Hacker Newsnew | past | comments | ask | show | jobs | submit | spindump8930's commentslogin

Is the proxy here linkedin messaging/mail instead of direct email?

https://marco.org/2013/10/25/linkedin-intro-insecurity

I don't recall all of the specific details, but I just remember reading about it at the time and how they bypassed some of iOS security protections to do it. Adn that they didn't get perma-banned from the various app stores back then is beyond me. It's a huge part of why I avoid installing apps on my phone in general.


It was mentioned elsewhere in the thread but this article is relevant: https://www.cbssports.com/mlb/news/guardians-reliever-emmanu...

The ability to bet on short term individual events (such as a single pitch) means that even a single pitch, otherwise nearly inconsequential, can be abused.


For many of us a better Turing test is contextual to a topic we CARE about. Lots of LLMs sound better than a randomly sampled human on a topic I don't know too much about (e.g. opinions on new movies). They're decent on engineering topics I only vaguely know about, but still below the bar (though getting better!) on topics I really care about.


Fairly certain that AI (meaning an expensive llm type model) isn't needed to detect spam a large amount of the time. Classical classification methods could work while also being more privacy friendly (e.g. running on device).

> According to AT&T, a big difference with its product is that it is built into the network itself.

This doesn't seem like an asset...


I recall that there were similarly motivated lawsuits for the earlier answer boxes that used to appear (prior to direct genai injection into the SERP page). What ever happened with those? Finding it difficult to search for.


That's not what this is about.

"I had no problem getting deterministic LLM outputs when I experimented with this 6 months ago" looks like you're using llama-cpp in that repo. This is about vllm serving many requests at once, at long sequence lengths.

> As it turns out, our request’s output does depend on the parallel user requests. Not because we’re somehow leaking information across batches — instead, it’s because our forward pass lacks “batch invariance”, causing our request’s output to depend on the batch size of our forward pass.

Your situation isn't really comparable.


This topic is interesting, but the repo and paper have a lot of inconsistencies that make me think this work is hiding behind lots of dense notation and language. For one, the repo states:

> This implementation follows the framework from the paper “Compression Failure in LLMs: Bayesian in Expectation, Not in Realization” (NeurIPS 2024 preprint) and related EDFL/ISR/B2T methodology.

There doesn't seem to be a paper by that title, preprint or actual neurips publication. There is https://arxiv.org/abs/2507.11768, with a different title, and contains lots of inconsistencies with regards to the model. For example, from the appendix:

> All experiments used the OpenAI API with the following configuration:

> • Model: *text-davinci-002*

> • Temperature: 0 (deterministic)

> • Max tokens: 0 (only compute next-token probabilities)

> • Logprobs: 1 (return top token log probability)

> • Rate limiting: 10 concurrent requests maximum

> • Retry logic: Exponential backoff with maximum 3 retries

That model is not remotely appropriate for these experiments and was deprecated in 2023.

I'd suggest anyone excited by this attempt to run the codebase on github and take a close look at the paper.


It's telling that neither the repo nor the linked paper have a single empirical demonstration of the ability to predict hallucination. Let's see a few prompts and responses! Instead, all I see is a lot of handwavy philosophical pseudo-math, like using Kolmogorov complexity and Solomonoff induction, two poster children of abstract concepts that are inherently not computable, as explicit algorithmic objectives.


Ya I saw no comparison with other methods in the paper, which is odd for a ML paper.


It seems like the repo is mostly if not entirely LLM generated; not a great sign.


Don't forget that theoretical peak performance is (probably) half the performance listed on the nvidia datasheet because they used the "with sparsity" numbers! I've seen this bite folks who miss the * on the figure or aren't used to reading those spec sheets.


It's good to have this support in APIs but grammar constrained decoding has been around for quite a while, even before the contemporary LLM era (e.g. [1] is similar in spirit). Local vs global planning is a huge issue here though - if you enforce local constraints during decoding time, an LLM might be forced to make suboptimal token decisions. This could result in a "global" (i.e. all tokens) miss, where the probability of the constrained output is far lower than the probability of the optimal response (which may also conform to the grammar). Algorithms like beam search can alleviate this, but it's still difficult. This is one of the reasons that XML tags work better than JSON outputs - less constraints on "weird" tokens.

[1] https://aclanthology.org/P17-2012/


Why would this be? I'm probably missing something.

Don't these LLMs fundamentally work by outputting a vector of all possible tokens and strengths assigned to each, which is sampled via some form of sampler (that typically implements some softmax variant, and then picks a random output form that distribution), which now becomes the newest input token, repeat until some limit is hit, or an end of output token is selected?

I don't see why limiting that sampling to the set of valid tokens to fit a grammar should be harmful vs repeated generation until you get something that fits your grammar. (Assuming identical input to both processes.) This is especially the case if you maintain the relative probability of valid (per grammar) tokens in the restricted sampling. If one lets the relative probabilities change substantially, then I could see that giving worse results.

Now, I could certainly imagine blindsiding the LLM with output restrictions when it is expecting to be able to give a freeform response might give worse results than if one prompts it to give output in that format without restricting it. (Simply because forcing an output that is not natural and not a good fit for training can mean the LLM will struggle with creating good output.) I'd imagine the best results likely come from both textually prompting it to give output in your desired format, plus constraining the output to prevent it from accidentally going off the rails.


Great article, thanks for sharing. The tension between "SQL is declarative" and "Write the query like this or it will OOM" has always made me uncomfortable.

I've used closed, mature, systems with custom cost based optimization and still find that I sometimes need to override the optimizer on join orders. Interesting to see if any of the benchmarks in this paper have similar shapes.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: