Hacker Newsnew | past | comments | ask | show | jobs | submit | nearbuy's commentslogin

If you include the encoder outputs as part of the state, then encoder-decoder LLMs are Markovian as well. While in token space, decoder-only LLMs are not Markovian. Anything can be a Markov process depending what state you include. Humans, or even the universe itself are Markovian. I don't see what insight about LLMs you and other commenters are gesturing at.

> (Also, Wikipedia is a tertiary source, as it is meant to only cite secondary sources, not primary sources.)

Wikipedia absolutely cites primary sources (as well as secondary and tertiary sources), and this is in accordance with their policy. Breaking news stories and scientific papers are some commonly used primary sources. You may be thinking of their "no original research" policy or their warnings against editors adding their own interpretation to primary sources.


Perhaps I was a bit strict, but Wikipedia is mainly meant to cite secondary sources.

When they explain where primary sources are allowed, they emphasize they "should be used carefully":

https://en.wikipedia.org/wiki/Wikipedia:Identifying_and_usin...

Also "Wikipedia articles should be based on reliable, published secondary sources, and to a lesser extent, on tertiary sources and primary sources."

https://en.wikipedia.org/wiki/Wikipedia:No_original_research...

The general idea is that primary sources have not been judged as notable by anyone. The fact that a secondary source considers it notable to include a primary source is a strong signal that the information has passed a first, minimal bar for inclusion in an encyclopedia.

And when primary sources are cited, Wikipedia is exceptionally clear that they must be cited only for verifiable statements of fact, not interpretations or synthesis. That's what secondary sources are for.


It was $15k in 2017. Now their GDP per capita (PPP) is about $29k. It's growing fast.


you people realize that shanghai has world-class cost of living? like people there probably make 30k+ just to live there, but the rural, who aren't even allowed to consider living in shanghai bring that average down again. this avg without noting the median is about as misleading as saying people in america all make 80k. (this is household income so divide by avg household size 1.7 or whatever)

the ai had older data but it proves the point: "While the average disposable income for urban residents in 2019 was approximately 39,244 CNY (roughly $5,800 USD), rural residents earned significantly less at 14,389 CNY (roughly $2,100 USD)"


Not sure why you're telling me this. I was giving an update for the parent comment which was complaining that China's GDP per capita was only $15k and much lower than Japan and Korea. You can tell them why they shouldn't compare GDP per capita across countries if you want.


It’s worth mentioning that the economic numbers reported from China about China are not terribly trustworthy.


They can learn to generalize patterns during training and develop some model of the world. So for example, if you were to train an LLM on chess games, it would likely develop an internal model of the chess board. Then when someone plays chess with it and gives a move like Nf3, it can use that internal model to help it reason about its next move.

Or if you ask it, "what is the capital of the state that has the city Dallas?", it understands the relations and can internally reason through the two step process of Dallas is in Texas -> the capital of Texas is Austin. A simple n-gram model may occasionally get questions like that right by a lucky guess (though usually not) while we can see experimentally the LLM is actually applying the proper reasoning to the question.

You can say this is all just advanced applications of memorizing and predicting patterns, but you would have to use a broad definition of "predicting patterns" that would likely include human learning. People who declare LLMs are just glorified auto-complete are usually trying to imply they are unable to "truly" reason at all.


This isn't true at all. The LLMs absolutely world model and researchers have shown this many times on smaller language models.

> techniques like vector tokenization

(I assume you're talking about the input embedding.) This is really not an important part of what gives LLMs their power. The core is that you have a large scale artificial neural net. This is very different than an n-gram model and is probably capable of figuring out anything a human can figure out given sufficient scale and the right weights. We don't have that yet in practice, but it's not due to a theoretical limitation of ANNs.

> probability distribution of the most likely next token given a preceding text.

What you're talking about is an autoregressive model. That's more of an implementation detail. There are other kinds of LLMs.

I think talking about how it's just predicting the next token is misleading. It's implying it's not reasoning, not world-modeling, or is somehow limited. Reasoning is predicting, and predicting well requires world-modeling.


>This is really not an important part of what gives LLMs their power. The core is that you have a large scale artificial neural net.

What seperates transformers from LSTMs is their ability to proccess the entire corpus in parallel rather in-sequence and the inclusion of the more efficient "attention" mechanism that allows them to pick up long range dependencies across a language. We don't actually understand the full nature of the latter, but I suspect that is the basis behind the more "intelligent" actions of the LLM. There's quite a general range of problems that a long-range-dependency was encompass, but that's still ultimately limited by language itself.

But if you're talking about this being a fundamentally a probability distribution model, I stand by that, because that's literally the mathematical model (softmax for the encoder and decoder) that's being used in transformers here. It very much is generating a probability distribution over the vocabulary and just picking the highest probability (or beam search) as your next output.

>The LLMs absolutely world model and researchers have shown this many times on smaller language models.

We don't have a formal semantic definition of a "world model", I would take alot of what these researchers are writing with a grain of salt because something like that crosses more into philosophy (especially in the limits of language and logic) than hard engineering that these researchers are trained on.


For what it's worth, I've been trying Opus 4.1 in VS Code through GitHub Copilot and it's been really bad. Maybe worse than Sonnet and GPT 4.1. I'm not sure why it was doing so poorly.

In one instance, I asked it to optimize a roughly 80 line C# method that matches some object positions by object ID and delta encodes their positions from the previous frame. It seemed to be confused about how all this should work and output completely wrong code. It has all the context it needs in the file and the method is fairly self-contained. Other models did much better. GPT-5 understood what to do immediately.

I tried a few other tasks/questions that also had underwhelming results. Now I've switched to using GPT-5.

If you have a quick prompt you'd like me to try, I can share the results.


Use Claude Code, the rest aren't worth the bother.


What does Claude Code do differently to Copilot Agent? Shouldn't they produce the same(ish) result if they're using the same model?


If they prompt the same and ..., They should.

But they definitely don't taking into account whatever prompts the tools are really using (or ms is using a neutered version to reduce cost). So I would agree with the suggestion. Using sonnet through copilot seems very very different than cursor or cline or Claude code.

Using the same exact model, Copilot consistently often fails to finish tasks or makes a mess. It is consistent at this across ides (ie using the jetbrains plugin generates nearly identical bad results as vscode copilot). I then discard all it did and try the exact same (user) prompt in cursor or Claude code or cline with the same model and it does the same task perfectly.


I've used both aider and opencode with both Opus and Sonnet. Opencode, at least initially, used Claude Code's exact prompt; and I found the results surprisingly different.

Perhaps it shouldn't be surprising; after all, we do want the LLMs to listen to the prompts and act differently. And, the Claude team will presumably be tuning both Claude and Claude Code's prompts to each other optimize their own experience, so it's perhaps not surprising that Claude + Claude Code's prompts well together.


Copilot sucks more at applying what the model is instructing it to do


To me it seems that Opus is really good at writing code if you give it a spec. The other day I had Gpt come up with a spec for a DnD text game that uses the GPT API. It one shotted a 1k line program.

However, if I'm not detailed with it, it does seem to make weird choices that end up being unmaintainable. It's like it has poor creative instincts but is really good at following the directions you give it.


Wait, are you talking about Opus or GPT? Which GPT? You switched models mid-sentence.


GPT 4o came up with a design spec that I gave to Opus to implement.


Opus seems to need more babysitting IME, which is great if you are going to actually pair program. Terrible if you like leaving it to do its own thing or try to do multiple things at once.


I just want a model that feels like an extension of me. For example if I there's a task I can describe in one sentence - "add a rest api for user management in the db, and makes sure only users in the admin group are allowed to use it" - would result in an API endpoint that's properly wired up to the right places, and the model does what I tell it, and nothing else, even if it would logically follow from what I told it.

And if it's gets confused, needs clarification, or has its own initative - I want it to stop and ask.

Oh and it needs to be fast it's tokens per minute should be as fast as I can read what it generates (and I can read boilerplate-y code quite fast), and it shouldn't stop and think on every prompt, only when it needs to, and it should be much faster and granular in backtracking.

The loop of waiting on the AI then having to fix and steer it constantly as it doggedly follows its own ideas has really taken the enjoyment out of vibe coding for me.


Have it break the problem into phases. Have it unit testing after every phase. Only move forward after all the test for the phase have passed. I’m using the free Qwen3-Coder and with proper prompting is fairly good.


That's insightful.

I spend a lot of time planning tasks, generating various documents per pr (requirements, questions, todo), having AI poke my ideas (business/product/ux/code-wise) etc.

After 45 minutes of back and forth in general we end up with a detailed plan.

This has also many benefits: - writing tests becomes very simple (unit, integration, E2Es) - writing documentation becomes very simple - writing meaningful PRs becomes very simple

It is quite boring though, not gonna lie. But that's a price I have accepted for quality.

Also, clearing the ideas so much before hand often leads me to come with creative ideas later in the day, when I go for walks and review mentally what we've done/how.


You might want to try Claude Code if you haven't. It's perfect for exactly this plan, then build flow with a ton of documents. A colleague set up some strict code guidelines, right down to say, put constructors at the top, constants at the bottom, use this name for this, snake case for that. Code quality just shoots up with these details. Can't just hack away with a blunt axe.

People tend to hate Claude Code because it's not vibe coding anymore but it was never really meant to be.


Yes I use Claude Code a lot, but I'm on the $ 20 tier so I've never seen opus in action (I think it's sonnet only?).


It's not the case. Effective altruists give to dozens of different causes, such as malaria prevention, environmentalism, animal welfare, and (perhaps most controversially) extinction risk. It can't tell you which root values to care about. It just asks you to consider whether the charity is impactful.

Even if an individual person chooses to direct all their donations to a single cause, there's no way to get everyone to donate to a single cause (nor is EA attempting to). Money gets spread around because people have different values.

It absolutely does take some money away from other causes, but only in the sense that all charities do: if you give a lot to one charity, you may have less money to give to others.


When you go deeper into physics, mass and energy don't seem real either, in that, like entropy, they're emergent properties of a system rather than fixed, localized things.

I always thought of the energy of a system (kinetic + potential) as a useful mathematical invariant that helps us predict systems rather than a physical thing. If you put a cart at the top of a hill, then the cart has more potential energy (but only from certain reference frames). It doesn't feel like that potential energy is physical. It doesn't have a specific location; it's a property of the whole Earth-cart system. And yet, it's this total energy that gives rise to the physical properties we're familiar with. In fact, almost all of the mass in your body comes not from the mass of elementary particles, but from the potential energy in the bonds between quarks. Your mass is more than 99% from potential energy.

And then when you get into Quantum Field Theory, it turns out particles (like electrons) are no longer truly point particles but rather another emergent phenomenon from ripples in an underlying field. A particle is just a model that describes it well when looked at from a distance. (I hope I'm not butchering that, as I'm not a physicist.)

So mass, matter, energy, and entropy are all emergent properties of a system rather than being localized, "real" things in the way we'd intuitively think. And at that point, I'm not sure how we would define "real" or why it would be a useful distinction. Is there a useful insight to be gained by putting entropy in a different category of realness than mass?


One Harvard study trained an AI that could reliably determine someone's race from a chest X-ray. AIs can be trained to see things we can't.

The difficulty is likely in making a good training dataset of labeled images with pathologies radiologists couldn't see. I imagine in some cases (like cancer), you may happen to have an earlier CT scan or X-ray from the patient where the pathology is not quite yet detectable to the human eye.


I suspect that radiologists could identify race from a plain chest x-ray if they were given the patient’s race and asked to start noticing the difference. They just aren’t doing it because, if it’s important, you can just look at the patient.

There are a lot of things in medicine that aren’t in literature, but are well-known among certain practitioners. I’m an anesthesiologist and practice in an area with a large African-American population. About 10-15% (rough guess) of people of West African descent will have a ridiculously strong salivary response to certain drugs (muscarinic agonists). As in, after one dose their mouths will be full of saliva in seconds. We don’t have East Africans for comparison, so I can’t say it’s a pan-Bantu thing, but I have seen it in a Nigerian who lived here. Not in the literature, but we all know it. I had a (EDIT: non-anesthesia) colleague ask me about a hypersecretory response from such a drug. I said, oh, was he black? Yes, how did I know? Because we give those drugs all the time and have eyes. It’s very rare to see in European-descended populations.


It's possible humans could learn to do this, but I'm skeptical they could do it this well. According to the article, humans experts couldn't tell race from the chest X-rays and the researchers couldn't figure out how the AI was detecting race. They fed it corrupted images to figure out what information it was relying on. It was robust against both low pass and high pass filters.


Sigh.

Rads can see a person's race. We look at them.

That's the reason rads never train to determine race from a chest x-ray.

BTW, models don't need to train that either. Because if it's important, it's recorded, along with a picture, in the guy's medical record.

I'd just like to gently suggest that determining someone's race from an X-Ray instead of, say, their photograph, is maybe not how we should be burning training cycles if we want to push medical imaging forward. Human radiologists had that figured out ages ago.


You're being snide about the Harvard/MIT researchers being idiots doing useless research because they don't realize radiologists can just look at the patients face, but that's obviously not what happened. They were trying to see if AI could introduce racial bias. They're not releasing an AI to tell radiologists the race of their patients.

According to the article, human experts could not tell race from chest X-rays, while the AI could to do reliably. Further, it could still determine race when given an X-ray image passed through a low pass filter or high pass filter, showing that it's not relying solely on fine detail or large features to do so.


Sigh.

Firstly, that doesn't tell us whether there is bias in data. That tells us whether or not there is bias in their data.

Secondly, it tells us it can train to spot things that human rads do not train to spot. It tells us nothing at all about whether or not an AI can train to spot things a human rad also trains to spot, but can't.

Human rads don't train to spot race. Why? Because they don't need to do so. Human rads do train to spot pathologies in as early a stage as possible. I've never seen an AI spot one at an earlier stage than the best human rads can. But I have seen several AIs fail to spot pathologies that even human rads at the median could spot.

That's the state of play today. And it's likely to remain that way for a long, long time. Human rads will be needed to review this work not because they are human, but right now, it's because human rads are just better. At the top end, human rads are not only better, but are manifestly superior.


Sigh.

They aren't studying bias in data. They were studying bias in AI. The data used was 850,000 chest x-rays from 6 publically available datasets. They aren't studying whether this dataset differs from the general public or has some kind of racial bias; that's irrelevant to the study.

> it tells us it can train to spot things that human rads do not train to spot

You're kidding yourself if you think you could determine someone's race with 97%+ accuracy from a chest x-ray if only you trained at it. The study authors (who are themselves a mix of radiologists and computer scientists) claim that radiologists widely believe it to be nearly impossible to determine race from a chest x-ray. No one is ever going to try to train radiologists to distinguish race from chest x-rays, so you'll always be able to hold out hope that maybe humans could do it with enough training. But your hope is based on nothing; you don't have a shred of evidence that radiologists could ever do this.

> I've never seen an AI spot one at an earlier stage than the best human rads can.

According to the article, AIs aren't trained to do this, because we don't have datasets to train this. You need a dataset where the disease is correctly labeled despite the best radiologists not being able to see it in the x-ray. Trained with a good enough dataset, they'd be able to see things we miss.


Polio type 2 and 3 were eradicated and overall we went from 400k cases of paralytic polio per year down to less than 1% of that.


You may want to check your numbers because polio type 2 and type 3 are both still around (in fact, cVDPV type 2 is very, very common). The GPEI website has the recent numbers, updated weekly (although with ~3 month lag).

400k to 4k is not 400k to 0. Eradication means 0. People don't get smallpox vaccines today because we hit 0. American children get ~4 doses of IPV still, despite what you are claiming as "eradication."


Reduced the number of cases by 99%*

That's amazing and I am very happy I live in a world where someone helped that many people, regardless of who you compare their accomplishments to.


Attributing that whole reduction to Gates alone is a ridiculous thing to do. He didn't even fund most of the project and the only thing he really brings to the table is money. This has been a multinational effort with literally millions of people for 3 decades. Gates mostly wrested control from them in the last decade.

If he manages to bring the project over the finish line, I will celebrate his achievement. At this point, signs point to failure of the GPEI being a near certainty. Unfortunately, we'll be back to 400k in about 10-15 years if we give up at this point.


You're conflating cVDPV with WPV. Despite both having a "type 2", they are not the same virus.

Calling a few hundred reported cases a year globally "very, very common" is... a stretch. You have better odds of getting struck by lightning.


I am not conflating cVDPV with WPV. The parent comment claimed type 2 and type 3 polio were gone, which is not true. WPV 3 and WPV 2 are gone, but type 2 and type 3 polio are both still around. If you get polio today (unlikely but possible), there's a pretty good chance it's cVDPV 2.


So you you're saying you knew the two types of polio were eradicated but inexplicably assumed the parent comment must have meant the vaccine-derived poliovirus was eradicated. Or you think these are the same virus because you saw "type 2" in the names of both.


I don't think these are the same thing. But when someone says that polio type 2 has been eradicated, that means the entire family of WPV, cVDPV, and VAPP. Not just WPV.

For all intents and purposes, yes, cVDPV is the same thing as WPV. There have actually been instances where cVDPVs have evolved into things that look quite a bit like WPV. Declaring victory over "type 2/3" because WPV is gone is meaningless when lots of people still get cVDPV and have exactly the same symptoms.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: