Hacker News new | past | comments | ask | show | jobs | submit login

For me as a lay-person, the article is disjointed and kinda hard to follow. It's fascinating that all the quotes are emotional responses or about academic politics. Even now, they are suspicious of transformers and are bitter that they were wrong. No one seems happy that their field of research has been on an astonishing rocketship of progress in the last decade.



The way I see this is that for a long time there was an academic field that was working on parsing natural human language and it was influenced by some very smart people who had strong opinions. They focused mainly on symbolic approaches to parsing, rather than probabilistic. And there were some fairly strong assumptions about structure and meaning. Norvig wrote about this: https://norvig.com/chomsky.html and I think the article bears repeated, close reading.

Unfortunately, because ML models went brr some time ago (Norvig was at the leading edge of this when he worked on the early google search engine and had access to huge amounts of data), we've since seen that probabilistic approaches produce excellent results, surpassing everything in the NLP space in terms of producing real-world sysems, without addressing any of the issues that the NLP folks believe are key (see https://en.wikipedia.org/wiki/Stochastic_parrot and the referenced paper). Personally I would have preferred if the parrot paper hadn't also discussed environmental costs of LLMs, and focused entirely on the semantic issues associated with probabilistic models.

I think there's a huge amount of jealousy in the NLP space that probabilistic methods worked so well, so fast (with transformers being the key innovation that improved metrics). And it's clear that even state-of-the-art probabilistic models lack features that NLP people expected.

Repeatedly we have seen that probabilistic methods are the most effective way to make forward progress, provided you have enough data and good algorithms. It would be interesting to see the NLP folks try to come up with models that did anything near what a modern LLM can do.


This is pretty much correct. I'd have to search for it but I remember an article from a couple years back that detailed how LLMs blew up the field of NLP processing overnight.

Although I'd also offer a slightly different lens through which to look at the reaction of other researchers. There's jealousy, sure, but overnight a ton of NLP researchers basically had to come to terms with the fact that their research was useless, at least from a practical perspective.

For example, imagine you just got your PhD in machine translation, which took you 7 years of laboring away in grad/post grad work. Then something comes out that can do machine translation several orders of magnitude better than anything you have proposed. Anyone can argue about what "understanding" means until they're blue in the face, but for machine translation, nobody really cares that much - people just want to get text in another language that means the same thing as the original language, and they don't really care how.

Tha majority of research leads to "dead ends", but most folks understand that's the nature of research, and there is usually still value in discovering "OK, this won't work". Usually, though, this process is pretty incremental. With LLMs all of a sudden you had lots of folks whose life work was pretty useless (again, from a practical perspective), and that'd be tough for anyone to deal with.


You might be thinking of this article by Sebastian Ruder: https://www.ruder.io/nlp-imagenet/

Note that the author has a background spanning a lot of the timespans/topics discussed - much work in multilingual NLP, translation, and more recently at DeepMind, Cohere, and Meta (in other words, someone with a great perspective on everything in the top article).

Re: Machine Translation, note that Transformers were introduced for this task, and built on one of the earlier notions of attention in sequence models: https://arxiv.org/abs/1409.0473 (2014, 38k citations)

That's not to say there weren't holdouts or people who really were "hurt" by a huge jump in MT capability - just that this is a logical progression in language understanding methods as seen by some folks (though who could have predicted the popularity of chat interfaces).


Yes, I think a lot of NLP folks must’ve had their “God does not play dice with the univers(al grammar)” moment.


The majority of NLP people were not into universal grammar at all.


I wouldn't say NLP as a field was resistant to probabilistic approaches or even neural networks. From maybe 2000-2018 almost all the papers were about using probabilistic methods to figure out word sense disambiguation or parsing or sentiment analysis or whatever. What changed was that these tasks turned out not to be important for the ultimate goal of making language technologies. We thought things like parsing were going to be important because we thought any system that can understand language would have to do so on the basis of the parse tree. But it turns out a gigantic neural network text generator can do nearly anything we ever wanted from a language technology, without dealing with any of the intermediate tasks that used to get so much attention. It's like the whole field got short-circuited.


The way I have experienced this, starting from circa 2018, it was a bit more incremental. First, LSTMs and then transformers lead to new heights on the old tasks, such as syntactic parsing and semantic role labelling, which was sad for the previous generation, but at least we were playing the same game. But then not only the old tools of NLP, but the research questions themselves became irrelevant because we could just ask a model nicely and get good results on very practical downstream tasks that didn't even exist a short while ago. NLP suddenly turned into general document/information processing field, with a side hustle in conversational assistants. Already GPT2 essentially mastered the grammar of English, and what difficulties remain are super-linguistic and have more to do with general reasoning. I would say that it's not that people are bitter that other people make progress, it's more that there is not much progress to be had in the old haunts at all.


I think you greatly understate the impact as EVERYONE is freaking the fuck out about AI, not just NLP researchers.

AI is obliterating the usefulness of all mental work. Look at the high percentage of HN articles trying to figure out whether LLMs can eliminate software developers. Or professional writers. Or composers. Or artists. Or lawyers.

Focusing on the NLP researchers really understates the scope of the insecurity induced by AI.


As someone in NLP who lived through this experience: there's something uniquely ironic and cruel about building the wave that washes yourself away.


I agree with criticism of Noam Chomsky as a linguist. I was raised in the typological tradition which has its very own kind of beef with Chomsky due to other reasons (his singular focus on English for constructing his theories amongst other things), but his dislike of statistical methods was of course equally suspect.

Nevertheless there is something to be said for classical linguistic theory in terms of constituent (or dependency) grammars and various other tools. They give us much simpler models that, while incomplete, can still be fairly useful at a fraction of the cost and size of transformer architectures (e.g. 99% of morphology can be modeled with finite state machines). They also let us understand languages better - we can't really peek into a transformer to understand structural patterns in a language or to compare them across different languages.


That is simply false about UG only being based on English. Maybe in 1950 but any modern generativist theory uses data from many, many languages and English has been re-analysed in light of other languages (see here for an example of quantifiers being analysed in English on the basis of data in a Salish language https://philpapers.org/rec/MATQAT )


All of this matches my understanding. It was interesting taking an NLP class in 2017, the professors said basically listen, this curriculum is all historical and now irrelevant given LLMs, we’ll tell you a little about them but basically it’s all cutting edge sorry.


Same for my nlp class of 2021. Just directly went onto talking about transformers after a brief intro of the old stuff


> most effective way to make forward progress

powerful response but.. "fit for what purposes" .. All of human writings are not functionally equivalent. This has been discussed at length. e.g. poetry versus factual reporting or summation..


https://www.amazon.com/dp/B0DYDGZTMV makes the case that DeepSeek is a poet.


At least the author is upfront that the poetry is a showcase of AI.


Even 15-ish years ago when I was in school, the NLP folks viewed probabilistic models with suspicion. NLP treated everyone from our Math department with suspicion and gave them a hard time. It created so many politics that some folks who wanted to do statistical approaches would call themselves CS so that the NLP old guard wouldn't give them a hard time.


Sounds like the bitter lesson is bitter indeed!


On the contrary, to some of us (who have focused on probability, big data, algorithms, and HPC, while eschewing complex theories that require geniuses to understand) the bitter lesson is incredibly sweet.

Very much like when I moved from tightly coupled to "embarassing" parallelism. A friend said "don't call it embarassing... it's pleasant not to have to think about hard distributed computing problems".


The progression reminds me of how brute force won out in the chess AI game long ago with Deep Blue. Custom VLSI and FPGA acceleration and all.


do transformers not use a symbolic and a probabilistic approach?


Well, if you’ve built a career on something, you will usually actively resist anything that threatens to destroy it.

In other words, what is progress for the field might not be progress for you !

This reminds me of Thomas Kuhn’s excellent book ´the structure of scientific revolutions’ https://en.wikipedia.org/wiki/The_Structure_of_Scientific_Re...


It reminds me much more of Paul Feyerabend's even better book "Against Method" https://en.wikipedia.org/wiki/Against_Method


Or Planck's principle - "Science progresses one funeral at a time".


And the more general version, “Humanity progresses one funeral at a time.” Which is why the hyper-longevity people are basically trying to freeze all human progress.


Or Effective Altruism's long-termism that effectively makes everyone universally poor now. Interestingly, Guillaume Verdon (e/acc) is friends with Bryan Johnson and seems to be pro-longevity.


Can you elaborate?


Long-termism is essentially a utilitarian doctrine, but uses the "total utilitarianism" method:

https://en.wikipedia.org/wiki/Average_and_total_utilitariani...

Take that moral position and then extend the calculation to cover all potential generations of humans to come, and the value of maximising your utility now is infinitely small.


It's a truly bitter pill to swallow when your whole area of research goes redundant.

I have a bit of background in this field so it's nice to see even people who were at the top of the field raise concerns that I had. That comment about LHC was exactly what I told my professor. That the whole field seems to be moving in a direction where you need a lot of resources to do anything. You can have 10 different ideas on how to improve LLMs but unless you have the resources there is barely anything you can do.

NLP was the main reason I pursued an MS degree but by the end of my course I was not longer interested in it mostly because of this.


> That the whole field seems to be moving in a direction where you need a lot of resources to do anything. You can have 10 different ideas on how to improve LLMs but unless you have the resources there is barely anything you can do.

I think you're confusing problems, or you're not realizing that improving the efficiency of a class of models is a research area on it's own. Look at any field that involves expensive computational work. Model reduction strategies dominate research.


I felt that way maybe an year or two ago. It seemed like the most research were only concerned about building bigger models to beat benchmarks. There was also this prevalent idea that models need to be big and have massive compute. Especially from companies like openai. I was glad that models like deepseek were made. Bought back some hope


> No one seems happy that their field of research has been on an astonishing rocketship of progress in the last decade.

Well, they're unhappy that an unrelated field of research more-or-less accidentally solved NLP. All the specialized NLP techniques people spent a decade developing were obviated by bigger deep learning models.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: