Hacker Newsnew | past | comments | ask | show | jobs | submit | more ofirpress's commentslogin

Cool new efficient inference method that saves 2x memory and does not degrade performance for large language models!

More from the author about this at: https://twitter.com/Tim_Dettmers/status/1559892888326049792


Thanks for posting our paper! If anyone has any questions, I'll stick around this thread for a bit.

There's a summary of our paper on twitter: https://twitter.com/OfirPress/status/1344387959563325442

And our code is on GitHub: https://github.com/ofirpress/shortformer


No questions. After giving just a quick skim, this paper looks like great work. The findings are remarkable, and they're presented in clear, to-the-point language.

I confess to being a bit shocked that given the same number of parameters, training is 1.65x faster (whoa), generation is 9x faster (wait, what!?), and perplexity is better (which is a flawed measure, but still), and all by using a new form of "curriculum learning" and adding position embeddings to the queries and keys but not the values.

And it's so nice to see new ideas and improvements that don't rely on yet more computation or yet more parameters (I'm looking at you, GPT-3).

Congratulations!


Thank you! We spent a lot of time on making this as easy to understand as possible.


What's "perplexity" a measure of? First I've heard of it.


e^loss. It's a bad name for a confusing concept: Loss. (e^loss is just another way of plotting loss, after all.)

Loss isn't the whole story -- the steepest slope during training often produces the worst quality language models. You want a nice, gentle downward slope.

SubsimulatorGPT2 (https://reddit.com/r/subsimulatorgpt2) continued to improve in terms of human evaluation even though the loss stayed flat for over a week.


Lots of people have heard about deep learning but dont really know where to start. When I was starting a year or so ago I was in the same place. I just took the resources that I used to learn about the field and put them all in one place. As far as I know, this hasnt been done before. Im not trying to provide any new insights into deep learning.


Greg Brockman (Founder of Open AI) has written this amazing answer:

If you want to read one main resource... the Goodfellow, Bengio, Courville book (available for free from http://www.deeplearningbook.org/) is an extremely comprehensive survey of the field. It contains essentially all the concepts and intuition needed for deep learning engineering (except reinforcement learning).

If you'd like to take courses... Pieter Abbeel and Wojciech Zaremba suggest the following course sequence:

- Linear Algebra — Stephen Boyd’s EE263 (Stanford) - Neural Networks for Machine Learning — Geoff Hinton (Coursera) - Neural Nets — Andrej Karpathy’s CS231N (Stanford) - Advanced Robotics (the MDP / optimal control lectures) — Pieter Abbeel’s CS287 (Berkeley) - Deep RL — John Schulman’s CS294-112 (Berkeley)

(Pieter also recommends the Cover & Thomas information theory and Nocedal & Wright nonlinear optimization books).

If you'd like to get your hands dirty... Ilya Sutskever recommends implementing simple MNIST classifiers, small convnets, reimplementing char-rnn, and then playing with a big convnet. Personally, I started out by picking Kaggle competitions (especially the "Knowledge" ones) and using those as a source of problems. Implementing agents for OpenAI Gym (or algorithms for the set of research problems we’ll be releasing soon) could also be a good starting place.

Quora link: https://www.quora.com/What-are-the-best-ways-to-pick-up-Deep...


Thats awesome, I wasn't aware of this. I slightly disagree with Greg though about the "one main resource". While the DL book is amazing, its not really aimed at newbies, and so I think CS231n is a much better starting point. For example, for a beginner I think the way backprop is explained in Stanford's course is better than the explanation in the book.


"One main resource" as in, it goes through all the underlying math required in details etc. It's usually assumed that people entering into DL have some experience with Machine Learning. Of course, for someone staring with ML, CS229 is the first thing she should pick up.


I just wanted to say this is a great recommendation for anyone who is serious about learning deep learning / machine learning. I also want to note, there are other resources available for people who are less serious about it and just want to get a general overview.


I'm "on the other side". I took a very good AI course 6 years ago, right before deep learning exploded. And while I had the concepts to understand why they work, it took me a while to start using the new tools.

The compilation of links is good, and you are very clear presenting them. Wish I'd had this earlier to save some time, but still, thanks for compiling this list!


I would ignore the top reply. I found this a very helpful collection, and I appreciated your breaking down of resources for starting off in different areas of deep learning. It's at the perfect level for a summer student that I'm currently supervising.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: