Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks.

So when, for example, we train an ImageNet model over multiple epochs using rotation/scaling/etc augmentation, it's really better to think of this as one epoch over a unique set of images than multi-epoch per se ? I was really thinking of augmentation as a way to get coverage over the input space rather than ensuring the training data doesn't repeat, but I guess it serves both purposes.

It does still seem that many LLMs are overfitting / memorizing to a fair degree though - maybe just because they are still too big for the amount of data they are trained on ? It seems like a bit of a balancing act - wanting an LLM to generalize, but yet also to serve as somewhat of a knowledge store for rare data it has only seen once.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: