holy347's favorites | Hacker News

		creativeSlumber 63 days ago \| parent \| context \| on: GPT-5 This one? "Mind the Gap: Assessing Temporal Generalization in Neural Language Models" https://arxiv.org/abs/2102.01951
		skdidjdndh 73 days ago \| parent \| context \| on: Tao on “blue team” vs. “red team” LLMs > I trust them to add tests almost indiscriminately because tests are usually cheap; if they are wrong it’s easy to remove or modify them Having worked on legacy codebases this is extremely wrong and harmful. Tests are the source of truth more so than your code - and incorrect tests are even more harmful than incorrect code. Having worked on legacy codebases, some of the hardest problems are determining “why is this broken test here that appears to test a behavior we don’t support”. Do we have a bug? Or do we have a bad test? On the other end, when there are tests for scenarios we don’t actually care about it’s impossible to determine if that test is meaningful or was added because “it’s testing the code as written”.
		danielhanchen 79 days ago \| parent \| context \| on: Qwen3-Coder: Agentic coding in the world I wrote approximately in the blog about it and linked some papers! I also wrote about it here - https://unsloth.ai/blog/dynamic-4bit - one has to inspect the activation and weight quantization errors!
		zero_k 7 months ago \| parent \| context \| on: XOR But you forgot! It's also a 3-wise independent linear hashing function! Which means it can be used for probabilistically approximately uniform sampling and counting of solutions to boolean functions. This is super-duper useful. We use it to build counters that give probabilistic, but proven, counts. I explained the idea here in more understandable terms [1]. Basically, it halves the solution space approximately correctly each time. So you keep on adding them, until you have say, 10 solutions. Then you multiply the 10 with 2^k, where k is the number of XORs you added. That's it! So cool, no? And it's super-scalable, because it haves it each time, so you'll get to, say, 10 pretty quick! Some research papers are here [2,3]. I work on this, the tools are here [4,5]. In the last model counting competition, it dominated all other competitors, when combined with an exact counter, slides of the competition here [6]. [1] https://www.msoos.org/2018/12/how-approximate-model-counting... [2] https://arxiv.org/abs/1306.5726 [3] https://www.cs.toronto.edu/~meel/Papers/cav20-sgm.pdf [4] https://github.com/meelgroup/approxmc [5] https://github.com/meelgroup/unigen [6] https://mccompetition.org/assets/files/2024/MC2024_awards.pd...