Could you elaborate on the progress that has been made?
To me, it seems only small/incremental changes are made between models with all of them still hallucinating.
I can see no clear steps towards AGI.
I suspect instructing the model to respond with "I don't know" more readily will result in more of those responses even though there are other options that seem viable according to the training data / model.
Remember, LLMs are just statistical sentence completion machines. So telling it what to respond with will increase the likelihood of that happening, even if there are other options that are viable.
But since you can't blindly trust LLM output anyway, I guess increasing "I don't know" responses is a good way of reducing incorrect responses (which will still happen frequently enough) at the cost of missing some correct ones.
> Remember, LLMs are just statistical sentence completion machines. So telling it what to respond with will increase the likelihood of that happening, even if there are other options that are viable.
Obviously. When I say "tuned" I don't mean adding stuff to a prompt. I mean tuning in the way models are also tuned to be more or less professional, tuned to defer certain tasks to other models (i.e. counting or math, something statistical models are almost unable to do) and so on.
I am almost certain that the chain of models we use on chatgpt.com are "tuned" to always give an answer, and not to answer with "I am just a model, I don't have information on this". Early models and early toolchains did this far more often, but today they are quite probably tuned to "always be of service".
"Quite probably" because I have no proof, other than that it will gladly hallucinate, invent urls and references, etc. And knowing that all the GPT competitors are battling for users, so their products quite certainly tuned to help in this battle - e.g. appear to be helpful and all-knowing, rather than factual correct and therefore often admittedly ignorant.
Whether you train the model how to do math internally or tell it to call an external model which only does math the root problem still exists. It's not as if a model which only does math won't hallucinate how to solve math problems just because it doesn't know about history, for the same number of parameters it's probably better to not have to duplicate the parts needed to understand the basis of things multiple times.
The root problem is training models to be uncertain of their answers results in lower benchmarks in every area except hallucinations. It's like you were in a multiple choice test and instead of picking which of answers A-D you think made more sense you picked E "I don't know". Helpful for the test grader, a bad bet for the model trying to claim it gets the most answers right compared to other models.
The technical solution is the easy half, the hard part is convincing people this is how we should be testing everything because we care about knowing the uncertainty in any test.
E.g. look at the math section of the SATs, it rewards trying to see if you can guess the right answer instead of rewarding admitting you don't know. It's not because the people writing the SATs can't figure out how to grade it otherwise, it's just not what people seem to care most about finding out for one reason or another.
Last time I tried Elevenlabs for German text, it got a lot of numbers and dates wrong.
E. g. saying "1963" when the actual year in the text was 1967. Yeah, the voices sound very realistic. But I'm not sure how useful that is if you can't trust the spoken words.
Does anyone know if it got better in the last weeks?
I'm currently building an open source CMS in Golang. Meaning fully implemented backend and frontend + support for custom themes in the frontend and custom collections/items in the backend.
Now, there is a lot of CMS software out there. Some of the better ones are paid products.
What I'm hoping to eventually accomplish is easy local creation of a website (content and themes) and after that easy one click deployment to a cheap hosting provider. Alternatively just copying a local folder to your own vps/server with the CMS should be enough.
My dream outcome would be a CMS that is a one-stop solution for most types of websites (blogs, company sites, shops, ...). To hopefully contribute to making people stop using facebook, twitter, other centralized and eventually login seeking services for hosting content people would like to read.
For this, a free/cheap one click hosting solution after locally creating and previewing a site would be necessary.
PHP is still pretty widely used here because cheap web hosting package support that.
I like PHP, but for open source projects I prefer Go because of the maintainability and fun of writing it.
Sorry, but that doesn't work in real world applications. Multiple requests are fired simultaneously all the time. E. g. browsers starting with multiple tabs, smartphone apps starting and firing multiple requests etc.
I've heard that CRDT needs to be the source of truth for certain features to work (handling users going online/offline?). I was never quite sure why it would not work to just create a new CRDT document from another source of truth (markdown, database, ...) when starting an editing session. Maybe someone else can explain.
I'm using your service (found it on HN too) for a side project and I'm very happy with it. Good job!
One nitpick: I'm seeing occasional timeouts (probably because the residential endpoint went down recently). Do you have a best practice on how to work around that?
I'm currently prototyping a product using the Raspberry Pi + GPIO. I've often wondered - if the thing makes it to market - how I would substitute the Rpi in mass production (1000+). The SD card is a big point of failure for many Rpi users for example. So storage is probably something I would change.
Does anyone have experience with a mass produced product that used the Rpi in prototypes?
I know element14 is offering Raspberry Pi customization for mass production.
I've looked at the compute module. It is positioned as the way to go for mass production - but the price point seems higher than the Rpi as you still need a base board for it to function.