As someone fully supportive of the social media ban for Australian kids, I think As someone fully supportive of the social media ban for Australian kids, I think we need to teach UK kids to vibecode their own VPNs with OSS models at this point so they can save what's left of their future civil liberties.
We all know where this is going, they're going to ban the one mathematical tool we have that gives us control over machines, encryption.
All in typescript too. Actually very impressive. Well, some webm videos and .glb 3dfiles but only the essentials it seems, the rest is all propper typescript.
I'm very much pro hyper-automation, especially for all government work... but can't help but think this type of branding is just in bad faith and that these are not good people.
It just screams fried serotonin-circuits to me. I don't like it. I looked at the site for 2-3 seconds and I want nothing to do with these guys.
Do I think we should stop this type of competitive behaviour fueled by kids and investors both microdosed on meth? No.
I just wouldn't do business with them, they don't look like trustworthy brand to me.
Edit: They got me with the joke, being in this field there are people that do actually talk like that, both startups and established executives alike. I.e. Artisan ads in billboards saying STOP HIRING HUMANS and another new york company I think pushing newspaper ads for complete replacement. Also if you're up with the latest engineering in agentic scaffolding work this type of thing is no joke.
>I'm very much pro hyper-automation, especially for all government work... but can't help but think this type of branding is just in bad faith and that these are not good people.
>It just screams fried-serotonin circuits to me. I don't like it. I looked at the site for 2-3 seconds and I want nothing to do with these guys.
Enlightenment is realizing they aren't any different from those other guys.
>Edit: They got me with the joke, being in this field there are people that do actually talk like that, both startups and established executives alike.
Keep in mind that Theo said the Vanilla benchmark was running too fast so he made it "way way slower" so 4x is not representative of a direct comparison
While I don't want to discount the work of any physician-founded org knowing the pain they go through from working with them after they've seen 18 patients in a days work, this still just just looks like bad software. With no testing, no internal bench.
Did you do some kind of zod schema, or compare the error rate of how different models perform for this task? Did you bother setting up any kind of json output at all? Did you add a second validation step with a different model and then compared their numbers are the same?
It looks like no, they just deferred to authority the whole thing. Technically theres no difference between them saying that gpt5-mini or llama2-7b did this.
Literally every single llm will make errors and hallucinate. It's your job to put all the scaffolding around to make sure it doesn't or that it does a lot less than a skilled human would.
So then have you measured the error rate or maybe tried to put some kind of error catching mechanism just like any professional software would do?
At the end of the day, if you look at almost any government, roughly 2/3 of expenses go towards healthcare and education things which, AI worlkflow are very likely continue offsetting a larger and larger percentage of the costs on.
Can we still have a financial crisis from all this investment going bust because it might take too long for it to make a difference in manufacturing enough automation hardware for everyone? Yes.
But, the fundamentals are still there, parents will still send their kids to some type of school, and people will trade good in exchange for health services. That's not going to change. Neither will the need to use robots in nursing homes, I think that assumption is safe to make.
What's difficult to predict change in is adoption in manufacturing, and repairs ( be that repairing bridges or repairing your espresso machine ) because that is more of a "3D" issue and hard to automate reliably (think about how many gpus today would it actually take to get a robot to reason out and repair a whole in your drywall), given that your RL environments and training data needs grow exponentially. Technically, your phone should have enough gpu performance to do your taxes with a 3B model and a bunch of tools, eventually it'll even be better than you at it. But to tun an actual robot with multiple cameras and stuff doing troubleshooting and decision making.... you're gonna need a whole 8x rack of gpus for that.
And that's what makes it now difficult to predict what's going to happen. The areas under the curve can vary widely. We could get a 1B AGI model in 6 months, or it could take 5 years for agentic workflows to fully automate everyones taxes and actually replace 2/3 of radiology work...
Either way, while theres a significant chance of this transition to the automation age being rough, I am overall quite optimistic given the fundamentals of what governments actually spend majority of their money on.
I wouldn't even call it political. It's financial, and should be criminal. The people who are elected to represent us are just taking bribes and being paid off to allow corporations to screw us over.
I wouldn't even say "corporations" because honestly, it's just the one corporation that's keeping the US tax system mired in pointless, manual complexity: Intuit.
There is also a whole political line of thinking that making taxes easier makes them more palatable, so if you want to “starve the beast” at all costs you actually want tax filing to be as painful as possible.
An easy position for people wealthy enough to painlessly have their accountant do their taxes for them. If they really wanted people to struggle with their taxes they should be discouraging or outlawing companies like turbo tax who make taxes easier for the peasant class forcing most people to fill everything out by hand on paper forms.
Talk to an educator.
Education is being actively harmed by AI. Kids don’t want to do any difficult thinking work so they aren’t learning. (Literally any teacher you talk to will confirm this)
AI in medicine is challenging because AI is bad at systems thinking, citation of fact and data privacy. Three things that are absolutely essential for medicine. Also everything for healthcare needs regulatory approval so costs go up and flexibility goes down. We’re ten years away from any AI for medicine being cost effective.
Having an AI do your taxes is absurd. They regularly hallucinate. I 100% guarantee that if you do your taxes with AI you won’t pass an audit. AI literally can’t count. You’re be better off asking it to vibecode a replacement for TurboTax. But again the product won’t be AI it will be traditional code.
Trying for AGI down the road of an LLM is insanity sauce. It’s a simulated language center that can’t count, it can’t do systems thinking. It can’t cite known facts. We’re not six months away we’re a decade or a “cost effective fusion” distance (defined as perpetually 20 years in the future from any point in time)
There are at least six Silicon Valley startups working on AGI. Not a single one of them has published an architecture strategy that might work. None of the “almost AGI” products that have ever come out have a path to AGI.
Meh is the most likely outcome. I say this as someone who uses it a lot for things it is good at.
> AI in medicine is challenging because AI is bad at systems thinking, citation of fact and data privacy.
main question is if humans are better than that. I have experiences with doctor: he gave prescription of Xmg, I am asking why, he said because some study said so, I go home, pull study, and it is XXmg there. Doctors can make things up all the time without much consequences and likely do. For AI, corps and community can do all kind of benchmarking and evaluation on industrial scale.
We all know where this is going, they're going to ban the one mathematical tool we have that gives us control over machines, encryption.