We're high-performing, diverse, tight-knit team with a mission to radically improve mainstream science and maths education at schools. By creating the best lessons in the world, coupled with intuitive tools that allow teachers to take advantage of the latest pedagogies, we’ve already helped millions of students in Australia get excited about science and maths.
45% of Australian science students in years 7-10 use Stile. Help us scale from 500k to more than 5 million students across Australia and the US over the next two years!
We now have offices in Melbourne, Boston, Portland and more! We're primarily hiring in Melbourne (relocation assistance and visa sponsorship available), but there will be lots of travel opportunities.
We're looking for a bunch of new roles right now (not all on our jobs site yet; please reach out if you're interested even if you don't seem to fit a specific role!):
Trying two things and giving up. It's like opening a REPL for a new language, typing some common commands you're familiar with, getting some syntax errors, then giving up.
You need how to learn to use your tools to get the best out of them!
Start by thinking about what you'd need to tell a new Junior human dev you'd never met before about the task if you could only send a single email to spec it out. There are shortcuts, but that's a good starting place.
In this case, I'd specifically suggest:
1. Write a CLAUDE.md listing the toolchains you want to work with, giving context for your projects, and listing the specific build, test etc. commands you work with on your system (including any helpful scripts/aliases you use). Start simple; you can have claude add to it as you find new things that you need to tell it or that it spends time working out (so that you don't need to do that every time).
2. In your initial command, include a pointer to an example project using similar tech in a directory that claude can read
3. Ask it to come up with a plan and ask for your approval before starting
Eh. This is true for humans too and doesn’t make humans useless at evaluating business plans or other things.
You just want the signal from the object level question to drown out irrelevant bias (which plan was proposed first, which of the plan proposers are more attractive, which plan seems cooler etc.)
Very good question. Currently, I’ve only sourced power supplies that meet US standards. However, I’m confident I can provide a power supply that works for Australia as well.
When you are placing the order, make sure you put the shipping address to Australia address and so that I will know I need to source corresponding power supply.
For what it's worth, I wasn't able to pick an address outside of the United States when it still had units for sale. If you want a first international sale, I'm estsauver at gmail dot com and would be thrilled to be your first customer from The Netherlands.
Another way to phrase this is LLM-as-compiler and Python (or whatever) as an intermediate compiler artefact.
Finally, a true 6th generation programming language!
I've considered building a toy of this with really aggressive modularisation of the output code (eg. python) and a query-based caching system so that each module of code output only changes when the relevant part of the prompt or upsteam modules change (the generated code would be committed to source control like a lockfile).
I think that (+ some sort of WASM encapsulated execution environment) would one of the best ways to write one off things like scripts which don't need to incrementally get better and more robust over time in the way that ordinary code does.
I think that commenter was disagreeing with this line:
> because omniscient-yet-dim-witted models terminate at "superhumanly assistive"
It might be that with dim wits + enough brute force (knowledge, parallelism, trial-and-error, specialisation, speed) models could still substitute for humans and transform the economy in short order.
Sorry, I can't edit it any more, but what I was trying to say is that if the authors are correct, that this distinction is philosophically meaningful, then that is the conclusion. If they are not correct, then all their papers on this subject are basically meaningless.
> I think AI maximalists will continue to think that the models are in fact getting less dim-witted
I'm bullish (and scared) about AI progress precisely because I think they've only gotten a little less dim-witted in the last few years, but their practical capabilities have improved a lot thanks to better knowledge, taste, context, tooling etc.
What scares me is that I think there's a reasoning/agency capabilities overhang. ie. we're only one or two breakthroughs away from something which is both kinda omniscient (where we are today), and able to out-think you very quickly (if only through dint of applying parallelism to actually competent outcome-modelling and strategic decision making).
That combination is terrifying. I don't think enough people have really imagined what it would mean for an AI to be able to out-strategise humans in the same way that they can now — say — out-poetry humans (by being both decent in terms of quality and super fast). It's like when you're speaking to someone way smarter than you and you realise that they're 6 steps ahead, and actively shaping your thought process to guide you where they want you to end up. At scale. For everything.
This exact thing (better reasoning + agency) is also the top priority for all of the frontier researchers right now (because it's super useful), so I think a breakthrough might not be far away.
Another way to phrase it: I think today's LLMs are about as good at snap judgements in most areas as the best humans (probably much better at everything that rhymes with inferring vibes from text), but they kinda suck at:
1. Reasoning/strategising step-by-step for very long periods
2. Snap judgements about reasoning or taking strategic actions (in the way that expert strategic humans don't actually need to think through their actions step-by-step very often - they've built intuition which gets them straight to the best answer 90% of the time)
Getting good at the long range thinking might require more substantial architectural changes (eg. some sort of separate 'system 2' reasoning architecture to complement the already pretty great 'system 1' transformer models we have). OTOH, it might just require better training data and algorithms so that the models develop good enough strategic taste and agentic intuitions to get to a near-optimal solution quickly before they fall off a long-range reasoning performance cliff.
Of course, maybe the problem is really hard and there's no easy breakthrough (or it requires 100,000x more computing power than we have access to right now). There's no certainty to be found, but a scary breakthrough definitely seems possible to me.
I think you are right, and that the next step function can be achieved using the models we have, either by scaling the inference, or changing the way inference is done.
People are doing all manner of very sophisticated inferency stuff now - it just tends to be extremely expensive for now and... people are keeping it secret.
If it was good enough to replace people then it wouldn't be too expensive, they would have launched it and replaced a bunch of people and made trillions of dollars by now.
So at best their internal models are still just performance multipliers unless some breakthrough happened very recently, it might be a bigger multiplier but that still keeps humans with jobs etc and thus doesn't revolutionize much.
reply