They belong to Microsoft Research not DevDiv, so while that doesn't protect them from layoffs, certainly gives them some protection being under different management.
Microsoft Research sites tend to be based in collaborations with university research labs.
Would be helpful to know why you think this. Even if there are common reasons that others could point to (and please don't; it won't be helpful), your comment doesn't make any sense without that context.
It's a large corporation. I'm certain someone asked that question and got an answer before starting producing Python tools. It's management's job to ask that question and get answers, you know.
Anthropomorphising a corporation is fanciful. A manipulative writing technique given that it appears you're hallucinating an emotional state that you have no way to know!
> have taken on a state in court
You are suggesting Microsoft should get involved politically? That would also condone that they should get involved in subjects you wouldn't support.
Trying to fix symptoms is a losing battle: the cause needs fixing.
MS is a group of people, led by people and people have emotions. Corporation is a legal abstraction/concept through which states views MS, for legal purposes and whatever. I don't have to, I can view it for what it is. An organized group of people doing stuff for both money and other reasons.
Saying "MS does something" is a shortcut for "decision makers at MS doing something".
I'm not suggesting anything other than that MS are boycot-worthy for their past decisions and actions, incl. recent ones. I'm not interested in "fixing" MS, it's much easier just to try to avoid it and its products.
Yes, I don't know for sure whether this particular decision was done happily by someone in MS. It doesn't matter much, either in the end.
I've been programming with Python for over 10 years now, and I use type hints whenever I can because of how many bugs they help catch. At this point, I'm beginning to form a rather radical view. As LLMs get smarter and vibe coding (or even more abstract ways of producing software) becomes normalized, we'll be less and less concerned about compatibility with existing codebases because new code will be cheaper, faster to produce, and more disposable. If progress continues at this pace, generating tests with near 100% coverage and fully rewriting libraries against those tests could be feasible within the next decade. Given that, I don't think backward compatibility should be the priority when it comes to language design and improvements. I'm personally ready to embrace a "Python 4" with a strict ownership model like Rust's (hopefully more flexible), fully typed, with the old baggage dropped and all the new bells and whistles. Static typing should also help LLMs produce more correct code and make iteration and refactoring easier.
I agree, older code is evidence of survivorship bias. We don't see all of the code that was written with the older code that was removed or replaced (without a code repository).
> I'm personally ready to embrace a "Python 4" with a strict ownership model like Rust's (hopefully more flexible), fully typed, with the old baggage dropped and all the new bells and whistles. Static typing should also help LLMs produce more correct code and make iteration and refactoring easier.
So...a new language? I get it except for borrow checking, just make it GC'ed.
But this doesn't work in practice, if you break compatibility, you are also breaking compatibility with the training data of decades and decades of python code.
Interestingly, I think as we use more and more LLMs, types gets even more and more important as its basically a hint to the program as well.
I think people are still fooling themselves about the relevance of 3GL languages in an AI dominated future.
It is similar to how Assembly developers thought about their relevance until optimising compilers backends turned that into a niche activity.
It is a matter of time, maybe a decade who knows, until we can produce executables directly from AI systems.
Most likely we will still need some kind of formalisation tools to tame natural language uncertainties, however most certainly they won't be Python/Rust like.
We are moving into another abstraction layer, closer to the 4GL, CASE tooling dreams.
"Since FORTRAN should virtually eliminate coding and debugging…" -- FORTRAN report, 1954 [1]
If, as you seem to imply and as others have stated, we should no longer even look at the "generated" code, then the LLM prompts are the programs / programming language.
I can't think of a worse programming language, and I am not the only one [2]
However, it does indicate that our current programming languages are way to low-level, too verbose. Maybe we should fix that?
> I think people are still fooling themselves about the relevance of 3GL languages in an AI dominated future.
I think, as happens in the AI summer before each AI winter, people are fooling themselves about both the shape and proximity of the “AI dominated future”.
That's the wrong distinction, and bringing it up causes pointless arguments like are in the replies.
The right distinction is that assemblers and compilers have semantics and an idea of correctness. If your input doesn't lead to a correct program, you can find the problem. You can examine the input and determine whether it is correct. If the input is wrong, it's theoretically possible to find the problem and fix it without ever running the assembler/compiler.
Can you examine a prompt for an LLM and determine whether it's right or wrong without running it through the model? The idea is ludicrous. Prompts cannot be source code. LLMs are fundamentally different from programs that convert source code into machine code.
This is something like "deterministic" in the colloquial sense, but not at all in the technical sense. And that's where these arguments come from. I think it's better to sidestep them and focus on the important part: compilers and assemblers are intended to be predictable in terms of semantics of code. And when they aren't, it's a compiler bug that needs to be fixed, not an input that you should try rephrasing. LLMs are not intended to be predictable at all.
So focus on predictability, not determinism. It might forestall some of these arguments that get lost in the weeds and miss the point entirely.
LLMs are deterministic. So far every vendor is giving them random noise in addition to your prompt though. They don't like have a free will or a soul or anything, you feed them exactly the same tokens exactly the same tokens will come out.
If you change one letter in the prompt, however insignificant you may think it is, it will change the results in unpredictable ways, even with temperature 0 etc. The same is not true of renaming a variable in a programming language, most refactorings etc.
Locally that’s possible but for multi tenant ones I think there’s other challenges related to batch processing (not in terms of the random seed necessarily but because of other non determinism sources).
> Most likely we will still need some kind of formalisation tools to tame natural language uncertainties, however most certainly they won't be Python/Rust like
No, I didn't miss it. I think the fact that LLMs are non deterministic means we'll need a lot more than "some kind of formalization tools", we'll need real programming languages for some applications!
There is a world of difference between "my code is generated by an LLM where a tiny change in the prompt might produce an entirely different program" and "this CPU doesn't have AVX2".
You moved the goal posts and declared victory - that's not what deterministic means. It means same source, same flags, same output. Under that definition, the actual definition, they're 99.9% deterministic (we strive for 100% but bugs do happen).
Nope the goal stayed at the same position, people argue for deterministic results while using tools that by definition aren't deterministic unless a big chunck of work is done ensuring that it is indeed.
"It means same source, same flags, same output", it suffices to change the CPU and the Assembly behaviour might not be the same.
You see how this folder has folders for each target? Then within each target folder there are tests (thousands of tests)? Each of those tests is verified deterministically on each commit.
Edit: there's an even more practical way to understand how you're wrong: if what you were saying were true, ccache wouldn't work.
You keep being you, but you also have to admit, not only do you move goal posts, but most of arguments are on dollies, performing elaborate choreographies that would make Merce Cunningham blush.
You have a point, but in making it I think you're undermining your argument.
Yes, it's true that computer systems are nondeterministic if you deconstruct them enough. Because writing code for a nondeterministic machine is fraught with peril, as an industry we've gone to great lengths to move this nondeterminism as far away from programmers as possible. So they can at least pretend their code code is executing in a deterministic manner.
Formal languages are a big part of this, because even though different machines may execute the program differently, at least you and I can agree on the meaning of the program in the context of the language semantics. Then we can at least agree there's a bug and try to fix it.
But LLMs bring nondeterminism right to the programmer's face. They make writing programs so difficult that people are inventing new formalisms, "prompt engineering", to deal with them. Which are kind of like a mix between a protocol and a contract that's not even enforced. People are writing full-on specs to shape the output of LLMs, taking something that's nondeterministic and turning into something more akin to a function, which is deterministic and therefore useful (actually as an aside, this also harkens to language design, where recently languages have been moving toward immutable variables and idempotent functions -- two features that combined help deal with nondeterministic output in programs, thereby making them easier to debug).
I think what's going to happen is the following:
- A lot of people will try to reduce nondeterminism in LLMs through natural language constrained by formalisms (prompt engineering)
- Those formalisms will prove insufficient and people will move to LLMs constrained with formal languages that work with LLMs. Something like SQL queries that can talk to a database.
- Those formal languages will work nicely enough to do simple things like collecting data and making view on them, but they will prove insufficient to build systems with. That's when programming languages and LLMs come back together, full circle.
Ultimately, my feeling is the idea we can program without programming languages is misunderstanding what programming languages are; programming languages are not for communicating with a computer, they are for communicating ideas in an unambiguous way, whether to a computer or a human or an LLM. This is important whether or not a machine exists to execute those programs. After all, programming languages are languages.
And so LLMs cannot and will not replace programming languages, because even if no computers are executing them, programs still need to be written in a programming language. How else are we to communicate what the program does? We can't use English and we know why. And we can't describe the program to the LLM in English for the same reason. The way to describe the program to the LLM is a programming language, so we're stuck building and using them.
I think the question is: What is the value of that intermediate step? It depends on how long the full path takes.
If we're one year away from realizing a brave new world where everyone is going straight from natural language to machine code or something similar, then any work to make a "python 4" - or any other new programming languages / versions / features - is rearranging deck chairs on the Titanic. But if that's 50 years away, then it's the opposite.
It's hard to know what to work on without being able to predict the future :)
Wild thought: maybe coding is a thing of the past? Given that an llm can get fast&deterministic results if needed, maybe a backend for instance, can be a set of functions which are all textual specifications and by following them it can do actions (validations, calculations, etc), approach apis and connect to databases, then produce output? Then the llm can auto refine the specifications to avoid bugs and roll the changes in real time for the next calls? Like a brain which doesn't need predefined coding instructions to fulfill a task, but just understand its scope, how to approach it and learn from the past.
Fast forward to the near future, why wouldn't it with the correct restrictions? For instance, would you let it today run SELECT queries? as Hemingway once said "if it's about price we know who you are".
I'd think LLMs would be more dependent on compatibility than humans, since they need training data in bulk. Humans can adapt with a book and a list of language changes, and a lot of grumbling about newfangled things. But an LLM isn't going to produce Python++ code without having been trained on a corpus of such code.
It should work if you feed the data yourself, or at the very least the documentation. I do this with niche languages and it seems to work more or less, but you will have to pay attention to your context length, and of course if you start a new chat, you are back to square one.
I don't know if that's a big blocker now we have abundant synthetic data from a RL training loop where language-specific things like syntax can be learned without any human examples. Human code may still be relevant for learning best practices, but even then it's not clear that can't happen via transfer learning from other languages, or it might even emerge naturally if the synthetic problems and rewards are designed well enough. It's still very early days (7-8 months since o1 preview) so to draw conclusions from current difficulties over a 2-year time frame would be questionable.
Consider a language designed only FOR an LLM, and a corresponding LLM designed only FOR that language. You'd imagine there'd be dedicated single tokens for common things like "class" or "def" or "import", which allows more efficient representation. There's a lot to think about ...
It’s just as questionable to declare victory because we had a few early wins and that time will fix everything.
Lots of people had predicted that we wouldn’t have a single human-driven vehicle by now. But many issues happened to be a lot more difficult to solve than previously thought!
One has to wonder, why would there be any bugs at all if the LLM could fix them? Given Kernighan's Law, does this mean the LLM can't debug the bugs it makes?
My feeling is unless you are using a formal language, then you're expressing an ambiguous program, and that makes it inherently buggy. How does the LLM infer your intended meaning otherwise? That means programmers will always be part of the loop, unless you're fine just letting the LLM guess.
Kernighan's Law - Debugging is twice as hard as writing the code in the first place.
The same applies to humans, who are capable of fixing bugs and yet still produce bugs. It's easier to detect bugs with tests and fix them than to never have introduced bugs.
But the whole idea of Kernighan’s law is to not be so clever that no one is available to debug your code.
So what happens when an LLM writes code that is too clever for it to debug? If it weren’t too clever to debug it, it would have recognized the bug and fixed it itself.
Do we then turn to the cleverest human coder? What if they can’t debug it, because we have atrophied human debugging ability by removing them from the loop?
Just want you to know this heart monitor we gave you was engineered with vibe coding, that's why your insurance was able to cover it. Nobody really knows how the software works (because...vibes), but the AI of course surpasses humans on all current (human-created) benchmarks like SAT and bar exam tests, so there's no reason to think its software isn't superior to human-coded (crusty old non "vibe coded" software) as well. You should be able to resume activity immediately! good luck
Welcome to the flight, this is your captain speaking. Just want to let you know our entire flight system was vibe coded to the strict standards you expect from our industry, iterated and refined in a virtual environment over twenty virtual-years, with no fallible human eyes reviewing it - even if it were possible to review the mountain of impenetrable machine-generated code. The pilot will be controlling the plane via a cutting-edge LLM interface, prompt-engineering our way to our overseas destination. Relax, get comfortable, and pray to the collective intelligence distilled from Reddit posts.
this is black and white thinking. if the practice of "let the AI write the code and assume it's fine because I'm only an incurious amateur anyway" becomes normalized, the tendency of AI to produce inaccurate slop will become more and more part of software we use every day and definitely will begin impacting functions that are more and more critical over time.
Different tools for different jobs is not black and white thinking.
I remember when people said the same thing about Basic; how dare anyone create such an abomination, whole generations of programmers will be useless because they learned this oversimplified, terrible language instead of proper assembly.
> embrace a "Python 4" with a strict ownership model like Rust
Rust only does this because it targets low-level use cases without automatic memory management, and makes a conscious tradeoff against ease of programming.
> Static typing should also help LLMs produce more correct code and make iteration and refactoring easier.
You say that as if they "understand" and, as actual usage has shown us, if they're perfectly comfortable making up whole function names I'm super confident a little nuance of "what type is this?" is not going to stand in their way of just placating the requestor
Ownership models like Rust require a grester ability for holistic refactoring, otherwise a change in one place causes a lifetime issue elsewhere.
This is actually exactly what LLM's are doing the worst at.
Beyond that, a Python with something like lifetimes implies doing away with garbage-collection - there really isn't any need for lifetimes otherwise.
What you are suggesting has nothing to do with Python and completely misses the point of why python became so widely used.
The more general point is that garbage collection is very appealing from a usability standpoint and it removes a whole class of errors.
People who don't see that value should look again at the rise of Java vs c/c++.
Businesses largely aren't paying for "beautiful", exacting memory management, but for programs which work and hopefully can handle more business concerns with the same development budget.
While I go into another direction in a sibling comment, lifetimes does not imply not needing garbage collection.
On the contrary, having both allows the productivity of automatic resource management, while providing the necessary tooling to squeeze the ultimate performance when needed.
No need to worry about data structures not friendly to affine/linear types, Pin and Phantom types and so forth.
It is no accident that while Rust has been successful bringing modern lifetime type systems into mainstream, almost everyone else is researching how to combine linear/affine/effects/dependent types with classical automatic resource management approaches.
Rust lifetimes are generally fairly local and don’t impact refactoring too much unless you fundamentally change the ownership structure.
Also a reminder that Rc, Arc, and Box are garbage collection. Indeed rust is a garbage collected language unless you drop to unsafe. It’s best to clarify with tracing GC which is what I think you meant.
A lot of people want a garbage collected Rust without all the complexity caused by borrow checking rules. I guess it's because Rust is genuinely a great language even if you ignore that part of it.
Thankfully, like many other languages that rather combine models instead of going full speed into affine types, OCaml is getting both.
Besides the effects type system initially introduced to support multicore OCaml, Jane Street is sponsoring the work for explicit stack allocation, unboxed types, modal types.
I have not used Reason ML as I have not had the reason to. :D
But apparently the target audience is JavaScript / TypeScript developers, and I think it is mainly used for web development IIRC, whereas OCaml is much more general-purpose and even low-level at times.
Jane Street is doing a great job at contributing to OCaml itself and its libraries.
By the way, wouldn't it be possible to have a garbage-collecting container in Rust? Where all the various objects are owned by the container, and available for as long as they are reachable from a borrowed object.
I'd rather love to see confluent persistence in python, i.e. a git-like management of an object tree.
so when you create a new call stack ( generator, async sth, thread) you can create a twig/branch, and that is modified in-place, copy on write.
and you decide when and how to merge a data branch,there are support frameworks for this, even defaults but in general merging data is a deliberate operation. like with git.
locally, a python with this option looks and feels single threaded,
no brain knots. sharing and merging intermediate results becomes a deliberate operation with synchronisation points that you can reason about.
cinder includes changes for immutable module objects. I wonder if the implementation is similar? Or is cinder so old that it would be incompatible with the future noGil direction?
I work on cinder, it is primarily a JIT compiler, and does not have this sort of concurrency construct (which looks extremely exciting and well thought out, props to the verona team!).
cinder is actively developed (we recently upgraded it to 3.12) and is definitely going to be compatible with free threaded (nogil) python.
If I understand it correctly, this is only catching ownership violations at runtime, so it doesn't actually prevent writing/shipping the bug? But it does seem to be able to improve the detection rate and determinism, and also help with diagnosing the bug when it's discovered. If this does let simple unit tests in CI discover concurrency bugs, that's a big improvement, even if it's not as strong as static analysis. I imagine there are still cases where the ownership violation is not deterministic though, e.g. depending on the data or the app's configuration, and maybe will not be caught until production
Sounds like a fun job, I’d love to do something like this in my 9 to 5.
It’s also amazing how much work goes into making Python a decent platform because it’s popular. Work that will never be finished and could have been avoided with better design.
Get users first, lock them in, fix problems later seems to be the lesson here.
I would say it was closer to 2005 that Python really took off. Coincidentally around when I started using it, but I remember a noticeable increase in "buzz".
> Get users first, lock them in, fix problems later seems to be the lesson here.
Or with a less cynical spin: deliver something that's useful and solves a problem for your potential users, and iterate over that without dying in the process (and Python suffered a lot already in the 2 to 3 transition)
2 to 3 was possibly precisely because of user lock-in and sunk cost. This kind of global update was unprecedented, and could have been totally avoided with better design.
Imo it is less about locking anyone in (in this case) and more about what Python actually enables: exceedingly fast prototyping and iteration. Turns out the ability to ship fast and iterate is actually more useful that performance, esp in a web context where the bottlenecks are frequently not program execution speed.
Python has compounding problems that make it extremely tricky though.
If it was just slow because it was interpreted they could easily have added a good JIT or transpiler by now, but it's also extremely dynamic so anything can change at any time, and the type mess doesn't help.
If it was just slow one could parallelise, but it has a GIL (although they're finally trying to fix it), so one needs multiple processes.
If it just had a GIL but was somewhat fast, multiple processes would be OK, but as it is also terribly slow, any single process can easily hit its performance limit if one request or task is slow. If you make the code async to fix that you either get threads or extremely complex cooperative multitasking code that keeps breaking when there's some bit of slow performance or blocking you missed
If the problem was just the GIL, but it was OK fast and had a good async model, you could run enough processes to cope, but it's slow so you need a ridiculous number, which has knock-on effects on needing a silly number of database/api connections
I've tried very hard to make this work, but when you can replace 100 servers struggling to serve the load on python with 3 servers running Java (and you only have 3 because of redundancy as a single one can deal with the load), you kinda give up on using python for a web context
If you want a dynamic web backend language that's fast to write, typescript is a much better option, if you can cope with the dependency mess
If it's a tiny thing that won't need to scale or is easy to rewrite if it does, I guess python is ok
> If it was just slow because it was interpreted they could easily have added a good JIT or transpiler by now, but it's also extremely dynamic so anything can change at any time, and the type mess doesn't help.
See Smalltalk, Common Lisp, Self.
Their dynamism, image based development, break-edit-compile-redo.
What to change everything in Smalltalk in a single call?
a becomes: b
Now every single instance of a in a Smalltalk image, has been replaced by b.
Just one example, there is hardly anything that one can do in Python that those languages don't do as well.
Smalltalk and Self are the genesis of JIT research that eventually gave birth to Hotspot and V8.
Second paragraph is not really true, unless you’ve gone out of your way. Cython is used primarily for compute-bound problems, not processing user input.
I agree that fast iteration and the „easy to get something working” factor is a huge asset in Python, which contributed to its growth. A whole lot of things were done right from that point of view.
An additional asset was the friendliness of the language to non-programmers, and features enabling libraries that are similarly friendly.
Python is also unnecessarily slow - 50x slower than Java, 20x slower than Common Lisp and 10x slower than JavaScript. It’s iterative development is worse than Common Lisp’s.
I’d say that the biggest factor is simply that American higher education adopted Python as the introductory learning language.
For American higher education, It was Pascal ages ago, and then it was Java for quite a while.
But Java is too bureaucratic to be an introductory language, especially for would-be-non-programmers. Python won on “intorudctoriness” merits - capable of getting everything done in every field (bio, chem, stat, humanities) while still being (relatively) friendly. I remember days it was frowned upon for being a “script language” (thus not a real language). But it won on merit.
Microsoft just fired 3% of its staff, more than it ever did before. I would stick with type-checked free-threaded Python with locks and queues. Someone should be able to enhance the type checker to also check for unsafe mutation of variables.
This looks like a pivot on the Project Verona research, as there have not been much other papers out since the initial announcement, regarding the programming language itself.
I'm a big BEAM person, but python 3.0.0 was released december 2008. At that time, I believe OTP R12 was current, and it only gained SMP support in R11. [1] In 2008, I don't know that it would have been clear that the BEAM would be a good target. And I don't know how switching to BEAM then would have addressed what I think is the core issue python 3 was working on, unicode strings; BEAM didn't start taking on unicode until R13 and IMHO, is kind of on the slow end of unicode adoption (which isn't always bad... being late means adopting industry consensus with less of the intermediate false steps)
At the time of the original Py3 release, PyPy was not ready for wide use. Otherwise maybe there could have been a chance of it replacing CPython. They were in too big a hurry to ship Py3 though. Tragedy.
Which is a pity, Python ends up being the only major dynamic language, where for all pratical purposes there is no JIT support, because while there are alternative implementations with great JIT achievements, the comunity behaves as if all that effort was for nothing other than helping PhD students doing their thesis.