> If these hundred tasks make heavy use of the CPU, then the sync and async solutions would have similar performance, since the speed at which the CPU runs is fixed, Python's speed of executing code is always the same and the work to be done by the application is also equal.
I think one of the things the author has not discussed is that if you have high CPU then you are going to get starvation as async does not have a fairness mechanism - timesharing is co-operative and based on FIFO in all the implementations I've seen.
If you start to saturate CPU (even a bit) you get uneven distribution of cpu time between your various async threads. This translates into chronic timeouts. For async to be applicable you really need to have a) high io load and b) lowish cpu load.
In production systems, you don't use coroutines alone. You have several processes, some may contain several threads, some thread may contain an event loop.
In fact, it's already the case with WSGI setups: you may have async nginx on front, multi-process gunicorn behind, and a thread pool for you db connections. And asyncio already provides API for sending tasks to process pools and threads and getting them back as async Future as a convenience. It's turtles all the way down.
Bottom line, async, threads or processes are not silver bullets, and you should use the right tool for the right job.
Just like when Go users realized that spawning goroutines for everything actually didn't work at scale, throwing awaits everywhere is not a magical solution.
Having asyncio at your disposal make some tasks easier, like websockets, crawling your network, etc. It opened the door for new architectures, and maybe new features (live settings, task queues that feeds the results back to you, etc.).
But it's been oversold, as with anything new and shiny.
I believe I did discuss this, and in fact, I list the same a) and b) conditions you mention using slightly different words. Quoting the article: "To benefit from the async style, an application needs to have tasks that are often blocked by I/O and don't have too much CPU work."
This is a great point! If you use an established framework like tornado / twisted - they have idiomatic ways to deal with these.
For e.g. If you have a loop - that can potentially starve the rest of your service when the iterable is too long - go ahead and throw a `yield gen.moment` in there. `yield gen.sleep` can also be incredibly handy. Once you realize that you are just living in a world of partially executed functions - you can always have your coroutine phone home to the event loop and ask if anyone else needs some CPU.
Of-course for this to work you need lumpiness in your workload, if the CPU is just always starved this is more likely a problem of under-allocation of compute.
can attest to this fact, I've isolated this issue in Openstack components once or twice where we identified CPU-bound tasks blocking other greenlets leading to new MySQL database connections rejecting the client, because there's an initial auth phase that has a very short 10 second timeout while it waits for the client's response to a security challenge.
> ... you are going to get starvation as async does not have a fairness mechanism - timesharing is co-operative and based on FIFO in all the implementations I've seen.
FIFO systems can have starvation when dealing with time bound things like web requests - slow request handlers can cause other (waiting) requests to timeout.
In a thread-per-request or process-per-request model where you run say 4 * cpu workers the operating system scheduler is able to interrupt stuff that's hogging the CPU but that isn't the case in a typical 1 * cpu worker async model.
When there's little cpu load this is not a real threat but when there is material load you can easily run into problems.
I see; it sounds like latency would become very unpredictable under that model. I can imagine that's a serious problem in practice, but it's not typically what's meant by starvation.
"Task does not receive resources to advance within the required time interval" is exactly starvation. A "task" here is responding to a request. The required time interval is the timeout.
ctrl+f'd "time interval", zero matches. The first linked source in Wikipedia, "Modern operating systems" even says a FIFO prevents starvation, "Starvation can be avoided by using a first-come, first-serve, resource allocation policy. In due course of time, any given process will eventually become the oldest and thus get the needed resource"
I think these are equivalent when applicable, because starving a task long enough of a resource for either the resource to expire or the task's deadline to expire has the same effect as starving the task of the resource indefinitely: task fails to progress.
That's called "firm real-time failure". Again, the standard definitions of starvation/liveness only use "eventually" and "never" without mentioning actual physical time IIRC, and the real-time systems use a whole another bunch of concepts to deal with actual, physical time.
And it doesn't really fail to progress if it's cancelled on deadline expiration, it progresses to a non-successful completion. If it instead hanged indefinitely, yes, that'd be a problem.
Because any function can use as much time as it wants before returning. Event loops process events by calling functions which must return in order to allow the loop to make progress. Complex handlers will stall the loop and increase latency.
The recent innovations in asynchronous programming involve ways to signal to the programming language that the current task is blocked, thereby allowing the underlying loop to move on to the next event. While this improves the efficiency of the asynchronous processing system, programmers still need to use these features correctly just like coroutines.
If you're curious about this topic then I would love to link you to https://techspot.zzzeek.org/2015/02/15/asynchronous-python-a... written by Mike Bayer (author of SQLAlchemy) -- it's an article that I keep referring to as a must-read, incredibly well written and all about real-world implications (totally transferable to other non-Python ecosystems).
If you like to, maybe also have a read into https://gehrcke.de/gipc/background.html. This is where I tried to put into words what I learned a couple of years ago when I wrote a library for the gevent ecosystem. You'll find some paragraphs about "Cooperative scheduling vs. preemptive scheduling", about "Asynchronous vs. synchronous", and about a few more simple and yet important topics in the world of event-driven architectures.
Not mentioned in TFA, but I'm utterly convinced that the only compelling reason for async is to avoid the per-thread stack memory allocation of 8MB per thread or whatever it is, in order to be able to scale to an extremely large number of concurrent threads/coroutines. You can't do this with threads.
Async lets you do this whilst still storing the state of unfinished things in a stack, i.e. not having to make a million callbacks. Making it look like you're using threads even though you're not.
The better async code and interfaces become, the more it looks like regular old multithreaded code, except there aren't actual threads underlying it. You still need to make sure you serialise access to shared resources, don't share data that shouldn't be shared, etc. All the same considerations as multithreaded code.
Almost ends up looking like the underlying mechanism: threads with a GIL vs async ought to be an implementation detail that doesn't require you to modify your entire programming model.
Extension code not holding the GIL can still run in true parallel with real threads, so that's a meaningful difference but is usually not relevant for IO where async is usually used.
> avoid the per-thread stack memory allocation of 8MB per thread or whatever it is, in order to be able to scale to an extremely large number of concurrent threads/coroutines. You can't do this with threads.
It's 8 K B per thread, so you can scale a thousand times further than you thought. One dark secret of the async movement is that if your goal is C10K (10,000 concurrent clients) then actually bog standard threading will handle that fine these days.
> The better async code and interfaces become, the more it looks like regular old multithreaded code, except there aren't actual threads underlying it. You still need to make sure you serialise access to shared resources, don't share data that shouldn't be shared, etc. All the same considerations as multithreaded code.
Depends what approach you're using. I prefer making an explicit distinction between sync and async functions ( https://glyph.twistedmatrix.com/2014/02/unyielding.html ), so you effectively invert the notion of a "critical section" - instead of marking which sections can't yield, you mark which sections can yield, so your code is safe by default and you can introduce concurrency explicitly as and when you need it for performance, rather than your code being fast-but-unsafe by default and you're expected to fix a bunch of rare nondeterministic bugs with minimal support from your tools, which is how it works in a multithreading world.
> It's 8 K B per thread, so you can scale a thousand times further than you thought. One dark secret of the async movement is that if your goal is C10K (10,000 concurrent clients) then actually bog standard threading will handle that fine these days.
Default virtual memory allocation for threads on Linux distributions tends to be 8 megabytes. Actual memory used is the peak stack depth used, rounded up a bit. It'd be pretty unusual to only use as little as 8 kilobytes per thread; just the standard per-thread libc context information for concurrency is a few kilobytes, plus at least one page of stack, plus the kernel's information about the thread (which isn't counted against the process)...
Yes, you can spawn thousands of threads on relatively modest hardware; I was spawning thousands of threads a decade ago.
Spawning 5000 bare-minimal python threads that do nil seems to use about 300 megs of ram on my system; real threads that do anything substantial will use a whole lot more, even if their use of the stack depth is intermittent.
Not to mention allocators that cache part of freed heap per-thread, etc.
I believe 8k is the size of the per thread stack on the kernel side. This is non-pageable memory, so it will consume physical memory whether it is needed or not, while of course the 8 megabytes is paged in on demand.
Yah, I'm ignoring the kernel stack and all kernel data structures. User space memory used will be at least a page of stack (reaching up to the maximum amount used in the thread), plus the libc reentrancy data structures, plus per-thread heap caches, etc. It's can all get paged out, but we hardly want that these days.
The key distinction is that the 8MB of stack VM doesn't have a backing until the memory is used in the thread, but afterwards it does forever.
Well, it's forever if you assume that the threads live forever; for a web server it's perfectly practical to have threads that only live for a single request, or to reuse them for multiple requests but not allow a single thread to live longer than say 10 minutes.
Short-lived threads (for one request) are a performance and scalability disaster; tens of microseconds or worse to spawn and join, contention on important locks, bad for caches, etc. There's not much concurrency when it comes to spawning threads, too.
If you have long-lived threads in a pool, yes, they may not live forever, but you generally have to assume that each thread will end up with a resident stack size equal to the largest stack use: each will get a turn to run the stack-intensive functions.
I'd argue that the main benefits of day-to-day async programming isn't performance but actually the concurrency patterns that help you sequence your code and resource access in ways that `thread { work() }` could not.
For example, future/result combinators and `await [task1, task2.then(task3)]`.
Not really - it gets very messy because every time you transform a future you have to figure out where you're getting the thread for that transformation to run on.
sorry, what transformation? A future is simply a placeholder for something being computed asynchronously. On a threadful design you would simply spawn a thread (or pick one from a thread pool) to handle the computation. Normally your future runtime would handle it for you.
Basically you end up with something similar to the fork-join model.
Whenever you want to transform a result that's in a future, e.g. you have a future for a number and want to add 2 to it.
> On a threadful design you would simply spawn a thread (or pick one from a thread pool) to handle the computation.
If you allow yourself to spawn threads everywhere you'll quickly run out of resources. So you have to manage which thread pool you're using where and ensure you're not bringing in priority inversions etc.. It's really not that easy.
> Basically you end up with something similar to the fork-join model.
The fork-join model isn't really a purely thread-based model - the work-stealing technique is pretty much trying to reimplement what async-style code would do naturally.
> ensure you're not bringing in priority inversions etc..
Remember we're comparing to async/futures, which are not really guaranteed to not starve either. At least with thread pools you can, in theory, manage this well.
> Remember we're comparing to async/futures, which are not really guaranteed to not starve either. At least with thread pools you can, in theory, manage this well.
With async/futures you're giving the runtime control over these decisions, whereas with threads you're managing them yourself, which can be an advantage but only if you don't make errors with that manual control. An async/future runtime can know which tasks are waiting for which other tasks, letting it avoid deadlocks and a lot of possible priority inversions, and the async style naturally lends itself to writing code that's logically end-to-end (on a single "fiber" even as that fiber moves between threads), which means there's less need to balance resources across multiple thread pools.
That's almost certainly a bad idea in a web server context like this article is talking about. You improve best-case latency when the server's not loaded, but now you're using 3 threads per request to get a less than 2x speedup (and in a bigger example it would be worse), so your scaling behaviour will get worse.
always spawning a thread is of course the naive implementation. You can put an upper bound on the number of threads and fallback to synchronous execution of async operations in the worst case (for example inside the wait call).
If your threads are a bit more than dumb os threads (say, an hybrid M:N scheduler) you can do smarter scheduling, including work stealing of course.
Well, as your threads become less like threads and more like a future/async runtime you come closer to the advantages and disadvantages of a future/async runtime, yes.
The underlying thread model have always been 'async' in some form under the hood, i.e. at some point there is always a multiplexer/scheduler that schedules continuations. Normally this is inside the kernel, but M:N or purely superspace based thread models have been used for decades.
Really the only difference between the modern async model and other 'threaded' model is its 'stacklessness' nature. This is both a major problem (due to the green/red function issue and not being able to abstract away asynchronicity) and an advantage (due to the guaranteed fixed stack size, and, IMHO overrated, ability to identify yield points).
At the end of the day is always continuations all the way down.
In a multi-threaded context yes, reduced memory usage is the main benefit.
But async/await can also be used for other things! In a C# Windows GUI application, it's normal to use async/await on the UI thread. Your UI can await multiple tasks at the same time, yet you don't need any locks when accessing the UI state; because all your code runs on the UI thread.
This is a really useful programming model made possible by cooperative task-switching via `await` on a single thread.
Here the "await" being explicit is a crucial feature, it allows the programmer to reason about when the shared state might be mutated by other tasks (or maybe by the user clicking cancel while the current task is waiting).
Any pre-emptive task switching adds a lot of additional complexity and isn't really suitable for UI code.
“Your UI can await multiple tasks at the same time, yet you don't need any locks when accessing the UI state; because all your code runs on the UI thread.”
Is that true? I thought the code after each await runs on a different thread from the code before the await. At least that’s what I have observed when debugging things.
There are also other language implementation styles that get roughly the same benefit without writing async and await all over the codebase. If the implied yield at defined synchronization points coupled with a decent scheduler like in Go would make this all a non-issue [1].
> I'm utterly convinced that the only compelling reason for async is to avoid the per-thread stack memory allocation of 8MB per thread
Yes. The point is to use a single thread with one stack to process several tasks. This consumes less memory.
For example, Go currently has a minimum stack size of 2 KiB so a machine with 4 GiB of memory will be able to process less than 2 million goroutines. An event loop uses a single thread with a single stack, reducing memory usage at the cost of complexity.
Asynchronous functions are just like coroutines. The difference is they return to the awaiting caller instead of yielding to another function. The order of execution is determined by the underlying loop.
> I'm utterly convinced that the only compelling reason for async is to avoid the per-thread stack memory allocation of 8MB per thread or whatever it is, in order to be able to scale to an extremely large number of concurrent threads/coroutines.
If it's possible to avoid shared state then I tend to prefer threads, but in the presence of shared state there are good reasons to think that explicit coroutines are easier to reason about than threads or green-threads: https://glyph.twistedmatrix.com/2014/02/unyielding.html
In general, it is far easier to reason about locking with async code, as the number of preemption points is far far lower. As a result, you get very low overhead inter-task communication.
The "8 megabytes" is virtual memory. You're only incrementing a counter in a table, nothing is actually allocated until you actually start using that stack.
Also there is a point where managing many threads become a load on the CPU, which my inderstanding is why Rust is not provinding green threads anymore.
Why are you "utterly convinced" of that though? If I run 20 threads in Python, unix top reports resident memory usage to me as 14mb. Why do I see that number instead of 160mb?
This explanation works great as long as the underlying implementation of threads in your particular python implementation is ultimately concurrent, but not parallel.
For me, async code is easier to debug and understand because it is composed of regular functions (tagged with async).
This means if you have a function that is 10 layers deep, you can pause the debugger and see the stack context. Same with your IDE, you can jump to each function.
With threads that 10 function stack would be 10 threads, each requiring some sort of tooling at both software write time and runtime to get the context.
In summary composing a system of just “functions (sync + async)” is easier than vs “functions + threads”
If you wanted each function to wait for IO or an event without blocking a thread you need an event loop, or one thread per function that needs to block to wait on incoming events.
Well, I guess that is possible but I've never seen multithreaded server side code (in a thread per request type environment) bother - database calls or other IO just block the current thread.
If you had a thread-per-request, you would see the whole call stack for the request in your debugger, like you do now with async/await and an event loop.
If you had a greenlet-per-request and an event loop, you would see the whole call stack for the request in your debugger, like you do now with async/await and an event loop.
1. If you are building the web server, the async stack would have the functions of the web server code too. The thread only sees its history from when it was spawned for that request.
2. As an example, if you needed to start 5 async tasks and await them, the async code would keep caller context if you break inside the task. For the thread-request model, you start new threads for each 5 tasks, if you debug-break in those threads you would not get the caller stack context. Or do you?
If there are dependencies between those tasks (i.e. task 2 depends on the result of task 1, task 3 depends on the result of task 2...) what option do you have?
And if there aren't dependencies better to drop messages on a queue and have a completely different process handle those tasks.
Ideally, the tasks would be serialized when dependent on each other and not when not dependent on each other. Dropping them on a queue discards information about the caller, which is what this thread of the discussion is about. Using greenlet or async/await, you can serialize only when necessary and retain information about the caller.
hum, I'm missing something. The same call stack you get with async would map to exactly one thread call stack. Sure each thread will get its own call stack, but all related continuations that participate in an async call stack would be owned by the same thread (Assuming the same application design).
My point is that with async both the runtime and the dev tools try to make async call stacks look exactly like sync ones.
This keeps the context of how your code gets into certain states. If you have 10 threads, it’s like you have 10 different processes (without the context of how they are related - which you get with composing functions) which is harder to understand.
Also assuming that each thread does not have an event loop.
> I'm utterly convinced that the only compelling reason for async is to avoid the per-thread stack memory allocation of 8MB per thread
I needed to write an interface to a SSE API (tldr: long running socket connection with occasional message arriving as content)
There simply was no way to poll a http connection (e.g. requests) if it had new data. It would block no matter what. So I would have to start writing threaded code, or just use asyncio which seemed much more ergonomic.
All this async stuff should die as soon as possible. It exposes the limitations of the underlying tech and forces the application developer to deal with things that the systems level people couldn't bother with.
Look at the Erlang eco-system to see how this is done the right way.
Threads, asynchronicity, locks and so on should be dealt with at the OS or the framework level, not at the application level.
Javascript also suffers from this and it makes JS ugly and hard to read. Callback hell will be yours or you end up with crutches such as promises.
At the application level things should be as deterministic as possible and the default should be that statements executed in sequence will have their side effects updated in sequence as well.
No, now you get it all mixed up. One library will use callback functions you supply, another will use callbacks but inline them and yet another will use async/await. And you, the application developer are left somewhere in the middle.
Oh, and you can't use 'await' in the main thread of your JS, you can only use it from within another async function, so when you need it most - for instance during application start-up - it isn't available.
You can use promisify for the first one. For the second one I think these libraries doing that but still not supporting promises are not that common anymore but nothing you can't do with promises yourself.
Javascript has a single thread, but I see what you mean and it has been fixed recently. To support older versions you can simply declare an anonymous async function and execute it immediately.
Yeah, but now we have this weird mix of awaits, thens, and missing awaits that you're not sure were intentional.
I may be in the minority here, but I kind of prefer callback hell to async hell. It's fine on projects with one or two devs, but more than that and the situation seems to devolve into tabs-vs-spaces, where one group finds the other incomprehensible and won't use it.
This is pretty unfortunate. I expect that if JS had gotten fibers and a switch primitive, other languages would have copied that, but now users of unrelated languages are stuck with low-quality things forever.
I think of it the other way around: Thanks to javascript being single-threaded, we now have async/await keywords strewn throughout the software ecosystem where they provide no benefit.
Async is basically a super convenient way to interact with an event loop.
Any time you call `await` you are saying "schedule an event on the event loop, and call back into this method with the result once it's available"
Multi-threaded code is good for some things, but often you need an event loop to keep things reasonable (GUI code for example). That's where async / await really shines.
Exposing the limitations of the underlying tech is a good thing, because it makes it clear what needs to be fixed.
Early operating system APIs grew up around batch systems, which were inherently synchronous. As multi-user systems and networking became ubiquitous, computers have become inherently asynchronous. But system APIs have been slow to evolve, or else have made explicit decisions to go all-in with threading (e.g. WinAPI).
Threads are a paper-over. You can use threads to simulate asynchronicity over synchronous system APIs. As I'm sure you know, threads have a widespread performance tax (context switching blowing out your cache), and are an obstacle for the developer who wants to understand exactly what the machine is really doing.
When you use async, you can reason about what the machine is doing. Operating system APIs have been slow to effectively support this. For example, they usually support something like select(3) but it's usually non-trivial to check for keyboard input in the same select call as you are checking for network activity.
It's encouraging to see Linux add io_uring recently. I think I have read elsewhere that Windows is your main platform. Indeed, the Windows API steers programmers heavily towards threads due to design choices. Sometimes in linux you can hack in a path towards async by abusing file descriptors+threads but Windows is inflexible here.
Tanenbaum defends the synchronous model, "Parallel programming and interrupts are REALLY REALLY hard to get right. There are all kinds of race conditions you have to worry about. Synchronous is much safer. Sometimes there is no way around using an asynchronous interface, but it is always the last resort. Making it hard to do is a good idea." But the RMOX os seems to have solved the general problem of creating a pure-async system api.
The dialects of Javascript that are in the current generation of browsers are confusing partly because the async model in that js generation is half-baked. In another comment, you highlight the ecosystem problems that have grown around js callback hell.
But these problems are not an inherent problem of async programming. For example, python3 asyncio does not suffer from callback hell, because (1) it is straightforward to await on a coroutine to return a value, and (2) the ecosystem has grown up around a mature async model. There are still hassles with python asyncio libraries that use threads under the hood. For example, I've had trouble with aiofiles at scale.
Regarding Erlang: I see the actor model as a weak response to multicomputing. Some of the difficult problems of that domain: (1) message reliability; (2) grid-application survival when you lose a physical host. Erlang/Actor avoids responsibility here by telling the application programmer that they must assume that message delivery is unreliable. This pushes handling of all related problems into the application layer, where it is a distraction, and arduous to test. I think the actor model dodges hard problems that are inherent to its chosen domain. (Apologies if I have misunderstood the reason you highlighted Erlang.)
> I think I have read elsewhere that Windows is your main platform.
I don't know where you read that but I've been a Linux user since 2004 and used SGI Irix prior to that, QNX before that, FreeBSD before that and Xenix before that... Windows 3.1 is the last one I remember actually using (with Trumpet Winsock...). The only windows box we had was the one to do the administration on and to build the webcam software. Other than that it's been UNIX all the way since as far back as I care to remember.
> Greenlets are similar to coroutines [...] the async ecosystem in Python is fractured in two big groups.
Async in software development in general is fractured in two big groups. On the one hand languages like C#, C++ and Dart support (stackless) coroutines like Python’s - and on the other hand, Go, Java and Lua support stackful coroutines that are more like Python’s greenlets.
There are pros and cons to each approach. I wonder if one or the other will eventually become dominant.
to be fair, stackless coroutines are in the C++ language because they require language support. There are also options for stackful coroutines and of course good old threads.
I just want to rep Trio, mentioned in the article. I'm using it to prototype a system in a different language before doing another iteration of said system and it's is quite nice to use, at least compared to what I remember asyncio being like.
> it's is quite nice to use, at least compared to what I remember asyncio being like.
Yes, I agree. Fairly recently there's been anyio [0], which brings ideas from trio to other async libraries, in particular asyncio. E.g. it has equivalent of trio nurseries (what anyio calls "task groups") for implementing structured concurrency ideas in asyncio environments (saving you from headaches due to having to deal with tasks manually). Very neat way to use trio ideas when stuck on asyncio (e.g. for web apps). :)
anyio was also discussed in the latest Python Bytes episode [1].
It lets me do stuff that would be hard with ordinary Python.
No doubt it takes some effort and practice to grasp, but once you have it worked out you'll find it's a powerful tool in the toolbelt.
Many languages now have async and await - it's not exclusive to Python. The reason so many languages have gopne this way is that so far it's the easiest and most sensible solution to writing concurrent code in a way that won't become incomprehensible.
And if you are trying to get your head around it, the easiest analogy is the web browser and JavaScript. If you have grasped JavaScript and the event loop then essentially you have grasped async Python, or not far from it.
Some people misunderstand async Python - they think the point is to make Python faster - it isn't. The point is to enable concurrent programming.
I do understand the haters - at first, until you grasp it, async Python so incredibly foreign and so different from ordinary synchronous Python that you might think "AAAARGH why did they do this and make it so hard"? But really it's not hard once you grasp the mental model of what is going on.
> it's the easiest and most sensible solution to writing concurrent code in a way that won't become incomprehensible
I don't think that async/await is necessarily the most sensible solution, there is also the "implicit" concurrency model used in Go, Erlang, and even Python (using gevent/eventlet). A big benefit of the "implicit" model is that it doesn't split code into async vs sync, and so you don't have two versions of every library, one for sync and one for async. In a lot of cases in fact, you can run existing Python code on gevent and have concurrency "just work", without having to spring async/await everywhere.
Python's implementation of async/await is also particularly bad for teaching: not only do you now need to teach people that there are multiple ways to call functions, but async functions cannot be called from the CPython interpreter directly. (You can from the IPython interpreter though).
I love concurrent programming, but I don't love the effect that async/await has on code - it makes for less re-usability, and has a more difficult mental model to grasp than the implicit style. And when you can just import and monkey-patch Gevent to get all of the benefits of re-writing your entire codebase to be async/await, why would you?
Mostly agree around reusing existing libs, and being and split point on the community, but I also can't understand why so many detractors. It's a tool and does it's job quit well. Anyway, in my test I prefer explicit and don't like monkey patching :)
It does something we were already doing worse than we were already doing it. In Python's case, it does it worse than we were already doing it in the same language. At least for me, this is why I don't have many nice things to say about it.
It's also fine to "prefer to be explicit", but it is weird to prefer to be explicit ONLY about whether a function can do the "switch stacks" operation on your behalf (not whether it can consume random numbers from the global RNG, not whether it can create files, not whether it can do blocking I/O, not whether it can open a server socket, not whether it can send HTTP requests, not whether it can write log lines, not whether it mutates its parameters, etc.), and to be explicit about it in a way that invites bugs that don't need to be possible to write (async functions produce a value that can then be awaited, so it's a bug to not await them when you meant to, or to await them when you didn't mean to, and it's common to write such bugs when adding an await operation to a formerly non-async function that is called by other formerly non-async functions).
I must say that python3.8 does a better job detecting not awaited awaitables.
Anyway,asyncio it's not new to python, the stdlib still has asyncore in it. There are still awesome web servers around (threaded) but using "asyncio" on the main thread. ZServer, medusa (look at what powers supervisor for example).
The file io thing it's still a pain, but same happens on node side.. but seems there is in the way something that will fix it.
I think the whole monkey patching thing is a bit of a red herring, gevent changes the python execution model enough that there may as well be a "gpython" command which runs Python - Python+gevent may as well be a different distribution of Python.
But even so, is the cost of having to run a monkey patch (or some other startup script) greater than the cost of having to rewrite an entire ecosystem of libraries? I don't think so, not by a long shot. Just look at the whole "aiolibs" project to see how ridiculous this has become. Are some of those libraries better put-together than their originals? Sure. Is it worth splitting an ecosystem in half and having one half not even able to call into the other half? Absolutely not.
Someone should take care of it, but doesn't seem like a trivial problem.. btw, I also will be so happy to be able to reuse part of twisted core on asyncio services :)
I had so much trouble getting the python tcp client to work like I wanted it to. Then I tried my hand on the async tcp client and was pleasantly surprised how simple it was.
reader, writer = await asyncio.wait_for(asyncio.open_connection(), connect_timeout)
line = ""
while True:
line = await asyncio.wait_for(reader.readline(), read_timeout)
if line == b"":
break
yield line
The current stream interface is still somewhat awkward. Apart from the question whether the reader and writer really need to separate objects, there's also the problem that the write and close methods are synchronous—in order to use them correctly, it's necessary to throw in asynchronous calls to drain and wait_closed.
There was a plan to fix this in Python 3.8, but that never went anywhere.
I've written a bunch of things that were really well suited to async Python including an mpjeg server, a queueing server and a process management server that monitors the stdout and stderr of various processes and reads them in real time and acts upon the messages it reads.
Each one of those applications was made possible comprehensible and even easy by async Python.
I made a python clone of multitail in just under a day thanks to being able to read multiple files simultaneously with async. It really is overpowered.
When the async keyword was released in Python, that was during the "callback hell" era of Javascript.
The first codes using async in Python were mostly written by developers coming from the JS world. Reading code written that way may have turn off a lot of Python devs.
IMO, 2020 async code is cleaner in both Javascript and Python, maybe some haters would change their mind if they give a second chance to "newer" async codes.
That's where it _could_ provide a solid benefit (though you could probably usually do better by instantiating an appropriately sized thread pool and avoiding async entirely), but that's missing the real power of async which is that it makes concurrency easier to reason about. Except at an await the application behaves serially; you can't get interrupted in between lines or in the middle of an operation that it turns out isn't atomic.
Doesn't it do that by not really having concurrency? There is only a single locus of control so a whole class of problems should simply cease to exist.
We might be nit-picking definitions, but I don't think that's a fair characterization. From the perspective of the user programming with async, when you gather the results of two coroutines you don't care at all the order used for each stack frame. The mental model is one of concurrent operations which might suspend at yield points.
What this really highlights is that you need to learn the basics to make the right decision for your application and platform. There's no one solution for everyone but if you don't take the time to understand context switching and the role of your OS then you will be doomed to tilting at windmills.
I think one of the things the author has not discussed is that if you have high CPU then you are going to get starvation as async does not have a fairness mechanism - timesharing is co-operative and based on FIFO in all the implementations I've seen.
If you start to saturate CPU (even a bit) you get uneven distribution of cpu time between your various async threads. This translates into chronic timeouts. For async to be applicable you really need to have a) high io load and b) lowish cpu load.