Sync vs. Async Python: What Is the Difference?

calpaterson · on Sept 10, 2020

> If these hundred tasks make heavy use of the CPU, then the sync and async solutions would have similar performance, since the speed at which the CPU runs is fixed, Python's speed of executing code is always the same and the work to be done by the application is also equal.

I think one of the things the author has not discussed is that if you have high CPU then you are going to get starvation as async does not have a fairness mechanism - timesharing is co-operative and based on FIFO in all the implementations I've seen.

If you start to saturate CPU (even a bit) you get uneven distribution of cpu time between your various async threads. This translates into chronic timeouts. For async to be applicable you really need to have a) high io load and b) lowish cpu load.

BiteCode_dev · on Sept 10, 2020

In production systems, you don't use coroutines alone. You have several processes, some may contain several threads, some thread may contain an event loop.

In fact, it's already the case with WSGI setups: you may have async nginx on front, multi-process gunicorn behind, and a thread pool for you db connections. And asyncio already provides API for sending tasks to process pools and threads and getting them back as async Future as a convenience. It's turtles all the way down.

Bottom line, async, threads or processes are not silver bullets, and you should use the right tool for the right job.

Just like when Go users realized that spawning goroutines for everything actually didn't work at scale, throwing awaits everywhere is not a magical solution.

Having asyncio at your disposal make some tasks easier, like websockets, crawling your network, etc. It opened the door for new architectures, and maybe new features (live settings, task queues that feeds the results back to you, etc.).

But it's been oversold, as with anything new and shiny.

3pt14159 · on Sept 10, 2020

Async is perfect for things like bulk uploading to DynamoDb while you munge through input files or save metadata to postgres.

But for 97% of the use cases it's easier to use multiple processes and a normal worker queue.

miguelgrinberg · on Sept 10, 2020

Hi, author here.

I believe I did discuss this, and in fact, I list the same a) and b) conditions you mention using slightly different words. Quoting the article: "To benefit from the async style, an application needs to have tasks that are often blocked by I/O and don't have too much CPU work."

So I think we agree.

luckycharms810 · on Sept 10, 2020

This is a great point! If you use an established framework like tornado / twisted - they have idiomatic ways to deal with these.

For e.g. If you have a loop - that can potentially starve the rest of your service when the iterable is too long - go ahead and throw a `yield gen.moment` in there. `yield gen.sleep` can also be incredibly handy. Once you realize that you are just living in a world of partially executed functions - you can always have your coroutine phone home to the event loop and ask if anyone else needs some CPU.

Of-course for this to work you need lumpiness in your workload, if the CPU is just always starved this is more likely a problem of under-allocation of compute.

zzzeek · on Sept 10, 2020

can attest to this fact, I've isolated this issue in Openstack components once or twice where we identified CPU-bound tasks blocking other greenlets leading to new MySQL database connections rejecting the client, because there's an initial auth phase that has a very short 10 second timeout while it waits for the client's response to a security challenge.

worldmerge · on Sept 10, 2020

Would it be possible to implement async to push the functions to run on the GPU and bypass the CPU?

Like build out a queuing system in shaders and all of your functions into shaders and just pass data to the shader functions as uniforms?

sltkr · on Sept 10, 2020

> ... you are going to get starvation as async does not have a fairness mechanism - timesharing is co-operative and based on FIFO in all the implementations I've seen.

If it's FIFO based, why would you get starvation?

calpaterson · on Sept 10, 2020

FIFO systems can have starvation when dealing with time bound things like web requests - slow request handlers can cause other (waiting) requests to timeout.

In a thread-per-request or process-per-request model where you run say 4 * cpu workers the operating system scheduler is able to interrupt stuff that's hogging the CPU but that isn't the case in a typical 1 * cpu worker async model.

When there's little cpu load this is not a real threat but when there is material load you can easily run into problems.

sltkr · on Sept 10, 2020

I see; it sounds like latency would become very unpredictable under that model. I can imagine that's a serious problem in practice, but it's not typically what's meant by starvation.

https://en.wikipedia.org/wiki/Starvation_(computer_science)

formerly_proven · on Sept 10, 2020

"Task does not receive resources to advance within the required time interval" is exactly starvation. A "task" here is responding to a request. The required time interval is the timeout.

squeaky-clean · on Sept 10, 2020

ctrl+f'd "time interval", zero matches. The first linked source in Wikipedia, "Modern operating systems" even says a FIFO prevents starvation, "Starvation can be avoided by using a first-come, first-serve, resource allocation policy. In due course of time, any given process will eventually become the oldest and thus get the needed resource"

Joker_vD · on Sept 10, 2020

"Starvation" is being denied the necessary resource indefinitely. What you describe is "soft real-time failure".

formerly_proven · on Sept 10, 2020

I think these are equivalent when applicable, because starving a task long enough of a resource for either the resource to expire or the task's deadline to expire has the same effect as starving the task of the resource indefinitely: task fails to progress.

Joker_vD · on Sept 10, 2020

That's called "firm real-time failure". Again, the standard definitions of starvation/liveness only use "eventually" and "never" without mentioning actual physical time IIRC, and the real-time systems use a whole another bunch of concepts to deal with actual, physical time.

And it doesn't really fail to progress if it's cancelled on deadline expiration, it progresses to a non-successful completion. If it instead hanged indefinitely, yes, that'd be a problem.

matheusmoreira · on Sept 10, 2020

Because any function can use as much time as it wants before returning. Event loops process events by calling functions which must return in order to allow the loop to make progress. Complex handlers will stall the loop and increase latency.

The recent innovations in asynchronous programming involve ways to signal to the programming language that the current task is blocked, thereby allowing the underlying loop to move on to the next event. While this improves the efficiency of the asynchronous processing system, programmers still need to use these features correctly just like coroutines.

euiq · on Sept 10, 2020

What would the CPU-heavy code await on?

jgehrcke · on Sept 10, 2020

If you're curious about this topic then I would love to link you to https://techspot.zzzeek.org/2015/02/15/asynchronous-python-a... written by Mike Bayer (author of SQLAlchemy) -- it's an article that I keep referring to as a must-read, incredibly well written and all about real-world implications (totally transferable to other non-Python ecosystems).

If you like to, maybe also have a read into https://gehrcke.de/gipc/background.html. This is where I tried to put into words what I learned a couple of years ago when I wrote a library for the gevent ecosystem. You'll find some paragraphs about "Cooperative scheduling vs. preemptive scheduling", about "Asynchronous vs. synchronous", and about a few more simple and yet important topics in the world of event-driven architectures.

doubleunplussed · on Sept 10, 2020

Not mentioned in TFA, but I'm utterly convinced that the only compelling reason for async is to avoid the per-thread stack memory allocation of 8MB per thread or whatever it is, in order to be able to scale to an extremely large number of concurrent threads/coroutines. You can't do this with threads.

Async lets you do this whilst still storing the state of unfinished things in a stack, i.e. not having to make a million callbacks. Making it look like you're using threads even though you're not.

The better async code and interfaces become, the more it looks like regular old multithreaded code, except there aren't actual threads underlying it. You still need to make sure you serialise access to shared resources, don't share data that shouldn't be shared, etc. All the same considerations as multithreaded code.

Almost ends up looking like the underlying mechanism: threads with a GIL vs async ought to be an implementation detail that doesn't require you to modify your entire programming model.

Extension code not holding the GIL can still run in true parallel with real threads, so that's a meaningful difference but is usually not relevant for IO where async is usually used.

lmm · on Sept 10, 2020

> avoid the per-thread stack memory allocation of 8MB per thread or whatever it is, in order to be able to scale to an extremely large number of concurrent threads/coroutines. You can't do this with threads.

It's 8 K B per thread, so you can scale a thousand times further than you thought. One dark secret of the async movement is that if your goal is C10K (10,000 concurrent clients) then actually bog standard threading will handle that fine these days.

> The better async code and interfaces become, the more it looks like regular old multithreaded code, except there aren't actual threads underlying it. You still need to make sure you serialise access to shared resources, don't share data that shouldn't be shared, etc. All the same considerations as multithreaded code.

Depends what approach you're using. I prefer making an explicit distinction between sync and async functions ( https://glyph.twistedmatrix.com/2014/02/unyielding.html ), so you effectively invert the notion of a "critical section" - instead of marking which sections can't yield, you mark which sections can yield, so your code is safe by default and you can introduce concurrency explicitly as and when you need it for performance, rather than your code being fast-but-unsafe by default and you're expected to fix a bunch of rare nondeterministic bugs with minimal support from your tools, which is how it works in a multithreading world.

mlyle · on Sept 10, 2020

> It's 8 K B per thread, so you can scale a thousand times further than you thought. One dark secret of the async movement is that if your goal is C10K (10,000 concurrent clients) then actually bog standard threading will handle that fine these days.

Default virtual memory allocation for threads on Linux distributions tends to be 8 megabytes. Actual memory used is the peak stack depth used, rounded up a bit. It'd be pretty unusual to only use as little as 8 kilobytes per thread; just the standard per-thread libc context information for concurrency is a few kilobytes, plus at least one page of stack, plus the kernel's information about the thread (which isn't counted against the process)...

Yes, you can spawn thousands of threads on relatively modest hardware; I was spawning thousands of threads a decade ago.

Spawning 5000 bare-minimal python threads that do nil seems to use about 300 megs of ram on my system; real threads that do anything substantial will use a whole lot more, even if their use of the stack depth is intermittent.

Not to mention allocators that cache part of freed heap per-thread, etc.

gpderetta · on Sept 10, 2020

I believe 8k is the size of the per thread stack on the kernel side. This is non-pageable memory, so it will consume physical memory whether it is needed or not, while of course the 8 megabytes is paged in on demand.

mlyle · on Sept 10, 2020

Yah, I'm ignoring the kernel stack and all kernel data structures. User space memory used will be at least a page of stack (reaching up to the maximum amount used in the thread), plus the libc reentrancy data structures, plus per-thread heap caches, etc. It's can all get paged out, but we hardly want that these days.

The key distinction is that the 8MB of stack VM doesn't have a backing until the memory is used in the thread, but afterwards it does forever.

lmm · on Sept 11, 2020

Well, it's forever if you assume that the threads live forever; for a web server it's perfectly practical to have threads that only live for a single request, or to reuse them for multiple requests but not allow a single thread to live longer than say 10 minutes.

mlyle · on Sept 11, 2020

Decent nits, but...

Short-lived threads (for one request) are a performance and scalability disaster; tens of microseconds or worse to spawn and join, contention on important locks, bad for caches, etc. There's not much concurrency when it comes to spawning threads, too.

If you have long-lived threads in a pool, yes, they may not live forever, but you generally have to assume that each thread will end up with a resident stack size equal to the largest stack use: each will get a turn to run the stack-intensive functions.

hombre_fatal · on Sept 10, 2020

I'd argue that the main benefits of day-to-day async programming isn't performance but actually the concurrency patterns that help you sequence your code and resource access in ways that `thread { work() }` could not.

For example, future/result combinators and `await [task1, task2.then(task3)]`.

gpderetta · on Sept 10, 2020

you could (and can in many languages) use futures with threads just fine though.

lmm · on Sept 10, 2020

Not really - it gets very messy because every time you transform a future you have to figure out where you're getting the thread for that transformation to run on.

gpderetta · on Sept 10, 2020

sorry, what transformation? A future is simply a placeholder for something being computed asynchronously. On a threadful design you would simply spawn a thread (or pick one from a thread pool) to handle the computation. Normally your future runtime would handle it for you.

Basically you end up with something similar to the fork-join model.

lmm · on Sept 10, 2020

> sorry, what transformation?

Whenever you want to transform a result that's in a future, e.g. you have a future for a number and want to add 2 to it.

> On a threadful design you would simply spawn a thread (or pick one from a thread pool) to handle the computation.

If you allow yourself to spawn threads everywhere you'll quickly run out of resources. So you have to manage which thread pool you're using where and ensure you're not bringing in priority inversions etc.. It's really not that easy.

> Basically you end up with something similar to the fork-join model.

The fork-join model isn't really a purely thread-based model - the work-stealing technique is pretty much trying to reimplement what async-style code would do naturally.

mlyle · on Sept 10, 2020

> ensure you're not bringing in priority inversions etc..

Remember we're comparing to async/futures, which are not really guaranteed to not starve either. At least with thread pools you can, in theory, manage this well.

lmm · on Sept 11, 2020

> Remember we're comparing to async/futures, which are not really guaranteed to not starve either. At least with thread pools you can, in theory, manage this well.

With async/futures you're giving the runtime control over these decisions, whereas with threads you're managing them yourself, which can be an advantage but only if you don't make errors with that manual control. An async/future runtime can know which tasks are waiting for which other tasks, letting it avoid deadlocks and a lot of possible priority inversions, and the async style naturally lends itself to writing code that's logically end-to-end (on a single "fiber" even as that fiber moves between threads), which means there's less need to balance resources across multiple thread pools.

gpderetta · on Sept 10, 2020

If you want to add 2 to a future, you block and then extract the content and add to it.

Re spawning threads, depending on the semantics of your framework you cana always execute async operations sychronously.

lmm · on Sept 11, 2020

> If you want to add 2 to a future, you block and then extract the content and add to it.

That makes using futures completely pointless. Why not just write blocking code if you're going to block anyway?

gpderetta · on Sept 11, 2020

Pipelining and opportunity for parallelization:

  A = computeA()
  B = computeB()
  C = A+B

Vs

  futA = asyncComputeA()
  futB = asyncComputeB()
  A, B = waitAll(futA, futB)
  C = A+B

lmm · on Sept 11, 2020

That's almost certainly a bad idea in a web server context like this article is talking about. You improve best-case latency when the server's not loaded, but now you're using 3 threads per request to get a less than 2x speedup (and in a bigger example it would be worse), so your scaling behaviour will get worse.

gpderetta · on Sept 11, 2020

always spawning a thread is of course the naive implementation. You can put an upper bound on the number of threads and fallback to synchronous execution of async operations in the worst case (for example inside the wait call).

If your threads are a bit more than dumb os threads (say, an hybrid M:N scheduler) you can do smarter scheduling, including work stealing of course.

lmm · on Sept 14, 2020

Well, as your threads become less like threads and more like a future/async runtime you come closer to the advantages and disadvantages of a future/async runtime, yes.

gpderetta · on Sept 14, 2020

The underlying thread model have always been 'async' in some form under the hood, i.e. at some point there is always a multiplexer/scheduler that schedules continuations. Normally this is inside the kernel, but M:N or purely superspace based thread models have been used for decades.

Really the only difference between the modern async model and other 'threaded' model is its 'stacklessness' nature. This is both a major problem (due to the green/red function issue and not being able to abstract away asynchronicity) and an advantage (due to the guaranteed fixed stack size, and, IMHO overrated, ability to identify yield points).

At the end of the day is always continuations all the way down.

ynik · on Sept 10, 2020

In a multi-threaded context yes, reduced memory usage is the main benefit.

But async/await can also be used for other things! In a C# Windows GUI application, it's normal to use async/await on the UI thread. Your UI can await multiple tasks at the same time, yet you don't need any locks when accessing the UI state; because all your code runs on the UI thread.

This is a really useful programming model made possible by cooperative task-switching via `await` on a single thread.

Here the "await" being explicit is a crucial feature, it allows the programmer to reason about when the shared state might be mutated by other tasks (or maybe by the user clicking cancel while the current task is waiting).

Any pre-emptive task switching adds a lot of additional complexity and isn't really suitable for UI code.

spaetzleesser · on Sept 10, 2020

“Your UI can await multiple tasks at the same time, yet you don't need any locks when accessing the UI state; because all your code runs on the UI thread.”

Is that true? I thought the code after each await runs on a different thread from the code before the await. At least that’s what I have observed when debugging things.

harikb · on Sept 10, 2020

There are also other language implementation styles that get roughly the same benefit without writing async and await all over the codebase. If the implied yield at defined synchronization points coupled with a decent scheduler like in Go would make this all a non-issue [1].

[1] https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...

matheusmoreira · on Sept 10, 2020

> I'm utterly convinced that the only compelling reason for async is to avoid the per-thread stack memory allocation of 8MB per thread

Yes. The point is to use a single thread with one stack to process several tasks. This consumes less memory.

For example, Go currently has a minimum stack size of 2 KiB so a machine with 4 GiB of memory will be able to process less than 2 million goroutines. An event loop uses a single thread with a single stack, reducing memory usage at the cost of complexity.

Asynchronous functions are just like coroutines. The difference is they return to the awaiting caller instead of yielding to another function. The order of execution is determined by the underlying loop.

laurencerowe · on Sept 10, 2020

> I'm utterly convinced that the only compelling reason for async is to avoid the per-thread stack memory allocation of 8MB per thread or whatever it is, in order to be able to scale to an extremely large number of concurrent threads/coroutines.

If it's possible to avoid shared state then I tend to prefer threads, but in the presence of shared state there are good reasons to think that explicit coroutines are easier to reason about than threads or green-threads: https://glyph.twistedmatrix.com/2014/02/unyielding.html

gpderetta · on Sept 10, 2020

but then you need to forfeit parallelism. If you add parallel execution to (shared memory) async, the advantage disappear.

laurencerowe · on Sept 10, 2020

With Python you forfeit parallelism either way due to the GIL.

jnwatson · on Sept 10, 2020

In general, it is far easier to reason about locking with async code, as the number of preemption points is far far lower. As a result, you get very low overhead inter-task communication.

otabdeveloper4 · on Sept 10, 2020

The "8 megabytes" is virtual memory. You're only incrementing a counter in a table, nothing is actually allocated until you actually start using that stack.

dorfsmay · on Sept 10, 2020

Also there is a point where managing many threads become a load on the CPU, which my inderstanding is why Rust is not provinding green threads anymore.

calpaterson · on Sept 10, 2020

Why are you "utterly convinced" of that though? If I run 20 threads in Python, unix top reports resident memory usage to me as 14mb. Why do I see that number instead of 160mb?

Code sample:

https://gist.github.com/calpaterson/ab35377da9275ca3af7072db...

Scarbutt · on Sept 10, 2020

You mean 14MB?

calpaterson · on Sept 10, 2020

Htop shows me "14016" as resident memory and I don't think it is using kilobytes as the unit.

anonymoushn · on Sept 10, 2020

It is using kilobytes as the unit.

calpaterson · on Sept 10, 2020

Yes you're right. What a silly mistake. Normal Python interpreter = 9mb, with 20 threads = 14mb.

Edited the original post.

mlyle · on Sept 10, 2020

Answer to your question was already discussed right here: https://news.ycombinator.com/item?id=24429221

mlyle · on Sept 10, 2020

That's embarrassing.

calpaterson · on Sept 10, 2020

Only for me as the GP was right. :)

mlyle · on Sept 10, 2020

Yah I replied to you. :P

nurettin · on Sept 10, 2020

This explanation works great as long as the underlying implementation of threads in your particular python implementation is ultimately concurrent, but not parallel.

justsomeuser · on Sept 10, 2020

It’s not the only reason.

For me, async code is easier to debug and understand because it is composed of regular functions (tagged with async).

This means if you have a function that is 10 layers deep, you can pause the debugger and see the stack context. Same with your IDE, you can jump to each function.

With threads that 10 function stack would be 10 threads, each requiring some sort of tooling at both software write time and runtime to get the context.

In summary composing a system of just “functions (sync + async)” is easier than vs “functions + threads”

arethuza · on Sept 10, 2020

"With threads that 10 function stack would be 10 threads"

Presumably only if you have one-thread per function - which I guess you could do but not sure if anyone actually writes multi-threaded code that way.

justsomeuser · on Sept 10, 2020

If you wanted each function to wait for IO or an event without blocking a thread you need an event loop, or one thread per function that needs to block to wait on incoming events.

arethuza · on Sept 10, 2020

Well, I guess that is possible but I've never seen multithreaded server side code (in a thread per request type environment) bother - database calls or other IO just block the current thread.

justsomeuser · on Sept 10, 2020

If you down vote tell me how I’m wrong - appreciated!

anonymoushn · on Sept 10, 2020

If you had a thread-per-request, you would see the whole call stack for the request in your debugger, like you do now with async/await and an event loop.

If you had a greenlet-per-request and an event loop, you would see the whole call stack for the request in your debugger, like you do now with async/await and an event loop.

justsomeuser · on Sept 10, 2020

This is true, but:

1. If you are building the web server, the async stack would have the functions of the web server code too. The thread only sees its history from when it was spawned for that request.

2. As an example, if you needed to start 5 async tasks and await them, the async code would keep caller context if you break inside the task. For the thread-request model, you start new threads for each 5 tasks, if you debug-break in those threads you would not get the caller stack context. Or do you?

anonymoushn · on Sept 10, 2020

To be honest, most people writing threaded code just let the 5 tasks run serially. Otherwise they would have the problem you mentioned.

arethuza · on Sept 10, 2020

If there are dependencies between those tasks (i.e. task 2 depends on the result of task 1, task 3 depends on the result of task 2...) what option do you have?

And if there aren't dependencies better to drop messages on a queue and have a completely different process handle those tasks.

anonymoushn · on Sept 10, 2020

Ideally, the tasks would be serialized when dependent on each other and not when not dependent on each other. Dropping them on a queue discards information about the caller, which is what this thread of the discussion is about. Using greenlet or async/await, you can serialize only when necessary and retain information about the caller.

gpderetta · on Sept 10, 2020

hum, I'm missing something. The same call stack you get with async would map to exactly one thread call stack. Sure each thread will get its own call stack, but all related continuations that participate in an async call stack would be owned by the same thread (Assuming the same application design).

justsomeuser · on Sept 10, 2020

My point is that with async both the runtime and the dev tools try to make async call stacks look exactly like sync ones.

This keeps the context of how your code gets into certain states. If you have 10 threads, it’s like you have 10 different processes (without the context of how they are related - which you get with composing functions) which is harder to understand.

Also assuming that each thread does not have an event loop.

heeen2 · on Sept 11, 2020

> I'm utterly convinced that the only compelling reason for async is to avoid the per-thread stack memory allocation of 8MB per thread

I needed to write an interface to a SSE API (tldr: long running socket connection with occasional message arriving as content)

There simply was no way to poll a http connection (e.g. requests) if it had new data. It would block no matter what. So I would have to start writing threaded code, or just use asyncio which seemed much more ergonomic.

jacquesm · on Sept 10, 2020

All this async stuff should die as soon as possible. It exposes the limitations of the underlying tech and forces the application developer to deal with things that the systems level people couldn't bother with.

Look at the Erlang eco-system to see how this is done the right way.

Threads, asynchronicity, locks and so on should be dealt with at the OS or the framework level, not at the application level.

Javascript also suffers from this and it makes JS ugly and hard to read. Callback hell will be yours or you end up with crutches such as promises.

At the application level things should be as deterministic as possible and the default should be that statements executed in sequence will have their side effects updated in sequence as well.

speedgoose · on Sept 10, 2020

Thanks to the async/await keywords, modern javascript is not a callback hell anymore.

jacquesm · on Sept 10, 2020

No, now you get it all mixed up. One library will use callback functions you supply, another will use callbacks but inline them and yet another will use async/await. And you, the application developer are left somewhere in the middle.

Oh, and you can't use 'await' in the main thread of your JS, you can only use it from within another async function, so when you need it most - for instance during application start-up - it isn't available.

speedgoose · on Sept 10, 2020

You can use promisify for the first one. For the second one I think these libraries doing that but still not supporting promises are not that common anymore but nothing you can't do with promises yourself.

Javascript has a single thread, but I see what you mean and it has been fixed recently. To support older versions you can simply declare an anonymous async function and execute it immediately.

mywittyname · on Sept 10, 2020

Yeah, but now we have this weird mix of awaits, thens, and missing awaits that you're not sure were intentional.

I may be in the minority here, but I kind of prefer callback hell to async hell. It's fine on projects with one or two devs, but more than that and the situation seems to devolve into tabs-vs-spaces, where one group finds the other incomprehensible and won't use it.

anonymoushn · on Sept 10, 2020

This is pretty unfortunate. I expect that if JS had gotten fibers and a switch primitive, other languages would have copied that, but now users of unrelated languages are stuck with low-quality things forever.

doubleunplussed · on Sept 10, 2020

I think of it the other way around: Thanks to javascript being single-threaded, we now have async/await keywords strewn throughout the software ecosystem where they provide no benefit.

war1025 · on Sept 10, 2020

Async is basically a super convenient way to interact with an event loop.

Any time you call `await` you are saying "schedule an event on the event loop, and call back into this method with the result once it's available"

Multi-threaded code is good for some things, but often you need an event loop to keep things reasonable (GUI code for example). That's where async / await really shines.

cturner · on Sept 10, 2020

Exposing the limitations of the underlying tech is a good thing, because it makes it clear what needs to be fixed.

Early operating system APIs grew up around batch systems, which were inherently synchronous. As multi-user systems and networking became ubiquitous, computers have become inherently asynchronous. But system APIs have been slow to evolve, or else have made explicit decisions to go all-in with threading (e.g. WinAPI).

Threads are a paper-over. You can use threads to simulate asynchronicity over synchronous system APIs. As I'm sure you know, threads have a widespread performance tax (context switching blowing out your cache), and are an obstacle for the developer who wants to understand exactly what the machine is really doing.

When you use async, you can reason about what the machine is doing. Operating system APIs have been slow to effectively support this. For example, they usually support something like select(3) but it's usually non-trivial to check for keyboard input in the same select call as you are checking for network activity.

It's encouraging to see Linux add io_uring recently. I think I have read elsewhere that Windows is your main platform. Indeed, the Windows API steers programmers heavily towards threads due to design choices. Sometimes in linux you can hack in a path towards async by abusing file descriptors+threads but Windows is inflexible here.

Tanenbaum defends the synchronous model, "Parallel programming and interrupts are REALLY REALLY hard to get right. There are all kinds of race conditions you have to worry about. Synchronous is much safer. Sometimes there is no way around using an asynchronous interface, but it is always the last resort. Making it hard to do is a good idea." But the RMOX os seems to have solved the general problem of creating a pure-async system api.

The dialects of Javascript that are in the current generation of browsers are confusing partly because the async model in that js generation is half-baked. In another comment, you highlight the ecosystem problems that have grown around js callback hell.

But these problems are not an inherent problem of async programming. For example, python3 asyncio does not suffer from callback hell, because (1) it is straightforward to await on a coroutine to return a value, and (2) the ecosystem has grown up around a mature async model. There are still hassles with python asyncio libraries that use threads under the hood. For example, I've had trouble with aiofiles at scale.

Regarding Erlang: I see the actor model as a weak response to multicomputing. Some of the difficult problems of that domain: (1) message reliability; (2) grid-application survival when you lose a physical host. Erlang/Actor avoids responsibility here by telling the application programmer that they must assume that message delivery is unreliable. This pushes handling of all related problems into the application layer, where it is a distraction, and arduous to test. I think the actor model dodges hard problems that are inherent to its chosen domain. (Apologies if I have misunderstood the reason you highlighted Erlang.)

jacquesm · on Sept 10, 2020

> I think I have read elsewhere that Windows is your main platform.

I don't know where you read that but I've been a Linux user since 2004 and used SGI Irix prior to that, QNX before that, FreeBSD before that and Xenix before that... Windows 3.1 is the last one I remember actually using (with Trumpet Winsock...). The only windows box we had was the one to do the administration on and to build the webcam software. Other than that it's been UNIX all the way since as far back as I care to remember.

pansa2 · on Sept 10, 2020

> Greenlets are similar to coroutines [...] the async ecosystem in Python is fractured in two big groups.

Async in software development in general is fractured in two big groups. On the one hand languages like C#, C++ and Dart support (stackless) coroutines like Python’s - and on the other hand, Go, Java and Lua support stackful coroutines that are more like Python’s greenlets.

There are pros and cons to each approach. I wonder if one or the other will eventually become dominant.

gpderetta · on Sept 10, 2020

to be fair, stackless coroutines are in the C++ language because they require language support. There are also options for stackful coroutines and of course good old threads.

BorisTheBrave · on Sept 10, 2020

What is Java's support for "stackful coroutines"?

pansa2 · on Sept 10, 2020

Project Loom: https://wiki.openjdk.java.net/display/loom/Main

fnord123 · on Sept 10, 2020

You used the present tense. Loom is not available yet.

boardwaalk · on Sept 10, 2020

I just want to rep Trio, mentioned in the article. I'm using it to prototype a system in a different language before doing another iteration of said system and it's is quite nice to use, at least compared to what I remember asyncio being like.

florimondmanca · on Sept 10, 2020

> it's is quite nice to use, at least compared to what I remember asyncio being like.

Yes, I agree. Fairly recently there's been anyio [0], which brings ideas from trio to other async libraries, in particular asyncio. E.g. it has equivalent of trio nurseries (what anyio calls "task groups") for implementing structured concurrency ideas in asyncio environments (saving you from headaches due to having to deal with tasks manually). Very neat way to use trio ideas when stuck on asyncio (e.g. for web apps). :)

anyio was also discussed in the latest Python Bytes episode [1].

[0]: https://github.com/agronholm/anyio

[1]: https://pythonbytes.fm/episodes/show/197/structured-concurre...

andrewstuart · on Sept 10, 2020

I absolutely love async Python.

It lets me do stuff that would be hard with ordinary Python.

No doubt it takes some effort and practice to grasp, but once you have it worked out you'll find it's a powerful tool in the toolbelt.

Many languages now have async and await - it's not exclusive to Python. The reason so many languages have gopne this way is that so far it's the easiest and most sensible solution to writing concurrent code in a way that won't become incomprehensible.

And if you are trying to get your head around it, the easiest analogy is the web browser and JavaScript. If you have grasped JavaScript and the event loop then essentially you have grasped async Python, or not far from it.

Some people misunderstand async Python - they think the point is to make Python faster - it isn't. The point is to enable concurrent programming.

I do understand the haters - at first, until you grasp it, async Python so incredibly foreign and so different from ordinary synchronous Python that you might think "AAAARGH why did they do this and make it so hard"? But really it's not hard once you grasp the mental model of what is going on.

joppy · on Sept 10, 2020

> it's the easiest and most sensible solution to writing concurrent code in a way that won't become incomprehensible

I don't think that async/await is necessarily the most sensible solution, there is also the "implicit" concurrency model used in Go, Erlang, and even Python (using gevent/eventlet). A big benefit of the "implicit" model is that it doesn't split code into async vs sync, and so you don't have two versions of every library, one for sync and one for async. In a lot of cases in fact, you can run existing Python code on gevent and have concurrency "just work", without having to spring async/await everywhere.

Python's implementation of async/await is also particularly bad for teaching: not only do you now need to teach people that there are multiple ways to call functions, but async functions cannot be called from the CPython interpreter directly. (You can from the IPython interpreter though).

I love concurrent programming, but I don't love the effect that async/await has on code - it makes for less re-usability, and has a more difficult mental model to grasp than the implicit style. And when you can just import and monkey-patch Gevent to get all of the benefits of re-writing your entire codebase to be async/await, why would you?

jordic · on Sept 10, 2020

Mostly agree around reusing existing libs, and being and split point on the community, but I also can't understand why so many detractors. It's a tool and does it's job quit well. Anyway, in my test I prefer explicit and don't like monkey patching :)

anonymoushn · on Sept 10, 2020

It does something we were already doing worse than we were already doing it. In Python's case, it does it worse than we were already doing it in the same language. At least for me, this is why I don't have many nice things to say about it.

It's also fine to "prefer to be explicit", but it is weird to prefer to be explicit ONLY about whether a function can do the "switch stacks" operation on your behalf (not whether it can consume random numbers from the global RNG, not whether it can create files, not whether it can do blocking I/O, not whether it can open a server socket, not whether it can send HTTP requests, not whether it can write log lines, not whether it mutates its parameters, etc.), and to be explicit about it in a way that invites bugs that don't need to be possible to write (async functions produce a value that can then be awaited, so it's a bug to not await them when you meant to, or to await them when you didn't mean to, and it's common to write such bugs when adding an await operation to a formerly non-async function that is called by other formerly non-async functions).

jordic · on Sept 10, 2020

I must say that python3.8 does a better job detecting not awaited awaitables.

Anyway,asyncio it's not new to python, the stdlib still has asyncore in it. There are still awesome web servers around (threaded) but using "asyncio" on the main thread. ZServer, medusa (look at what powers supervisor for example).

The file io thing it's still a pain, but same happens on node side.. but seems there is in the way something that will fix it.

joppy · on Sept 10, 2020

I think the whole monkey patching thing is a bit of a red herring, gevent changes the python execution model enough that there may as well be a "gpython" command which runs Python - Python+gevent may as well be a different distribution of Python.

But even so, is the cost of having to run a monkey patch (or some other startup script) greater than the cost of having to rewrite an entire ecosystem of libraries? I don't think so, not by a long shot. Just look at the whole "aiolibs" project to see how ridiculous this has become. Are some of those libraries better put-together than their originals? Sure. Is it worth splitting an ecosystem in half and having one half not even able to call into the other half? Absolutely not.

jordic · on Sept 10, 2020

Someone should take care of it, but doesn't seem like a trivial problem.. btw, I also will be so happy to be able to reuse part of twisted core on asyncio services :)

nurettin · on Sept 10, 2020

I had so much trouble getting the python tcp client to work like I wanted it to. Then I tried my hand on the async tcp client and was pleasantly surprised how simple it was.

    reader, writer = await asyncio.wait_for(asyncio.open_connection(), connect_timeout)
    line = ""
    while True: 
        line = await asyncio.wait_for(reader.readline(), read_timeout)
        if line == b"":
            break
        yield line

euiq · on Sept 10, 2020

The current stream interface is still somewhat awkward. Apart from the question whether the reader and writer really need to separate objects, there's also the problem that the write and close methods are synchronous—in order to use them correctly, it's necessary to throw in asynchronous calls to drain and wait_closed.

There was a plan to fix this in Python 3.8, but that never went anywhere.

andrewstuart · on Sept 10, 2020

I've written a bunch of things that were really well suited to async Python including an mpjeg server, a queueing server and a process management server that monitors the stdout and stderr of various processes and reads them in real time and acts upon the messages it reads.

Each one of those applications was made possible comprehensible and even easy by async Python.

nurettin · on Sept 10, 2020

I made a python clone of multitail in just under a day thanks to being able to read multiple files simultaneously with async. It really is overpowered.

wdroz · on Sept 10, 2020

When the async keyword was released in Python, that was during the "callback hell" era of Javascript.

The first codes using async in Python were mostly written by developers coming from the JS world. Reading code written that way may have turn off a lot of Python devs.

IMO, 2020 async code is cleaner in both Javascript and Python, maybe some haters would change their mind if they give a second chance to "newer" async codes.

holografix · on Sept 10, 2020

I’m very uninformed when it comes to async in Python so please bear with me.

Given a “web API type” python app, where about 30% of the APIs also reach out to other, external APIs, before responds to the incoming requests...

Is that _the_ classic example where async would provide a solid benefit?

Is that why Node has become so popular = it’s all json + non blocking API responses?

hansvm · on Sept 10, 2020

That's where it _could_ provide a solid benefit (though you could probably usually do better by instantiating an appropriately sized thread pool and avoiding async entirely), but that's missing the real power of async which is that it makes concurrency easier to reason about. Except at an await the application behaves serially; you can't get interrupted in between lines or in the middle of an operation that it turns out isn't atomic.

arethuza · on Sept 10, 2020

"it makes concurrency easier to reason about"

Doesn't it do that by not really having concurrency? There is only a single locus of control so a whole class of problems should simply cease to exist.

hansvm · on Sept 10, 2020

We might be nit-picking definitions, but I don't think that's a fair characterization. From the perspective of the user programming with async, when you gather the results of two coroutines you don't care at all the order used for each stack frame. The mental model is one of concurrent operations which might suspend at yield points.

divbzero · on Sept 10, 2020

Yes, the “web API type” app you describe is a classic example. Asynchronous Python is similar to async/await in C# or the non-blocking patterns in JS.

zwieback · on Sept 10, 2020

What this really highlights is that you need to learn the basics to make the right decision for your application and platform. There's no one solution for everyone but if you don't take the time to understand context switching and the role of your OS then you will be doomed to tilting at windmills.

luord · on Sept 10, 2020

Didn't know that about Flask. One learns something new every day, even with tools one's been using for years.

Actually, I gotta take a closer look to greenlets in general.

ds0 · on Sept 10, 2020

tangentially, I'd like to give thanks to Miguel Grinberg for getting me started on flask and sockets in python via his posts.

shermanmccoy · on Sept 10, 2020

Any emacs users know of a linter that will tell me when I forget to await a coroutine?

war1025 · on Sept 10, 2020

Pretty sure I added a lint rule to pylint the other day that will do just that. I imagine emacs has a pylint plugin.

Basically you add a linter that overrides `visit_callfunc()` and make sure it's not an async function being called.

shermanmccoy · on Sept 11, 2020

Cheers.