> p.s.: I forgot to mention the reason interop was brought up - in C# it can be zero cost or nearly zero cost because it is something that was considered at the language and platform inception. It is one of the reasons C# will never use green threads and eventually will move over upgraded runtime handled task system (mind you, current day state machine implmentation works well, but can be improved). The cost of FFI in Go, unless you use Cgo, is dramatic. This is something that is not acceptable for a language positioning itself as a systems programming one.
The cost of FFI in Go is a cgo callgate regardless of whether you use Cgo, I can point to the actual implementation if need be. The cost from going from Go execution to C execution is ~tens of nanoseconds, so it's not particularly easy to measure. I would guess its on the order of hundreds of CPU cycles. I've stepped through it in IDA from time to time, it's not a ton of instructions. Of course, "not a ton" is a shit load more than "literally zero", but it's worth talking about I think.
There is a potential for greater cost, because with the way the Go runtime scheduler works, if the call doesn't return quickly enough, the thread has to be marked "lost" and another is spawned to take on Goroutine execution. This happens rather quickly of course and it isn't free since it results in an OS thread spawning. But this is actually very important, more on this in a moment...
Meanwhile C# async uses the old async/await mechanism. This is nice of course and it works very well, but it has the problem that execution does not ever yield until you await, and if you DO call a C function and it blocks, unlike Go, another thread does not spawn, that thread is just blocked on C execution until it's over. That was my experience playing with async in .NET 7, but I don't think it can change because either you have zero-cost C FFI or you have usermode scheduling, you can't really get both because the latter requires breaks in the ABI.
I would be happy to talk more because I am honestly pretty disappointed that there's not really a better way to do what Go tries to do. I'd love to have the advantages of Go's usermode scheduling with preemption and integrated GC sequences with somehow-zero-overhead C calls, but it simply can't be done, it's literally not possible. You can take other tradeoffs, but they lose some of the most important advantages Goroutines have over most other greenthread implementations. Google and Microsoft have both produced papers researching this. Microsoft's paper on fibers basically comes to the conclusion that you literally shouldn't bother with usermode scheduling because it's not worth the trouble:
However, their conclusion about the cgo callgate taking around ~130ns does not match what I've seen. But just to be sure, I searched for a random benchmark and found this one:
Of course this may be fairly optimal conditions, maybe it depends on the conditions of the goroutine stack leading up to it, but I think it's fair to say that "less than 50ns per op" is not an unreasonable amount of time for the cgocall to take as long as we're considering that "0ns" is strictly not an option for what Go wants to achieve. With Go you don't have to care if something blocks or not; everything blocks and nothing is ever blocked. That's not something that can be accomplished without some runtime cost. The runtime cost that it actually takes is very nearly zero, but the runtime cost of integrating that with something that doesn't eat that cost is unfortunately higher, and that's where the CGo problem lies.
(I admit that a substantial portion of this problem is actually around the stack pivoting, but if you squint hard enough you can see that this is also inextricably woven into how Goroutines manage to accomplish what they do.)
Wow that's a long post. While I read it, wanted to note that .NET deals with blocked threads by having the threadpool scale the worker thread count dynamically through hill-climbing algorithm that will work to reduce the time the work items wait in their respective queues unhandled (the threadcount can be 1, it can be 200 or more, depending on what you do, 200 is clearly a degenerate case but this is what you get if you managed to abuse it badly enough to act as good old thread per request way). It also has out-of-hill-climbing blocked thread detection (things like Thread.Sleep) to cope with workers being blocked faster. It is all around a very resilient implementation.
As for the cost of FFI, in .NET, the regular p/invokes that don't require marshalling (most of the time it's just UTF-16<->UTF-8) cost approximately ~4-1.5ns. The cost can be further reduced by
- Suppressing GC frame transition (safe for most sub-1ms calls)
- Generating direct P/Invokes when publishing as an AOT binary (they are bound at startup but dynamically linked dependency referenced this way needs to be available)
- Static linking. Yes, .NET's AOT binaries can be statically linked and it is done by system linker, which makes the call a direct jump, which costs as much as a similar call in C. .NET can also produce statically linkable libraries which can be linked into C/C++/Rust binaries (although it can be tricky)
On AST - you are not parsing C# yourself, you are using the same facilities that are utilized by Roslyn. You can do quite a few tricks, I'm working on a UTF-8 string library (which, naturally, outperforms Go implementation :P) and it uses new interceptors API to fold UTF-16->UTF-8 literal conversions during build. My skill is way lower than of engineers working with it in more advanced settings and yet I was able to easily use it - it is very convenient despite the learning curve.
On Go hate - it's simple. It has reached quite some time ago the critical adoption rate where it will be made work in the domains it is applied to regardless of its merits (hello a post on HN describing the woes of a company investing millions in tooling to undo the damage done by bolting in NILs and their unsoudness so tightly). It has serious hype and marketing behind it, because other languages are either perceived as Java-kind-of-uncool, or are not noticed, or bundled, again, with Java, like C#. And developers, who have a rift where their knowledge of asynchronous and concurrent programming should be, stop reading at "async/await means no thread blocky" and never learn to appreciate the power and flexibility task/future-based system gives (and how much less ceremony it needs compared to channels or manually scheduled threads, green or not).
Just look at https://madnight.github.io/githut/#/. Go has won, it pays well, it gets "interesting and novel projects" - it does not need your help. Hating it is correct, because it is both more popular and worse (sometimes catastrophically so) at what other languages do.
Surprisingly, I think we're actually mostly in agreement here, so there's not much to reply to. I think the only real takeaway is that we don't agree on the conclusions to draw.
> On Go hate - it's simple. It has reached quite some time ago the critical adoption rate where it will be made work in the domains it is applied to regardless of its merits (hello a post on HN describing the woes of a company investing millions in tooling to undo the damage done by bolting in NILs and their unsoudness so tightly). It has serious hype and marketing behind it, because other languages are either perceived as Java-kind-of-uncool, or are not noticed, or bundled, again, with Java, like C#. And developers, who have a rift where their knowledge of asynchronous and concurrent programming should be, stop reading at "async/await means no thread blocky" and never learn to appreciate the power and flexibility task/future-based system gives (and how much less ceremony it needs compared to channels or manually scheduled threads, green or not).
I agree that bolting on nil checking to Go is pretty much an admission that the language design has issues. That said, of course it does. You can't eat your cake and have it too, and the Go designers choose to keep the cake more often than not. To properly avoid nil, the Go language would've needed to adopt probably something like sum types and pattern matching. To be honest, that may have been better if they did, but also, it doesn't come at literally no language complexity cost, and the way Go is incredibly careful about that is a major part of what makes it uniquely appealing to begin with.
Meanwhile while Go gets nil checkers, JavaScript gets TypeScript, which I think really puts into perspective how relatively minor the problems Go has actually are.
> Just look at https://madnight.github.io/githut/#/. Go has won, it pays well, it gets "interesting and novel projects" - it does not need your help. Hating it is correct, because it is both more popular and worse (sometimes catastrophically so) at what other languages do.
I gotta say, I basically despise this mentality. This basically reads somewhere along the lines of, "How come Go gets all of the success and attention when other programming languages deserve it more?" To me that just sounds immature. I never thought this way when Go was relatively niche. People certainly use Python, JavaScript, and C++ in cases where they are far from the best tool for the job, but despite all of those languages being vastly more popular than Go, none of them enjoy the reputation of being talked about as the only programming language in history with no redeeming qualities.
People generally use Go (or whatever their favorite programming language is) for things because they know it and feel productive in it, not to spite C# proponents by choosing Go in a use case that C# might do better, or anything like that.
But if you want to think this way, then I can't stop you. I can only hope that some day it is apparent how this is not a very rational or productive approach to programming language debates.
Unfortunately, even though I'm sure it definitely plays no small part, I can't really assume that Go's popularity plays into any person's hatred of it, because flat-out, that would feel like a bad-faith assumption to make...
The cost of FFI in Go is a cgo callgate regardless of whether you use Cgo, I can point to the actual implementation if need be. The cost from going from Go execution to C execution is ~tens of nanoseconds, so it's not particularly easy to measure. I would guess its on the order of hundreds of CPU cycles. I've stepped through it in IDA from time to time, it's not a ton of instructions. Of course, "not a ton" is a shit load more than "literally zero", but it's worth talking about I think.
There is a potential for greater cost, because with the way the Go runtime scheduler works, if the call doesn't return quickly enough, the thread has to be marked "lost" and another is spawned to take on Goroutine execution. This happens rather quickly of course and it isn't free since it results in an OS thread spawning. But this is actually very important, more on this in a moment...
Meanwhile C# async uses the old async/await mechanism. This is nice of course and it works very well, but it has the problem that execution does not ever yield until you await, and if you DO call a C function and it blocks, unlike Go, another thread does not spawn, that thread is just blocked on C execution until it's over. That was my experience playing with async in .NET 7, but I don't think it can change because either you have zero-cost C FFI or you have usermode scheduling, you can't really get both because the latter requires breaks in the ABI.
I would be happy to talk more because I am honestly pretty disappointed that there's not really a better way to do what Go tries to do. I'd love to have the advantages of Go's usermode scheduling with preemption and integrated GC sequences with somehow-zero-overhead C calls, but it simply can't be done, it's literally not possible. You can take other tradeoffs, but they lose some of the most important advantages Goroutines have over most other greenthread implementations. Google and Microsoft have both produced papers researching this. Microsoft's paper on fibers basically comes to the conclusion that you literally shouldn't bother with usermode scheduling because it's not worth the trouble:
https://www.open-std.org/JTC1/SC22/WG21/docs/papers/2018/p13...
However, their conclusion about the cgo callgate taking around ~130ns does not match what I've seen. But just to be sure, I searched for a random benchmark and found this one:
https://shane.ai/posts/cgo-performance-in-go1.21/
Of course this may be fairly optimal conditions, maybe it depends on the conditions of the goroutine stack leading up to it, but I think it's fair to say that "less than 50ns per op" is not an unreasonable amount of time for the cgocall to take as long as we're considering that "0ns" is strictly not an option for what Go wants to achieve. With Go you don't have to care if something blocks or not; everything blocks and nothing is ever blocked. That's not something that can be accomplished without some runtime cost. The runtime cost that it actually takes is very nearly zero, but the runtime cost of integrating that with something that doesn't eat that cost is unfortunately higher, and that's where the CGo problem lies.(I admit that a substantial portion of this problem is actually around the stack pivoting, but if you squint hard enough you can see that this is also inextricably woven into how Goroutines manage to accomplish what they do.)