There's a certain level of skill required to compose and produce a song, but beyond that the genre/style of the song is more important, and which style is better is very subjective.
The best opera singer in the world will not take much business away from a techno DJ who merely presses a "play vocal sample" button, and both genres have their fans.
In a system where people listen to songs based on recommendations fitting their individual taste (where the recommender doesn't assume popular is good) you can have people listening to individual songs from the long tail of many artists.
Not to mention the part where adding compressors like this somewhat defeats the purpose of using a simple format like QOI (although at least zstd is faster than gzip, let alone 7zip).
But if we're modifying things like that, then they might as well make use of Nigel Tao's improved QOIR format, and replace the LZ4 compressor it uses with zstd. That's probably faster and likely compresses better than QOI.
So to clarify: my suggested point of comparison was replacing QOI + 7Zip of GP with QOIR + zstd. QOIR already compresses better than QOI before the LZ4 pass, and zstd compresses faster than 7zip and often better. On top of that you can put zstd in the header option when streaming data on a browser so you don't need to increase the JS bundle or whatever if the use case is the web. So that's basically a guaranteed net improvement all around.
Second of all, the speed/compression trade-off with zstd can be tuned a lot. The "half as fast as LZ4" stat is for the fastest setting, but for the proposed comparison point of 7zip a slower setting with better compression ratio is likely perfectly fine.
Sometimes you do just want to use off-the-shelf libraries for things and not need to re-implement them yourself. If you already have a web app you’re building around, it’s annoying to need to rewrite bits of it - especially rock solid 3rd party libraries - unnecessarily.
Another consideration is that any webview<->native bridge is going to impose some kind of bridge toll in the form of serialization and message passing (memcopy) overhead. You can call into WASM from JS synchronously, and share memory between JS and WASM without copying. Sync calls from webview to Tauri native code doesn’t appear to be possible according to https://github.com/tauri-apps/wry/issues/454
Finally, security posture of native code running in the main process is quite different from WASM code in the webview. I maintain a WASM build of QuickJS and I would never run untrusted JS in real native QuickJS, whereas I think it’s probably safe enough to run untrusted JS in QuickJS inside a WASM container
That's a common misconception. Advanced compilers like GCC and LLVM have optimizers working primarily on a common intermediate representation of the program, which is largely backend-independent, and written for semantics of the C abstract machine.
UB has such inexplicable and hard to explain side effects, because all the implicit assumptions in the optimization passes, and symbolic execution used to simplify expressions follow semantics of the C spec, not the details of the specific target hardware.
Programmers have an ideal obvious translation of C programs to machine instructions in their head, but there's no spec for that.
It creates impossible expectations for the compilers. My recent favorite paradox is: in clang's optimizer comparing addresses of two variables always gives false, because they're obviously separate entities (the hardcoded assumption allows optimizing out the comparison, and doesn't stop variables from being in registers).
But then clang is expected to avoid redundant memcpys and remove useless copies, so the same memory location can be reused instead of copying, and then two variables on the stack can end up having the same address, contradicting the previous hardcoded result. You get a different result of the same expression depending on order of optimizations. Clang won't remove this paradox, because there are programs and benchmarks that rely on both of these.
And yet, in practice the C have written over the last 20+ years mostly ended up with "undefined behaviour" = "what CPU does"...
That may not be true in fact but it ends up that way because "undefined behaviour" tends to be implemented "to work" and we're used to behaviour of common CPUs that may have fed into expectation of what "to work" should be...
That's not how compiler developers interpret undefined behaviour. Undefined behaviour is closer to an 'U', 'X' or "don't care" in VHDL. These things don't exist in real hardware and only in simulation, therefore the synthesis tool simply assumes that they never happen and optimizes according to that. However, C does not have a simulation environment and UB propagation is not a thing. It will simply do weird shit, like run an infinite loop, because you forgot to write the return keyword, when you changed a function from void.
"undefined behavior behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
EXAMPLE An example of undefined behavior is the behavior on integer overflow." (The C99 standard, section 3.4.3)
translates into "whatever your CPU does" because while there is not requirement imposed, in general the compiler does make it work "in a manner characteristic of the environment".
I believe that memory accesses outside of array bounds, signed integer overflow, null pointer dereference are all examples of "undefined behaviour", which in practice all boil down to what the CPU does in those cases. I.e. commonly memory access outside of array bounds returns whatever is at that address as long as address is valid because there are no checks and that's what the CPU does when asked to load from address. Integer overflow? If a result of adding/subtracting, commonly it wraps around because that's how the CPU behaves, etc.
And I believe this is all on purpose. C is an abstraction over assembly and I believe that people who were used to their CPU's behaviour wanted to keep it that way in C, and also compilers were simple.
For someone who's been writing "C for 20+ years" according to your other post, you come across as extremely ignorant of how modern optimizing C compilers work. I suggest you thoroughly read and understand Russ Cox's simple expose [1] and the UB posts in John Regeh'r blog [2].
The C security issues that plague us today are partly fueled by the attitude of guesswork and ignorance demonstrated in your posts.
This is a common pitfall in Rust wrappers for C libraries — they diligently copy the underlying library's typedefs as Rust's type aliases, but Rust doesn't have the same implicit type conversions as C, which makes it a leaky abstraction.
In this case, I bet older/Intel version had a `typedef char BOOL` (i8 in Rust), and another version/platform upgraded that to `typedef _Bool BOOL` (bool in Rust).
In C that wasn't a breaking change, because C doesn't care if you compare `false` to 0, but Rust does.
Similarly, the plain `char` in C is allowed to be signed or unsigned depending on platform. Bindings to C libraries break when Rust programs mix `c_char` and `u8`, which are the same type on some platforms but not others.
> Rust's shared XOR mutable references […] makes linearity nearly useless or inevitably creates a parallel, incomplete universe of functions that also work on linear values.
Yup. Rust can't abstract over mutability. For owned values, it defaults to exclusive ownership, and needs explicit Rc<T> and clone() to share them. For references, in practice it requires making separate `foo()` and `foo_mut()` functions for each type of the loan.
Even though this sounds horribly clunky, it works okay in practice. Ability to temporarily borrow exclusively owned objects as either shared or exclusive-mutable adds enough flexibility.
Rust is known for being difficult, but I think a lot of that is due to lack of GC. Rust can't make any reference live longer, and programmers have to manually get scopes of loans right, or manually use Rc<RefCell> to have a mini DIY GC. Perhaps a shared XOR mutable with a GC could be the best of both?
> Ability to temporarily borrow exclusively owned objects as either shared or exclusive-mutable adds enough flexibility.
Rust quietly has several other features in order to improve the quality-of-life of its ownership model. Two examples: if you have a mutable reference then Rust will coerce it to an immutable reference if one is required, and if you have a mutable reference then Rust will transparently re-borrow it when calling functions that accept mutable references in order to allow you to use the mutable reference more than once despite the fact that they do not implement Copy.
There are a couple of proof-of-concept libraries adding a `Gc<T> where T: Traceable`, so it is doable.
You can't change the existing Rc, because it's a concrete type. Making it use a tracing GC internally would require a lot of compiler magic.
However, I don't think a GC will catch on in Rust in its current form. Rust's userbase likes it as a no-GC language. Plus a 3rd party library GC wrapper type can't save you from having to learn ownership and lifetimes used by literally everything else. Once you invest time to learn the "zero-cost" references, a runtime Gc is less appealing, and won't be as ergonomic than the built-in references.
Swift, OCaml, and Mojo are trying to add some subset of Rust-like ownership and borrowing, but in a simpler form.
You can, but it turns out that, as one may intuitively expect, a GC is never needed unless implementing a VM for a GC-based language or an API that required GC like fd passing on unix domain sockets, and those generally want an ad-hoc GC instead tailored to whatever you are implementing.
Since it's not needed and it's massively worse than reference counting (assuming you only change reference counts when essential and use borrowing normally) due to the absurd behavior of scanning most of the heap at arbitrary times, there is no Rust GC crate in widespread use.
Almost all GCs used in practice today only scan the set of live objects, which in normal operation is much smaller than the entire heap. They also allow much more efficient allocation and de-allocation.
The problems with GC are threefold, and why you might not want it in a systems language:
1. GC requires more memory than strictly necessary to be efficient (usually about 1.5x - 2x the amount you absolutely need). You're basically trading runtime efficiency for memory.
2. GC performance is harder to predict and reason about than certain other allocation strategies
3. GC languages tend to encourage excessive heap allocation for various reasons, ending up with much more junk than a typical Rust or C program that has a similar amount of entities
Note that item 2 is the one that's least understood. The best part about GCs is that they make heap allocation trivial, and they make de-allocation a no-op. In contrast, both malloc() and free() are extremely complex and costly operations. The GC does impose a cost on every pointer write, similar to (but typically less than) the overhead of Arc<T> over a T*, but that has a very uniform and predictable cost. The problem of unpredictability only comes in the collection phase, and is mostly related to (a) when the collection happens, (b) how much data actually has to be scanned (how many live objects are present on the heap and stack), and (c) what type of collection needs to happen (is it enough to collect from this thread's young generation, or do you need to collect all generations from all threads).
Note that many of these problems are in fact solvable, and there actually exist GCs with constant predictable collection times, suitable even for realtime applications (which malloc/free don't support). They are very sophisticated technology that no one is distributing for free though (e.g. you have to pay Azul for a realtime-compatible Java).
Assuming you use a set of slabs of fixed size objects and keep free objects in a linked list, both malloc and free are trivial O(1) operations.
Destructors with cascading deletions can take time bounded only by memory allocation, but you can solve that for instance by destroying them on a separate thread, or having a linked list of objects to be destroyed and destroying a constant number/memory size of them on each allocation.
You still need locking to make this work in a multithreaded environment, or at least atomics. And all malloc implementations used today are more complex than this, especially when allocating large objects, because you can't actually maintain a list for every possible size of allocation. That means they need to do extra work to handle fragmentation.
Plus, the free list has to occasionally be walked to actually free pages back to the OS. If you don't, then memory is never freed by free(), it is only marked for potential reuse.
There are several popular implementations of malloc (and their corresponding free), and they are all quite complex and have different tradeoffs. And, in fact, none of them is any more suitable for high performance or realtime code than a GC is. The golden rule for writing this type of code is to just never call malloc/free in critical sections.
And I will note that probably the most commonly used malloc, the one in GNU's glibc, actually uses Linux locks, not atomics. Which means that you are virtually guaranteed to deadlock if you try to use malloc() after fork() but before exec() in a multi-threaded process.
> Destructors with cascading deletions can take time bounded only by memory allocation, but you can solve that for instance by destroying them on a separate thread, or having a linked list of objects to be destroyed and destroying a constant number/memory size of them on each allocation.
Which means calculating how long free going to take for a given object is incredibly difficult. Which is my entire point. Of course it's not non-deterministic unless someone 's destructor call s some random code but it is not easy to calculate and in the real world is basically unpredictable.
Spitting up a separate thread and doing an unknown amount of work is no more predictable than doing on the main thread.
Fundamentally destructors can do whatever the hell they want which means you never know how long destruction is going to take.
And of course if you're freeing up an object that has a linked list inside of it and you need to clear out the link list, now you're jumping all around memory and that's never fun, and the run time sure as heck is not constant based on what needs to be paged in and out and what exists in cache lines where.
I'm not saying these problems are unsolvable for a given use case and there are good reasons why custom allocators abound throughout the industry, My main point is that manual memory management does not necessarily lead easy to predict run times for freeing up memory.
This especially true because a lot of developers think that malloc and free are just magic somehow.
In reality, garbage collectors often aren't that complicated, and when it comes to large complicated user applications, you're going to be spending a lot of time in the allocator and deallocator no matter what memory management schema you choose to use.
(Edit: and then there's fragmentation which is the death of many manually managed memory systems! A naively Java or c-sharp application can run for far longer than a naively written C++ application!)
But having such a naive malloc implementation would put it well below what a state-of-the-art tracing GC can do in allocation speed, and then we didn’t even get to fragmentation, elegantly solved by moving GCs. Of course, all these are tradeoffs, I’m just saying that it isn’t as easy.
> Destructors with cascading deletions can take time bounded only by memory allocation, but you can solve that for instance by destroying them on a separate thread, or having a linked list of objects to be destroyed and destroying a constant number/memory size of them on each allocation.
You can run your garbage collector on a separate thread, too. Or use a real time garbage collector.
> 2. GC performance is harder to predict and reason about than certain other allocation strategies
I'm not sure what you mean by this being the least understood. It seems like it's very well understood: GC introduces latency if you want good throughput, or it reduces throughput if you want excellent latency (sub-100us is possible).
Of course, that doesn't mean you can predict what specific throughput or latency properties your specific program will have, except for GCs that have maximum bounded latency guarantees.
The point is that it's very hard to predict how much of an impact the GC will have a priori. You can of course measure after the fact, and try to improve, but it's hard to architect your system in a way that you can more or less guarantee will get good GC performance (other than just not allocating any memory, of course). Malloc actually suffers from a similar problem: it's just hard to know a priori if your access patterns will work well with the internals of your allocator/GC.
IBM Metronome and similar approaches provide very strong guarantees (if you keep a certain bound on garbage creation ofc) at the expense of throughput.
What they lose is that they introduce also a certain guaranteed loss of available cpu time, due to optimizing for latency and real-time predictability.
Respecting generational hypothesis is a good step towards engineering a GC-friendly design. Mind you, this brings back the necessity to reason about object lifetime, something a GC-based language absolves you from, but it does help. It tends to teach you a lesson that sometimes higher allocation count of objects that die in Gen 0 is preferable to fewer larger allocations which have higher chance of surviving until Gen 1 (.NET has three main generations for GC objects - 0 and 1 aka ephemeral and 2 aka long-lived, there are also LOH, POH and NonGC-heap).
That's fast enough for real-time audio, so yes, that's excellent. And that includes compaction and only a small hit to throughput. You can get sub-2us latency at a serious hit to throughput.
No, the more memory you give them, the more efficient they are, and generally they are much more efficient than an equivalent program using malloc() + free() for every piece of memory that gets allocated.
Take the extreme case of a program where every allocation is permanent. The GC allocation will be much much faster than malloc() (GC allocation is normally just bumping a pointer to the end of the heap, while malloc() typically does a lot more work to segregate objects by size etc), and collection will never run. So, overall, the program will be significantly faster.
Edit: More realistically, a program that typically produces 1:1 live mmeory:junk, but occasionally spikes, will consume 0 GC time in normal operation (the GC will just let the junk sit there), while free() needs to run on all that junk as soon as it was created, otherwise it leaks forever.
Also, the fact that GCs can defer collection to later than the object going out of scope can often make them more efficient than a traditional free(). For example, if you have a large graph data structure that is now junk, free() needs to do work that's O(size of graph) to free all the nodes. A copying collector, the most popular design, never even looks at the graph, so it frees it in O(1). Edit: of course, the converse is also true: if the graph survives N collections, the copying GC has to do O(N * size) work, where malloc()/free() do nothing at all.
This is in fact heavily related to "ancient" Unix approach, and partially why it optimized so hard for many small processes.
PDP-11 unix (without split instruction/data space) had 32kB of memory for userland process, total, plus another 24kB for kernel (8kB are reserved for I/O devices).
The memory management interface that gave us malloc() and free() ultimately boiled down to brk() and sbrk() which simply set a pointer describing where stack and heap met in memory. Using many small programs, you'd have as little as possible of that 32kB used by code, and prefer to simply allocate using bump pointer and never free (better reuse object pools, which is also a technique used with GCs). A program in a pipeline would mostly allocate some stuff at the start plus buffers, and while some of the early bits might become garbage it doesn't really matter because you can't share parts of your 32kB block (or at best you can use 8kB segments). So you do "missile-style GC" and just use a bump-pointer allocator (brk() + sbrk() + sizeof combo), then exit to OS when your work is done and the entire memory space gets cleaned.
EDIT: Some GCs (like in later versions of Symbolics Lisp Machines, or presently in SBCL behind some experimental options) provided multiple arena support explicitly to be able to force-declare everything allocated in an area as garbage.
I'm guessing you're referring to the fact that the most popular GC languages allocate almost every object on the heap, even when used as a field of another object. This is by no means a constraint of the GC, it is a separate design choice.
And in fact C# has always supported "value objects", which do get allocated in-place, Java is adding the same support probably in the next release, and Go actually defaults to it.
Worst case, 2× is actually a steal compared to a non-moving allocator such as malloc/free or RC built on top of that, which cannot[1] do better than about 1 + ½log₂(largest allocation / smallest allocation). For example, if you have allocations from 1K to 16K bytes, any malloc/free implementation can require at least 3× the memory you actually use; if from 16 to 16K, 6×; etc.
At this point I must mention that paged memory can get you just enough movability that catastrophic fragmentation of this kind is avoided[2] with high probability. Paging tricks can be seriously expensive on a modern multicore processor, though, so I’m not sure this is the way forward. (The paper report only 1% perf overhead for Firefox; I don’t know if that means Firefox is less multithreaded than it could be or what.)
It depends what you mean by “efficient”: a world that controls when someone can free memory can be surprisingly efficient at dealing with the fundamental fragmentation problem in terms of overall churn over time (throughout) at the cost of space and immediate time (latency). Both are different forms of efficiency.
Manual memory management as a solution to the fragmentation problem trades that off, not knowing anything about when free might be called, and so has to lean towards optimising space and immediate time rather than throughout. But there’s still a memory manager behind the scenes that has to deal with fragmentation as well; there’s no get out of jail free card for that, and that complexity is still hidden.
(Helpful memory usage disciplines like arenas/pools have their desirable properties for the same reasons: it’s a discipline on when you free memory in order to avoid fragmentation.)
Faster than ref counting AND than manual memory management. Papers that have recorded the traces of memory allocation and liveness info and then replayed them using GC or by inserting manual new/free show GC has a throughput advantage here.
Many use GC languages for productivity in code where that’s more important. Far as slowdowns, there are concurrent and real-time (fixed timing) GC’s out there which reduce or avoid the problems you mention. JX OS also let you mix different GC’s for different components.
So, there’s a performance hit of some kind with optional tuning that takes no expertise. Many people would go for those tradeoffs for some applications. Especially if they got to keep using safe, no-GC code in their libraries with their efficiency benefits.
> You can, but it turns out that, as one may intuitively expect, a GC is never needed unless [...]
Rust uses plenty of reference counting. You could replace most of that with a GC; depending on your GC implementation and your reference counting implementation, and your requirements in terms of latency and throughput and ability to handle cycles.
> Since it's not needed and it's massively worse than reference counting (assuming you only change reference counts when essential and use borrowing normally) due to the absurd behavior of scanning most of the heap at arbitrary times, there is no Rust GC crate in widespread use.
There are choices. You can have real time garbage collectors. And if you want to handle cycles, you need some kind of tracing anyway.
Also, if you only care about throughput and not about latency, garbage collection can easily be faster than reference counting.
If you don't allow cycles, you can also take advantage of that in your garbage collection. See how Erlang does it. (In Erlang, the garbage collector relocates your objects so that they are in topological order, so all references only point forwards in memory. You can combine that with generational gc, too, of course.)
> Since it's not needed and it's massively worse than reference counting
Lol, what? Maybe don’t go asserting stuff you clearly know little about. Reference counting is a fine tradeoff for manual memory-managed languages, but it is absolutely smoked out of the water by a tracing GC on most counts. It’s almost like JVM, V8, etc engineers know a thing about the topic and don’t have RC for a good reason.
Tracing GC doesn’t burden the mutator threads with additional work, almost everything can be done in parallel, resulting in vastly better throughput. Imagine dropping the last reference to a huge graph, one can actually observe it when exiting a c++ program, it might hang for a few seconds before returning control to you, as all the destructors are recursively called, serially, on the program thread, literally pointer by pointer jumping across the heap, the very thing you are so afraid of. And I didn’t even get to the atomic part, bumping a number up or down with synchronization between CPUs is literally the slowest operation you can do on a modern machine. Tracing GCs elegantly avoid all these problems at the price of some memory overhead. None of these GC algorithms (yes, RC is a GC) is a silver bullet, but let’s not joke ourselves.
That's because, unlike Rust, those languages with RC would have a lot of unnecessarily refcounted objects because they don't have value objects, do a whole lot of useless reference count updates because they don't have borrowing and always have to use atomics because they can't ensure that some objects are not shared between threads (and also would need a cycle collector in addition to the reference counting).
If you use reference counting properly in a well-designed language then it's obviously better than GC since it's rarely used, fast, simple, local and needs no arbitrary heuristics.
The destructor cascades are only a problem for latency and potential stack overflow and can be solved by having custom destructors for recursive structures that queue nodes for destruction, or using arena allocators if applicable.
So, RC is better than tracing GC, when it’s not used as memory management, and it is special cased everywhere.. got you!
Like, as I explicitly wrote, it is probably the correct choice for low-level languages close to the metal, that want easy compatibility with other languages through FFI. But the method itself has still got a much slower throughput than a tracing GC, when used in a similar manner. Anything else is a useless comparison, like is a bicycle better than a tank.
> But the method itself has still got a much slower throughput than a tracing GC, when used in a similar manner
That is correct, but the issue is not with reference counting, but rather with having unnecessary extremely frequent RC/GC operations.
Once frequency is reduced to only necessary operations (which could be none at all for many programs), reference counting wins since its cost is proportional to the number of operations, while GC has fixed but large costs.
You can very well have a Rust-like language with Rust's model of 'compile-time memory management', but you replace all uses of reference counting with tracing garbage collection.
You probably already know this, but the gc crate is an implementation of a somewhat more sophisticated GC, but it uses `derive`s to implement tracing, so it's not exactly the Rc<T> interface.
UK tried to have a "silicon roundabout" and attract VC investment, but then Brexit happened and the allure of English-speaking entry point into the EU market has disappeared.
Europe has missed out on the craze of getting millions to build an Uber for Cats.
'Realtime' isn't a specific thing, there's just 'fast enough'. Oldschool "render each frame during the scan line staying ahead of the electron beam" graphics was pretty hardcore though.
> Real-time computing (RTC) is the computer science term for hardware and software systems subject to a "real-time constraint", for example from event to system response.[1] Real-time programs must guarantee response within specified time constraints, often referred to as "deadlines".[2]
Strictly speaking, this definition doesn't say anything about how tight those deadlines are, as long as you can guarantee some deadlines.
There's also 'soft' real time where you try to hit your deadlines often-enough, but there's no guarantees and a missed deadline is not the end of the world. Games are good example of that, including the example of chasing the electron beam.
ABS brakes are an example of a 'hard' real time system: the deadlines aren't nearly as tight as for video game frames, but you really, really can't afford to miss them.
There's a certain level of skill required to compose and produce a song, but beyond that the genre/style of the song is more important, and which style is better is very subjective.
The best opera singer in the world will not take much business away from a techno DJ who merely presses a "play vocal sample" button, and both genres have their fans.
In a system where people listen to songs based on recommendations fitting their individual taste (where the recommender doesn't assume popular is good) you can have people listening to individual songs from the long tail of many artists.