> C libraries typically return opaque pointers to their data structures, to hide implementation details and ensure there's only one copy of each instance of the struct. This costs heap allocations and pointer indirections. Rust's built-in privacy, unique ownership rules, and coding conventions let libraries expose their objects by value
The primary reason c libraries do this is not for safety, but to maintain ABI compatibility. Rust eschews dynamic linking, which is why it doesn't bother. Common lisp, for instance, does the same thing as c, for similar reasons: the layout of structures may change, and existing code in the image has to be able to deal with it.
> Rust by default can inline functions from the standard library, dependencies, and other compilation units. In C I'm sometimes reluctant to split files or use libraries, because it affects inlining
This is again because c is conventionally dynamically linked, and rust statically linked. If you use LTO, cross-module inlining will happen.
Rust provides ABI compatibility against its C ABI, and if you want you can dynamically link against that. What Rust eschews is the insane fragile ABI compatibility of C++, which is a huge pain to deal with as a user:
I don't think we'll ever see as comprehensive an ABI out of Rust as we get out of C++, because exposing that much incidental complexity is a bad idea. Maybe we'll get some incremental improvements over time. Or maybe C ABIs are the sweet spot.
Rust has yet to standardize an ABI. Yes you can call or expose a function with C calling conventions. However, you cant pass all native rust types like this, and lose some semantics.
However, as the parent comment you responded to you can enable LTO when compiling C. As rust is mostly always statically linked it basically always got LTO optimizations.
Even with static linking, Rust produces separate compilation units a least at the crate level (and depending on compiler settings, within crates). You won't get LTO between crates if you don't explicitly request it. It does allow inlining across compilation units without LTO, but only for functions explicitly marked as `#[inline]`.
Swift has a stable ABI. It makes different tradeoffs than rust, but I don't think complexity is the cliff. There is a good overview at https://gankra.github.io/blah/swift-abi/
Swift has a stable ABI at the cost of what amounts to runtime reflection, which is expensive. That doesn't really fit with the goals of Rust, I don't think.
This is misleading, especially since Swift binaries do typically ship with actual reflection metadata (unless it is stripped out). The Swift ABI does keep layout information behind a pointer in certain cases, but if you squint at it funny it's basically a vtable but for data. (Actually, even more so than non-fragile ivars are in Objective-C, because I believe actual offsets are not provided, rather you get getter/setter functions…)
I don't disagree that Rust probably would not go this way, but I think that's less "this is spooky reflection" and more "Rust likes static linking and cares less about stable ABIs, plus the general attitude of 'if you're going to make an indirect call the language should make you work for it'".
Do you have a source on this? I didn't think Swift requires runtime reflection to make calling across module boundaries work - I thought `.swiftmodule` files are essentially IR code to avoid this
Pretty sure the link the parent (to my comment) provided explains this.
It's not the same kind of runtime reflection people talk about when they (for example) use reflection in Java. It's hidden from the library-using programmer, but the calling needs to "communicate" with the library to figure out data layouts and such, and that sounds a lot like reflection to me.
Yes, and if you use the C abi to dynamically link rust code, you will have exactly the same problem as c: you can't change the layout of your structures without breaking compatibility, unless you use indirecting wrappers.
That's ABI compatibility of the language, not of a particular API.
If you have an API that allows the caller to instantiate a structure on the stack and pass a reference to it to your function, then the caller must now be recompiled when the size of that structure changes. If that API now resides in a separate dynamic library, then changing the size of the structure is an ABI-breaking change, regardless of the language.
-rwxr-xr-x 1 root root 199K Nov 10 06:37 /usr/bin/grep
-rwxr-xr-x 1 root root 4.2M Jan 19 09:31 /usr/bin/rg
My very unscientific measurement of the startup time of grep vs ripgrep is 10ms when the cache is cold (ie, never run before) and 3ms when the cache is hot (ie, was run seconds prior). For grep even in the cold case libc will already be in memory, of course. The point I'm trying to make is even the worst case, 10ms, is irrelevant to a human using the thing.
However, speaking as a Debian Developer, it makes a huge difference to maintaining the two systems that use the two programs. If a security bug is found in libc, all Debian has to do is make the fixed version of libc as a security update. If a bug is found in the rust stdlib create Debian has to track down every ripgrep like program that statically includes it, recompile it. There are current 21,000 packages that link to libc6 right in Debian right now. If it was statically linked, Debian would have to rebuilt and distribute _all_ of them. (As a side note, Debian has a lot hardware resources donated to it but if libc wasn't dynamlic I wonder if it could get security updates to a series of bugs in libc6 out in a timely fashion.)
I don't know rust well, but I thought it could dynamically link. The Debian rust packagers don't, for some reason. (As opposed 21,000 dependencies, libstd-rust has 1.) I guess there must be some kink in the rust tool chain that makes it easier not to. I imagine that would have to change if rust replaces C.
I am sympathetic to the point you make but to be accurate, one can consume and create C and C compatible dynamic libraries with rust. So, one is not “losing” something because what you (and me) want - dynamic linking and shared libraries with a stable and safe rust ABI - was not there to begin with.
Also to be pedantic, C doesn't spec anything about linkage. Shared objects and how linkers use them to compose programs is a system detail more than a language one.
The reason Common Lisp uses pointers is because it is dynamically typed. It’s not some principled position about ABI compatibility. If I define an RGB struct for colours, it isn’t going to change but it would still need to be passed by reference because the language can’t enforce that the variable which holds the RGBs will only ever hold 3 word values. Similarly, the reason floats are often passed by reference isn’t some principled stance about the float representation maybe changing, it’s that you can’t fit a float and the information that you have a float into a single word[1].
If instead you’re referring to the fact that all the fields of a struct aren’t explicitly obvious when you have such a value, well I don’t really agree that it’s always what you want. A great thing about pattern matching with exhaustiveness checks is that it forces you to acknowledge that you don’t care about new record fields (though the Common Lisp way of dealing with this probably involves CLOS instead).
[1] some implementations may use NaN-boxing to get around this
Lisp users pointers because of the realization that the entities in a computerized implementation of symbolic processing can be adequately represented by tiny index tokens that fit into machine registers, whose properties are implemented elsewhere, and these tokens can be whipped around inside the program very quickly.
What your describing are symbols where the properties are much less important than the identity. Most CL implementations will use fixnums rather than pointers when possible because they don’t have some kind of philosophical affinity to pointers. For data structures, pointers aren’t so good with modern hardware. The reason Common Lisp tends to have to use pointers is that the type system cannot provide information about how big objects are. Compare this to the arrays which are often better at packing because they can know how big their elements are.
This is similar in typed languages with polymorphism like Haskell or ocaml where a function like concat (taking a list of lists to a single list) needs to work when the elements are floats (morally 8 bytes each) or bools (morally 1 bit each). The solution is to write the code once and have everything be in one word, either a fixnum or a pointer.
Dynamic linking is one thing I miss from Swift - I used dynamic linking for hot code reloading for several applications, which resulted in super fast and useful development loops. Given Rust's sometimes long compile times, this is something which would be welcome.
> This costs heap allocations and pointer indirections.
Heap allocations, yes; pointer indirections no.
A structure is referenced by pointer no matter what. Remember that the stack is accessed via a stack pointer.
The performance cost is that there are no inline functions for a truly opaque type; everything goes through a function call. Indirect access through functions is the cost, which is worse than a mere pointer indirection.
An API has to be well-designed this regard; it has to anticipate the likely use cases that are going to be performance critical and avoid perpetrating a design in which the application has to make millions of API calls in an inner loop. Opaqueness is more abstract and so it puts designers on their toes to create good abstractions instead of "oh, the user has all the access to everything, so they have all the rope they need".
Opaque structures don't have to cost heap allocations either. An API can provide a way to ask "what is the size of this opaque type" and the client can then provide the memory, e.g. by using alloca on the stack. This is still future-proof against changes in the size, compared to a compile-time size taken from a "sizeof struct" in some header file. Another alternative is to have some worst-case size represented as a type. An example of this is the POSIX struct sockaddr_storage in the sockets API. Though the individual sockaddrs are not opaque, the concept of providing a non-opaque worst-case storage type for an opaque object would work fine.
There can be half-opaque types: part of the structure can be declared (e.g. via some struct type that is documened as "do not use in application code"). Inline functions use that for direct access to some common fields.
Escape analysis is tough in C, and data returned by pointer may be pessimistically assumed to have escaped, forcing exact memory accesses. OTOH on-stack struct is more likely to get fields optimized as if they were local variables. Plus x86 has special treatment for the stack, treating it almost like a register file.
Sure, there are libraries which have `init(&struct, sizeof(struct))`. This adds extra ABI fragility, and doesn't hide fields unless the lib maintains two versions of a struct. Some libraries that started with such ABI end up adding extra fields behind internal indirection instead of breaking the ABI. This is of course all solvable, and there's no hard limit for C there. But different concerns nudge users towards different solutions. Rust doesn't have a stable ABI, so the laziest good way is to return by value and hope the constructor gets inlined. In C the solution that is both accepted as a decent practice and also the laziest is to return malloced opaque struct.
I'd like to point out that this is not always the case. Some libraries, especially those with embedded systems in mind, allow you to provide your own memory buffer (which might live on the stack), where the object should be constructed. Others allow you to pass your own allocator.
The primary reason c libraries do this is not for safety, but to maintain ABI compatibility. Rust eschews dynamic linking, which is why it doesn't bother. Common lisp, for instance, does the same thing as c, for similar reasons: the layout of structures may change, and existing code in the image has to be able to deal with it.
> Rust by default can inline functions from the standard library, dependencies, and other compilation units. In C I'm sometimes reluctant to split files or use libraries, because it affects inlining
This is again because c is conventionally dynamically linked, and rust statically linked. If you use LTO, cross-module inlining will happen.