Another build time improvement coming, especially for fresh CI builds, is a new registry protocol. Instead of git-cloning metadata for 100,000+ packages, it can download only the data for your dependencies.
Great stuff. Now, if they can just have a globally shared (at least per $USER!), content-addressible target/ directory, two of my complaints with Cargo would be fixed nicely...
I really wonder how many Dockerfiles are out there that on every PR merge pull the entire cargo "metadata" without cache and how wasteful that is from a bandwidth/electricity standpoint or if in the grand scheme of things it's a small drop in the bucket?
In my experience it's pretty significant from the bandwidth side at reasonable levels of usage. You'd be astounded at how many things download packages and their metadata near constantly, and the rise of fully automated CI systems has really put the stress on bandwidth in particular, since most things are "from scratch." And now we have things like dependabot automatically creating PRs for downstream advisories constantly which can incur rebuilds, closing the loop fully.
If you use GitHub as like a storage server and totally externalize the costs of the package index onto them, then it's workable for free. But if you're running your own servers then it's a whole different ballgame.
I think github would have throttled that cargo index repository a long time ago if it wasn't used by Rust, i.e they get some kind of special favour. Which is nice but maybe not sustainable.
Github employees personally reached out to various packagers (I know both Cargo and Homebrew for certain) asking them not to perform shallow clones on their index repos, because of the extra processing it was incurring on the server side.
You're correct that Cargo doesn't check the index if it's building using a lockfile, but I think the problem is that a freshly-installed copy of Cargo assumes that it needs to get the index the first time that any command is run. I assume (but haven't verified in the slightest) that this behavior will change with the move to an on-demand index by default.
I don't care much either way, but you have the privacy argument backwards. If you're downloading all the things, then no knows if you are using xyz, only that you might be using xyz. If you're just downloading what you need and you're downloading xyz, then they know that you're using xyz.
I'm not sure I understand. This is talking about Cargo metadata download improvements. You still download individual packages regardless of receiving a copy of the entire registry, so privacy hasn't materially changed either way.
If knowing you use a crate is too much, then running your own registry with a mirror of packages seems like all you could do.
You're downloading specific packages either way, which can potentially be tracked, regardless of whether you're downloading metadata for all packages or just one.
Edit: A thought occurs to me. Cargo downloads metadata from crates.io but clones the package repo from GitHub/etc. So unless I'm missing something, downloading specific metadata instead of all metadata allows for crates.io to track your specific packages in addition to GitHub.
No, repos of packages are not used, at all. Crates don't even need to be in any repository, and the repository URL in the metadata isn't verified in any way. Crates can link to somebody else's repo or a repo full of fake code unrelated to what has been published on crates.io.
crates.io crates are tarballs stored in S3. The tarball downloads also go through a download-counting service, which is how you get download stats for all crates (it's not a tracker in the Google-is-watching-you sense, but just an integer increment in Postgres).
Use https://lib.rs/cargo-crev or source view on docs.rs to see the actual source code that has been uploaded by Cargo.
This has it backwards. crates.io has always hosted the crates themselves, but has used Github for the index. In the future, with the sparse HTTP index, crates.io will be the only one in the loop, cutting Github out of the equation.
https://blog.rust-lang.org/inside-rust/2023/01/30/cargo-spar...