Another build time improvement coming, especially for fresh CI builds, is a new ...

aseipp · on Feb 3, 2023

Great stuff. Now, if they can just have a globally shared (at least per $USER!), content-addressible target/ directory, two of my complaints with Cargo would be fixed nicely...

jynelson · on Feb 3, 2023

You can do that today, set CARGO_TARGET_DIR to an absolute path.

pdimitar · on Feb 4, 2023

Huh? But the docs say this:

"Location of where to place all generated artifacts, relative to the current working directory."

jenadine · on Feb 4, 2023

You can set an absolute path.

The problem is that if two workspace build a dependency with sightly different features or flags, it will always be rebuild when changing workspaces

MuffinFlavored · on Feb 3, 2023

I really wonder how many Dockerfiles are out there that on every PR merge pull the entire cargo "metadata" without cache and how wasteful that is from a bandwidth/electricity standpoint or if in the grand scheme of things it's a small drop in the bucket?

aseipp · on Feb 3, 2023

In my experience it's pretty significant from the bandwidth side at reasonable levels of usage. You'd be astounded at how many things download packages and their metadata near constantly, and the rise of fully automated CI systems has really put the stress on bandwidth in particular, since most things are "from scratch." And now we have things like dependabot automatically creating PRs for downstream advisories constantly which can incur rebuilds, closing the loop fully.

If you use GitHub as like a storage server and totally externalize the costs of the package index onto them, then it's workable for free. But if you're running your own servers then it's a whole different ballgame.

kzrdude · on Feb 3, 2023

I think github would have throttled that cargo index repository a long time ago if it wasn't used by Rust, i.e they get some kind of special favour. Which is nice but maybe not sustainable.

kibwen · on Feb 3, 2023

Github employees personally reached out to various packagers (I know both Cargo and Homebrew for certain) asking them not to perform shallow clones on their index repos, because of the extra processing it was incurring on the server side.

MarkSweep · on Feb 3, 2023

You can also use something like this to cache build artifacts and dependencies between builds:

https://github.com/Swatinem/rust-cache

CodesInChaos · on Feb 3, 2023

Why would a CI build need the index at all? The lock file should already contain all the dependencies and their hashes.

kibwen · on Feb 3, 2023

You're correct that Cargo doesn't check the index if it's building using a lockfile, but I think the problem is that a freshly-installed copy of Cargo assumes that it needs to get the index the first time that any command is run. I assume (but haven't verified in the slightest) that this behavior will change with the move to an on-demand index by default.

Vecr · on Feb 3, 2023

Good thing they will continue to support the original protocol. I don't like downloading things on demand like that, not good for privacy.

charcircuit · on Feb 3, 2023

How is it bad for privacy?

Before:

Download all metadata, Download xyz package

After:

Downolad xyz's metadata, Download xyz

They already know you are using xyz.

throwaway894345 · on Feb 3, 2023

I don't care much either way, but you have the privacy argument backwards. If you're downloading all the things, then no knows if you are using xyz, only that you might be using xyz. If you're just downloading what you need and you're downloading xyz, then they know that you're using xyz.

Xorlev · on Feb 3, 2023

I'm not sure I understand. This is talking about Cargo metadata download improvements. You still download individual packages regardless of receiving a copy of the entire registry, so privacy hasn't materially changed either way.

If knowing you use a crate is too much, then running your own registry with a mirror of packages seems like all you could do.

rascul · on Feb 3, 2023

You're downloading specific packages either way, which can potentially be tracked, regardless of whether you're downloading metadata for all packages or just one.

Edit: A thought occurs to me. Cargo downloads metadata from crates.io but clones the package repo from GitHub/etc. So unless I'm missing something, downloading specific metadata instead of all metadata allows for crates.io to track your specific packages in addition to GitHub.

pornel · on Feb 3, 2023

No, repos of packages are not used, at all. Crates don't even need to be in any repository, and the repository URL in the metadata isn't verified in any way. Crates can link to somebody else's repo or a repo full of fake code unrelated to what has been published on crates.io.

crates.io crates are tarballs stored in S3. The tarball downloads also go through a download-counting service, which is how you get download stats for all crates (it's not a tracker in the Google-is-watching-you sense, but just an integer increment in Postgres).

Use https://lib.rs/cargo-crev or source view on docs.rs to see the actual source code that has been uploaded by Cargo.

kibwen · on Feb 3, 2023

This has it backwards. crates.io has always hosted the crates themselves, but has used Github for the index. In the future, with the sparse HTTP index, crates.io will be the only one in the loop, cutting Github out of the equation.