I think the industry is going to soon look back on building with Wild West open-source repos like we looked back on not having absolutely everything running on HTTPS in the Snowden era. I know Google has "assured" open source repos for Python and Java [1]. Are there other similar providers for those and other languages?
You're absolutely right, but you've just asserted that almost all companies making software are unreasonable.
Distressingly, doing what you suggest remains the exception by orders of magnitude. Very few people have internalized why it's necessary and few of those have the political influence in their organizations to make it happen.
JFrog / Artifactory is one very common provider of private npm registries. There are a ton of security-scan vendors out there (mend/whitesource, socket, black duck...)
I worked for an IBM acquiree 13 years ago and as part of the "Blue-washing" process to get our software up to IBM spec we had to use their proprietary tools for verifying our dependencies were okay.
Well then I wouldn't expect to do business with every random company. TPRM is a big issue today, so I wouldn't expect any company not performing basic due diligence to service.
How much is that automated scanning worth? Sure, we have mirrored repos, but I assume the malware authors pre test their code on a suite of detectors in CI. So infected packages will happily be mirrored internally for consumption.
Totally agree. Most companies using mirrors or proxies like Artifactory aren’t getting much real protection.
- They cache packages but don’t analyze what’s inside.
- They scan or review the first version, then auto-approve every update after that.
- They skip transitive deps — and in npm, that’s 79 on average per package.
- They rely on scanners that claim to detect supply chain attacks but just check for known CVEs. The CVE system doesn’t track malware or supply chain attacks (except rarely), so it misses 99%+ of real threats.
Almost everything on the market today gives a false sense of security.
One exception is Socket — we analyze the actual package behavior to detect risks in real time, even in transitive deps. https://socket.dev (Disclosure: I’m the founder.)
Not much. as you say, static scanning is pretty much a dead end strategy. Exploiters have long since realized that you can just run the scan yourself and jiggle the bytes around to evade the signature detection.
At least at my company, I think someone at least has to approve/verify the scan results. Of course it's still a risk, but so are external emails, vendor files, and everything else.
It is worth a fair bit. If you control the mirroring you can ensure the malware is flagged but not deleted, so forensics can assess how much damage has been done or would have been done, for instance.
> npm is a package manager for the JavaScript programming language maintained by npm, Inc., a subsidiary of GitHub. -- [1]
and Microsoft own Github so Microsoft is the provider? Pretty sure they're running malware scanners over NPM constantly at the least. NPM also has (optional) provenance [2] to a Github build workflow which is as strong as being "assured" by Google IMO. Only problem is it's optional.
This is a coordination failure. We have ways to distribute the source, but not the reviews. Every time someone does any level of reviewing that should be publishable too.
Things like cargo-crev [0] or cargo vet [1] aim to tackle a subset of that problem.
There’s also alternate implementations of crev [2] for other languages, but I’m not sure about the maturity of those integrations and their ecosystems.
Sorry, I wasn't clear. I meant only in the general sense of in the not too far past, the industry was content with a huge hole like only running login under HTTPS and no site traffic, which in hindsight seems insane. What I mean is the situation (explored in the rest of this thread) where many in the industry seem to be content with consuming code extensively from public repos without many obstacles to prevent a supply-chain attack. What I'm saying is that soon the industry will probably look back on this in the same way: "what were we thinking!?"
There's a deeper issue though. I frequently have difficult getting things to build from source in a network isolated environment. That's after I manually wrangle all the dependencies (and sub-deps, and sub-sub-deps, and ...).
Even worse is something like emscripten where you are fully expected to run `npm install`.
Any build process that depends on network access is fundamentally broken as far as I'm concerned.
Which is nearly all of them, except perhaps C/C++, that I can think of, in terms of languages broadly adopted
You can cache and/or emulate the network to go offline but fundamentally a fresh build in most languages will want to hit a network at least by default
In my world (VHDL/Verilog and some C/C++) there's a difference between the "fetch" and "build" steps. It's perfectly reasonable for the fetch step to require network access; the build step should not.
The real problem is that some language ecosystems conflate those two steps.
I'm mostly on board with that dichotomy except that I think it's also important that all fetched artifacts either come from a VCS or are similarly cryptographically versioned and all historical versions made available in a reliable manner.
How does https help with the problems Snowden uncovered? You don't run on https, https just does in transit encryption between 2 points of the service architecture. That is why you can (could?) slap cloudflare atop your http only site and get a padlock!
Because one of the methods reported was scanning http packets. Easily read without ssl from any hop in the chain. More importantly, he blew the lid off the fact that governments had access to this via the very ISP’s everyone relies on for telecom. By making everything TLS, they can look all they want but they can’t read it.
You could do tls offloading at your load balancer but then you have to secure your entire network starting with your isp. For some workloads, this is fine, you aren’t dealing with super sensitive data. For others, you are violating compliance.
I'm referring to programs like MUSCULAR [1] and PRISM [2] where NSA was tapping inter- and intra-datacenter traffic of major internet communications platforms like Gmail, Yahoo Mail, Facebook etc. At the time, that kind of traffic was not encrypted. It was added in a hurry after these revelations.
Totally agree — we’re going to look back and wonder how we ever shipped code without knowing what was in our dependencies. Socket is working on exactly this: we analyze the actual code of open source packages to detect supply chain risks, not just known CVEs. We support npm, PyPI, Maven, .NET, Rubygems, and Go. Would love to hear which ecosystems you care about most.
If you include commercial offerings Red Hat has offered this for awhile, and many semi-successful startups have tried creating a business model solving this.
Based on the staff I see at the average technology company I wouldn’t expect this to get any better any time soon. The state of things is definitely declining.
[1] https://cloud.google.com/assured-open-source-software/docs/o...