8 Comments

This is the best take on the topic (of single maintainers).

Expand full comment

Hmm... If they are "burning" tons of energy on nice picture generation, cover letter creation or sloppy code snippets generation via those so "called-AI-things", maybe this energy could be used to create "AI-thing" that will detect issues in code? In my POV this would be better wasted energy...

Expand full comment

> The companies profiting from this infrastructure can afford to thoroughly vet and monitor key dependencies on behalf of the community.

I'm not sure what you're recommending, but it sounds like more eyes on dependencies (code review, testing, and so on) from packagers who are downstream from library maintainers, rather than the library maintainers themselves?

This is what Linux distributions do, among others. Unfortunately there's a lot of volunteer work in that, too. Also, a lot of duplicate work. It seems worth mentioning that some codebases are easier to review and test than others, and the language they're written in plays a role in that.

The Go SDK is an example of a more centralized, monolithic approach. I don't think there are any outside dependencies at all? Volunteers contribute patches, but things like mandatory code review happen because that's what the project requires.

It's pretty stable. Some improvements take a long time, though.

Expand full comment

Consider that companies like Microsoft, Amazon, or Google are distro maintainers themselves; plus, they are major dependency consumers, building their own software on top of thousands of libraries (from glibc to ffmpeg). They can, in their own self-interest, build monitoring infrastructure that benefits them *and* the rest of the world - because their use cases probably cover 90%+ of what's used in the wild.

How that would look like is a good question; I think you'd need to rely heavily on automatic classification to detect suspicious or simply consequential changes, and then augment that with human deep dives where necessary. You actually benefit from the heuristics being non-public, if you can keep them that way. Again, nothing about this is cheap or trivial, but it's entirely within reach if you really want to build it and you have their scale.

I think it's a bit harder to tease out for smaller commercial players - for example, what can Red Hat do that they're not doing right now? On some level, the most bang for the buck is probably in tightening down the baseline. We moved away from the computing paradigm where you needed every machine loaded with features for every eventuality, and if your compute image has 200 components instead of 4,000, it's easier to watch what's going on. In other words, more Alpine Linux and fewer kitchen sink distros?

Expand full comment

Google sort of already provides that, in a couple of ways too.

Most directly, there's the Assured Open Source Software program. It was borne of the infamous log4j vulnerability; and it's goal is precisely to prevent software supply chain attacks.

Less directly, but more openly, you can also just grab 'whatever Google is ok with using', via the Chromium third_party packages tree (https://chromium.googlesource.com/chromium/src/third_party/) . It doesn't have everything, but it is technically possible to just import the open-source packages from there rather than their open-source homes.

Chromium also constitutes the source for Google's Container OS, so this is also what the appropriate cloud VM images are made from.

Finally, there's the OSS Fuzz collection, which I though you may have played some role in perhaps ? (https://github.com/google/oss-fuzz)

Expand full comment

> I think you'd need to rely heavily on automatic classification to detect suspicious or simply consequential changes, and then augment that with human deep dives where necessary. You actually benefit from the heuristics being non-public, if you can keep them that way.

This approach can only detect the simplest of backdoors. Some of these codebases have millions of lines of code, and if a malicious attacker has gained total control over the codebase he could insert a backdoor that is as obfuscated as he likes. You're up against computability theory here - the problem you're trying to solve is fundamentally unsolvable not just in theory but in practice as well. Binary test files are sometimes required, and arcane build scripts are often unavoidable. Automated and manual code reviews are fallible and it takes only one mistake for a backdoor to slip through.

The only way to be sure that a large codebase contains no backdoors is to rewrite the entire thing from scratch. This is well within the capabilities of large companies like Google and Microsoft so this does not change the essential point of your argument, but I think the specific implementation that you proposed - having companies review codebases written by anonymous developers- is unworkable both in theory and in practice.

Expand full comment

a-Yup.

Expand full comment
Comment deleted
Apr 1
Comment deleted
Expand full comment

You can backdoor any piece of software, though as your question suggests, different packages present different opportunities; I think xz was chosen and the long-term effort was made both because there weren't a ton of eyes on it and it is a dependency of a critical piece of software.

Expand full comment