Reminds me of when Apple started providing "smaller size updates" to OS X. I was curious about the details since my doctorate had touched on the topic, so I worked my contacts (I had a few in Apple engineering from the FreeBSD / OS X relationship) and after a few months I got back as answer: "We're using a tool called bsdiff, are you familiar with it?" I was indeed, since I was the author of said tool.
(Just to be clear, there was no license violation involved in this case; just a lack of awareness of the provenance of the open source software they were using.)
While I'm not the author of anything, I did on one occasion share Russ Cox' articles on regexes with a fellow developer, only for that developer to reply "that guy is making a mountain out of a molehill, just use re2".
For anybody who's lost: Russ Cox is the original author of re2, a fast C++ library implementing regular expressions that are guaranteed to run in linear-time.
This is still relevant today, too. The last several JavaScript vulns. that people at my company have had to upgrade around were because of accidentally quadratic regex in JavaScript. One poor library[1] was attempting to match a header whose grammar was a whoppingly complex,
1#token
(this is in the HTTP spec's notation: it means 1 or more `token`s, comma separated with optional whitespace around the comma) and hit this, by trying to split the incoming values with,
/ *, */
I was shocked that this wasn't compiled to a DFA. (I checked, too: my JS exhibited the behavior in the bug report.)
This is, I also think, another reason why "simple" text protocols are not really so simple. The grammar above is "trivial", and yet, this is the end result. I don't feel like the library is particularly at fault: I doubt I would have caught this in code review.
>> "that guy is making a mountain out of a molehill, just use re2"
That's an odd thing about the tech world, it's accessible. As you get better in different areas you are actually more and more likely to make contact with important people (big names? people who did important stuff?). This can creep up on you if you're not aware what level you're operating at. It can be a small world.
Also, people who say so-and-so company (usually Google) is hard to contact for support, or that they require expensive support contracts before they'll talk to you, have likely never tried sending email to the appropriate mailing list for the product.
It's amazing how often doing this completely bypasses any corporate first-line-support structure in the way, and just puts the email right into the inbox of the line engineers working directly on the product. It's also amazing how quickly those line engineers reply. (It's as if they treat "replying to random messages on the product mailing list" as their highest-priority job. Or maybe it's just that they're technical people, and my questions are usually very nerd-snipe-y, and get them hooked.)
It's a great concept in theory, but in practice...find me the email list for Google Photos. Or Google Keep. These are two Google products that I use daily (including paying for one!)
Well, yeah, there does have to be a public mailing list.
My point was that there are often public mailing lists, where engineers with real engineering problems could discuss those problems with the engineers responsible for the product/service; and yet the engineer with the problem nevertheless doesn't even think of using the mailing list to reach out, but instead decides to go through regular customer-service support channels to get their problem solved.
I've also got serious problems (as in, the service we're paying real money for is totally broken) solved by contacting a friend who worked at $BIG_COMPANY and the friend escalating internally.
The point is that I shouldn't have to bypass the official channels that way. These organisations are operating at the level of ad-hoc individual heroics, which is the lowest tier in terms of organisational maturity. In a start-up where everyone has to do everything and no-one has worked anything out yet, that's completely understandable. In a many-billions-of-dollars business with enough influence that someone's quality of life or the viability of some other business could be profoundly affected if the giant screwed up, we should be demanding better by now.
What I've seen so far is that companies in general don't get better at this as they grow. They add process over process and each level makes tackling 90 percent of support requests more efficient (for them) but the more difficult requests just don't make it to the person who could help.
That is certainly one pattern we see repeating, but I don't think it's what is happening with many of the big tech companies. They're opting for the alternative where tackling 90% of support requests is highly efficient because those requests are simply routed to /dev/null.
In one sense, it's hard to blame them. After all, if no-one who matters to their revenue stream is actually going to change behaviour because of that dismissive policy, it saves them all the overheads of providing useful support and costs them practically nothing. It's just good business, right?
What is strange is how they've got away with it for so long and most people still don't seem to be switching to alternatives, even as the tech giants casually squash them without even noticing. At some point around here, the words "competition" and "regulation" enter the room.
>Well, yeah, there does have to be a public mailing list.
Imo, this is not scalable or sustainable, and mailing lists are not a replacement for adequate customer support.
The only reason sending emails directly to mailing lists for specific Google products works is precisely because those mailing lists are not public and not flooded with bajillions of emails from the general public. So those who send the emails are already somewhat pre-screened in a way, because if you know that mailing list email address in the first place, you are very unlikely to send something like "my cousin couldn't remember password to their google photos account, can you fix this please". That's why everything there ends up being read and addressed. If those mailing lists were public, then they would be just as useless and ineffective as the current customer support routes currently are for Google.
Tl;dr: mailing lists for specific products are a nifty workaround for the time being, but they aren't a good sustainable solution for shitty customer support. Making those mailing lists public will not only not help solving the problem, it will just make those mailing lists as ineffective as the current customer support. There is no "one weird trick" to solve the customer support adequacy issues with Google,it has to be an actual customer solution that won't be easy and will take time.
I'm confused about what you mean about "public." I'm just a regular guy with no connections to Google, other than being a GCP customer. I found the mailing list addresses for each GCP service listed directly in the support documentation. Literally anyone who has GCP problems would end up finding those addresses, if 1. they clicked on the "help" button and went through the workflow presented, and 2. didn't first pay for extended white-glove support and then immediately reach for it for any problem that came up.
By my thinking, that's a "public" mailing list. They're not hiding it from you. The opposite, really — they're trying to get everyone to know and use it, by making it free to any GCP customer, while the actual CSR kind of support requires paying for a subscription to a higher support tier. The mailing list, presented in Google Groups format, is literally what GCP calls their "support forum." It's supposed to take on all comers, including dumb customer asks.
I think the disconnect here is that Google engineers are much more likely to answer the low volume of technical "nerd-snipey" questions from other developers than the high volume of non-technical questions they'd get from the general public for something like Gmail.
Lately, I've seen a few open source projects go this way: GitHub issues, this gets unwieldy so they create a Discord, this descends into chaos so they nominate a community volunteer, issues are then filtered by thier preferences, etc.
I like Discord, especially in the early days, you can reach out to the principal dev, etc. But it soon seems like they either disappear to get work done (good) or spend all thier time on it (bad). Either way you end up with chaos.
Our company has a Discord, but we employ a professional Community Manager for it. When community members bring issues up:
1. they're encouraged to do so in public, so that other community members can help if possible, and/or so bots can reply with suggested FAQ answers;
2. the Community Manager will answer with the company line for questions the company has set answers to (e.g. "when are you releasing X?" or "why is [abusive DoS-like pattern of requests to your service] not working?");
3. otherwise, if the Community Manager knows the answer for sure off the top of their head, they'll give the answer;
4. and if not, the Community Manager relays the question to an engineer in our Slack, where we either have an answer off the top of our heads, or we file it as an issue.
Seems to work just fine for us so far.
Some of the engineers are also sometimes in the Discord (and we're all registered to it), but other than the Community Manager, it's not our job to be in there.
I've more than once had one of the core developers of Elixir or Phoenix answer a question almost right after asking it in the Slack or IRC channel. I often felt a bit embarrassed to take up their time considering how 'basic' these questions were.
I've had similar experiences in other language/framework communities. It's amazing how helpful some of these very productive people can be to random chat visitors :)!
The trick is to remember that they're almost certainly working on that stuff because they enjoy making users happy, so they're also doing support for the same reason.
I do feel a little bit embarrassed if it turns out they're reading the docs to me, but I feel embarrassed about that whether it's an expert or a fellow n00b ;)
It depends on your issue. We got good support emailing with the TF Lite team on a neural net bug. I think if you’re interacting with open source in a value add way google support is often quite good. If you’re looking for support for integrating for sales or classic customer support it can be terrible to non-existent.
> If you’re looking for support for integrating for sales or classic customer support it can be terrible to non-existent.
> Maybe it's just that they're technical people, and my questions are usually very nerd-snipe-y, and get them hooked.
Integrating sales or classic customer support is boring.
I mean, I get that it pays the bills, but when I've got a million priorities, boring work that I don't really get credit for goes to the bottom of the pile.
no. there is a big difference between having support and having someone that is passionate about something helping you out.
support should be there and should be available from
the simplest issues to the most complicated things about A PRODUCT.
you will not get much traction if you ask the same things to the people expert person on a mailing list.
I guess I've never needed "support" in that sense.
I almost always solve problems with the products/services we use myself — up to and including forking the vendor's codebase to fix their shit for them — because it's almost always the fastest way to do things. I've already been working with their product for a while, and I already know exactly what my own problem is. Provided I also know the language their code is written in, that translates to being able to code a patch myself, faster than I can get someone on their end to comprehend the problem I'm having.
That applies up until the point where there's a problem surface that's just plain inaccessible to me (i.e. the inside of a proprietary mobile app or SaaS service), at which point I have to reach out to tell them that it's broken / missing something on their end. (And even then, if I have a spare hour and access to the offending binary, I'll reverse-engineer it a bit to see if I can hotpatch it while waiting for them to get back to me.)
I suppose, for people who don't think this way, there can be value in "support." But IMHO there's more value in just hiring some DevOps engineers who do think that way. Then all the easy "support" requests get handled in-house, and so you'll only ever need the kind of "support" that involves direct bug reports to the engineers from the vendor who built the thing.
you are by definition a power user. if your product is for power users that’s fine.
if your product is targeted to everyone but only power users can figure it out when there is an issue... well you have a problem.
also, being able to figure something out != you should figure it out. your time is limited and the complexity of remembering all those things that you figured out (even if you have the time) will quickly overwhelm you. unless it’s literally your job to support the product you should care about the interface of the product and what guarantees it makes
re: hiring devops engineers. i’m sorry, what? if my email suddenly does not work I’m supposed to hire a devops engineer now?
I mean, if I know that some service is using e.g. Redis under the covers, and the problem is in Redis itself, then submitting a patch upstream to Redis; waiting for it to get upstreamed; and then telling the cloud host to update their Redis version to solve the problem — is usually a pretty reliable path.
But otherwise, like I said, that's when "the problem surface is inaccessible."
I read about a similar example this week. Some news orgs filed FOIA requests for Dr. Anthony Fauci's email and I was surprised at how many regular people just emailed him and got a response.
Apparently the guy answers about 1000 emails per day.
Google interviewing lore is that there once was a candidate who was asked if they were familiar with MapReduce and replied "MapReduce? Is that like Hadoop?"
Reportedly, this was also a major factor in Google's strategy shift to open-source a lot of their infrastructure (GRPC, Bazel, TensorFlow, LevelDB, etc.)
Like the featured article, HN threads - then and now - are pretty low-key in regards to participant introductions. Usernames are just text, no flair or qualifiers so the focus is on the speaker's content itself. `libria` is free to converse with `cperciva` as an equal peer which is nice, because most other IRL forums I'd be acutely aware that's not the case ;)
I like the part where someone calls cperciva's idea bad, he says what's bad about it then the founder of Dropbox replies saying they're just starting and it sounds like they're in the same space.
I don't agree with that phrasing. A lot of people on that thread seemed sure cperciva was some arrogant dickhead bound for failure, but tarsnap is going a lot stronger than many of them are. Also people in the thread were amazingly rude to him, while he seemed pretty polite to me.
I sent an email to a corporate Steam email address, asking whether I’d be allowed to post a screenshot from half-life in a computer graphics shading thesis. Ended up with a response from Gabe Newell, CEO, shortly after, and another engineer who invented skeletal animation, all excited to talk about it with some random kid
I first encountered bsdiff when working on reverse engineering Blizzard MPQs (their proprietary packaging format, long abandoned now in favour of other stuff). They started using it for mpq patches some time in … 2007 I want to say? I was 16, had just started to code.
10-ish years later, I'm doing other stuff but still working with Blizzard-related tooling sometimes. I was talking to a Battle.net engineer about their latest-and-greatest game update protocols. He tells me they're thinking of adopting this great thing called bsdiff for the next version. I giggled a bit.
I don't understand what bsdiff does, or is. I am a software developer and I frankly have no clue what I would ever use bsdiff for! I've read what it does (libraries for building and applying patches to binary files) and still don't really have a sense for what the purpose of this tool is.
What are some real life use cases for it? When does a developer need such a tool?
Implementing software updates where you don't want to ship entire binaries again (and only the diff) would be one. In some video games the assets are also packed into massive binaries, so you don't want to ship gigabytes of data because you replaced one icon. Sadly many games do this anyway nowadays.
There are other solutions to this problem that the game industry uses. Binary diff patching is slow, incremental, involves large diffs and has the possibility of corruption. It was used back in the mid 90's (RTPatch was the big name), but really isn't used anymore because of the drawbacks.
Games frequently use an override directory or file. The patch contains only the files that have changed and is loaded after the main index and replaces the entries in the index with the updated ones. This is the most common way of doing a patch if it's not just overwriting the original files.
Some games load their file as a virtual filesystem and then the patch just replaces the entries in the virtual store with new ones. Guild Wars 2 works this way. This is only common in MMOs though.
Games also use this because it's a straightforward way to almost guarantee a physical ordering of the files in the VFS, which is/was a common optimization strategy in the days of CDs and hard drives (profile what order the game needs files, then put them in the archive in exactly that order = tada, loading 4000 files behaves like a sequential read).
Another reason is that certain operating systems originating in the state of Washington have performance problems when you access small files or directories containing many files.
Circa 1995: we (company) used RTPatch as the users at the time were floppy based via Post, but enjoyed BBSes (and Prodigy, etc.) as it was a social community due to the nature of the software/industry. We could upload small RTPatch based updates and bugfixes to our tiny company BBS, users could dial in and download a rtpatch a lot faster than a floppy in the mail (besides avoiding the usual corrupt floppies that plagued the tech).
Fun fact. This is not a new concept. Doom used overrides with their WAD file format. Mod authors could release their mod files replacing or adding content without stealing the game level files.
There may be prior art to that, but as a young coder that was the first time I’d seen it
Yes, overrides. I've heard talks on this at conferences with a couple big publishers. A lot of effort is put into it but obviously if we were distributing OS security updates like Apple it would be a whole different ballgame.
To my knowledge, most developers have gone back to binary patching for obfuscation purposes. Bethesda does this now (and ID, by extension), as well as many other developers I've seen.
For at least the Nintendo Switch (not sure other modern conosles), the digital distribution infrastructure is built in terms of overlay packfiles. Games, updates, and DLC on disk are all single-file archives / filesystem images. The OS, when launching a game, mounts the game + its updates + its DLCs together as a transparent overlay filesystem. The game just sees a unified representation of its newest version, with whatever DLC has been installed, sitting under (IIIRC) /title.
I wouldn't be surprised if the other consoles also do things this way. It's a very sensible way to manage updates — especially when a game is running off of physical media but the updates are held in local storage. It also means there's no point where the update gets "merged in" to the base image, which means updates can be an atomic thing — either you have the whole update file downloaded + sig-checked (and thus it gets added to the overlay-list at boot) or you don't.
And, if all the consoles are doing it, I wouldn't be surprised if studios that do a lot of work on console don't just use that update strategy even on PC, for uniformity of QA, rather than for "obfuscation."
Games are directories/packfiles containing many individual files, mostly binary art assets, plus one executable that takes up a negligible proportion of the total size. When binary art assets in the directory/packfile are updated between versions, they don't really "change" in the sense that a source-code file might be changed a git commit; instead, they get replaced. (I.e. every file change is essentially a 100% change.)
The "binary diff patching" you're talking about the game industry using, was just the result of xor-ing the old and new packfiles, and then RLE-encoding the result (so areas that were "the same" were then represented by an RLE symbol saying "run of zeros, length N"). For the particular choices being made, this is indeed much less bandwidth-efficient than just sending a new packfile containing the new assets, and then overlay-mounting the new packfile over the old packfile.
bsdiff isn't for directories full of files that get 100% rewritten on update. (There's already a pretty good solution to that — tar's differential archives, esp. as automated by a program like http://tardiff.sourceforge.net/tardiff-help.html .)
Instead, bsdiff is for updates to executable binaries themselves (think Chrome updates), or to disk images containing mostly executable binaries + library code (think OS sealed-base-image updates — like CoreOS; or, as mentioned above, macOS as of Catalina + APFS.)
In these cases, almost all the files that change, change partially rather than fully. Often with very small changes. The patches can be much smaller, if they're done on the level of e.g. individual compiled function that have changed within a library, rather than on the level of the entire library. (Also, more modern algorithms than xor+RLE can be used — and bsdiff does — but even xor + RLE would be a win here, given the shape of the data.)
There's also Google's Courgette (https://www.chromium.org/developers/design-documents/softwar...), which goes further in optimizing for this specific problem domain (diffing executable binaries), by having the diff tool understand the structure/format of executables well-enough to be able to create efficient patches for when functions are inserted, deleted, moved around, or updated such that their emitted code changes size — in other words, at times when the object code gets rearranged and jumps/pointers must be updated.
The goal of tools like bsdiff or Courgette isn't to reduce an update from 1GB to 200MB for ~10k customers. The goal is to reduce an update from 10MB to 50KB for 100 million customers. At those scales, you really don't want to be sending even a 10MB file if you can at-all help it. The server time required to crunch of the patch is more than paid off by your peering-bandwidth savings.
XOR+RLE is almost useless for binaries, because almost any change will cause instructions to be added or deleted, offsetting the entire binary after the first change, making the xor fail to converge. On top of that, these changes cause changes in addresses in the first part of the binary, so you end up with a zillion similar-looking xor deltas in the first part of the file that won't compress well with RLE.
In fact, if you use smarter compression than RLE, I wouldn't be surprised if the update was larger than the original binaries after the xor, as an offset xor will likely increase chaos (entropy) in the file, making it compresss worse than the original.
bsdiff was specifically designed to intelligently handle these situations, which is why it works.
Just tested it on Chromium from my package server (90.0.4430.72 vs 90.0.4430.212):
Pedantic point: it's not "almost useless for binaries." It's almost useless for compiled, PIC binaries in modern executable formats like PE or ELF that allow for lots of load-time address-space rearrangement.
XOR-and-RLE works well for binaries from non-HLL languages (assembler, mostly) where — due mostly to early assemblers' lack of support for forward-referencing subroutine labels from the data section — subroutines tend to ossify into having defined address-space positions.
You can observe this by the fact that IPS-patchfile representations (which, while a different algorithm, is basically equivalent to XOR-and-RLE in its results) of the deltas between different versions/releases of old game ROMs written in assembly, are actually rather small relative to the sizes of the ROM images themsleves. The v1.1 ROMs are almost always byte-for-byte identical in ROM-image layout to the v1.0 versions, except for where (presumably) explicit changes were made in the assembler source code. Translated releases are the same (sometimes, but not always, because they were actually done by the localization team bit-twiddling the original ROM, because they didn't have access to the original team's assembly code.)
(This is also why archives that contain all the various versions/releases of a given game ROM, are highly compressible using generic compressors like LZMA.)
bsdiff is pretty similar to RTPatch, which is what the game industry used in the past. I'm unaware of what you're describing ever being used in practice, especially among large game houses.
That said, patches aren't really downloaded as standalone patches anymore because of Steam distribution. The way Steam handles it is documented, and if you're interested, it's available here: https://partner.steamgames.com/doc/sdk/uploading#Building_Ef...
But as an overview, Steam splits files into 1MB chunks and only downloads the 1MB chunks that have changed. The 1MB chunks are compressed in transit. Steam also dedups the 1MB chunks. I would assume that this works fine to manage the tradeoffs between size and efficiency.
One of the reasons games do this is the data is compressed, so a "patch" might be indistinguishable from a real update.
Also, as a dev, you have no idea what version your users are updating _from_. You either need to generate some number of patches for every version you could be updating from, and figure out if you should just download the whole thing again in any of those cases anyway.
The simplest one is generate patches for recent versions, where recent can be years in the past. It is a linear operation but you only run it on release so it probably isn't a huge cost. You can also use some heuristics such as if if diff is >20% of the file just stop and force users still on that version to do a full update.
A second option is using zsync[1]. zsync is basically a precomputed rolling checksum. The client can download this manifest and they download just the parts of the file they need. This way you don't care about the source, if there is any similarity they can save resources.
And of course these can be combined. Generate exact deltas for recent versions and a zsync manifest for fallback.
Side note: One nice thing about zsync is that the actual download happens from the original file using range requests. This is nice for caching as a proxy only needs to cache the new data once. Is there a diff tool that generates a similar manifest for exact diffs? So instead of storing the new data in the delta file it just references ranges of the new file.
We usually don’t compress the data on disk; decompression would make loading and file access slower.
Instead, we just pack the uncompressed files together (frequently using normal zip in a no-compression mode) so that we can avoid needing to ask the OS to open and close files for us or examining the contents of a directory, both of which can be kind of startlingly slow (by video game standards) on some common OSes. Instead, we will generally cache the directory data from the zip file and just use that rather than go to disk.
(of course, the whole download/patch would all be compressed for network transfer, but files would then be decompressed during the installation process)
On the switch the reads are so slow the fastest loading requires atbleast mild compression. At least it did when I was testing packaging for my latest switch release. Despite the weak cpu.
Ps4 also did the compressed packages by default thing if I remember right. The upside there being ample cpu for decompression such that no compression was never fastest.
I could see that keeping one big file would still be advantageous too in that environment. As a fopen on a set of small sized files plus read plus close over and over does add up. Just in cpu time and memory slack. Whereas treating it as one giant packed backing store would have advantages in speed. But at a cost of dev time. Even if you are compressing it as well could be an advantage. But I would expect there to be some spot where it crosses over from being an advantage to a disadvantage. I suspect it would be on small files/objects. But that would just be for speed. You may have to optimize for size in some cases. In the end it is usually better to build some test code and find out. Sometimes the results are what you expect. Sometimes you are spending time on things that do not matter anymore but were a big deal 20 years ago.
Oh, definitely! I haven’t worked on anything targetting a console past the PS3 generation and it completely slipped my mind that the latest gen consoles are architected specifically for streaming compressed data.
On the Windows/Mac/Linux title I’m working on now, I definitely measure a sizeable improvement to performance when loading from an uncompressed zip rather than from a compressed one. But even that could be down to the particular set of libraries I’m using to handle it.
> We usually don’t compress the data on disk; decompression would make loading and file access slower.
Did you actually benchmark this? It probably makes sense in your head, but on any vaguely modern hardware it's very unlikely to actually be true because of how exponential the memory hierarchy is.
Console hardware tends to have fast processors & cache but extremely slow RAM. Benchmarking a console's memory vs cache access tends to be one of the first things a team of principal game devs do and that information becomes bible for their titles.
IIRC in a bunch of scenarios compression makes loading and file access faster, as you're I/O limited and it's quicker to read less data and decompress. You do need to choose simple/quick/not-that-much-compressing compression algorithms for that.
I have worked on a number of embedded products which ran from compressed root file system from eMMC. The overhead was a wash because RAM is so much faster than eMMC. What you spent in decompression time was covered by reduced eMMC access time.
If you know a user is on version 3 and need to update to version 5, then why not just send out all the patches between 3 and 5? Why do you need to generate a new patch for each pair of versions.
It feels a bit egregious when I have to download a 100MB update just because a few characters were buffed or nerfed. More involved changes end up being over 1GB.
Because it's not just version 3 to version 5, it's version 3 to version 84.
Not all versions are made equal either - one might be a character buff, another might reorder assets in the "big huge binary blob file" for performance improvements. At a certain point, rather than downloading 30MB per update for 25 versions, and applying each incrementally (remember that you have to do them in order too), just download the full 1GB once and overwrite the whole whing.
Microsoft made sure in windows 10 that it's almost unusable without SSD. SO you big binary blob file have random r/w access.
Most backup software is able to do good binary deltas of arbitrary data for decades. Even dumb checkpointing resolves problem of downloading 25 versions - you download latest checkpoint and deltas from there.
Don't excuse poor design and programming, when you know a file structure, creating a differential update should be short task. With a tiny bit of algorithmic knowledge you could even optimize the process to only download needed assets inside of you big binary blob - if the asset was changed 7 times during your last 25 version you only need to download the last one.
I'd personally like to see a company put a little thought into innovating how they store data on disk so patches can be quickly applied like with git while also not requiring a full source recompilation.
It can go worse - some cheap and badly designed Android phones which download updates from every month when you first buy it until the current month, so maybe 10+ updates, but they aren't deltas (diffs) but full images. Ridiculous on so many levels.
It’s because they only tested updates from one version to the next, and not every version to every newer version.
It is a complete image, but phones today have nontrivial state that may be a problem - e.g. your baseband processor might have its own rom with its own update protocol, which changed between image 2 and image 7, so image 10 after image 1 will be unable to update the baseband.
If it's a cheap phone, I'd rather them do something brute force but reliable than try and be clever when they know they don't have the budget to QA it.
I honestly consider that a pretty reasonable trade-off.
> One of the reasons games do this is the data is compressed, so a "patch" might be indistinguishable from a real update.
Does this happen with more advanced compression algorithms? I've rsynced zip files of different versions of internal software and the diff was always much, much smaller than the entire package.
Zip files have all the metadata in a footer rather than a header. As a result, compressed files can be added and overwritten by appending to the file without disturbing already compressed data. Additionally, the "deflate" compression likely does not span across files, so files that did not change from version to version would have a similar compressed byte sequence, regardless of the order they were added to the archive.
I'd argue that zip is a relatively simple compressed archive format. Its simplicity is its charm and the reason it's so popular. More space-efficient algorithms would be less likely to be "patchable" as there would be less redundancy / structure in the compressed representation to exploit (the best compression seems like it would have similar properties of random data.)
> Additionally, the "deflate" compression likely does not span across files
Clarification: .zip (unlike .tar.gz for example, or "solid" .7z) compresses each file separately, that's nothing to do with the compression algorithm used. In addition, DEFLATE, the LZ77-based compression which is by far most commonly used in .zip (and also by gzip) has a window size of 32kB (uncompressed). So yes, even if you used DEFLATE on a solid stream (e.g. zipped a .tar archive) it couldn't remove any cross-file redundancy once it's gone past the first 32kB of each file.
On the other hand HIgh voltage sid collection (hvsc) distributes a zipped zip;
Each file is 1k-20k, of which there are 40,000 or so. But they are catalogued in 3-4 deep directories, so if you just zip them, the metadata takes 30% or so of the zip.
But the metadata does compress very well, so they zip it again.
zstd strikes a nice balance here. It can inject markers in the bytestream to make it "rsync friendly", but one could just as well say "binary diff friendly".
zstd itself also has the (pretty new) ability to use a shared file as shorthand during compression. What that means in practice is that diffs can be REALLY tiny if you have the previous archive download.
In general yes. After the first difference the compressed streams will be basically random compared to each other. However there are numerous things that may avoid this.
For zip files each individual file is compressed independently. So unchanged files and prefixes don't need to be resent, even if once a file changes the entire tail end of it needs to be resent.
Some times compression algorithms "reset" periodically. For example the `gzip --rsyncable` patch. This basically resets the compression stream so that a change will only affect part of the compressed file. This does have a cost in terms of compressed size because the compressor can't deduplicate across resets. However if the resets are infrequent you can maintain fairly good delta transfer with little space overhead.
Additionally some delta transfer tools detect common compression and decompress the file "in transfer", performing the delta checks on the original file.
Kind of an aside from your question, but binary patches not being much smaller than the full thing might happen more with modern games?
Heresay, but from what I've heard modern games may ship multiple copies of some assets with different levels or features so they can be loaded as a sequential read off the disk. While a block-oriented compression algorithm might sync up more reliably, if you're packing 200MB of assets for a level and they're all compressed to take advantage of the fact they'll be read sequentially could mean a change 25MB in would still ship ~175MB of changes.
So far in in most of the world even platter disks (which have really poor performance with modern Windows) are faster than network. Which means you can download description of the difference and reorder the file locally much faster than downloading it. Yes it needs the file to made in a way that is update friendly - most current compression algorithms can be configured like that. Yes, compression will be slightly lower, but you will save on both download size AND disk space, because right now most patches require you to download patch and then apply it, requiring twice the space. If you have 5% larger asset file but can patch it with a few memcpy on a mapped file it is a win in every way imaginable.
It is just really poor programming, nothing more. And it's everywhere. If find source >/dev/null takes 6 seconds there is no reason for gradle to take 2 minutes on a rebuild. If the dev is used to that, why would they even think about patch optimisation?
Windows game devs just traditionally didn't treat things with such granularity as you might find in a *nix environment where every little thing is a file. Content is then managed as larger blobs and you have a database of offsets or mappings.
It is very unlikely that it was a “continuous” compression anyway. Continuous archives disallow random file access property, and games require exactly that for assets. You can’t decompress few gb on average to fetch a sprite of a cat.
The reason games (and software in general) do full downloads instead of binary patches is purely overdefensive and/or stupid. Store software could just check checksums after a patch and re-download only if they fail.
Speaking from experience, AAA games have quite a bit of architecture behind them that can date back a decade or two. So you end up with some tradeoffs. The code may be well-tuned, resource efficient, and mostly crash proof, but some elements can be a bit dated relative to the size and scale of modern assets.
Ahh, thank you for the explanation. That's an awesome tool!
I actually could really see using it, now that I understand what it does.
I've worked on some firmware projects where we did OTA updates and were guilty of shipping the entire binary. Luckily, even the entire binary was rather small, but still it would have been very cool to be able to create a diff and ship only the diff!
The really cool TL;DR here is that courgette "disassembles" the binary before diffing. Basically turning internal references into symbolic references. This way adding an extra instruction to a function won't affect all of the relative addresses in surrounding code.
If you have a game, where you have large packs of files (like a 2GB sized textures.pak), and in a patch you want to add 2 small things, you can just ship the differencd between the old anc new pack file, and skip transferring the rest.
They do this, but not as a binary diff. The new files are just shipped as an override of the old files that are loaded after the old files. It's substainally faster than applying a binary diff and file sizes really aren't a concern for the most part.
The game industry really hasn't used binary diffs and patching since the 90s when they used RTPatch.
> What are some real life use cases for it? When does a developer need such a tool?
Incrementally update binary files e.g. assets or runnable binaries instead if having to re-send the entire thing on every update so e.g. games, browsers, package managers, …
The standard diff/patch utilities are text-based and even when they do extend to binary data their algorithms and heuristics tend to be biased towards textual contents.
Bsdiff was built specifically with an eye towards executables.
Years ago, I interviewed a candidate for a role on my team. As usual, one of the ways I break the ice with candidates is to get them to talk "war stories".
The team he'd worked on had produced a tool that was only ever intended to be used by the team to solve a particular problem they had. It contained proprietary code.
Unknown to the team, word had spread about the tool, and others had started to use it, including solutions architects. Who started shipping it to customers to use, who absolutely loved it.
That'd be fine except one of the core libraries it used was GPLv3 licensed, and there was non-open source proprietary code used in the tool.
The nightmare scenario he found himself in was having to rapidly re-architect the tool around a non-GPLv3 licensed library, without breaking any functionality, all the while having to have regular sync up meetings with a furious CEO and Legal department (who, to be clear, were mad about the situation, not this particular developer or his team, who weren't to blame)
I worked in support and came up with a quick script to check for a very specific issue. It was super simple and really just applied to one customer to find one bug they encountered. It really didn't do much more than look through a bunch of counters and a bunch of if statements ...
Next thing I knew someone had copied it and started running as a rule on every data set / customer they could get their hands on, and of course it was false positive city.
Finally after lots of emails where I would just type "Don't use that script, it doesn't work." some engineer wandered up the stairs to support to talk to me.
They were fielding escalation after escalation for these false positives. Support would run the script, see some flags, and turn their brain off and escalate. Management was so scared of this bug / issue that they would do the same.
So he tells me to use a trick he used with a similar situation.
I announced a new and improved script and management militantly demanded everyone use it, and that all escalation using the 'outdated ' script would be rejected.
The new script just identified if it was the right customer for that script and set a bit if it wasn't. If that bit was set the engineer knew immediately they could ignore the output and would say that their analysis didn't find the problem in question and advised some basic troubleshooting next steps (copy and pasted mostly).
The support soon realized that just running that script wasn't getting them much more than a few minutes of breathing room away from the case, management realized this too and saw all these next steps coming back and the focus switched to 'hey we should do these next steps all the time too'.
That was also one of the ways I started to understand how the engineering team worked and really helped start a good relationship with them.
> having to rapidly re-architect the tool around a non-GPLv3 licensed library
... or just go with it and have it be open source? The old version is already open and free for anyone to request the code of. No rush at that point, you can withhold updates for a little while while you rearchitect this or take the situation as it is and have the next few bugfix releases also fall under GPL until you get around to replacing the core component (iff one insists that the future additions must absolutely be proprietary).
Quickly removing the code doesn't change the previously released versions' license.
The license of the previous code didn't magically become the GPL altogether, instead it by default became un-distributable. They were required to (a) stop distribution of the existing code since it at best had no clear license, and (b) if they wanted, going forward remedy the license by clearly making it GPL or doing the rewrite of the dependency. Or even reach out to the library author and ask for an LGPL or other alternative - there is sometimes (often?) some flexibility there.
The built-in conflict resolution in the GPL is no-distribution.
Depends who had rights to the proprietary code. If it was someone else's (for example, if the internal version had used both GPLed code and some proprietary third-party library), open sourcing the whole thing just might not be possible.
> Quickly removing the code doesn't change the previously released versions' license.
But would $BIG_CORP publish source on request for a proprietary product just because they built one version with a GPL library by mistake and later fixed it? Has this been successful, ever?
Depends on the composition of CORP. If it is comprised of developers with strong respect for OSS or FS community they might not stand for non-disclosure. OTOH, plenty of orgs would be mainly folks without a strong opinion either way and would just follow executive direction.
How is that? The company added code to a GPL project, that means it is a derivative work and also comes with software freedoms, or at least that's how the story reads to me since there is no mention of other claims or parties to the mix. That means the company owns the copyright to the added code and is free to comply with the contract (license).
> ... since there is no mention of other claims or parties to the mix [like company A]. That means the company [B] owns the copyright to the added code and is free to comply with the contract (license).
I'm also not sure whether the confidentiality clause would weigh heavier than a 'must provide source on request' clause, perhaps it could be resolved by not distributing the part that's covered by the confidentiality clause since another standalone library is clearly not a derivative work of the GPL-licensed library. Then only B has to distribute what they made for everyone's benefit.
> one of the core libraries it used was GPLv3 licensed
it's still untested in the real world if gplv3 "taints" derivative work that broadly.
Would they had to open source the entire solution? or just changes made to the library? nobody knows, and there's a lot of FUD to promote BSD licenses instead of sitting down and properly defining the limits in a practical way.
My stories are very minor. I did my PhD in a reasonably well regarded mechanical engineering lab. My area is experimental fluid mechanics. I ended up writing a lot of Matlab code while there and even worked with a spin out company from the lab in the biotech sector for a while.
I'd get a lot of other students coming to me for coding help. Most just wanted me to do their job for them and I was too naive to say no. One wanted to count cells in a microfluidic device using image processing. I sat down with them for a couple of hours and walked them through a few methods they could look into to get started collecting all the examples in a script. Basic stuff so they wouldn't feel overwhelmed. A few months later I see he published my simple introduction as a paper with zero modifications. He had the good grace to at least thank me in the acknowledgments.
Several years later while working a $BIG_TECH lab we interviewed a candidate from my old lab. They presented their work and had performed some data analysis of thermal camera images. Turns out they were using a script I'd written there and it was still actively used to work with the thermal camera. Nobody ever modified or improved the code — many engineering students are terrified of code. I was annoyed because there was no interest in further development the code, I don't think they even read it or understood it.
While at the spinout company I developed a software tool for the PCR optofluidics platform that was being developed. It was considerably faster and more robust than the hacked together script they were using before and had a user friendly UI that I build with feedback from the biologists on the team. A few years later the founder and one of their new students published a paper documenting their amazing tool without any reference or acknowledgment whatsoever. That one pissed me off.
There is a lot of ignorance around code authorship and respect for the developer in physical sciences research. Like I said, many are terrified by code but don't value the time and expertise it requires; once they have it they no longer think about its maintenance or acknowledging the author.
> I'd get a lot of other students coming to me for coding help. Most just wanted me to do their job for them and I was too naive to say no. One wanted to count cells in a microfluidic device using image processing. I sat down with them for a couple of hours and walked them through a few methods they could look into to get started collecting all the examples in a script. Basic stuff so they wouldn't feel overwhelmed. A few months later I see he published my simple introduction as a paper with zero modifications. He had the good grace to at least thank me in the acknowledgments.
That can't be the whole story, surely? You verbally suggested a couple of possibilities for what might work and wrote down a couple of lines of code, and then I imagine the student tried out all of those possibilities and reported what did and didn't work? I mean, what academic journal would want to publish half-working examples with unstudied properties?
Mind you, I agree about your broader point that (especially in academia) a lot of people don't really understand and respect code authorship.
IMO, academia is not better or worse -- often they do apply their citation culture to code.
The overall problem is that people tend to think only in terms of first order consequences. A little copying here and there might have minimal financial or reputational risk. But the second order consequences of that becoming the example and the norm for the junior ranks and the next generation causes larger organizational risk. So got to consider the bigger picture and nip the ethical lapses immediately.
I’ve checked the paper again and there is also some basic CFD analysis performed by a third author.
He had some sample images he had taken and I used them to demonstrate the basic code. The figures in the paper are the ones I generated in my sample code.
I had a friend who worked at Compaq on tools that did desktop imaging. There was one tool he used that had a big splash screen when it started naming the author. Everyone was in awe of that guy so he used the cachet of people knowing his name to move on to bigger and better roles. My friend decided to see if he could replicate that success and when he wrote the next version of the tool he too had a big splash screen with his name up front. Sure enough it worked and for a certain subset of everyone, everyone at Compaq knew his name and he too was able to move on to what he considered a better role.
There were a couple of downsides; he got calls for years after asking for help with the tool, and one boss seemed envious which led to other issues.
I think it is very common for code to outlast its authors in academia for non-CS/EE fields, where programming is seen more as a means to an end rather than its own pursuit. For example, my wife in her Psych PhD inherited a giant hairball of Matlab code for interfacing with an eye tracker and scripting various experiments. She made her own modifications, and I helped too. The last we heard, the code is still being used. There are likely dozens of copies of it under various names, with sections commented out or added to the end.
I'm sorry to hear about you not getting credit, that's inexcusable, in the same way that not accrediting the researchers who did the work in a paper or book (I have heard this story too often) is inexcusable.
Just remembered another recent one. A couple of years after I had left, my former employer closed the regional site. I saw the writing on the wall I guess. A few colleagues from there have set up a new consultancy company, they’re going to build demos as a service apparently. Several of the showcases on their website are my work.
A long time ago (early 2000s) I wrote a handful of tools to ease physical to virtual (P2V) migration for Windows Server. I was a systems administrator in the UK working for a large American firm.
I wrote the tools in my own time and unlike in other countries my employer had no ownership of them, just to clear that up at the start. I did use these tools at work but never developed them on company time or resources. They were released as open source.
Fast forward 6 months and we had a meeting with a virtualisation consultancy trying to sell us some tools to assist in a wider P2V programme. We had a sales guy and a tech guy visit to show us their stuff. After half an hour of them talking up all they can offer the tech guy fired up a tool and I instantly recognised it was my tool but with some rebranding.
My manager looked at me slightly confused as he recognised it too. I let them continue for a few minutes to properly confirm my suspicions then mentioned that this tool was in fact a tool we already use. They were very confused until I loaded up my version on a remote system to show them.
Needless to say the rest of the presentation was extremely awkward. I believe these two gentlemen thought they were in fact their own tools developed in house. It turned out several of "their" tools were in fact rebranded versions of mine.
I would love to say there was some kind of exciting conclusion but in reality all that happened was they were clearly spooked by this as their/my tools were removed and never included again in their P2V toolkit.
I suspect the moment they left the meeting with us they called to report what happened and rather than risk me following up (not sure how I could do that to be perfectly honest, it was FOSS after all just not used the proper way) they decided it would be safer to just pull these non-critical tools. They were just "nice to haves" anyway.
My boss excitedly share the experience with the team and we had a good laugh about it and how the sales guy went from Mr Confident to stammering and stressed in a matter of seconds.
Recognition - yes, but money - why? It was open source.
This meeting does not sound at all awkward to me if I was the sales person. To me this sounds like how lots of companies work with open source to make a business.
They take some open source tools (maybe they built it themselves, maybe not), package then and maybe sell some cloud service around it or simply just support.
If I was the sales person I would be delighted to meet the person who wrote a part of the package.
The only reason to be embarrassed is if it did not contain the correct attribution and recognition.
Edit: the fact that it happened 2000 could have made it embarrassing though. Many things have changed since then...
Yes the tools were all open source. This was in 2004 and I just zipped up a compiled exe, the src directory and a license.txt (GPL2) and put it on my website. Similar to Nirsoft which is what inspired me.
The reason behind it being awkward was for half an hour the sales guy had talked up how they were the only company developing tools like this, etc, etc. How the 'big players like VMware don't care about these pain points admins have to deal with' or words to that effect (which was true and why I made the tools in the first place).
Then the moment he finished the sales pitch and I see these 'one of a kind tools' they have been developing I respond with "That's my tool, see..."
I believe the sales guy, at least, believed all the tools he was talking about were made in house. Open source wasn't a widely understood thing back then with Microsoft talking about Linux and open source being a "cancer" and such. Really knocked him off course as I guess he had never been in that situation before (I doubt many have?).
As for following up with the company. I did nothing. I was young and while I knew what they did was wrong (they literally removed my name, link to my website, etc. and put their company name but changed no functionality of the tools as far as I could see and certainly no source code was available!) I didn't have the confidence (or desire tbh) to chase up on some little tools I made to learn and make my life a little easier at work. The company disappeared (I don't know why) sometime around 2010 iirc.
Honestly, it sounds like they might have been willing to purchase a license from you to redistribute the code. Had you reached out, you may have received a few bucks and proper attribution for your work (and your work would have reached a broader audience).
Possibly. Of course they could have contacted me first rather than modifying my program to pass it off as theirs.
I mean it was GPL2 so they could have just used it as is.
Of course this was the early 2000s where many people saw "open source" and felt like it meant they could just do whatever they want. I bet they never entertained the idea I would ever find out let alone be sitting in a sales pitch :)
> I've learned ... instead to simply say "I have a lot of experience with that technology" and leave it at that.
Shows Brendan's maturity.
I am not sure what should be the appropriate reaction or corrective measure in these situations. We should talk more about handling these unfair situations.
Someone else can become more successful building on top of one's open source project. On a resume, a top contributor and a minor contributor to open source project might have same weightage depending on how you present it - making the situation unfair for a person dedicatedly working on a single project (quality) vs minor contributor to multiple projects (quantity).
But deleting name and credits is wrong. An acknowledgement from the benefitting person (if not the recognition/reward) has far more positive impact on career than justifying to other's that your work was stolen.
It was a bit strange to read some of the initial negative comments. I see Brendan being a sport. I would argue that reading the story as a report against unknown persons at Sun makes more sense. I don't see much sense in blaming victim. And, in my opinion, the VIP had a good run but he isn't the bad guy here.
Thanks. There was a time when many observability products were adding latency heat maps, and at one conference expo floor there were three companies with latency heat maps on their screen at the same time, pitching them as a flagship feature. If I walked near them they'd start trying to explain them to me, and I never figured out an appropriate response. If I said "hey, great to see you added them, I invented these back at Sun" I'd get funny looks.
I think it's a small world, and everything is software, so the chance you'll bump into someone who wrote software you are using I think is pretty high. I was once trying to get my head around Andi Kleen's pmu-tools, and I had the github repo open in my browser on my laptop I was carrying, when the guy sitting next to me on a bus says he's Andi Kleen. (Ok, it was a bus taking Linux conference attendees to an event, not a random bus, but I still found it remarkable timing -- I was studying pmu-tools at that exact time!)
Still, it must be quite rewarding to know that everyone, no matter how big is using your tools. Before i knew anything about open source, i was somewhat surprised to see that even the giant that is Apple had open source licenses on their iPod. I assumed that apple had enough resources to develop all their own software, but no, they go just like everyone else and pick off-the-shelf software.
Thank you for sharing some of what you've learned with everyone in everything that you've published. I've been reading the latest addition of your systems performance book the past few weeks and it is amazing. You're work is pretty awe inspiring.
> If I said "hey, great to see you added them, I invented these back at Sun" I'd get funny looks.
I don't understand. What kind of funny looks were they? Disbelief? Distrust? Fear of your mental health? Realization of having been lied to by their bosses (oops it wasn't really an internal tool)?
Also, what were the impact of those funny looks? How did they make you feel? Was there any longer term consequences of telling them you wrote the thing?
Disbelief and suspicion. And fear of my mental health I guess: What's wrong with this person?
Maybe I just don't look or dress or sound like what one would expect. But there's context here too: At the time it's when these things are flagship features and on the booth monitors, and the booth staff are explaining the virtues of these features to everyone they meet. They are making it a big deal of it at the time, so maybe that makes it even more unbelievable that the inventor would wander by at that moment.
Now imagine what would happen if companies had a thanks page along with the other boilerplate pages (contact us, about us) on their website. If you're making millions from a thing, thank the original person for that thing. (I put thanks pages at the end of my slide decks, it's not hard.) These interactions would go a lot better -- "my name is on your company website" -- and could lead to fruitful discussions and collaboration instead of weird looks.
Years back I was at a deep learning conference and was reading Andrej Karpathy's blog during one of the talks. Demis hassabis had come in slightly late and sat at the last free seat that happened to be next to me.
He leaned over, asked if I liked the blog, and (slightly proudly if I remember correctly) mentioned that deep mind had hired Andrej for an internship starting soon.
> I am not sure what should be the appropriate reaction or corrective measure in these situations. We should talk more about handling these unfair situations.
Start recording; have them, a big multinational with a massive legal department, admit to violating and stripping a license from source code. Then sue them. They should know better, and they're making billions off of other people's work. That in itself is fair enough, if the license permits it, but removing the license is crossing the line.
Oh it needs to be redressed and some knuckles soundly rapped, maybe someone even fired depending on the situation, but suing is a last, last, last resort. "WARNING: Do not feed the Lawyers".
Most jurisdictions in the US are one-party-consent. I think the tech crowd tends to have a skewed perception of recording consent rules because California happens to be one of the relatively few two-party-consent states, but it's the exception rather than the rule.
Laws around recording typically also cover cases where an outside person, who isn't a party in the conversation, is recording. The idea is that there are three possibilities: all parties in the conversation consent to recording, one of the parties consents (almost certainly the person who wants the recording), and none of the parties consent (ie, someone is spying on the conversation). One-party consent is legal in a variety of countries and regions of countries, while zero-party consent is illegal pretty much everywhere I know.
> Some places only require one party (you the recorder) to consent.
That's whats you said, and it's not true, without consent from at least one recorded or being in a situation where recording is normal (tv etc) it's pretty much everywhere illegal.
And i wrote 'Most Country's' which is a hint that there is some 'Some others'. There is even a Country where singing under the shower is forbidden, in MOST others...it's not.
Sounds a bit more complicated then what you think it is:
>But the reality is that it is normally against the law to record a phone call without the other person’s consent.
>In fact, ‘covertly’ (secretly) using a listening device such as a mobile phone or digital recorder and publishing or otherwise distributing that material can amount to a criminal offence.
Recording private conversations:
>The laws only apply to ‘private conversations’, which is one where the parties may reasonably assume that they don’t want to be overheard by others.
>One of the exceptions to the prohibition against recording and/or publishing or distributing records of private conversations is where police officers have obtained what’s known as a ‘surveillance device warrant’ – also known as a ‘wire tap’ – which allows for the recorded material to be used for investigations and tendered in court provided, of course, that the material is relevant to the proceedings at hand.
Between jurisdictions:
>It is legal in all jurisdictions to record a phone call if ALL PARTIES to the phone call consent.
Yep. Here's a list of one-party recording consent states from [1]:
Alabama, Alaska, Arizona, Arkansas, Colorado, District of Columbia, Georgia, Hawaii, Idaho, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Michigan, Minnesota, Mississippi, Missouri, Montana*, Nebraska, New Jersey, New Mexico, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Rhode Island, South Carolina, South Dakota, Tennessee, Texas, Utah, Virginia, West Virginia, Wisconsin, Wyoming
Consent by one party of the conversation. If you initiate a recording of a conversation, you can reasonably have consented to you, yourself, recording the conversation.
Note that I believe (and, IANAL) that if at least one party to the conversation resides in a "two-party consent" jurisdiction, you will need the consent from all such parties.
I didn't read maturity from this. I read timidness, conflict aversion, lack of standing up for oneself. Someone touring the world making hundreds of thousands of dollars, demoing your own software and claiming its their own? Violating your license setup, the foundation of OSS? I would have spelled it out as clearly as possible, including the legal implications, spelled out my assumption this person was claiming they worked hard on these tools when instead they did minimal stealing, and either talked about legal follow-up action or financial follow-up action. This is a time where anger, frustration, and being stern are justified.
>I am not sure what should be the appropriate reaction or corrective measure in these situations.
When it happens internally, i.e. I catch someone doing it, then either it is a "first time offense" of the clueless, or it is the act of an unethical person who will be unrepentant. For the clueless, it might be that they undervalue themselves, and therefore undervalue assigning credit. The unethical person, however, understands what they are doing and is simply untrustworthy. They will also likely have a lawyer, because they've done this before. So it can be pricey to get rid of them, but get rid of them you must. They are poison to your team.
A colleague of mine was attending a local tech conference just before Covid hit. I believe it was aimed at newcomers and various companies tried to show off exciting tech that interns/new coders could potentially work on.
She couldn't believe what she saw. A major govt. backed logging company (which does do a lot of dev work themselves) were showing off one of our projects as theirs! We were using depth cameras to estimate the volume of wood loaded on a truck as it drives trough a gate. They even used screenshots that I had made!
Now, they were involved in the project. But they were basically clients of our client. They provided us with a place to test the system as their trucks ran trough. They didn't own the software let alone did any work on it. Why they would present it as something interns could potentially work on is beyond me.
I would imagine they would say to an intern once they joined, "yeah you won't work on THAT project, but here's some other thing for you to do". It's like the estate agency putting a listing of a nice place on a website and then when it comes to viewing the place they say "well that one was just taken but let me show you something similar".
Maybe they wanted to show something that the interns could try and replicate. I mean, if everything your in house dev team is working on are CRUD applications built on 90s tech, that's not very exciting.
I had a similar, smaller scale of that happen, in a sort of reverse direction a few years ago. My director dropped a resume on my desk for someone coming from a company I had worked at ten years prior, thinking I might have met them (it was a small organization). I didn't recognize the name, but skimmed through the resume quickly, and their primary claim on their work history was something I invented just six months before leaving the organization. I couldn't believe it. What were the odds of that resume ending on my desk? Basically zero, but what a huge mistake. And they didn't claim they had maintained and extended it, they claimed they had invented it! He didn't get called in for an interview.
Having conducted technical interviews at FAANG companies for over 10 years, I've gotten to the point where I never believe anything anyone claims on their resumes unless I can independently verify them. I also make it a point to ask probing questions about where they got the idea of "inventing" what they did and what alternatives to "making a brand new thing" they considered at the time.
However one of my favorite questions to ask is, "What don't you like about what you invented?" A true creator of something is always acutely aware of its flaws. They're unsettled about its shortcomings and wants to do better. In describing the flaws they demonstrate deep insight into the problem space, and they explain expertly how things don't go perfectly in certain corner cases or how the code could have been better organized.
The poser will almost always struggle to come up with criticisms of what they "invented" and try to pass it off as a spectacular feat of engineering brilliance.
Your favourite question has some cultural gaps as in many countries in interview settings people downplay weaknesses and flaws. It’s why a lot of weakness questions are often ineffective. Unless you are acutely aware of when a person is doing this BECAUSE it’s an interview you’re going to get some answers that might lead you to reject good candidates.
I've made a lot of foul-ups in the course of my career. When people ask me about mistakes I've made and what I've learned from them during interviews it's generally an easy question to answer because I have this overstuffed mental file folder of examples.
I can't speak for the US but, in the UK, don't misrepresent your work in a job interview. I can't say you'll never miss out on an offer by being honest, though I don't believe I ever have, but would you really want to work for people who'd prefer you to lie or misrepresent mistakes you've made than be open and truthful about them?
To me that's something of a red flag: it's at least indicative of a culture where mistakes are likely to be covered up, leading to a lack of reflection, learning and improvement... and also quite possibly storing up bigger problems for later.
(FWIW, I started as a developer and am now the CTO of a mid-sized multinational market research and insight company. This is nowhere near as grand as it might sound, and isn't meant to be a boast, but hopefully illustrates that being honest doesn't appear to have done my career any long-term harm. Some things that have, if not derailed my career, caused me to take some fairly substantial detours: (i) taking things too personally, (ii) placing too much weight on others' assessments of me, (iii) and I say this as somebody who is wary of people who change jobs too often, but... staying in a job way past the point where there was anything else I could learn/give/progress. I am, of course, but a single data point.)
> Honesty is something that never goes down well in an interview when it comes to being critical. (At least in the US)
I wouldn't paint all tech companies in the U.S. with such broad strokes. In the interview loop for the job I have now in the U.S., every single interviewer asked me a question about "how things could have gone better." I talked about mistakes I made, lessons learned, and how I could do better next time.
I am told the feedback from that loop was across-the-board "outstanding."
Amazon thinks otherwise. This is their Earn Trust Leadership Principal:
"Leaders listen attentively, speak candidly, and treat others respectfully. They are vocally self-critical, even when doing so is awkward or embarrassing. Leaders do not believe their or their team’s body odor smells of perfume. They benchmark themselves and their teams against the best." [1]
There is a difference between being self-critical once employed (where I agree it's a useful practice) and being self-critical during an interview (which is often viewed as the process of selling yourself in order to get a job).
Amazon's "principles" is also just that, corporate shpiel. 100% of Amazon employees, including Jeff Bezos, would fail if they were actually tested against their own dogma.
I thought this was going in the direction that I experienced. I had a recruiter try to recruit me for the position that I was already in (we were expanding the team, not replacing me).
Great article. Gives me flashbacks to the time someone sent me a link to a newly-released version of Minecraft, and it turned out to be using my own voxel engine. :D
(No credit of course, and the marketing copy around it made it sound like it was all their own code. Welcome to open source I guess!)
Annoyingly there's not much to tell. It was a minimal web version, released as a one-off marketing thing and then never updated, and Mojang didn't reply when I tried to get in touch (via contacts at microsoft). So I never really found out anything about it.
The game is still live though: https://classic.minecraft.net - Originally it supported multiplayer, but that stopped working the day after the game launched.
> Minecraft Classic - official game from Mojang (I'm as surprised as you are)
lol. Microsoft should just release all the source for this since Minecraft isn't exactly state of the art these days. The brand is nearly all of the value.
Almost been there. I was once hired and given some sources to work on by a company, and those were the same exact sources I wrote at another company years before, although my name and all recognizable comments were stripped, save for a few almost invisible traces I left like my initials paired with reserved words to make them appear like directives, pragmas, etc.
The guy who had given me the "new" sources was without any doubt responsible, since he was PM at the old company having access to everything, but almost certainly he had no right to use them over the old company. It took like 10 seconds to me to recognize them as filenames, function names, network usage, variables and structures all were the same, so it went like. "Hah, no problems, I recall this very well, in fact there should be my name somewhere... oh rats, it got deleted somehow, no probs however, consider it done":=).
Consider GP's position: they wrote some code for their old employer, and therefore had no copyright over it (I hate capitalism). Point is, they had very little stake, beyond personal pride and ethics.
So now they see that their code has been stolen from the previous company, and they're being asked to work on it, what will they do? Refuse to use the code and risk being fired for being too slow? Report the crime and risk being accused of attempted to cover their own tracks? Talk to HR and being hated by one's closest bosses?
I see no way to really win here. At least, not reliably.
Yes, but that code wasn't mine either as I wrote at another company as their product. When I added my "hidden" traces I did that with no intention to claim ownership, but rather to leave a signature just in case. Technically the 1st company should have sued him, but I can't know for sure the details; the old project was then dead, and he could have purchased the sources legally, although I doubt that.
I honestly don't know, I for sure couldn't foresee that some years later I would have been called from others to work on the same sources. The work environment was really good and I wished the project lasted more, but alas all good things must come to an end. Anyway I had no explicit reason to leave traces other than maybe some odd reasoning after listening to a ("toxic", as some of us realized later) colleague talking about bad experiences in other places and signing his sources in a similar way, but to me it was mainly a "Kilroy was here" thing.
I personally wouldn't work on code that I knew was stolen from a previous employer, especially if its code I worked on directly. Ethics aside, if you ever get caught you are personally going to be subjected to entirely to many questions and its the kind of thing that could follow you around for the rest of your career even if you really did nothing wrong.
Thankfully the "DTrace expert" didn't turn out to be Bryan Cantrill. He's a big fan of Scott McNealy's engineering principles for Sun Microsystems, which includes "Don't Cheat", and the behaviour in this story very much seems not in the spirit of that principle. I wonder if this bit of 'cheating' ended up getting someone at Sun in trouble.
It definitely (obviously?) wasn't me. I had introduced Brendan to DTrace in 2004 (over e-mail) when I discovered his earlier psio work[0] (which was based on TNF, a tracing facility that predated DTrace that was better than nothing -- but not by much). I knew that he was going to be very excited to learn about DTrace, and it wasn't long before he was delighting us with some really creative scripts. (His shellsnoop.d[1], in particular, really got people's attention as to what what could do with DTrace!)
While he and I had corresponded a bunch, ironically the first time I met Brendan was in 2005, it was in Sydney, and I was on a tour of Australia talking about Solaris 10 -- but it followed this incident by several months (if I recall correctly). I was excited to meet Brendan, and he was excited to meet me -- especially so after his poor experience several months prior!
Given that Brenden and Bryan worked together closely and seemed to be very fond of each other; and the article mentions that "the VIP may not have known" I think it's safe to assume it was not Bryan.
As if it had been Bryan, Brenden would have been more sure.
To the extent that the two of them have disagreed about things, they don't seem to have any issue saying so publicly, referring to the other by name: https://news.ycombinator.com/item?id=16382456
An analogous experience: I once chaired an American Bar Association committee that developed and published "fair and balanced" model software license provisions, extensively annotated. Twice in the following year, in working on client deals, lawyers for The Other Side proposed draft license agreements that were indisputably copied wholesale from the model provisions — but with certain fair-play features omitted.
I had a vaguely similar experience. I was working on virtual reality race car simulators that used sophisticated custom designed and built motion platforms. I had several responsibilities, the first was the software that controlled the motion platform. When it was done, and I proceeded to the sound effects system, my boss turned over my motion platform code to his good friend who he had brought on to the project with no interviews. A month later in a status meeting, the new guy said that he had to replace about 20% of my code to fix defects. I was curious, looked, and saw that there were zero code changes, but the comment blocks were changed replacing my name with his. Kind of sad for him. He literally had done no work.
There's a long industry history of stealing code, but I'm surprised to hear this story of Sun allegedly doing it.
My impression is that reason for the stealing usually makes sense. For example, a key library that's hard to write that's just copied into the source tree, ignoring licensing. Or an appliance developer didn't want to deal with licensing for Linux or BusyBox. Or an individual developer in over their head quietly copies code.
The time I heard an explainable incident happened with my code, was in mid/late-'90s. An acquaintance, who'd offered to be one of the testers for an unreleased Java desktop application I wrote, then reportedly ran it through a decompiler, and passed it off as his own code, in a demo to investors. He later acknowledged doing this, and said he'd send me a Sun (ha) workstation as compensation. I declined.
Then there are incidents for which the reason isn't obvious, like the one from the article. I speculate that sometimes the explanation might simply be that the perpetrator wasn't quite right in the head at the time, like in some famous cases of journalism fabrication.
An inexplicable one involving my code was when an open source developer took a substantial and novel package that I wrote, stripped out my name and license notices, including out of the main file, and posted the package with themself identified as the author. There were also a couple other incidents with that person that seemed like that hadn't yet learned how to play well with others, in engineering or open source. I asked a mutual acquaintance, in confidence, what was going on with that person. The acquaintance checked, and was also baffled. In that case, I suppose that maybe the perpetrator was going through a difficult time, and not thinking clearly. Or maybe it was a combination of unlikely accidents that looked worse than the intent was (which happens).
In the article's story of the Sun incident, I'm a little surprised that (speculating) an engineer could do this despite all the other people in engineering who might be in a position to notice something funny going on. And Sun had been the dot in some dotcoms by 2005, so presumably they had some strong engineering processes around what goes into product.
Maybe the demo was something put together by a systems engineer, working as part of a small marketing/sales team, rather than under an engineering organization, so a lot fewer engineers were aware of it?
You are very compassionate towards people just literally stealing code. My experience is that a lot of folks just don't care about licenses: they steal and rebrand by any mean necessary, I guess to accumulate reputation.
I wrote a few silly scripts in my lifetime, all small-time stuff (so trivial it never got me hired as a Dev anywhere). No matter whether I marked things as GPL or BSS, I often found them copied on GitHub with my copyright notices stripped. In the case of the BSD license, that's literally the only thing you can't do!
In some cases it was utterly blatant: cloning from my own GitHub repo and then replacing authorship notices in the very first commit. I mean, come on guys, at least try to be smart; you can add your name just fine... When I politely asked to, y'know, respect license terms and reinstate the notices, some people just took down their repo rather than doing that. I bet they then re-uploaded it somewhere else.
I've seen blatant cases as well. There was a github repo where someone was sharing their notes on performance engineering, and there were tickets from people thanking them for their amazing work. Except these notes were literally copy-n-pasted pages from my systems performance book. It's weird to find all these messages from people thanking someone else for the work I did.
When I contacted them to ask how they thought it was ok to republish my work like this (I was giving them a chance to explain), they just took it all down.
I was in shock, and a bit furious at being ripped off, but didn't feel there was a lot I could do about it. I edited this bit out (I was trimming the post):
"You might wonder why I didn't talk about this first case publicly at the time. I'd already informed them of the problem privately and how to fix it, so there wasn't more to say. Also, Sun was the number one employer in town, and to be publicly critical could be career ending."
I don't know what happened to the US developer. But I didn't think it made sense to burn bridges with Sun about it. I was more worried about giving people a bad experience with DTrace by running older versions of my software.
> Is making and using multiple copies within one organization or company “distribution”?
> No, in that case the organization is just making the copies for itself. As a consequence, a company or other organization can develop a modified version and install that version through its own facilities, without giving the staff permission to release that modified version to outsiders.
> However, when the organization transfers copies to other organizations or individuals, that is distribution. In particular, providing copies to contractors for use off-site is distribution.
Please consult sources before making a strong statement.
> No, GPL FAQ specifically addresses internal distribution
Which couldn't be more irrelevant, because
a) This software doesn't appear to be licensed under the GPL.
b) The question here is not whether or not it's distribution, it's if it's copyright infringement. Making copies of copyrighted material is illegal (up to fair use and licenses) regardless of whether or not you distribute them.
c) The GPL also requires you maintain the license header when you are making copies (GPLv2 term 1, "You may copy [...] the Program's source code as you receive it [...] provided that you [...] keep intact all the notices that refer to this License [...]" (other terms and conditions do apply, hence the ...'s).
d) The license (and the law, and court rulings) is the source material, not GNU's not legally binding faq.
TFA does say the code was under GPLv2 and CDDL. I would say it’s entirely within your right to distribute it in accordance with what the GPL FAQ considers okay, which includes internal distribution however you want. License header is required when distributing externally.
Oh weird, read over that line. Removes one of my bullet points but doesn't change the result. I agree you can distribute it internally, i.e. you can pass around the hard drive with the single unmodified copy. You can't make copies of it without the header, regardless of whether you keep it internal.
> Does the GPL require that source code of modified versions be posted to the public?
> The GPL does not require you to release your modified version, or any part of it. You are free to make modifications and use them privately, without ever releasing them. This applies to organizations (including companies), too; an organization can make a modified version and use it internally without ever releasing it outside the organization.
You're not free to make copies without the license. You are free to make copies with the license, because that's what the license granted you permission to do. You aren't free to do so without preserving the license, because you don't have a license to do that and that's copyright infringement.
You aren't required to publicly release derivative works or their source, simply because the license grants you the ability to produce derivative works (provided you keep the copyright information intact) without publicly releasing them. You are required to keep the copyright information intact, because the license does not grant you permission to produce derivative works if you do not.
The requirement to keep the copyright information attached is simply not connected to your choice to distribute it or not. It's a requirement any time you make a copy or make a derivative work, even if you're only making that copy or derivative work for yourself.
That was my understanding. If removing the license is a breach of terms, therefore the GPL now fails to apply, then it falls back to default copyright laws. And no one would argue copying MS Windows 'internally' is fine.
In the sense that previously one computer was running the software at a time, and now 1000 computers are running it because I copied it onto all of their disks/memory banks/CPUs.
I still control all 1000 computers, but I'm doing something I fundamentally cannot do with a single copy.
The question is a bit murky, not because it's unclear I'm making copies, but because some copying is fair use. For example to execute a program I need to copy it from the hard drive to ram, and (pieces of it) from the ram to the CPU. That's generally considered to be legal, despite being copying, approximately because it's necessary to use the program. The exact line here is not well defined.
That’s a pretty interesting thought experiment. How likely is it to burn bridges with a multibillion dollar corp by taking them to court to protect your IP? They might even respect you for it. Most likely the people working there will move on eventually and especially if they are responsible for IP theft that results in a heavy legal bill. Companies sue each other over ip all the time and then still do business with each other. I’m also assuming OP’s bridges with the persons directly responsible are already burnt. So which bridges exactly are being protected?
I don't know much about law, but I wonder what the ballpark out-of-pocket legal costs might look like for an individual going after a large corporation with an IP suit... i.e. how likely is it the corporation could win simply by dragging out proceedings and exhausting your resources / burying you with legal fees?
You don't need to get into legal proceedings, if the facts are clear.
If you can prove it's your invention, and they're distributing it without your permission, they won't want to waste money defending a suit.
You may be a little guy, without enough dosh to launch a suit; but you could sell your interest to $COPYRIGHT_TROLL, then they'd be in trouble. So just ask them nicely to propose a settlement.
It's not like war, or anything; tech-company legal departments are there for just this kind of thing. Unless there are arseholes involved, it can all be friendly and business-like.
You’re right but it kind of renders copyright law to be another elitist construct. I wonder if there are lawyers or organisations who take up these types of cases to negate the behaviour you have outlined. Would EFF for example find it in conflict with their mission?
Wouldn’t you have made significant bank if you had sued? To the extent that you never had to work for a paycheck ever? Just curious if a lawyer consult might have been useful.
Brendan has "made more bank" since then than he could have made in that one suit or since that suit if he'd filed it. Brendan's a pretty big shot nowadays and is almost certainly very well compensated, but when you demonstrate that you're too happy to sue you tend to get discriminated against, so if he had sued... it might have ended his career.
Sorry, should have said "tech" employer, and this is referring to Sun and its ecosystem. A few years earlier I gave a talk to a local University about the job market and asked the comp sci students to have a showing of hands as to whether they thought they'd work on Windows or Solaris when they graduated. Most hands went for Windows. Then I showed the local job statistics: Sun Solaris was number one.
How fantastically different to my experience. I graduated 2010 in Australia, and we were just entering into the pre-cloud age. A lot of us expected to either move to SV or just work at local consultancies for banks or insurance companies (which I did). 10 years earlier, I would have gone to work at Sun!
I come from Melbourne and know that there are several rather large multimational engineering centers in Melbourne around this time period. Ericsson, NEC and Robert Bosch comes to my mine..
I thought Solaris is being completely eclipsed by Linux.
Even windows are surprising because hardly anything except gaming and limited local clients are windows. Granted some people choose to develop on Windows.
Recently I was struggling to understand MotionLayout in Android and how to make it do what I wanted for my app. I had searched all over the internet reading blogs and forum posts trying to figure out what I was doing wrong. I ended up watching one particularly helpful presentation video (I think it was from a DroidCon) given by this guy named Jason Pearson. It helped a lot, (watched various parts of the presentation multiple times) but eventually I still got myself stuck again. I ended up dropping a question out on StackOverflow in desperation just to see if anyone might be able to point me in the right direction. Well, fast forward a bit and someone responds to my question with a really great answer that helps me understand what I was missing. Happened to look at the user who wrote the answer and saw his name was Jason Pearson... Then I had to double-check and sure enough, it was the same guy!
So, shout out to Mr. Pearson for a great presentation and a really helpful SO answer!
And I thought it was fun when a colleague who publicly belittled me on a somewhat regular basis for scripting (“we're not programmers!”) triumphantly showed a blog post he’d found to solve a problem he was having.
A blog post on my blog.
In fairness to him, it was still on a domain that didn’t identify me at all, though the “About” page had my name on it.
He then proceeded to launch up the chain of command to get me in trouble for publishing company secrets on said blog.
I promised to only write it outside of work hours and not to directly copy any code (scripts, one-liners) I wrote at work, and that was the end of that.
There were other less-antagonistic instances of others in this large company finding solutions on my blog, which says something about the value to the company of posting generic stuff like this on publicly-searchable platforms instead of an internal-only SharePoint or Confluence instance, and definitely instead of the PDFs our senior management insisted on for all documentation.
I worked at a company that was interfacing with the AOL Instant Messenger network in the early 2000s. Finally the business people got a deal and AOL made us sign NDAs to get access to the official API docs. We were excited because we were using the docs that one of the open source clients had produced from their reverse engineering efforts.
AOL's docs were the same ones from the open source project, with the GPL message intact.
What does an NDA-encumbered GPL document mean in practice? You're allowed to spread the document, but not tell anyone where you got it from?
I once received a library under a modified Apache 2.0 license, where all the conditions under section 4 (attribution, etc.) were just inverted for a given number of years (effectively an NDA), after which the normal Apache 2.0 would apply. Which worked because it came straight from the original copyright owners, but here I assume AOL weren't.
It's a small world. Many, many years ago, I wrote a little help desk application for a one man ISP. The only requirement? "I have to be able to respond to support requests using WAP". This was, of course, before smart phones were a thing. Typing long replies on a phone with only a numpad was not pleasant. So I built something with a lot of prehashed replies and parts of replies, which could be accessed by entering a couple of digits (with some very rudimentary context awareness).
Fast forward 10 years or so, I'm consulting at a company. When I see one of their support engineers using a tool that looked vaguely familiar. Turns out the company had acquired the one-man ISP and had continued developing the little helpdesk application I'd made for WAP all those years ago.
I was fortunate enough to happen to be live at LISA13 where AFAIK, Brendan gave his first public flamegraphs demo. I'm glad I saw it. I love learning about areas (performance engineering) where I'm weaker. Great demos! I've tried to keep my personal live demos at a high quality too. Can't wait to see what he comes up with next!
Flame graphs completely blew my mind, but most people I talk to about them just don't seem to get it.
I'm a generalist, but I've been thinking lately if performance engineering is something I should be specialising in. I'd love to hear any advice from those in the field.
Specializing in anything is good if you have 5+ years of exp IMO. It's always good to stand out from the crowd of generic developers.
The other plus is, if you frame yourself as a specialist in X, it might be easier to explain not knowing Y. You can't simply know everything, especially if you invest heavily in specializing in other things.
Personally I've been a web performance (i.e. mostly JavaScript/HTML) guy lately and it's been fun. There's a million of React devs in the world, but only a few dozen web perf guys with web presence (Twitter / blogs), and I almost know all of them by name at this point.
Of course it depends what you want to do. Typically it's big corps who look for specialists. Small companies prefer generalists.
My problem with picking a niche is that ones that are easy to learn from a book are too crowded to be considered niches, and the rest can only be learned by doing in real production environments. Now there’s one where I have a foot in, and it’s one that I enjoy.
Yep exactly, it's difficult to learn in isolation. One must try their luck with applying to jobs that you're underqualified. And then seizing the opportunities once in :)
Can someone elaborate on why he takes a weird detour in the middle of the post to discourage forks? Is there some particular issue with supporting bpf tooling forks?
As someone who has spent a lot of time in open source, forks are not a problem, they are indicative of a problem.
Don’t cry when people fork and go a different direction, try to figure out why and see if you’re willing to change the project to accommodate them. Dropping chastising “more wood behind fewer arrows” platitudes is pointless when half of the wood wants to break off in a different direction anyway.
Yes, there can be good reasons to fork (especially after making a fair effort to have things fixed), and bad reasons.
But also yes: There are two particular issues with bpf tooling forks. 1) They look deceptively simple, but those that are kprobes-based are really kernel-specific and brittle, and need ongoing maintenance to match the latest changes in the kernel. One ftrace(/kprobe) tool I wrote has already been ported a bunch of times, and I know it doesn't always work and one day I'll go fix it -- but how do I get all the ports updated? No one porting it has noticed it has a problem, and so the same problem is just getting duplicated and duplicated. Which is also issue 2) Unlike lots of other software, when observability tools become broken it may not be obvious at all! Imagine a tool prints a throughput that captures 90% of activity and no longer 100% (because there's now a fast-path taking 10%). So the numbers for some deep kernel activity are now off by 10%. It's hard to spot, and that increases the risk people keep deploying their old broken ports without realizing there's a problem.
> They look deceptively simple, but those that are kprobes-based are really kernel-specific and brittle, and need ongoing maintenance to match the latest changes in the kernel
It seems like there is a missing formal interface here if this is so brittle, no? If it’s hitting a bunch of internal kernel stuff shouldn’t this stuff just live with the kernel itself?
The formal interface is tracepoints. So tracepoints in theory aren't brittle (they are best-effort stable) and don't need so much expert maintenance (which is mostly the case). In theory, someone could port tracepoint-based tools and almost never need maintenance.
But kprobes is basically exposing raw kernel code that the kernel engineers bashed out with no idea that anyone might trace it. And they can change it from one minor release to another. And change it in unobvious ways: Add a new codepath somewhere that takes some of the traffic, so gee, seems like my tool still works but the numbers are a bit lower. Or maybe I measured queue latency and now there's two queues but the tool is only tracing the first one, or now there's no queues so my tools blows up as it can't find the functions to trace (that's actually preferable, since it's obvious that something needs fixing!).
I really don't like using kprobes if it can be avoided (instead use tracepoints, /proc, netlink, etc). But sometimes it's solve the problem with kprobes or not at all.
Now, normally such code-specific-brittle things should indeed live with the code like you say, so normally I'd think about putting the tools in the kernel code. But we don't want to add so much user space to the kernel, and, it also opens the door as to whether these should actually be tracepoints instead (which begins long discussions: Maintainers don't want to be on the hook to maintain stable tracepoints if they aren't totally needed).
Another scenario where the tools should ship with the code base would be user space applications. E.g., if someone wrote a bunch of low-level tracing tools for the Cassandra database that used uprobes and were code specific, then they would be too niche for bcc, and would probably be best living in the Cassandra code base itself.
Thanks Brendan for creating the
bpfcc-tools! I’m using it in magicmake [1] which is a tool to automatically find missing packages when compiling, based on file path accesses.
> As someone who has spent a lot of time in open source, forks are not a problem, they are indicative of a problem.
Forks because the main project is not responding to a need are not a problem. Forks just because some project or product believes it gives them more control and they don't interface with the original project usefully are a sort of a problem. Those aren't necessarily started because of a community need, but some business need or perceived business need which may have nothing to do with the reality.
I think that's the situation he's asking to avoid here, as he mentions observability products. He's saying please don't fork just to have it under your own repo and for no other reason, when it can be developed jointly.
Why? If someone wants more control their interests already aren’t aligned with the community. Developer resources aren’t just some faucet easily redirected.
It’s worse to have people trying to jam shit into the main project if they don’t actually care about the main project. That’s how you get contributions that are huge hacks and require more work to review and iterate on than is worth it to the community.
If some company forks for an observability product and doesn’t contribute back, they clearly don’t want to. Don’t try to force them.
git is an incredible tool for managing forks and this anti-fork mindset is right out of old school open source culture from the early 2000s.
Demanding people contribute to your project instead of forking just sounds like demands to pay homage more than anything.
> That’s how you get contributions that are huge hacks and require more work to review and iterate on than is worth it to the community.
And yet the author here is saying please work together. Presumably he or the community is okay with picking apart those huge hacks. I'm not sure why you should care what that community is willing to accept, unless you're part of it.
> If some company forks for an observability product and doesn’t contribute back, they clearly don’t want to. Don’t try to force them.
Who's forcing anyone? Did you not notice the "please" he starts that statement with?
> Demanding people contribute to your project instead of forking just sounds like demands to pay homage more than anything.
That's just a straw man, nobody demanded anything, and I'm not sure why you'd even try to insinuate he was when he literally says please.
If I had to paraphrase the part of the post you're critiquing, I would do so as "If you're using these tools, don't feel like you have keep any stuff you do separate. We'd like to build something better for everyone, so feel free to contribute to the project rather than keeping it separate and we can all build on each other's work." That's pretty standard open source ideals IMO.
I appreciate the discussion as I want to do a follow on post about this in more detail. It doesn't do the BPF community any good to have people running old broken versions of my tools (just as back in 2005, Sun was selling DTrace by distributing my half-finished socketsnoop.d). Now, some people are going to port anyway regardless of what's good for BPF. But if I can reach developers who may be on the fence about it, I can explain the pros of just building upon the existing tools. To start with, you get updates for free by kernel engineers at Facebook, Netflix, etc. Also, by pushing fixes back you'll find people will try to help you in return for when you need it (some startups, like Cilium, have done so much for BPF that I'd happily help them with anything they need (provided there isn't a clash with my current employment)).
> If I had to paraphrase the part of the post you're critiquing, I would do so as "If you're using these tools, don't feel like you have keep any stuff you do separate. We'd like to build something better for everyone, so feel free to contribute to the project rather than keeping it separate and we can all build on each other's work." That's pretty standard open source ideals IMO.
Sure, that’s just a completely different writing of what’s actually there. The original post is an instruction not to do something. Your interpretation is a much more open “pull requests accepted”.
His reply clarifies that he doesn’t want broken shit out there so your reading is incorrect. He does want to discourage forks because they are subtle to get correct and doesn’t want shit out their sullying bpf’s reputation.
The criticism is of forks that are done in order to re-brand and re-sell a project, and then never contribute anything back to the original, and likely, never pull improvements from the originally forked project (which can include critical bug fixes) after that.
Serious question, how does that hurt the original project? This has never been a concern in the projects I’ve worked on. Getting mad every time some developer writes code that didn’t go to your project is a good way to be mad all of the time.
> Someone sells you BPF observability that's buggy and dissapointing, and you avoid BPF in the future. So in that case it's hurt the BPF community.
Ok, so someone sells me closed source tooling that uses BPF and it’s buggy and disappointing, so I avoid BPF in the future. This association problem exists regardless of closed source vs a fork vs an old release.
> Funding goes to a project that gives nothing back, instead of funding a project that does. Again, it hurts the community as funding isn't infinite.
This only hurts if it’s a closed/hidden fork. This also presumes that the fork isn’t just stripping out tons of upstream features (I’ve had to do this for clients because “security”).
My view is that forks of open source (GPL in particular) should be encouraged and done in public. Iterating on central projects only is just so slow and stifling.
Maybe your projects have very little churn so lots of people experimenting in parallel doesn’t make sense, but this has been my experience with larger infrastructure projects. Let a thousand flowers bloom.
Obviously it hurts if your project gets bad rep from an old unmaintained fork. If "everyone knows $your_project is broken, don't use it" it will spill back on you.
Somewhat like when powershell on windows had aliases for wget and curl that was just shell aliases for a simple downloader, making some scripts work without those two installed, but also confusing anyone that wanted to do anything outside of "get one single file from url".
I worked with a tech from a software vendor who had a link to one of my Server Fault answers in their setup docs. We got to that bit of the setup and I mentioned that I wrote the answer his doc was referencing. I quite enjoyed it.
I never got a chance to see it again. I ended up spending more energy with another Sun project that wanted to switch the CDDL license to the GPL, only because they wanted to avoid going through the legal approval procedure to ship mixed-licensed products. Maybe I'll share that story one day: it was another eye-opening experience.
TL;DR: DTrace uses a minimalistic C dialect called D to write it's probes, hence the name.
In order to trace arbitrary code, it has to inject these into the code site that you want inspected.
If you could just inject arbitrary C, you'd get the issue of potentially adding probes which change the behaviour of your code under test to a degree where new bugs/behaviours are introduced or old bugs/behaviours are masked.
DTrace solves this by using a C subset which helps you avoid such unwitting changes, by not including loops or other operations which could change the memory or timing behaviour of the existing system.
As an American, I found this and the other cultural anecdotes very interesting. We definitely do a lot of obnoxious things, or at least things that can seem obnoxious to other cultures. No arguments there. We should be better. =)
To an Australian, introductions in the US can sound
boastful, but they can also be useful as a quick
way to share one's specialties.
But this one puzzles me a bit. Typically, we talk up the people we're introducing - I'm not sure I've ever heard anybody talk themselves up during an introduction!
"Sally, this is Bob. Bob's been doing some really cool stuff with XYZ lately. Sally, I know you have too!"
(At which point Sally and Bob often politely insist that no, they're nothing special at XYZ)
I've always thought of this as gracious and not boastful. I definitely agree it would be obnoxious to talk one's self up!
I recently had an experience kind of like this. I run a website about a kind of niche topic. I got a call from someone at a FAANG one day working in a similar space who wanted to chat about it. I get these calls pretty regularly and agreed, always happy to chat about it. While she was showing me some of what they were working on it started to look very familiar, and I realized that a lot of the ideas had been copied from my site. In some cases they had literally copy/pasted my content. I mentioned this to her and she got uncomfortable and admitted to something like "yeah we took a lot of inspiration from your work." Frankly I don't care that much, I put the info out there to be used and use it they did! But attribution (or an acquisition offer) would have been nice.
That's not true. I was there, and I was there at the time. Sun had a lot of problems, but software was not an afterthought, at least not within the Solaris org.
they also (in my limited experience) treated sales as an afterthought. Its the only time I've contacted a company to inquire about a product, and had them actively dissuade me from buying the product (IAM).
Could anyone, please, share some light on the "low-key" introduction issue mentioned in the article?
I work quite a few years in IT and never, during any interview or meeting, I've been introduced as anything more than just an engineer. This must be cultural gap, no doubt, but I'd feel weird if someone would detailed my career in front of other participants. Of course I have nothing against filling some details in by myself but only if applicable in given situation. Truth to be told I've never worked in Australia or US, but I did some job in two EU countries and in Japan and, as said, never encountered detailed introduction.
Great story. One of my open source bioinformatics tools was ripped off, with name AND GPL license stripped off, by a person working at the EMBL/EBI (European Molecular Biology Laboratory European Bioinformatics Institute; a prestigious scientific institution)
Management, when alerted, made it right, but I think the point of my story is that this is perhaps more common than anyone realizes.
Wait but what happened? Did he do something about it after, like contact Sun saying "Stop! You have violated the law!" or whatever? If not, why on earth not?
What's the point of publishing with a copyleft license if you aren't going to do anything when someone literally walks into your office and says "we at Big Corp are selling your work without any attribution?"
I was hosting a research talk given by quasi-famous professor at a biotech startup that I worked at.
Quasi-famous prof was describing a gene (gene "xyz") being used as a tool in his lab, "but the specifics of what gene xyz is and what it does are not important. It's just a gene we use in these assays...."
Me: Do you know who has 2 thumbs and discovered xyz? This guy.
Any non-Americans want to chime in on what is a heavy American accent? I’m imagining heavy southern accent, but maybe this is something that can only be heard by non-Americans?
I'd say most people hear divergence from their local accent, and consider theirs to be (at least subconsciously) a normal accent. This results in English people hearing my accent as a bit Irish, and Irish people hearing the English parts. Neither acknowledge fully the shared parts. In truth, I am a bit of both.
So I'd say a strong American accent to an Australian is the most divergent one, not one that somebody in US might consider strong.
Living in upstate New York I also frequently heard "oh, you have an accent" to which I always tried to explain "so do you", and several times I got the response that their accent was either neutral or closer to generic English.
It is amazing the subjective differences in how people experience accents, and how they feel about their own.
I'm born and raised in Göteborg (Gothenburg), Sweden. My parents are originally from Karlskoga, 270km NE from here. Göteborg has a very distinct accent and I always get told by others that I have a very bland Swedish with no hint of that accent.
But one time I was at a wedding in Karlskoga and talked to someone I hadn't met before. I opened my mouth and managed to get half a sentence out before he interrupted with "Oh! You're from Göteborg!"
Same experience for me. I am from the Venice area in Italy, which is famous for a very strong and somehow funny accent and dialect. However, my family is from central Italy. This resulted in me being considered having a Southern, definitely not local accent in Venice and me being considered Venetian by everyone else in Italy. Probably the truth is in the middle.
There was a month or so after I moved back to Germany from the Netherlands where my own accent when speaking German was super noticeable to me. Otherwise I don't even hear my own accent when speaking English even though it is undeniable there when I listen to recordings of me.
It's like your mouth was full of water - in contrast to British Received Pronunciation, which is like a mouth full of down feather. Also compare to "Hollywood Soviet English", which is like a dry mouth, but the tongue is filled with helium so it floats up, and then 'r' sounds like this: https://upload.wikimedia.org/wikipedia/commons/c/ce/Alveolar....
I have absolutely zero ability to take your descriptions and convert them into an understanding of what you're trying to convey. Like what could this possibly mean???
Speak through your nose and and lean heavily on the R sound wherever you find it.
Peter Sellers called Americans "The Herns" [1]:
> Various American characters with the surname Hern or Hearn, often used for narration, outrageous announcements or parody sales pitches. The Goons referred to Americans as "herns", possibly because saying "hern hern hern...." sounded American to them, possibly because Sellers once said that a decent American accent could be developed simply by saying it in between sentences.
As a non-native English speaker I absolutely hate English accents where consonants absolutely disappear for no good reason.
Personally I'd rather have Americans "lean heavily on the R" than act like the letter doesn't exist (rhotic vs non-rhotic). I think it's another factor why American English is more popular than British English (besides the huge economic factor, the US economy being 5x the UK one), since their pronunciation is clearer and more explicit.
>Rhoticity in English is the pronunciation of the historical rhotic consonant /r/ in all contexts by speakers of certain varieties of English. The presence or absence of rhoticity is one of the most prominent distinctions by which varieties of English can be classified. In rhotic varieties, the historical English /r/ sound is preserved in all pronunciation contexts. In non-rhotic varieties, speakers no longer pronounce /r/ in postvocalic environments—that is, when it is immediately after a vowel and not followed by another vowel. For example, in isolation, a rhotic English speaker pronounces the words hard and butter as /ˈhɑːrd/ and /ˈbʌtər/, whereas a non-rhotic speaker "drops" or "deletes" the /r/ sound, pronouncing them as /ˈhɑːd/ and /ˈbʌtə/. When an r is at the end of a word but the next word begins with a vowel, as in the phrase "better apples", most non-rhotic speakers will pronounce the /r/ in that position (the linking R), since it is followed by a vowel in this case. (Not all non-rhotic varieties use the linking R; for example, it is absent in non-rhotic varieties of Southern American English.)
>The rhotic varieties of English include the dialects of South West England, Scotland, Ireland, and most of the United States and Canada. The non-rhotic varieties include most of the dialects of modern England, Wales, Australia, New Zealand, and South Africa. In some varieties, such as those of some parts of the southern and northeastern United States, rhoticity is a sociolinguistic variable: postvocalic r is deleted depending on an array of social factors such as the speaker's age, social class, ethnicity, or the degree of formality of the speech event.
My brothers and I did the ring road in Iceland a few years ago. One night we were eating dinner at a restaurant in a small village in North Iceland. Our server had a perfect North USA dialect of English. He sounded to us just like an American. We asked him if he had lived in the USA. He said he had never left his village and had never even been to Reykjavík. We asked him how he came to speak American English so perfectly and he said he learned it from watching movies and TV shows.
I'd include the US music industry in that. Songs are <<super>> important for learning a language, if you're constantly surrounded by songs of a certain language.
Plus... American multinationals are somewhat close behind. If you want to get a good, well paying job at an American multinational, you have to speak English at least a bit, and you have to know it well if you want to move up the ladder.
Recently we were doing interviews for a job. We had one gentleman from India applying for the position. We have hundreds of people with Indian English accents at our company, so it’s not like we are unfamiliar with the dialect. But this person… none of us could understand more than one word in ten from him. We persevered with the interview, each of us assuming that it was just us and everyone else could understand him. It took me nearly an hour to realize that he was pronouncing API as “ape-ee”.
I like British accent for its aesthetics, but from pragmatic point of view it's not even a contest (from the perspective of a non-native speaker), I agree.
It's not just the r's (in fact, some British accents are rhotic - around their South West, if I'm not mistaken?), there are all these glottal stops and whatnot.
But, from my observations at least, there are also big discrepancies related to social class. When I moved to the UK, I had no problem whatsoever talking to, say, a local librarian - but a plumber would be nearly impossible to understand for me, in the first months at least. I didn't really experience it in the US, certainly not to such an extent.
The thing is that UK dialects are virtually unknown outside the country; they can be very different from Received Pronounciation, but diverge in ways that are still fundamentally predictable for a native English speaker (unless you wander into Scotland or Ireland).
American accents are more familiar because of Hollywood, so they tend to be less surprising; and likely because a lot of them were actually developed by people who learned English as a second language, they are often exaggerate in effect, very clear, and actually more regular (particularly on names, where UK "rules" are anything but).
This said, "deep south" US accents, when pushed hard, can become as inscrutable as certain UK dialects.
What does that even mean? There's nothing clearer or more explicit about either. Maybe you mean that it closer matches the orthography?
> than act like the letter doesn't exist (rhotic vs non-rhotic).
Every language changes. Nobody is acting like "the letter doesn't exist", just occasionally that phoneme has changed or dropped in their dialect. Even in non-rhotic accents, an r in the orthography can indicate a change in vowel quality.
> I think it's another factor why American English is more popular than British English (besides the huge economic factor, the US economy being 5x the UK one),
I think the greater population, and the fact that the Hollywood content has embedded itself globally, has resulted in more exposure to American content (of which there is more of). Any dialect will sound clearer to you if you're exposed to it more often than others.
That's funny. As a non-native (German) English speaker, British pronounciation seems far clearer to me than American - barring strong regional accents of course, maybe thatis what you had in mind.
Yeah, but Received Pronunciation is basically an artificial creation, only a certain percentage of Brits actually use it.
Regular Brits use their regional access, yes. And those can be much, much harder to understand than your average American accent. For precisely the same reason rhotic accents can cause issues, those regional accents tend to eat up sounds and sometimes entire syllables.
Imagine a British accent. Now imagine that that is what’s normal; everyone in your country speaks it. Suddenly American sounds very different.
Specifically in my limited experience stuff like pronouncing t’s like d’s and soft back-in-the-throat r’s. “Budder” vs. “buttah”, for example. American vowels tend to sound larger as well in my experience.
Well, that's a specific regional American accent, in the US called a Southern accent. Probably the only strong/widespread regional accent really left in the states (you could also argue for AAVE/ebonics but it's not regional in the same way).
It's a bit like saying that someone from rural Bavaria who speaks with a strong Bavarian accent has a heavy German accent while speaking German.
Yes, but I'd bet that the Bavarian accent in particular is considered by foreigners as "heavy German". Bavarian culture (Oktoberfest) is considered "typical German".
Really rhotic r's, relaxed prosody, tongue flaps instead of a t berween vowels, indistinct schwas on unstressed syllables, etc.
I also perceive Canadians to have "heavier" North American accents than Americans do. Some Canadians speak as if from the backs of their throats, with leaden vowels and really round r's. They also often overcorrect /a/ to /æ/ so "drama" becomes "dramma" (like the first two syllables of "Dramamine". And of course there was William Shatner's famous "sabotadge"...
Americans can identify accents from within the US. Any such accent would be identified by foreigners at least as American and probably as coming from a coast, the central US and the south.
I would read that as an accent that isn’t a softer New England accent, but also not one that non-Americans would so easily place more specifically (Southern, New York, maybe one or two others?)
I almost had an heart attack last week when I noticed that a library I've used for work in the last year wasn't open source, but rather source available.
Thankfully, my employer had some licenses for the library without my knowledge, but it ain't fun to break licenses at work, especially when you don't notice until months later.
Definitely not -- see my earlier comment[0], but when I met Brendan in person later that year (and he relayed this incident to me), I didn't even recognize the name. Certainly, it wasn't someone who should have been claiming to be a DTrace expert! And honestly, by that time, any actual DTrace expert inside of Sun definitely knew of Brendan -- and likely vice versa.
there were many people inside Sun that were not Bryan Cantrill. In fact, almost all of them were not Bryan Cantrill. But there can only be one VIP and it can only be Bryan?
I'd love to hear stories from the other side. Behind each of these "someone else taking credit" war stories is the engineer or product owner at the other company who decided this was an ethical course of action. How did you justify this to yourself and to your boss, and what was the plan if/when you got caught? Doesn't your company have an internal process for vetting the licenses of software you use? It seems like these cases are failures at multiple levels in the company. It would make an interesting post-mortem. Use a throwaway account if you like!
I'm not sure if this is just my personal feeling but I would say that stealing intellectual property was sort of common at that time. Open source was not widely known, knowledge was scarce, communities were just ramping up and really anyone with lack of principles could pretty much steal anything and get away with it.
It happened to me a few times with online content I wrote. Essentially tutorials, articles, etc. around Open Source. Once my own company sent me a newsletter which contained one of my articles signed by another employee from a different place. It felt pretty weird.
The article says the story was from 2005. Open source was very much widely known at that time. Linux was 13 years old by then, and Sun themselves open sourced both DTrace and Solaris that same year.
At the time, Sun were pushing it heavily as one of the two salvations for Solaris, so I'm not surprised.
I have to say that my limited experience in dealing with Sun as a customer mirrors Brendan's comments around a remarkable arrogance, and it probably played no small part in their downfall.
> At the time, Sun were pushing it heavily as one of the two salvations for Solaris, so I'm not surprised.
Yeah, I remember that. I kept thinking "this tool might be nice for C wizards but it does nothing for my day-to-day experience as smalltime Linux user / admin". The other big thing was ZFS, which was interesting, but they were extremely uncooperative with the license, basically ensuring it would never make it big.
This has me imagining a struggling engineer at Sun, trying to live up to the arrogance and falling short, who finds Brendan's work and decides to save his career by passing it off as his own. Then some high-level executive decides to make it one of the straws that will save the company...
This is a made-up story, so you are free to make up your own ending as to whether this was the world-travelling VIP, and if so, whether he had some partial or complete flashback on hearing Brendan's name. One thing we can be sure of: whoever 'carelessly' stripped the copyright notice out of Brendan's code had seen his name before (and it was, perhaps, the part of the code he was most familiar with!)
What rights does a developer have in these cases? Can you get some compensation/damages for the license/copyright violation even if you were giving away the software originally?
Can you get more money if they violate the OSS license if you offer the software under a commercial license as well?
It makes me sad that the way a developer is expected to react when they discover that the Valuable Thing that they gave away for free is being sold by a Big Company for Big Money is to try to ensure that the Valuable Things they give away in the future remain free.
I'd much prefer to see the people who build Valuable Things show more interest in capturing some of that value.
There's this overwhelming narrative revolving around Open Source that makes it seem shameful to profit from your work. It's maddening to watch. There's no reason we as developers need to be the low man on the totem pole getting tread on by business people. We just set ourselves up that way and socially punish anybody who doesn't.
If that's what you took from the article, then you got completely the wrong end of the stick. The problem was the removal of the author's attribution and illegal relicensing. He says himself he was glad when Apple later included his tools in macOS with correct attribution and licensing.
The problem he should have noticed was that the Sun was selling the code he wrote for hundreds of thousands of dollars and not passing any of that on to him.
Step one shouldn't have been to worry about putting his header comment back in place and getting them the latest version of his code to sell to their customers. It should have been negotiating a redistribution license for his code if they wanted to continue selling it.
No, the problem was the licensing change and removal of attribution. I would strongly suggest reading what Brendan says in the article.
On this topic, I work for Red Hat where we made $3.4 billion in revenues in the last published year (before being acquired), making exclusively open source software which you can download yourself for no cost.
This works in the other direction: how many developers are willing to pay for their dependencies? Would things like npm even exist if you had to pay invoices for every bit of code loaded?
Semi-related... I TA'ed a Comp Sci class a long time ago (back when they had to submit their code as print-outs). I read through the print-outs and noticed that the "look" of one of them seemed oddly familiar (blocks, line lengths, indenting etc). I went back through the others and found another one that was almost exactly the same.
Took it to the Prof and we agreed the ones that copied got 0 on the project, and the ones who allowed the copying got 50% of their mark. Honestly, they got off quite easy.
This reminds me of how the developer of MINIX, Andrew S. Tanenbaum, found that Intel put his operating system on millions of machines only after reading about it in the media: "I guess that makes MINIX the most widely used computer operating system in the world, even more than Windows, Linux, or MacOS. And I didn't even know until I read a press report about it."
Once I worked at company X, while there I have seen a said Library for universally accessing things in OS. It was written by our senior architect.
Year later I worked at another company, there was this high strung developer boasting about his own library. He even put it up on a webpage - was before GitHub. It was same library at company listed above, so I ratted to the developer who wrote that library. I usually feel queasy about ratting, for some reason I did not feel a slight bit of dissonance.
> It was something I and my consulting colleagues had run into before: The belief at Sun that only Sun could make good use of its own technologies, and anything created outside of Sun was trash.
Yep. See Sun's response to the Linux SPARC maintainers technical critique of Solaris for SPARC. Meanwhile at Red Hat we started hiring all the ex-Sun people that had been pushing for x86 internally at Sun and gotten frustrated with the flip flops (RHEL 3 on Xeon already demolished Solaris/SPARC) .
A lot of us at Sun in the Solaris org had no love for SPARC. Sun made a very typical and terrible mistake: it sat on its laurels. Sun created some awesome things then tried to lock-in and milk customers. Sun (and later Oracle) did this with SPARC and J2ME, among others. All the products they did this with are as good as dead today in terms of market share -- surprise!
Vendor lock-in is not great for the customer, but when you try to milk the customers, you end up risking taking them to the FYO point [0], and that is catastrophic for the vendor, and the vendor never sees it coming and can't help themselves.
Sitting on your laurels is not good. Don't do it. Innovate. Then innovate some more. Then never stop innovating.
[0] Let me google that for ya: https://www.google.com/search?q=fyo+point
>The belief at Sun that only Sun could make good use of its own technologies, and anything created outside of Sun was trash.
As a former Sun employee, I can tell you that's true: people at Sun wouldn't even look at competing technologies because they were sure there was nothing useful to be learned from them.
That's why Bill Gates and Anders Hejlsberg were able to screw Sun so badly by examining and copying the good things about Java, and making something much better: C# and CLR.
While Sun totally failed to learn from any of the good things that Microsoft or Apple or anyone else did.
>We are better off with all the wood behind one arrow.
Nice reference to the old Sun slogan and 1999 April Fools Day prank, in which Sun employees put an enormous arrow through Scott McNealy's office.
All your wood notwithstanding, it also helps to chose an arrow that isn't flawed and doesn't totally miss the target. For what it's worth, Scott McNealy also put all his wood behind another arrow named Donald Trump.
Scott McNealy has long been one of Trump's few friends in Silicon Valley
Something I should be aware of; in Go programming, they sometimes encourage you to just copy some code instead of add another dependency. However, if you copy code from a codebase with a certain license, you should, if everything is above board, include the license as well then for that bit of code.
I mean in practice it's tiny utilities that I'm too lazy to reimplement in the exact same way, but still, it's something to keep in mind.
Someone downloaded the code for GANs (generative adversarial networks) from Github, generated (sampled) a painting using it, slapped the GAN objective function as a signature, and sold the painting for over $400k!
Are we being fools? Do you only get ahead in the world if instead of spending our precious time building things we use it cannabalizing other peoples work?
While I don't believe in karma, I do believe there are consequences to misbehavior, even if one is not caught:
The temerity and lack of ethics to appropriate a project that someone else has written and claim it as one's own leaks into other areas of one's life. That kind of behavior is not isolated to just open-source software. Perhaps the world-weary thief of the OP took his experience as a lesson and changed his ways, but he likely continued bumbling around, behaving dishonestly, losing the respect of his peers along the way. Perhaps even that of family and friends. Perhaps he miscounts the points in a boardgame, or cheats on his spouse, but it won't be limited to this.
Contrast that to the life and career arc of the OP himself.
So, no, the fools are those who steal, caught or not.
Once, someone tried to refute my argument using a post by a ghost user from github.
That ghost user was an older account of mine. Their interpretation of the post was wrong though.
I reacted to the situation by laughing slightly but didn't even bother explaining why. Because of the poor tone, I decided to let that person enjoy being wrong.
A very similar thing happened to me when I was pitching Gmail on anti-spam and realized they were running a forked version of Vipul’s Razor and claiming it was their own. They never contributed anything back... it’s unfortunately not uncommon.
As much as I think RMS himself is an utter embarrassment of a human being, stories like this are why I also believe the GPL is one of the most important contributions to computing, ever.
MIT/BSD-style licenses are practically begging large billion-dollar corporations to rip off your work wholesale and use it to generate profits while contributing only the occasional patch or two. I used to see this phenomenon on HN, where every time some distro had a new release the BSD folks would be in here reminding us that BSD runs Netflix and routers and Playstations and won't we please just donate? As if Sony and Netflix value these projects enough to use them for critical infrastructure but not enough to keep them financially solvent.
(The GPL is of course not a panacea; as TFA demonstrates, Sun would have got away with this, possibly forever, had the author not made his serendipitous discovery)
In this specific case, the files were under the CDDL, which is copyleft. The only thing that Sun did that was not allowed by the license was to remove the copyright notice. Had the scripts been under GPL, nothing would have been different. Anyone can sell GPL software, as long as they allow access to the source code and allow further motification and redistribution.
I think the GPL and FSF etc lost momentum with their GPLv3 push. I know that was the case for me. I really liked GPL (v2). Then GPLv3 came out and I was like huh? After that I became much more open to the lighter versions of things MIT / BSD.
GPLv3 is not really workable in terms of preserving developer freedom to do what they want with code (as long as they share their code back).
Obviously the powers that be disagreed and the GPL has been "upgraded" to v3, but was never impressed, and now much happier to contribute to MIT / BSD licensed products (which do allow you as the developer to do what you want with the code).
I think the FSF appeared at a critical juncture and that the tools they built, that were totally free, helped to fix the direction that software was going. With the rise of Linux as the working GNU kernel an entire generation (or two) became aware and appreciative of open source which then went on to become the backbone of most of the internet and now those companies are some of (if not the) largest contributors to open source.
With that said you’re right, with GPLv3 they moved the fight into new domains that aren’t as obvious or really “as big of a deal” to a majority of people. Also with the rise of things like JavaScript GNU, under Stallman, became the old man crying at the children. Stallman hated the rise of “non-trivial” JavaScript and refuses to work with proprietary code so GNU could never have developed e.g. React or Tensorflow. We now have a new generation of open source tooling that was developed more in spite of the FSF than by it.
The current environment will in turn spark a new generation of tooling down the line that is even more removed from the likes of Stallman; who once responded to a request I sent to work on a JS library advertised on the GNU website with the fact that I should call it F/LOSS instead of open source (and nothing else.)
So while I respect the FSF for the work they did in creating what we have today, the time where they were leading the fight to save software is long passed. They won in some ways and the world is better for it, but they lost in others and I’m not sure the world is that worse because of it. Having trade secrets isn’t completely a bad thing as it allows competition and different implementations, though that part is simply my opinion I suppose.
I remember RMS saying that the license was based on these freedoms he enumerated, but after releasing software under GPL v2 he found out he had to explicitly add the freedom to actually RUN the software.
Remember that the idea of the GPL is the authors preserve and propagate the rights for the USERS.
He also once said "proprietary software subjugates people", which I thought was sort of over-the-top to say, but over time I think software in this era of dark patterns and privacy has unfortunately become very obvious.
The irony is the GPLv3 does more to address some of the problems we have with open source than most other licenses, but was rejected by many due to a lack of concern for these issues.
"The auto-update clause is optional" -> Yes, I confirm this is true.
The sample license header here [1] says:
> This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
That is only a suggested header.
From Linux kernel here [2], you can see:
> under the terms of the GNU General Public License version 2 only
Please try to refrain from personal attacks on people. It’s very disrespectful, especially when someone has clearly dedicated so much of their time to public works.
Where does this expectation come from that a rich organization or person using open source software should be paying for it? That’s against the entire spirit of the open source license in the first place.
From the same place taxes come, or more generally, quite a lot of social obligations. It's the expectation that, if you're benefiting from commons, you should also contribute something back to the commons.
A company using open source software to make money is making money off commons. Makes sense they should feel obliged to contribute something back, and since they have the surplus of the best form of contribution - money - it's reasonable to expect them to donate some of it.
Open source isn’t a commons. That’s your major mistake. Using it does not deprive others or wear down something.
> it's reasonable to expect them to donate some of it.
No, that’s actually quite ridiculous. “Reasonable” implies some level of reasoning behind it. There is no “reasonable” proposal of how much money should be given when it’s against the very spirit of the license to expect payment based on usage.
What is the percentage amount of profit an individual or corporation should contribute? Give me a concrete calculation of software usage and how much should go back to the project. Is it measured in percentage of clock cycles spent executing that code across all of an entity’s compute?
Presumably the IRS should also give a cut of all tax revenue collected by the US to the open source projects it uses too, right? If not, your beef seems to be purely with private enterprise being successful more than any fairness based billing.
The only great thing I get from the story, at least a multi Billion dollar company thought your tools were so amazing it deserve paying their own VIP going on World Tour.
And no wonder why Sun failed to compete. A Cultural and management failure.
It's frustrating at times; at other times, it's a breath of fresh air to deal with engineers who just tell you what they're doing, without the overblown sales pitch.
Why do you find it cringe? In NZ I think it's even more true than Australia. It's a tiny, isolated country with a small population that isn't the hub of anything really.
Aussie and NZ compared to the US are completely different worlds in my eyes.
See, at least you realize you are a tiny country with a small population and the ego adapted to that.
I am French and we are a country with a great past - but not looking ahead. We still assume that we radiate across the world and that our voice counts.
It does not, and this has to be clear. There are countries that can influence events, but this is not ours.
A typical example is how angry we were after the recent pirate act of Lukashenko (Belarus forced a EU plane flying form one EU country to another to land when above their territory to arrest an opponent of theirs). Our president said that there would be consequences and there are no consequences.
This is not to spit on my contry but sometimes egos are much bigger than the reality. We are in good company there.
I am British but have lived in [redacted] for a while now and I feel the same as you about [redacted].
If Brexit showed us anything it is that the UK is no way near as important as they like to think they are. Watching the UK and [redacted] at loggerheads has been a mix of frustration and amusement for someone like myself clearly caught in the crossfire.
As an "outsider" in [redacted] my biggest complaint is the staunch opposition to change. Any change. As you say they simply cannot look ahead. It is as if they only know how to live in their past glories rather than working towards future ones.
The UK is sort of the opposite but in a terribly executed manner. They have dreams of the future but do everything possible to make those dreams harder to achieve due to arrogance they can "do it alone". Harping back to "the good old days of the Empire" and "Blitz spirit!" as if the Blitz was some wonderful time (wtf?).
The sad thing is the [redacted] could learn a lot from each other but seems both sides are too myopic to do so.
I am French and we are a country with a great past - but not looking ahead.
I feel the same as you about both the British and the French.
Add India to the list.
The right-wing in India are obsessed about our past, and worse, desperate to associate everything about it with "Hindu religion" or "Hindu culture" despite the huge influence of Buddhism, Islam and the imperialists (largely the British who finally got the upper hand on our sub-continent).
"We were the richest country in the world till we were looted by Muslims and Christians."
"We had brilliant Hindu brahmin scientists who excelled in mathematics, medicine and astronomy / astrology.
"Look at these huge ancient temples built with extraordinary artistry that have survived for centuries."
And so on, are proof of our "great past" that they believe automatically should earn us the respect of the world.
In their obsession with the past, they totally disregard the achievements of modern independent India just because our freedom movement were lead by people like Gandhi and Nehru who opposed their idea of a theocratic-fascist state, and chose to create a secular state that treated everyone as an equal and gave every citizen equal rights.
For a country that won its independence in a non-violent manner from the most powerful empire of the world, and a country that has lifted millions of its citizens from poverty and is today self sufficient in agriculture (one of the largest in the world), and one of the few countries with an active and self-sufficient space, nuclear and defence program we have really made a lot of strides.
But for the right, India is not "respected" by the world because anti-Hindus chose secularism and thus since "we Hindus" don't respect our "Hinduness", the world also ignores Hindu cultural achievement and denies us our true place.
Have you seen the coverage of the interview of Roman Protasevich? The visible injuries on his wrist and his recently adjusted attitude are alarming.
https://www.bbc.com/news/world-europe-57353413
TV/film - is there something new even remotely popular across the world that's French?
It was the case up to the 2000's, I think, but I can't even remember the name of a new French director since then. The last one I remember is Luc Besson. Kind of similar story for actors/actresses, are there some major French stars popular across the world?
I feel most of the influences are leftovers from a different era, folks like Depardieu.
TV/film - is there something new even remotely popular across the world that's French?
Indian here who doesn't know French - just finished watching all four seasons of Dix Pour Cent (Call My Agent) and really enjoyed it. I also watched Lupin (after learning that it stars Omar Sy who I loved in the movie The Intouchables).
I think there are lower expectations of Aus/NZ because they're relatively small and out-of-the-way countries (Maybe less so for Aussie), so it's surprising how often they seem to excel in things.
The obvious examples that come to mind would be sports. NZ excels in Sailing, Cricket and Rugby against much larger countries.
Though it could be that in general, Aus/NZ have the same skill distribution as other countries, but they just stand out more because there are lower expectations or highly skilled people are rarer overall due to lower populations.
> The obvious examples that come to mind would be sports. NZ excels in Sailing, Cricket and Rugby against much larger countries.
Those are niche sports, very popular in that part of the world. They're not major sports, though.
Cricket is sort of a major sport, but it's also very culturally concentrated. Outside of UK & some former UK colonies, almost nobody plays it/watches it.
Does New Zealand have any famous footballers, basketball players, athletes, etc.?
It's not a tiny country... it's about average for country size and when you consider that it's an archipelago, the area of the world it controls through territorials waters and such is basically continent sized. New Zealand is actually smack-bang (#75) in the middle of the list: https://en.wikipedia.org/wiki/List_of_countries_and_dependen...
2. Large countries: anything below 1 up to probably about 500k sqkm.
3. Middle of the pack countries: everything from 500k down to about 100-200k sqkm.
4. Small countries. Everything below 100k sqkm (200k sqkm if you want to stretch it out).
The categories are somewhat fluid since for example Indonesia is not continent sized in landmass, but it's an archipelago that does stretch over the area of an entire continent, when you consider it end-to-end and include its territorial waters. Plus having a very high population for your group also moves you up. Germany is average in size but in population it's a large country. Same for Japan.
Have you ever been to the UK? That sort of mindset is widespread. In fact, there is basically an obligation that the country must, at all times, fight wildly above its level.
Ok, maybe i'm too old or experienced with these type of things to really enjoy this article. The author might honestly be sincere as he wrote this but I felt it was a bit overblown and coming from negative feelings of being brushed off and rightfully so being upset that his code was stolen. It could just be there are some cultural misunderstandings as well.
He mentioned this "VIP" is a "Developer and dtrace expert". But reading that and the other details, I think this is probably not the reality and maybe was communicated incorrectly to him. I really doubt this guy was a "VIP" as he says.
My guess is this "VIP" was actually a pretty normal member on the dtrace project, could be a little senior and got the opportunity to go around and talk about it. I am sure they had a team somewhere who put together most of the software, maybe he was involved a little bit, but probably he was just as confused as everyone else about using that open source software - he probably knew enough to teach it, and how it worked, but so many people work on these type of projects, unless they sent the lead engineer he probably didn't know it deeply except enough to evangelize and teach how it works.
He mentions about being slighted by this guy a lot, saying things like "He wasn't impressed", "gave me a look like he didn't really believe me" etc. This might be true, but i suspect it's coming from his negative interpretation of the situation. This guy just traveled all the way around the world, was super exhausted, was possibly honestly confused what's going on - i certainly have been in that situation before.
The author also mentions he felt it odd that he (the author) was producing more dtrace tools than Sun was. This almost sounds a bit like indirect boasting. Large companies are slow. A dedicated passionate developer who is working alone or with a small team will always run laps around huge companies. This isn't odd at all. Companies often get distracted, can't focus on what's important, or decide not to do what is important for a product due to other business reasons.
In fact, as he found out, some engineer somewhere just ripped his stuff cause it was faster and easier for them to do it. Sun's team was not professional at all, even possibly breaking the law, which I think is the point of the article but the descriptions of the Dtrace guy who's job was to show Dtrace around the world lessened my enjoyment of the article.
I have said this elsewhere on this thread, but just to reemphasize: the person that Brendan met had absolutely nothing to do with DTrace -- to the point that when he told this story to me, I didn't even recognize the name. (And can't now remember it.) The DTrace team was very small (there were three of us), and the community of early DTrace users inside of Sun -- the earliest folks who could rightfully call themselves DTrace experts -- can be seen in the acknowledgements section of our 2004 USENIX paper.[0]
I am not saying it's factually incorrect. My point is that it includes a lot of Gregg's personal feelings (and maybe was informed incorrectly about the situation) and I'm just not sold that the guy who was assigned to show off Dtrace was the bad guy here
In the article I included my guess about real cause for this: Sun's assumption that any good work had to be from a Sun employee. I'd guess the sequence of events was:
- DTrace is the new hotness, we need it in our UI.
- Everyone's using Brendan's tools, let's add them (so far, so good).
- Oh, why do they say copyright Brendan? He made a mistake: Sun employees should be putting copyright Sun on them. (THIS is the mistake, as I wasn't a Sun employee).
- I'll just delete his name and stick copyright Sun on them all.
- Developer gets picked to go do a world tour (and may genuinely not know what happened).
As for how I was treated: I guessed why in the article as well, the low-key introduction as is the norm in Australia.
As for how you were treated - I don't think the low-key introduction can be fully blamed. The VIP should have known that smart people exist in various places around the world, and sooner or later one does bump into them. When you meet someone knowing absolutely nothing about them (and an introduction doesn't count), and then they start talking intelligently about a topic, then you have one data point (that they have talked intelligently), and you should draw an appropriate conclusion from that. It sounds like the VIP had serious preconception issues.
I worked for a government research lab and it was the same, only work coming from inside the lab was respected and contractor work was looked down upon.
That's really interesting, since you were so close to Sun they actually thought you were a Sun employee!
The example in the article is an exception but, companies (including Sun) are generally very careful about using open source, and using it without attribution would be the exception not the rule.
Large companies are slow, indeed. In the late 90s I wrote a few operating system plugins (nss_ldap, pam_ldap, GSS SASL plugin for the Netscape directory server) which were eventually obsoleted by native Solaris equivalents. The Sun versions were on the whole better engineered, if less flexible, because their OS team had a depth of experience that I didn't have at the time.
How is information in this blog related to Wikipedia article on DTrace [1] where one can read "DTrace is ... originally created by Sun Microsystems" and "Original author(s): Bryan Cantrill, Adam Leventhal, Mike Shapiro (Sun Microsystems)"?
Quite simply, it is not. The author is not claiming to have written DTrace, but rather, tools that made use of DTrace. From the introduction: "Sun Microsystems had just released DTrace" and "I was busy writing and publishing advanced performance tools using DTrace".
Sun developed DTrace the kernel building blocks, Brendan Gregg became an expert on it and made scripts that actually did useful stuff with them. The VIP was selling a GUI around Brendan Gregg’s scripts.
Brendan was the most amazing and prolific user of DTrace, from very early on. Brendan did not create DTrace, but in a sense he "made DTrace" what it is. And not just DTrace, but eBPF.
What do you mean? Brendan didn't create DTrace, he created DTraceToolkit and this VIP took his work and presented it as his own new DTrace-based product.
(Just to be clear, there was no license violation involved in this case; just a lack of awareness of the provenance of the open source software they were using.)