Searching statically-linked vulnerable library functions in executable code

carom · on Dec 19, 2018

It is totally awesome that function similarity is seeing the light of day. Google bought Zynamics way back when and their work has evolved a lot since then.

This sort of work has big implications in signature generation for malware samples, for clustering families of samples as well as finding common functions to generate detections on. You couldn't necessarily throw this into a detection engine, because we don't have a fast (dedicated) function recovery tool for binaries, but you can absolutely use it to generate byte based detection from a seed of a few samples.

Rather than having hash based signatures you could generate signatures that cover many samples (and likely new ones) in bulk. Normally a good signature like that requires manual effort from an analyst, this is a step toward machines doing it. As well, a central authoritative name database that could say "this is Petya" could force some sane naming convention on the industry (every AV wouldn't just be like "it's Zbot lol").

This stuff can even aid manual reverse engineering. You could build a function naming database that uses this. Maybe a new engine for Talos FIRST. [1] Then if you opened up a file without debug symbols this could match it to known functions and really speed up reverse engineering efforts.

I look forward to reading it in more detail tomorrow. Thanks to Halvar for putting this out.

1. https://www.talosintelligence.com/first

tomjakubowski · on Dec 19, 2018

> An efficient implementation of a hash function (based on SimHashing) which calculates a 128-bit hash from disassembled functions - and which preserves similarity (e.g. “distance” of two functions can be calculated by simply calculating the hamming distance between two hashes - which translates to two XOR and two POPCNT instructions on x64).

Huh! This obviously desirable property of a good hash function for this application is one of the classic undesirable properties of a good hash function for cryptography. I don't think that I was previously familiar with this sort of hash function, very cool.

Are space-filling curves related to these hash functions?

tylersmith · on Dec 19, 2018

Not all hashes are intended for cryptographic applications or want the Avalanche effect. Some are simply succinct digests of data and some attempt to retain important information like the hash described here.

livueta · on Dec 19, 2018

Another example of this sort of hash function (where the hamming distance between the hashes of similar inputs will be small) is pHash[1]. I've been playing with it recently for deduping bit-different but perceptually-identical images, and it works pretty well and the low hamming distance property makes it dead simple to use.

---

[1] https://www.phash.org/

viraptor · on Dec 19, 2018

Another hash with a similar property uses close pronunciation as distance: https://en.m.wikipedia.org/wiki/Soundex

simcop2387 · on Dec 19, 2018

Soundex however is only designed for English names. Take a look at Metaphone or Double-Metaphone for a more generic version of the same idea.

https://en.wikipedia.org/wiki/Metaphone

snovv_crash · on Dec 19, 2018

Space filling curves would work in some situations, but wouldn't be as efficient as a purpose-designed hash. A space filling curve, for example, would break down under 50+ dimensions, which is why you'd be using one of these hashes in the first place rather than a tree.

If you want to know more, I'd suggest researching Locality Sensitive Hashing. Flann[1] has a decent implementation.

1. https://www.cs.ubc.ca/research/flann/

hawski · on Dec 19, 2018

Pardon if I'm not well informed, that's why I ask. Is the problem similar to string search with multiple patterns? Could Rabin-Karp algorithm be used for searching? It also uses hashing. Or could this algorithm be used instead of Rabin-Karp?

matthewaveryusa · on Dec 19, 2018

You can use bad hashing functions for a lot of applications. In the traditional interview question: find all anagrams in a list of words, you can sort the letters of a word as the hashing function, and use a traditional hashmap that does conflict-resolution with a linked list as a solution. Fun stuff

jimktrains2 · on Dec 19, 2018

Bad isn't the right word here. It's a bad cryptographic hash, sure, but a hash doesn't necessarily need to be cryptographic to be a hash. P-hash for finding similar images; sounded/metaphone for similar English pronunciation; crcs, cityhash, siphash, &c for fast checksums and hashtables; and location sensitive hashes like simhash for "close matches" for spam filtering and content aggregation all have a long history of being good and useful hashes without being cryptographically secure.

devereaux · on Dec 19, 2018

The zlib vulnerability was a huge wakeup call. Years ago I used to be in favor of statically linked, to limit dependency creep. Disk space is cheap so large binaries were never a concern.

My favorite approach was single repository compiling optimized binaries with the specific versions of the libs wanted (in case of weird regression) and pushing these static binaries to the rest of the network.

After the zlib debacle, no more: I only use that approach for very specific mission critical tools, where I do not trust ansible or even linux distributions.

The sqlite 0 day may have reignited the same fears in those too young to remember grepping various zlib signatures on binaries -not just yours (you central repository can easily push new version to your network) but the other tools you don't necessary control.

flukus · on Dec 19, 2018

Unfortunately outside of linux distros static linking seems to be becoming more and more then new norm, especially if you include what is effectively "soft" static linking of bundling all dependencies, which suffer from the same problems. Rust is only suitable for static linking, .net core statically links and most .net projects will have all sorts of out of date dependencies, npm bundles huge dependency chains that may not be upgraded. Even in the linux distro world there is movement towards tools like docker where dependencies will often not be patched.

Even the idea of stable versions of libraries with security patches seems to be a dieing one.

whyever · on Dec 19, 2018

> Rust is only suitable for static linking

This is incorrect, Rust supports dynamic linking.

https://doc.rust-lang.org/reference/linkage.html

flukus · on Dec 19, 2018

It supports it terribly. Since there's no stable ABI you have to use the same compiler version. That or be limited by the C ABI and give up most of the safety features.

elteto · on Dec 19, 2018

This isn’t unique to Rust, C++ has the same issues when it comes to using shared libraries. You either carefully control compiling and shared library versions or you drop down to C.

I don’t know any native language that can do this in a better way.

nineteen999 · on Dec 19, 2018

This is probably a good thing, given the recent trend in languages like Golang and Rust to statically link binaries by default.

devereaux · on Dec 19, 2018

I do not understand the hate towards statically linked binaries. It has its place for mission critical tools.

regecks · on Dec 19, 2018

I feel like it's pretty easy to understand the hate. For example, the recent x.509 verification DoS patched in <Go1.11.3 required me to recompile, tag and redeploy bunch of services. And then also push an RPM to thousands of customer servers, which costs me CDN money :(.

I would have preferred for myself, and my customers to just upgrade openssl.

I am still a big fan of static binaries though, they more than make up for their downsides.

nineteen999 · on Dec 19, 2018

Where did I say I hated it? It just has different security characteristics, and I think static linking proponents are sometimes a bit hand-wavy about the implications of those.

I suppose my gut feeling is that it's more prone to issues of human error - why should I have to recompile my application if the security hole was found in a library I've linked to? With dynamic linking, assuming you're publishing your software for an OS with a conscientious approach to security patches, ie. pretty much all the major ones, it's a solved problem. With static linking, if you forget to update your binary, well, it's your problem.

viraptor · on Dec 19, 2018

Very specific tools, sure. If you have the time, people, and skills to keep your own list of baked-in dependencies and monitor it for issues - great. But for a random person downloading something from the internet, it's a dangerous thing to default to.

devereaux · on Dec 19, 2018

I agree. It is a double edge sword. It requires skill and proper analysis.

I should have been more specific. I explained my reason below on https://news.ycombinator.com/item?id=18713442

ex_amazon_sde · on Dec 19, 2018

Even FAANG companies rely on distributions and other companies to spot vulnerabilities, rebuild libraries, test and validate them.

Even if the company rebuilds everything, there's a huge benefit in knowing that you are using a well tested release instead of a less popular one or an internal fork.

Also: security teams hate big security patches.

ex_amazon_sde · on Dec 19, 2018

Static linking is really bad for security and it's not used more often in mission critical applications.

burfog · on Dec 19, 2018

Dynamic linking is also really bad for security. There exist whole classes of bugs that require it.

Library substitution is a big problem. If an attacker can get a library into a place where the executable looks for libraries, the attacker gains control.

Merely having the capability to load a dynamic library is an issue. Generally, this means it is possible to load code into the process.

Library ABI mismatches can be security bugs. There can exist two pieces of software that can be installed and used separately without trouble, but which have security bugs when both installed. This happens for example when the second piece of software to be installed brings along an updated library that isn't fully compatible (even bug-for-bug) with the one that the other software came with.

senderista · on Dec 19, 2018

Nit: simhash approximates cosine distance, not Jaccard distance (that would be minhash).

motohagiography · on Dec 19, 2018

The treatment of their ROC curve is precisely the problem I encountered in product for a closely related tech.

This is a great way to do vulnerability research, surprisingly good malware detection, but a less good way to provide an assurance service.

Great post, and a good step towards the general problem of code reputation.

a-dub · on Dec 19, 2018

I always wondered if AV databases were just minhashing/lsh on binaries... guess not!

loeg · on Dec 19, 2018

Hah, no. AV databases are just real dumb tries of whole-file MD5s, or a similar level of sophistication (i.e., not very). Usually combined with an unnecessarily privileged and unsandboxed parser for arbitrary weird file formats.

a-dub · on Dec 19, 2018

Looking for vulny binary code in terms of jaccard similarity is interesting.

landr0id · on Dec 19, 2018

ssdeep is also used https://www.forensicswiki.org/wiki/Ssdeep

gammateam · on Dec 19, 2018

“0ld days” lol I like it kinda

Marriott breach: someone executed an 0ld day.

RobLach · on Dec 19, 2018

Very cool.

Jacoe · on Dec 19, 2018

What am I reading haha.

puzzle · on Dec 19, 2018

The article makes more sense if you consider that Google is notorious for statically linking (almost) all its production binaries, something that the Go toolchain supported early on for related reasons.

CaliforniaKarl · on Dec 19, 2018

I’m guessing it’s also related to the recent SQLite zero-day. SQLite is public-domain, and so it’s not unusual for projects to simply include the code directly in their source distributions, which will either build it in or will provide a `./configure` option to use an already-built form (this is used by distro packagers). When the “build it in” option is used, then it’s statically linked.

tptacek · on Dec 19, 2018

I think Thomas Dullien has been working on this stuff for a very long time; his company, Zynamics, which Google acquired ages ago, was the author of BinDiff.

devereaux · on Dec 19, 2018

There I many things I disagree about with Google, but statically linking binaries is a good practice if you know what you are doing and have a central repository where you handle your own versions of everything.

You can not always rely on linux distribution to not mess up with a lib, especially if it "obscure" with few users.