Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Reverse Engineering Protobuf Definitions from Compiled Binaries (arkadiyt.com)
156 points by arkadiyt on March 9, 2024 | hide | past | favorite | 12 comments


I've used protodump before, but have found that some applications don't embed the .proto data.

It may be bit-rotted by now, but a couple of years ago I hacked together a python script that would extract a proto definition from Objective C apps by scanning the assembly output and looking for the patterns of code generated by the protoc compiler. I put it on github in case it is useful to somebody.

https://gist.github.com/dunhamsteve/224e26a7f56689c33cea4f0f...

Compiled ObjectiveC is a bit simpler than compiled C/C++, you can read the method invocations out of it. I haven't looked into how hard the output of Swift is to read.

I've also analyzed protobuf data (Apple Notes) by writing code that decodes the data in a generic fashion and output a guess at the schema. I would run on about 100 samples, to help distinguish binary data from sub-objects, to detect optional fields, and to detect 'repeated' fields. Then you have to go through and figure out what all of the fields are.

I succeeded, but later learned that the notes web app embedded the plain text .proto file, which would have made things a lot easier.


Definitely a useful tool. Decoding protobuf (and message formats in general) can be such a pain and fun at the same time.

I’ve written ProtobufDecoder which takes a different approach: analyze the structure of the actual messages to help you figure out the protobuf structure of a message.

https://github.com/sandermvanvliet/ProtobufDecoder


Back at Google there was a really nice extension to protobufs where servers had a side channel that let you query all the services on the end point along with their full proto descriptors. That's probably internal only though (but I haven't played with grpc enough to know).


The reflection service is open-sourced (at least for some sdks):

* https://github.com/grpc/grpc-go/blob/master/Documentation/se...

* https://chromium.googlesource.com/external/github.com/grpc/g...

You must explicitly enable it in code and many people… well just dont


That was a stubby cli command right? It was a special reflection proto service that was built into the c++ proto server. I think open source grpc_cli has it https://grpc.github.io/grpc/core/md_doc_server_reflection_tu...


Yes, grpc_cli tool uses essentially the same mechanism except implemented as a grpc service rather than as a stubby service. The basic principle of both is implementing the C++ proto library's DescriptorDatabase interface with cached recursive queries of (usually) the server's compiled in FileDescriptorProtos.

See also https://github.com/grpc/grpc/blob/master/doc/server-reflecti...

The primary difference between what grpc does and what stubby does is that grpc uses a stream to ensure that the reflection requests all go to the same server to avoid incompatible version skew and duplicate proto transmissions. With that said, in practice version skew is rarely a problem for grpc_cli style "issue a single RPC" usecases: even if requests do go to two or more different versions of a binary that might have incompatible proto graphs, it is very common for the request and response and RPC to all be in the same proto file so you only need to make one RPC in the first place unless you're using an extension mechanism like proto2 extensions or google.protobuf.Any.



Yup, .NET's gRPC tooling is one of the best, and somehow has lower ceremony than Go.


For at least 4 years protobuf has had decent support for self-describing messages (very similar to avro) as well as reflection

https://github.com/protocolbuffers/protobuf/blob/main/src/go...

Xgooglers trying to make do on the cheap will just create a Union of all their messages and include the message def in a self-describing message pattern. Super-sensitive network I/O can elide the message def (empty buffer) and any for RecordIO clone well file compression takes care of the definition.

Definitely useful to be able to dig out old defs but protobuf maintainers have surprisingly added useful features so you don’t have to.

Bonus points tho for extracting the protobuf defs that e.g. Apple bakes into their binaries.


"At least 4 years"? I believe I originally wrote the header you linked to in 2007... maybe 2006. ;)


I have used this other tool to good effect to reverse engineer a file format based on Protobuf: https://github.com/mildsunrise/protobuf-inspector


Useful! Wish I had this when I started reverse engineering pbf map tiles a few months back.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: