You can make notable compression gains by using a simple 7z self-extracting PE file on your already-static golang binary. In some cases this can yield another 10-50% savings in binary size. For environments where fresh startup time are important, then I would skip that as it does add some unnecessary overhead before your app comes to life. If you need something smaller than that, you're looking at writing c and building for your target arch.