It's almost never worth using binary packers - every saving in on-disk size is lost on extra memory usage and extra cpu usage on starting up; it also interferes with memory management features (a binary typically takes up no writeable swappable memory - a packed one does).
You can make notable compression gains by using a simple 7z self-extracting PE file on your already-static golang binary. In some cases this can yield another 10-50% savings in binary size. For environments where fresh startup time are important, then I would skip that as it does add some unnecessary overhead before your app comes to life. If you need something smaller than that, you're looking at writing c and building for your target arch.