Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Self-Replace: A Utility For Self Replacing Executables (github.com/mitsuhiko)
58 points by asicsp on May 15, 2023 | hide | past | favorite | 26 comments


Two things:

1. Replacing the current executable is generally okay on Unix and Unix-likes, but it's not particularly well-defined. Linux handles it gracefully, others may ETXTBSY or similar depending on how the replacement is done.

2. The hack being used on Windows is awful: it involves patching the underlying C runtime, can fail in uncontrolled ways (including in an `atexit` handler), and is fundamentally racy. The state it leaves a failed operation in isn't self-healing, and surfaces confusing filesystem state to the user on failure.

If you want to replace a running executable on Windows, either ask the user (and have them affirmatively kill the running program) or do the "cooperative rename" trick[1]. It doesn't require any unsafety, much less patching the runtime underneath you.

[1]: https://social.msdn.microsoft.com/Forums/vstudio/en-US/07fb6...


On #1, totally agree, and security systems can add an extra winkle...

With code-signed binaries on a Mac, macOS Gatekeeper caches signatures by inode. If the binary is replaced/re-written in a way which retains the old inode, and you try to execute it, the OS will SIGKILL the process immediately. Not fun to debug the first time you encounter it!


> it involves patching the underlying C runtime

Can you please clarify how this "patches the underlying C runtime"? What this crate is doing is not particularly out of the ordinary. You will find similar code in quite a few self updating windows executables.

> or do the "cooperative rename" trick[1]

I'm not entirely sure how this is much different? This is pretty much what this is doing. It spawns a copy of itself (with the exe being marked for self deletion) and then deletes the other executable.


> What this crate is doing is not particularly out of the ordinary.

"Out of the ordinary" and "good" are different qualifiers: patching a runtime initialization table might be ordinary (and even acceptable) in a leaf executable where the dependencies are carefully checked for conflicting initialization behavior, but it's pretty dangerous in a library (or crate) that might get mixed into all kinds of executables with different initialization expectations.

> This is pretty much what this is doing.

It's close, but you're relying on much racier and flimsier primitives. The trick that I'm most familiar with in Win32 applications is to create a named pipe or other IPC mechanism to communicate over, since the failure modes there are much better defined (and not subject to race conditions).


> patching a runtime initialization table might be ordinary (and even acceptable) in a leaf executable where the dependencies are carefully checked for conflicting initialization behavior, but it's pretty dangerous in a library (or crate) that might get mixed into all kinds of executables with different initialization expectations.

I'm not sure why that would be the case. Any C++ static constructor ends up in that table. Any Rust crate using #[ctor] ends up in there.

> It's close, but you're relying on much racier and flimsier primitives. The trick that I'm most familiar with in Win32 applications is to create a named pipe or other IPC mechanism to communicate over, since the failure modes there are much better defined (and not subject to race conditions).

As opposed to WaitSingleObject? Or what exactly are you objecting to here exactly? I'm also not sure which race condition you are referring to in particular. The main race this has (or any other library that does something similar) is that it needs a utility process to spawn to inherit the handle of the copy gc process. No named pipe will resolve this.


Thanks for the breakdown of why this is a bad idea. I was considering using this for a Windows Rust project I maintain, but now I know better. Ideally tradeoffs like these should be mentioned clearly in the readme.


I don't understand why people think this is a good idea to do at all. Yes it makes updates easier. But each software doing that has to run with the privileges to update itself. Which is in my book a big security downside. Separating rights to install, update, and remove software from the right to run and use it should be complete separate.

I admit this is not very nice for private Windows users. But this is one of the strength of Linux distributions.


Because Windows don't have decent package management system.

Linux way of "add a repo, upgrade every package from either system or app-own repo" is strictly superior but windows just doesn't have it (and thanks to that there is multi-billion dollar cottage industry to fill those holes)


They're starting to get there with winget. It's not as comfortable as an actual linux package manager, but it seems to be on the right track.


> But each software doing that has to run with the privileges to update itself. Which is in my book a big security downside.

But it does not have to run with those privileges under normal circumstances. It's easy enough to elevate privileges just for the update step.


A little known fact about executables on Windows is that while it's not possible to remove a running executable, it's possible to rename it.

I use this in Clyde [1]: on Windows, when clyde needs to upgrade itself (which means uninstalling vN and installing vN+1) it renames itself from `clyde.exe` to `_clyde.exe`. I leave the old `_clyde.exe` around, but a nicer implementation could remove on the next start.

[1]: https://github.com/agateau/clyde


Crazy idea, just make your EXE into a stub that loads your actual program, which is implemented in a DLL. When it's time to update, drop a newer DLL in there that the EXE will load instead. Hopefully you won't have to update the stub very often.


Or "launcher replaces the app, app updates the launcher".


This is interesting.

Google Chrome can patch binaries and the Linux kernel can be patched while running.

I would like this capability for general purpose software, hot code reloading. Can do seamless non-breaking changes to a live server without a full restart.

Would be useful for Function as a Service and WASM runtimes if you could live patch code running.

I generally rely on Kubernetes or CI to do a rolling deploy of software to switch to the new version, but if we could patch a server while running that would be interesting. Would have interesting interactions with socket connections using the old code and connections using the new code and all the complicated states you can get into, if your data structures are different.

How do you move data if you added a field to your data structure? Presumably if you added a field, there isn't enough memory allocates in the existing data structures... Your malloc is for the wrong size.


> I would like this capability for general purpose software, hot code reloading. Can do seamless non-breaking changes to a live server without a full restart.

This is very old-school style, that fell out of favor for... whatever reason this industry likes to make two steps forward, and then 1.9 steps backwards. It survives in the legacy of Smalltalk and Lisp. To a lesser degree, it's featured in Erlang, and I hear C# these days.

> Would have interesting interactions with socket connections using the old code and connections using the new code and all the complicated states you can get into, if your data structures are different.

If you make a mental shift - treat the OS as runtime, and processes as functions (and shell as a REPL, and environment variables as dynamic binding...) - then the ideas from Smalltalk and Lisp image-based programming will apply directly. At Kubernetes level, I suppose Erlang's take on connecting BEAM runtimes applies as well.

> How do you move data if you added a field to your data structure? Presumably if you added a field, there isn't enough memory allocates in the existing data structures... Your malloc is for the wrong size.

If this is C (or Rust, I suppose?), then you can't. You have to architect for that from the start. With garbage collected runtimes... I'm not sure how hard is it with static typing, but with dynamically typed languages it's not bad.

If you want to see a system that embraces runtime modification of classes, check Lisp (e.g. Common Lisp) or Smalltalk. For example, in case of Common Lisp, if you want to make a substantial runtime modification to some class definition, you have to supply a "old" -> "new" migration function, which will be applied to all existing instances of the old class definition. Note: this is not a hack, it's a legitimate feature of the language/runtime, and - like everything else - can be operated fully automatically.


The thing is, you have to test and debug that. Even if I'm working with Common Lisp, I don't necessarily want to go there.

Say there are five versions of your image out there, versioned 1 to 5. Users might be doing any of these upgrades:

        to
  from   1 2 3 4 5
     1     X X X X
     2       X X X
     3         X X
     4           X
Assuming nobody does downgrades, we have ten combinations to test.

Anything you do to reduce the testing comes at the cost of inconvenience to the user. You can support only upgrades by one; so someone going from 1 to 5 has to upgrade to 2, 3 and 4. You can cut off upgrade support for old versions; if a 1 user wants 6, they will have to suffer a stop and restart. That all just seems half-assed; you're not doing a perfect job of "no restarts ever", so why bother.

No restart upgrades handle the volatile objects in RAM. Regardless of that, applications have to handle the persistent data formats: like objects written to storage by version 1, 2, 3 or 4, which now have to be processed by 5.

People who understand and deal with with upgrades to persistent data, and protocols, and whatnot, are not going to react kindly if you tell them, "hey, you can add to the complexity and amount of work by hot-loading code, and dealing with upgrading across multiple versions of in-RAM volatile objects, too!".

Threads throw a monkey wrench into the blender also. If the application is multi-threaded, you will have some existing threads executing old versions of functions, after they have been redefined, with threads also calling new versions of functions. An old version of a method might not play nice with the new version of an object and vice versa. You may need a partial stop and restart: the application image as such may be running, but you put some threads into a quiescent state during the upgrade. Someone has to test all that.


> This is very old-school style, that fell out of favor for... whatever reason this industry likes to make two steps forward, and then 1.9 steps backwards. It survives in the legacy of Smalltalk and Lisp. To a lesser degree, it's featured in Erlang, and I hear C# these days.

The reason is very, very simple: It's hard to do right at any level of complexity.

Imagine simple cache server: what about the cache? If binary format changed, you either have to have some migration code or you need to throw it away. Once you threw the cache away you might as well just restart the server in the first place, if you want availability you need 2+ servers anyway so it buys you few seconds of boot time at most.

If all you need to replace is a bunch of function calls sure, that's doable but what if new function takes some extra arguments (or just struct that it gets passed changes) ?

If you have well-defined lines (like plugin APIs to use, or ability to load the "logic" part as separate thing) it can work but otherwise it is a LOT of work and bug potential to gain not that much. Better to focus on making seamless update (in case of apps used directly by users), or in "server" apps realize you need redundancy, code reload or not, so just focus on good deployment process.


When I had written my shell to the point where I could execute simple commands, this was the first thing that came to my mind using it.

Execute the shell -> Modify the source code of the shell with vim -> Remake the shell -> Execute the newly built shell with the old shell

Just for fun :)


the windows implementation seems awfully complex. what's wrong with something like _popen("find \"\" & del " + argv[0])? the problem to solve is "we can't delete ourselves, so we need something else to do it for us", but there are a myriad options to choose from, you don't have to write it in rust. hell, you've already got the dubious sleep(100) in there, you can even write _popen("ping 127.0.0.1 & del " + argv[0]) as suggested by stack overflow?


I tried plenty of approaches on windows, this was the one that seems the most reliable and least problematic. It's also the approach other systems already use.


Not a Windows programmer, but doesn't it prevent a file from being deleted while it is being used/executed? (At least not without black magic?)


Stupid trick would be to copy to temporary file and run that file, then update the original now-not-used one. Tho that might be weird enough to trigger AV


How does it relate to KSplice [1], from a technical point of view?

The first time I heard about it was when Jessica McKellar [2] explained what it is and what it does, in a video right before KSplice, Inc. was acquired by Oracle.

[1] https://en.wikipedia.org/wiki/Ksplice

[2] https://en.wikipedia.org/wiki/Jessica_McKellar


It has an interface and two implementations, UNIX and Windows.


Is there anything like this for C/C++ applications?


Self-modifying code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: