My most horrible abuse of `make` was to write a batch job runner.
Most of the targets in the Makefile had a command to kick off the job and wait for it to finish (this was accomplished with a Python script since kicking off a job involved telling another application to run the job) followed by a `touch $@` so that make would know which jobs it had successfully run. If a process had dependencies these were declared as you'd expect.
The other targets in the Makefile lashed those together into groups of processes, all the way up to individual days and times. So "monday-9pm" might run "daily-batch", "daily-batch" would have "daily-batch-part-1" (etc), and each "daily-batch-part-..." would list individual jobs.
It was awful. It still is awful because it works so well that there's been no need to replace it. I keep having dreams of replacing it, but like they say there's nothing more permanent than a temporary solution.
All of this was inspired by someone who replaced the rc scripts in their init system with a Makefile in order to allow processes to start in parallel while keeping the dependencies in the right order.
> Airflow was started in October 2014 by Maxime Beauchemin at Airbnb. It was open source from the very first commit and officially brought under the Airbnb GitHub and announced in June 2015.
I believe I starting building my tool somewhere around 2010, possibly 2011. The core mechanism has been completely unchanged in that time. If Airflow was a thing at the time, I'd have hopefully looked into it. I looked at a handful of similar products and didn't find anything that was a good fit.
Based on a really quick skim of the Airflow docs it seems like it checks all of the boxes. Off the top of my head:
* LocalExecutor (with some degree of parallelism, assuming the dependencies are all declared properly) seems to do exactly what I want.
* I could write an Operator to handle the interaction with the system where the processes actually run. The existing Python script that does this interaction can probably get me 90% of the way there. Due to the nature of what I'm running, any job scheduler will have to tell the target system to do a thing then poll it to wait for the thing to be done. To do this without any custom code, I could just use BashOperator to call my existing script.
* It's written in Python, so the barrier to entry (for me) is fairly low.
* Converting the existing Makefile to an Airflow DAG is likely something that can be done automatically. We deliberately keep the Makefile very consistent, so a conversion program can take advantage of that.
I think my dream of replacing this might have new life!
There are a number of deficiencies with the current system that aren't showstoppers, but are pain points nonetheless. Off the top of my head:
* There's no reasonable way to do cross-batch dependencies (e.g., if process X in batch A fails, don't run process Y in batch B). I've got a few ideas on how I could add this in, but nothing has been implemented yet.
* There's no easy way to visualize what's going on. Airflow has a Gantt view that looks very useful for this purpose, our business users would absolutely LOVE the task duration graph, and the visualization of the DAG looks really helpful too.
* Continuing a failed batch is a very manual process.
None of these are showstoppers because, as you said, this has been running fine for over a decade. These are mostly quality-of-life improvements.
Ah, I understand. That makes sense. If you have business users, then it makes sense to go with something like Airflow because they do make it easier for less technical users to inspect jobs, kick them off, visualize them, etc. The UI makes all the difference for those use cases.
> All of this was inspired by someone who replaced the rc scripts in their init system with a Makefile in order to allow processes to start in parallel while keeping the dependencies in the right order.
Sometimes the most interesting thing is not the story itself, but the story behind the story.
This has my interest peaked. Is there anywhere else I can read about this?
Basically you can specify a “-j 24” (e.g.) option to make, and it will run as many as 24 build steps in parallel. If your Makefile is correct, that’s all you need.
Because make knows the dependency graph, it can correctly handle cases where some build steps have to be done serially, while others can be fully parallelized. E.g.,
a: b;
b: c;
In which builds of b and c are serial, versus
x: y;
x: z;
for which builds of y and z are parallel.
It’s quite a powerful capability and it feels great to see 24 build-steps start in parallel from one simple command.
This was another thing that attracted me to `make` for this task. I figured that, as long as the dependencies were all declared properly, I should be able to run multiple jobs in parallel.
I didn't pursue this very far as there were other problems with doing this, but I'd like to pursue it again. The problems are all with the target system, `make` does the parallel execution work perfectly.
pique comes from (Vulgar) Latin piccare, which means "prick with a sword". The route to English is via French.
peak comes from Old English pīc, meaning just "peak" (e.g. of a mountain).
The two are completely different words that just sound similar.
pique may possibly go back to Proto-Germanic, and peak does, but the two go back to two separate words (*pīkaz, *pikkāre) though both are then related to sharp things and possibly onomatopoeic.
There’s a reference to some sample Makefiles to start and stop some Linux services in parallel. It’s obviously not complete, but this (or something similar) was what inspired my system.
> All of this was inspired by someone who replaced the rc scripts in their init system with a Makefile in order to allow processes to start in parallel while keeping the dependencies in the right order.
Any sufficiently complicated init system contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of systemd.
I did the same. Parallelization thanks to GNU make's `-j`, recoverability (only rerun the failed steps, not from scratch). If you use the remake fork of GNU make, you also get debugging and profiling for free.
My most horrible abuse of make was a distributed CI where I put a wrapper in the MAKE env var so that recursive make executions would invoke my wrapper which would enqueue jobs for remote workers to pick up
Most of the targets in the Makefile had a command to kick off the job and wait for it to finish (this was accomplished with a Python script since kicking off a job involved telling another application to run the job) followed by a `touch $@` so that make would know which jobs it had successfully run. If a process had dependencies these were declared as you'd expect.
The other targets in the Makefile lashed those together into groups of processes, all the way up to individual days and times. So "monday-9pm" might run "daily-batch", "daily-batch" would have "daily-batch-part-1" (etc), and each "daily-batch-part-..." would list individual jobs.
It was awful. It still is awful because it works so well that there's been no need to replace it. I keep having dreams of replacing it, but like they say there's nothing more permanent than a temporary solution.
All of this was inspired by someone who replaced the rc scripts in their init system with a Makefile in order to allow processes to start in parallel while keeping the dependencies in the right order.