My most horrible abuse of `make` was to write a batch job runner. Most of the ta...

scott_s · on Aug 12, 2022

Congrats! You invented Apache Airflow: https://airflow.apache.org

MonkeyMalarky · on Aug 12, 2022

With the added bonus of not having to learn or maintain Apache Airflow!

Mister_Snuggles · on Aug 12, 2022

Interesting:

> Airflow was started in October 2014 by Maxime Beauchemin at Airbnb. It was open source from the very first commit and officially brought under the Airbnb GitHub and announced in June 2015.

I believe I starting building my tool somewhere around 2010, possibly 2011. The core mechanism has been completely unchanged in that time. If Airflow was a thing at the time, I'd have hopefully looked into it. I looked at a handful of similar products and didn't find anything that was a good fit.

Based on a really quick skim of the Airflow docs it seems like it checks all of the boxes. Off the top of my head:

* LocalExecutor (with some degree of parallelism, assuming the dependencies are all declared properly) seems to do exactly what I want.

* I could write an Operator to handle the interaction with the system where the processes actually run. The existing Python script that does this interaction can probably get me 90% of the way there. Due to the nature of what I'm running, any job scheduler will have to tell the target system to do a thing then poll it to wait for the thing to be done. To do this without any custom code, I could just use BashOperator to call my existing script.

* It's written in Python, so the barrier to entry (for me) is fairly low.

* Converting the existing Makefile to an Airflow DAG is likely something that can be done automatically. We deliberately keep the Makefile very consistent, so a conversion program can take advantage of that.

I think my dream of replacing this might have new life!

zomglings · on Aug 14, 2022

But... why would you want to spend energy replacing something that has been running stable for over a decade?

Mister_Snuggles · on Aug 14, 2022

There are a number of deficiencies with the current system that aren't showstoppers, but are pain points nonetheless. Off the top of my head:

* There's no reasonable way to do cross-batch dependencies (e.g., if process X in batch A fails, don't run process Y in batch B). I've got a few ideas on how I could add this in, but nothing has been implemented yet.

* There's no easy way to visualize what's going on. Airflow has a Gantt view that looks very useful for this purpose, our business users would absolutely LOVE the task duration graph, and the visualization of the DAG looks really helpful too.

* Continuing a failed batch is a very manual process.

None of these are showstoppers because, as you said, this has been running fine for over a decade. These are mostly quality-of-life improvements.

zomglings · on Aug 15, 2022

Ah, I understand. That makes sense. If you have business users, then it makes sense to go with something like Airflow because they do make it easier for less technical users to inspect jobs, kick them off, visualize them, etc. The UI makes all the difference for those use cases.

rhacker · on Aug 12, 2022

But airflow is an abomination... that I am forced to use at my current job.

ncmncm · on Aug 13, 2022

It says "Apache" right there on the tin.

leaflets2 · on Aug 13, 2022

What don't you like about it?

enriquto · on Aug 12, 2022

> My most horrible abuse of `make` was to (...)

Heh. The text that follows this sentence is likely the most beautiful and elegant use of a Makefile ever.

I love the humble bragging of this site.

josteink · on Aug 12, 2022

> All of this was inspired by someone who replaced the rc scripts in their init system with a Makefile in order to allow processes to start in parallel while keeping the dependencies in the right order.

Sometimes the most interesting thing is not the story itself, but the story behind the story.

This has my interest peaked. Is there anywhere else I can read about this?

mturmon · on Aug 13, 2022

This is just the “job server” which you can read about in the Gnu make documentation, which is excellent.

https://www.gnu.org/software/make/manual/html_node/Job-Slots...

Basically you can specify a “-j 24” (e.g.) option to make, and it will run as many as 24 build steps in parallel. If your Makefile is correct, that’s all you need.

Because make knows the dependency graph, it can correctly handle cases where some build steps have to be done serially, while others can be fully parallelized. E.g.,

  a: b;
  b: c;

In which builds of b and c are serial, versus

  x: y;
  x: z;

for which builds of y and z are parallel.

It’s quite a powerful capability and it feels great to see 24 build-steps start in parallel from one simple command.

Mister_Snuggles · on Aug 13, 2022

This was another thing that attracted me to `make` for this task. I figured that, as long as the dependencies were all declared properly, I should be able to run multiple jobs in parallel.

I didn't pursue this very far as there were other problems with doing this, but I'd like to pursue it again. The problems are all with the target system, `make` does the parallel execution work perfectly.

pjot · on Aug 12, 2022

Just a friendly tip that it’s _piqued_ :)

shawabawa3 · on Aug 12, 2022

You never know, maybe it is the peak of their interest and it's all downhill from here

ncmncm · on Aug 13, 2022

Which, by the way, means "sharpened". Probably "peak" has the same root.

adhesive_wombat · on Aug 13, 2022

pique comes from (Vulgar) Latin piccare, which means "prick with a sword". The route to English is via French.

peak comes from Old English pīc, meaning just "peak" (e.g. of a mountain).

The two are completely different words that just sound similar.

pique may possibly go back to Proto-Germanic, and peak does, but the two go back to two separate words (*pīkaz, *pikkāre) though both are then related to sharp things and possibly onomatopoeic.

Mister_Snuggles · on Aug 13, 2022

It took a while, but I finally found it!

https://web.archive.org/web/20110606144530/http://www.ibm.co...

There’s a reference to some sample Makefiles to start and stop some Linux services in parallel. It’s obviously not complete, but this (or something similar) was what inspired my system.

jbboehr · on Aug 12, 2022

> All of this was inspired by someone who replaced the rc scripts in their init system with a Makefile in order to allow processes to start in parallel while keeping the dependencies in the right order.

Any sufficiently complicated init system contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of systemd.

felixgallo · on Aug 13, 2022

including, most notably, systemd.

cerved · on Aug 12, 2022

That's beautiful, not awful

fmajid · on Aug 13, 2022

I did the same. Parallelization thanks to GNU make's `-j`, recoverability (only rerun the failed steps, not from scratch). If you use the remake fork of GNU make, you also get debugging and profiling for free.

ithkuil · on Aug 12, 2022

My most horrible abuse of make was a distributed CI where I put a wrapper in the MAKE env var so that recursive make executions would invoke my wrapper which would enqueue jobs for remote workers to pick up