Awk in 20 Minutes (2015)

danwills · 2024-12-05T03:30:04 1733369404

In my job in VFX we are dealing with sequences of files constantly, and sometimes being able to do things to parts of the frame range is extremely helpful, and awk is great for that!

Let's say U wanted to delete exr files named like filename.1055.exr where the frame number is not between 1001 and 1100, that could look like this:

ls -1 *.exr | awk -F "." '{if($2 < 1001 || $2 > 1100) print "rm -v "$0}'

This actually just prints the commands out which is nice to check that it looks reasonable, then one can redirect that back to the shell by adding "|sh" to the end to actually run them!

The -F "." bit sets the field separator to dot, so that $2 is the frame-number part of the filename.

This is just the tip of the iceberg! Once you're over the initial learning-hump, there's so much U can do with awk! Great stuff!

jazzyjackson · 2024-12-05T04:22:42 1733372562

Thanks for this, I never considered the workflow of generating statements and then redirecting them to sh

danwills · 2024-12-05T13:29:14 1733405354

"|sh" is such a great thing I completely agree! I'm so glad to have opened your eyes to it! I use it all the time! (And it's not even specific to awk at all really! (xargs/sed/grep/tr/etc are all so helpful here! Heck.. make a python script that prints out commands and whack "|sh" on the end! that can work too! (very slow in comparison to awk tho!))

At work there are complex shell-environment cases where this type of redirection doesn't work because it needs to run in the actual-current-shell, not just a new shell spawned for your user which is what "|sh" does.

To work around that you can instead redirect the output to a file and source it into the current shell instead! It is a totally easy thing to do as long as u have somewhere you can write a temporary script file to! I use a 'tmp' dir in my home directory to do this (via an alias of course), like:

command > ~/tmp/bonza.sh; source ~/tmp/bonza.sh

Hopefully you won't have to worry about that in the short-term but I think it's good to keep in mind as an option when things don't work with "|sh" in more complex shell environments... generally I think it's really quite easy to generate a runnable script-file using redirection-to-awk, when u know how to use it! Very well worth the effort to learn!!

mattpallissard · 2024-12-05T06:03:04 1733378584

While serving time in HPC I saw a set of ETL jobs that were written using awk and xargs that beat the pants off a lot of in house code on both a performance and a simplicity standpoint. You lose a bit of "safety" when you take that approach (yes, I know you can do it in awk), but if you tightly control the inputs you don't need as many guard rails.

It's both unsurprising and amazing to me that a well crafted shell script can be highly performant.

danwills · 2024-12-05T13:35:11 1733405711

What's an "ETL" job?

Totally get your point about being able to go faster with tight specification of the expected inputs!

And yeah awk is renowned for being able to crunch through things (especially huge volumes of text) at quite an astounding rate for a scripting language!

fuzztester · 2024-12-05T18:38:43 1733423923

https://en.m.wikipedia.org/wiki/Extract,_transform,_load

asicsp · 2024-12-05T03:11:18 1733368278

Previous discussion: https://news.ycombinator.com/item?id=23048054 (762 points | May 2, 2020 | 126 comments)

See also:

* https://backreference.org/2010/02/10/idiomatic-awk/

* https://earthly.dev/blog/awk-examples/

* https://learnbyexample.github.io/learn_gnuawk/ (my ebook)

lo_stronzo · 2024-12-05T17:16:38 1733418998

I work in HPC and mostly on the shell/console. Without a doubt, the combination of Bash (or another shell) and AWK is truly amazing. Being able to quickly generate statistics, filter out unnecessary information, generate pipelines, etc., is unmatched, and 99% of the time only requires a pipe redirect. Not to sound trendy, but using AWK really is the proverbial "if you know, you know".

One of my favorite use cases is based upon grep with extended regular expressions; there's always a need to search for strings while needing to exclude others, think of a basic example as "grep -E 'this|that' file |grep -Ev 'not(this|that)'". With AWK it's simple, "awk $0 ~ /(this|that)/ && $0 !~ /not(this|that)/' file". Or, if you're monitoring server load averages via a tool like sar, you can pick and choose which loads you want to monitor based upon a threshold, "uptime |awk '$10 ~ /[0-9]{3,}\.[0-9]{1,}/ || $12 ~ /[0-9]{3,}\.[0-9]{1,}/'". This will print matches if the 1 minute or 15 minute load averages are over 100.

Just because it's an "old" language doesn't mean it's obsolete or useless!

akagr · 2024-12-04T19:04:50 1733339090

This is a great resource! I’ve used awk for only the most basic field wrangling. This makes me more confident in doing fancier things that I’d otherwise use grep for.

danwills · 2024-12-05T03:34:10 1733369650

I learned from the article that one can assign to the dollar vars like:

$0="bye bye input line"

It seems like this could be a bit of a footgun!? I'm sure it must be helpful in some cases though! Has anyone used it? Could you share the use-case to help me understand this weird superpower?

disgruntledphd2 · 2024-12-05T18:54:51 1733424891

This is a terrible, no good, very bad idea, which is probably really useful. It's rather like using awk for stuff that covers more than 10+ lines unless you really, really have to.

danwills · 2024-12-07T11:16:47 1733570207

Thanks, hehe yeah it does seem a bit like writing over the inputs isn't really a great plan in general, except in rare cases where it's a total escape-hatch! (probably amazingly useful then!)

I think I mostly agree about the 10+ lines thing although I think I've written things nearly 20 lines (that aren't that crazy!) However there's that undergraduate AI course from 1997 that used it! They would've had loads of lines there I reckon!

The original url has rotted now, but here it is on archive.org:

https://web.archive.org/web/20140731042732/http://www.wra1th...

userb · 2024-12-05T02:01:28 1733364088

Is there any tutorial like this for 'sed'?

asicsp · 2024-12-05T03:07:56 1733368076

Check out https://unix.stackexchange.com/questions/112023/how-can-i-re...

For more detailed resources, see https://www.grymoire.com/Unix/Sed.html and my ebook https://learnbyexample.github.io/learn_gnused/

fuzztester · 2024-12-05T18:40:51 1733424051

the original awk book is available somewhere on the net by googling. it is very good.

jiehong · 2024-12-05T21:44:24 1733435064

Weirdly, I recently switched to Perl as a replacement for awk and sed, enjoying actual Perl regexps and very similar syntax for 1 liners.

I’ve had to deal with sed not accepting the same flags depending on systems and weird awk version too. At least it’s more consistent with Perl (which also happens to be installed by default).

Am I the only one?

marcusb · 2024-12-05T22:43:33 1733438613

The Awk Programming Language[0], while obviously much longer than the linked guide, is such a joy to read. One of the more interesting examples is the creation of a simple virtual machine in AWK.

0 - https://awk.dev/

somat · 2024-12-04T18:25:11 1733336711

I really like awk, I mean, it is not a perfect language, but it fits this really nice spot for a small language with enough features for useful everyday one off tasks. If I had to teach my mother one language awk is a strong contender.

I mean Look at the manual http://man.openbsd.org/awk it is like 5 pages. Admittedly, I grew up on obsd awk so that is my point of reference, gnu awk is considerably more complex. But I like how I can easily keep the whole language in my head.

serbuvlad · 2024-12-04T22:09:31 1733350171

> But I like how I can easily keep the whole language in my head.

I genuinely do not think this can be emphasized enough.

I love tools which just manipulate some problem space in a simple, effective way, with a small number of primitives.

If you can define a simple set of primitives that combine to cover your entire problem space, and explain them effectively to your user, you're golden.

If common operations require some combination of primitives you can define syntactic sugar/macros/functions/etc. that cover your common operations, and explain to your users what these functions do in terms of primitives.

But all too often today the exact opposite design happens: the tutorial/manual/etc. will start by explaining the common operations, which all look cute and simple, and get you started. But if you try to bend the behavior of the tool even slightly, to cover some uncommon case, you end up digging through "Advanced" section of manuals, trying to understand lofty abstractions and trying random combinations of knobs which break for hard to diagnose reasons.

So you, just wanting to get some work done, give up on tool X and pick up tool Y, which is designed the same way, but happens to include your use case in the "common" set of operations.

It's all a frustrating and disempowering experience.

Awk just tells you everything it can do. You can learn all of it in an afternoon. If it fits your use case it's a godsend. If it doesn't, you'll know to reach for some other tool.

dotancohen · 2024-12-05T13:56:45 1733407005

And for the times that I've started with awk, that other tool is 90% of the time either Perl or Python.

mobscenez · 2024-12-05T05:05:23 1733375123

Pretty clear and straightforward explanation for awk! That's a good article.