This is the big thing that disappointed and frustrated me after I had spent a bunch of time hacking on Lisp Machines and then switched to Unix: in the Unix world, everything but everything is character strings. On the LispM, when you called 'directory', you'd get back a list of pathname objects. All the system interfaces were like that; it was hardly ever necessary to parse anything -- and when you did have to, it would be in s-expression format, so all you'd have to do is call 'read' on it.
In contrast, Unix is a Babel of different syntaxes. Every basic command like 'ls' has its own output syntax; every configuration file is in a different syntax. (Command line parsing isn't standardized either, but that train wreck deserves another conversation.)
In the case of the LispM all this was achieved by running the entire OS and all apps in a single address space; this obviously made passing objects between apps trivial, but at the price of a complete absence of security. Such a design would be a non-starter today. However, what you could do today would be to specify a standard system-wide serialization format, and give all the basic system commands an option to generate it. S-expressions would work great, but if you can't stand them, okay, use JSON. (Don't even think about using XML.)
The result would be, instead of just piping text strings from one app to another, you could, in effect, pipe objects. It's a far more powerful paradigm and would save you all this parsing pain.
All of this makes me happy that I use Powershell, where my output isn't some text I need to carefully parse (avoiding edge cases), but a list of objects, each of which has properties for me to interrogate.
It's good to be aware of these pitfalls, but in practice they often don't arise. If you're parsing log files, or any other system-generated files with sane filenames (no spaces, or other odd characters) you won't have an issue. Still, I normally would never attempt to parse 'ls' for this sort of thing. The preferred approach in a shell script is to use the shell's globbing capabilities (as in the example given):
Every so often, I'll find myself frustrated with some bash script. For me, once a shell script gets to that point, it's best to rewrite it in Python. Translating a script into Python is almost fun, and I find the result much more maintainable. BTW, the original submission is great. I've written shell scripts over the years that have made this very mistake! For example doing something like:
ls pj* | wc -l
Which normally returns the number of pj* files, but will fail for pathological file names as the submission points out.
In contrast, Unix is a Babel of different syntaxes. Every basic command like 'ls' has its own output syntax; every configuration file is in a different syntax. (Command line parsing isn't standardized either, but that train wreck deserves another conversation.)
In the case of the LispM all this was achieved by running the entire OS and all apps in a single address space; this obviously made passing objects between apps trivial, but at the price of a complete absence of security. Such a design would be a non-starter today. However, what you could do today would be to specify a standard system-wide serialization format, and give all the basic system commands an option to generate it. S-expressions would work great, but if you can't stand them, okay, use JSON. (Don't even think about using XML.)
The result would be, instead of just piping text strings from one app to another, you could, in effect, pipe objects. It's a far more powerful paradigm and would save you all this parsing pain.