The "no performance cost" thing is interesting: my experience writing a similar profiler is that there are a couple of things that can affect performance a little bit:
1. You have to make a lot of system calls to read the memory of the target process, and if you want to sample at a high rate then that does use some CPU. This can be an issue if you only have 1 CPU.
2. you have two choices when reading memory from a process: you can either race with the program and hope that you read its memory to get the function stack before it changes what function it's running (and you're likely to win the race, because C is faster than Python), or you can pause the program briefly while taking a sample. py-spy has an option to choose which one you want to do: https://github.com/benfred/py-spy#how-can-i-avoid-pausing-th...
Definitely this method is a lot lower overhead than a tracing profiler that instruments every single function call, and in practice it works well.
One thing I think is nice about this kind of profiler is that reading memory from the target process sounds like a complicated thing, but it's not: you can see austin's code for reading memory here, and it's implemented for 3 platforms in just 130 lines of C: https://github.com/P403n1x87/austin/blob/877e2ff946ea5313e47...
> you can either race with the program and hope that you read its memory to get the function stack before it changes what function it's running (and you're likely to win the race, because C is faster than Python), or you can pause the program briefly while taking a sample.
Somewhat interestingly, this problem doesn't seem to occur with Ruby - and rbspy can get away without pausing the target program with only minor errors seen when profiling a similar function. I suspect this is because of differences between how the Ruby and Python interpreters store call stack information, but haven't had a chance to dig into the specifics.
Also, this kind of profiler is great because you can use it on any running Python program, which is pretty magical and very useful. (especially when it's an application you didn't write)
But it's not right for every use case: by design austin/py-spy can only really profile the whole program, and if you want to profile a specific function or endpoint in your program, something like PyInstrument https://github.com/joerick/pyinstrument (which includes Django middlewares & Flask decorators) is a lot more useful.
All good points. The "no performance cost" is indeed more like "negligible performance costs". That's because these days multicore architectures are quite ubiquitous and standard Python applications are single process. For multi-process Python applications, a busy profiler would certainly steal a good chunk of a core, so the impact might be noticeable in that case.
As for the race conditions, Austin does not introduce any pauses. Even if it did, there would be no guarantee that it paused at a "good" point, so there are no real benefits in terms of accuracy in pausing. Error rates are quite low anyway, so the actual benefit comes from not pausing at all.
Running Python is not a performance cost. The meaning behind "no performance cost" is that a tool like this is unlikely going to impact the performance of the application that is being profiled. The fact that Python is not a "fast" programming language is then a different matter.
The point is that it's only possible in the first place because Python leaves so much performance on the table. You can't snoop inside a C++ program in the same way - unless you throw a bunch of sleep() calls everywhere to slow it down, and then hey presto you can!
Or, to put it another way, taking Python from 500x SlowerThanCee to 501x is "negligible", but taking C from 1x to 2x slower isn't.
Note that tracing profilers don't need to be high overhead - Python is slow enough that efficient tracing can be mostly hidden. For example, https://functiontrace.com tends to have <10% overhead when tracing.
Both use the terminal, but in different ways. A CLI interface uses the basic terminal functionality of writing and perhaps reading lines, or even simply accepting flags and running non-interactively. A TUI uses the terminal's advanced features, using the terminal as the basis for a kind of GUI, generally occupying the full area of the terminal.
Vim, Midnight Commander, and htop have TUIs. They rely heavily on the terminal's 'control character' features to accomplish this. apt-get has a CLI, as its interactive IO is handled with printing lines and having the user submit lines (even if it's just the letter y).
Somewhere in the middle are interfaces like bash and zsh which make light use of the terminal's advanced features for things like auto-complete, but which don't take over the whole terminal area.
I'd count non-interactive applications like gcc and sort as command-line applications, although strictly speaking you could just as well use a graphical interface to configure their flags and run commands.
God these explanations suck. Reminds me of the "two boats meet in an ocean" explanation of IRC.
A CLI (command line interface) just uses the streams to show and take in data.
A TUI uses escape codes or proprietary console commands (the latter on Windows, the former on most other things) to take control of the entire console window and display a user interface (hence Text User Interface or TUI). Usually mouse input is handled on modern systems too.
CLI refers to interacting with a program by typing commands. Commands need not come from a terminal (a terminal need not even exist). They might be just passed in from a different program.
TUI refers to interacting with a program via the terminal—the terminal is the program's user interface (think 'nano', which you can't really run without a terminal). The terminal need not interact via commands, although many TUI programs also accept commands.
One question: in the first XML example (minimal-view.xml), why is the root element a <aui:MinimalView ...> but then the last line ends it with </aui:MiniTop> ? Is it a typo or is there something else going on?
Can Austin attach to python process running in the docker container from the host system?
`sudo austin-tui -Cp <pid>`
I used pyflame for that at some point.
One benefit of Austin-based tools is that they don't require any instrumentation/extra configuration. The low overhead means that you can just attach to an application that is running in production.
- py-spy: https://github.com/benfred/py-spy (written in Rust)
- pyflame: https://github.com/uber-archive/pyflame (C++, seems to be not maintained anymore)
The "no performance cost" thing is interesting: my experience writing a similar profiler is that there are a couple of things that can affect performance a little bit:
1. You have to make a lot of system calls to read the memory of the target process, and if you want to sample at a high rate then that does use some CPU. This can be an issue if you only have 1 CPU.
2. you have two choices when reading memory from a process: you can either race with the program and hope that you read its memory to get the function stack before it changes what function it's running (and you're likely to win the race, because C is faster than Python), or you can pause the program briefly while taking a sample. py-spy has an option to choose which one you want to do: https://github.com/benfred/py-spy#how-can-i-avoid-pausing-th...
Definitely this method is a lot lower overhead than a tracing profiler that instruments every single function call, and in practice it works well.
One thing I think is nice about this kind of profiler is that reading memory from the target process sounds like a complicated thing, but it's not: you can see austin's code for reading memory here, and it's implemented for 3 platforms in just 130 lines of C: https://github.com/P403n1x87/austin/blob/877e2ff946ea5313e47...