Even being in Tauri this application just by doing these things takes around 120MB on my M3 Max. It's truly astonishing how modern desktop apps are essentially doing nothing and yet consume so much resources.
- it sets icon on the menubar
- it display a window where I can choose which model to use
I feel the same astonishment!
Our computers surely are today faster and stronger and smaller than yesterdays', but did this really translate in something tangible for a user?
I feel that besides boot-up, thanks to SSDs rather than gigaHertz, it's not any faster.
It's like, all this extra power is used to the maximum, for good and bad reasons, but not focused on making 'it' faster.
I get a bit puzzled to why my mac could freeze half a second when I 'cmd+a' in some 1000+ files-full folder.
Why doesn't Excel appear instantly, and why is it 2.29GB now when Excel 98 for Mac was.. 154.31MB?
Why is a LAN transfer between two computers still as slow as 1999, 10ishMB/s, when both can simultaneously download at > 100MB/s?
I'm not starting with GB-memory-hoarding tabs, when you think about it, it's managed well as a whole, holding 700+ tabs without complaining.
And what about logs?
This is a new branch of philosophy, open Console and witness the era of
hyperreal siloxal, where computational potential expands asymptotically while user
experience flatlines into philosophical absurdity?
It me takes longer to install a large Mac program from the .dmg than it takes to download it in the first place. My internet connection is fairly slow and my disk is an SSD. The only hypothesis that makes sense to me is that MacOS is still riddled with O[n] or even O[n^2] algorithms that have never been improved and this incompetence has been made less visible by ever-faster hardware.
A piece of evidence supporting this hypothesis: rsync (a program written by people who know their craft) on MacOS does essentially the same job as Time Machine, but the former is orders of magnitude faster than the latter.
You can make this app yourself in an hour if you're on Linux and can do some scripting. Mockup below for illustration, but this is the beating heart of a real script:
# whisper-live.sh: run once and it listens (blocking), run again and it stops listening.
if ! test -f whisper.quit ; then
touch whisper.quit
notify-send -a whisper "listening"
m="/usr/share/whisper.cpp-model-tiny.en-q5_1/ggml-tiny.en-q5_1.bin"
txt="$(ffmpeg -hide_banner -loglevel -8 -f pulse -i default -f wav pipe:1 < whisper.quit \
| whisper-cli -np -m "$m" -f - -otxt -sns 2>/dev/null \
| tr \\n " " | sed -e 's/^\s*//' -e 's/\s\s*$//')"
rm -f whisper.quit
notify-send -a whisper "done listening"
printf %s "$txt" | wtype -
else
printf %s q > whisper.quit
fi
You can trivially modify it to use wl-copy to copy to clipboard instead, if you prefer that over immediately sending the text to the current window. I set up sway to run a script like this on $mod+Shift+w so it can be done one-handed -- not push to listen, but the script itself toggles listen state on each invocation, so push once to start, again to stop.
The tech industry has such inefficiencies nearly everywhere. There's no good explanation why an AI model that knows so much could be smaller than a typical OS installation.
I could once optimize a solution to produce over 500x improvement. I cannot write about how this came, but it was much easier than initially expected.
In theory, Handy could be developed by hand-rolling assembly. Maybe even binary machine code.
- It would probably be much faster, smaller and use less memory. But...
- It would probably not be cross-platform (Handy works on Linux, MacOS, and Windows)
- It would probably take years or decades to develop (Handy was developed by a single dev in single digit months for the initial version)
- It would probably be more difficult to maintain. Instead of re-using general purpose libraries and frameworks, it would all be custom code with the single purpose of supporting Handy.
- Also, Handy uses an LLM for transcription. LLM's are known to require a lot of RAM to perform well. So most of the RAM is probably being used by the transcription model. An LLM is basically a large auto-complete, so you need a lot of RAM to store all the mappings to inputs and outputs. So the hand-rolled assembly version could still use a lot of RAM...
But do you start onnx and whisper.cpp on fresh install / start? I did nothing. I literally just installed the app and started it without selectin a model.
Oh interesting. I totally misread the original comment, I didn't realize you're talking about RAM usage. 120MB is quite a lot. This surprises me too. There's nothing fancy going on really until the model is chosen.
- it sets icon on the menubar - it display a window where I can choose which model to use
That's it. 120MB FOR doing nothing.