I seem to remember this was a big point of contention when threaded Apache (vs just forking a billion processes) appeared - that if you went from 20 processes to 4 processes of 5 threads each you could hit the ulimit.
But ... that's a bad memory from long ago and far away.
But ... that's a bad memory from long ago and far away.