Yes, he's a pencil pushing fraud for the most part (he might even admit it in private). I don't know how old he is but assuming he's been professional for 10 years, his big contribution is the few hundred/thousand lines of VW (he's probably done other work, but let's assume this is a significant fraction). Where did the rest of his time go? If he just decided to bang VW out a decade ago he'd be at it's current state within a month of starting. VW is only useful because it's fast and it works (and that's not due to any theory). It's theoretical considerations are useful only for essentially drawing that 1 month out to years/decades.
I'll ask him whether he considers himself mostly a pencil pushing fraud when I see him in December. I'll also see what he thinks of the claim that VW's usefulness is "not due to any theory". I'll think you'll be surprised given he's written a blog post on this topic titled "Use of Learning Theory" http://hunch.net/?p=496 at his blog which, by the way, is called "Machine Learning (Theory)".
It's easy to dismiss what you don't understand but you should consider the possibility that it is significantly more difficult develop algorithms like those in VW and "bang out" implementations of them that are fast and correct.
First of all, "learning" is a made up word for parameter search, it's kind of a trick to fool funders to think you're doing cool stuff. Second, his entire blog post is about theory being useful only in a crude way (which basically means not useful) and he's outlined useful rules of thumb (that probaly come from experiment). Is time best spent on gathering data and running experiments to show practical usefulness of different algorithms or on pencil pushing? That isn't made clear.
Firstly, I agree that machine learning is effectively parameter search but the name an artefact of history and we both appear to understand what it means so I don't see how this adds to the argument.
Secondly, no, John didn't say learning theory is "useful only in a crude way", he said, "learning theory is most useful in it’s crudest forms". Big difference. And besides, he says right at the beginning of the post that he believes "learning theory has much more substantial applications [than generating papers]".
To be convincing, theory needs to be precise – if you are not careful about what you are talking about it is easy to believe things that are not true. However, what I think John is saying is that value of theory comes from afterwards abstracting away the precision and understanding the main message (i.e., the "crude form") of a theoretical result. In general, I don't think it is possible to get to a convincing "main message" or "crude form" without someone having grappled with the details.
No matter how many experiments you run, you will only ever show that an algorithm works well or not in a typically small finite number of cases. What theory does is look those cases and ask something like, "It seems that every time X is true of my problem, an algorithm with property Y works really well. I wonder if that is always true?" This type of question gets carefully formalised and then (hopefully) answered. The process of formalisation (i.e., defining things carefully) can yield new ways of thinking about things (e.g., over fitting and bias-variance trade-offs), and having an answer to the general question means that you can be assured that future uses of your algorithm will behave as expected.
You seem to have an unshakeable opinion that mathematics/theory/"pencil-pushing" is inherently a waste of time. That's a real shame. Why do you believe the pursuit of precision, insight, and proof are somehow inferior to running experiments? I find both to be valuable and the interplay between the two extremely rewarding.