> Jeff Dean is clearly one of the greatest software developers/engineers ever Ba...

ericjang · on April 20, 2023

Jeff was very early on in the "just scale up the big brain" idea, perhaps as early as 2012 (Andrew Ng training networks on 1000s of CPUs). This vision is sort of summarized in https://blog.google/technology/ai/introducing-pathways-next-... and fleshed out more in https://arxiv.org/abs/2203.12533, but he had been internally promoting this idea since before 2016.

When I joined Brain in 2016, I had thought the idea of training billion/trillion-parameter sparsely gated mixtures of experts was a huge waste of resources, and that the idea was incredibly naive. But it turns out he was right, and it would take ~6 more years before that was abundantly obvious to the rest of the research community.

Here's his scholar page (H index of 94) https://scholar.google.com/citations?hl=en&user=NMS69lQAAAAJ...

As a leader, he also managed the development of TensorFlow and TPU. Consider the context / time frame - the year is 2014/2015 and a lot of academics still don't believe deep learning works. Jeff pivots a >100-person org to go all-in on deep learning, invest in an upgraded version of Theano (TF) and then give it away to the community for free, and develop Google's own training chip to compete with Nvidia. These are highly non-obvious ideas that show much more spine & vision than most tech leaders. Not to mention he designed & coded large parts of TF himself!

And before that, he was doing systems engineering on non-ML stuff. It's rare to pivot as a very senior-level engineer to a completely new field and then do what he did.

Jeff certainly has made mistakes as a leader (failing to translate Google Brain's numerous fundamental breakthroughs to more ambitious AI products, and consolidating the redundant big model efforts in google research) but I would consider his high level directional bets to be incredibly prescient.

HarHarVeryFunny · on April 21, 2023

OK - I can see the early ML push as obviously massively impactful, although by 2014/2015 we're already a couple of years after AlexNet, other frameworks such as Theano, Torch (already 10+ yrs old at that point), etc already existed, so the idea of another ML framework wasn't exactly revolutionary. I'm not sure how you'd characterize Jeff Dean's role in TensorFlow given that you're saying he lead a 100-person org, yet coded much of himself.... a hands-on technical lead perhaps?

I wonder if you know any of the history of exactly how TF's predecessor DistBelief came into being, given that this was during Andrew Ng's time at Google - who's idea was it?

The Pathways architecture is very interesting... what is the current status of this project? Is it still going to be a focus after the reorg, or too early to tell ?

ericjang · on April 21, 2023

Jeff was the first author on the DistBelief paper - he's always been big on model-parallelism + distributing neural network knowledge on many computers https://research.google/pubs/pub40565/ . I really have to emphasize that model-parallelism of a big network sounds obvious today, but it was totally non-obvious in 2011 when they were building it out.

DistBelief was tricky to program because it was written all in C++ and Protobufs IIRC. The development of TFv1 preceded my time at Google, so I can't comment on who contributed what.

HarHarVeryFunny · on April 21, 2023

Interesting - thanks!

panabee · on April 20, 2023

thanks for this insightful perspective.

1. what was the reasoning behind thinking billion/trillion parameters would be naive and wasteful? perhaps part are right and could inform improvements today.

2. can you elaborate on the failure to translate research breakthroughs, of which there are many, into ambitious AI products? do you mean commercialize them, or pursue something like alphafold? this question is especially relevant. everyone is watching to see if recent changes can bring google to its rightful place at the forefront of applied AI.

summerlight · on April 20, 2023

> large scale infrastructure projects such as BigTable, MapReduce, Protobuf and TensorFlow

If you initiated and successfully landed large scale engineering projects and products that has transformed the entire industry more than 10 times, that's something qualified for being a "legend".

HarHarVeryFunny · on April 21, 2023

Only if you did it at a company like Google where it's being talked about and you've got that large a user base. Inside most of corporate America internal infrastructure / modernization efforts get little recognition.

I wrote an entire (Torch-like - pre PyTorch) C++-based NN framework myself, just as a hobbyist effort. Ran on CPU as well as GPU (CUDA). For sure it didn't compete with TensorFlow in terms of features, but was complete enough to build and train things like ResNet. A lot of work to be sure, but hardly legendary.

summerlight · on April 21, 2023

> Only if you did it at a company like Google where it's being talked about and you've got that large a user base

Google has lots of folks who had access to the similar level of resources and no one but Jeff and Sanjay made it. Large scale engineering is not just about writing some fancy infra code, but a very rigorous project to convince thousands of people to onboard which typically requires them to rewrite significant fraction of their production code, typically referred as "replacing wheels on a running train". You gotta need lots of evidence, credits and visions to make them move.

HarHarVeryFunny · on April 21, 2023

> "replacing wheels on a running train"

Yeah - just finished migrating a system of 100+ Linux processes all inter-communicating via CORBA to use RabbitMQ instead. Production system with 24x7 uptime and migration spread over more than a year with ongoing functional releases at the same time. I prefer to call it changing the wheels on a moving car.

No doubt it's worse at Google, but these type of infrastructure projects are going on everywhere, and nobody is getting medals.