Machine Learning with PyTorch and Scikit-Learn

wokwokwok · on Feb 26, 2022

Does anyone buy packt books?

I view them as below free-tier content that you a) have to pay for, but since there is basically zero quality control, they're often out of date, full of errors or just either incredibly specific (here's one specific example of a thing) or copy-pasted API documentation.

I mean, just seeing packt as a publisher is enough for me to go: I'm not really interested in this.

You can make good content and publish it without packt (eg. the FastAI book).

If this book is actually good, why did you involve packt? They're the enemy of high quality technical documentation.

sbbq · on Feb 26, 2022

Author here. I agree with what you said. I wrote my first book with Packt back when I was a student and was like: "cool, a book deal!" Of course, I didn't know about the caveats :P. Yeah, there was very low (/no) quality control. In fact, they introduced a lot of typos during the layouting (apparently, they re-typed the equations by hand!). However, despite all of that, the book was quite successful, so for the subsequent editions, they gave my book much more attention. Personally, I also got much more flexibility regarding deadlines, etc.

Long story short, yeah, there are definitely issues with quality control, and it's really up to the author to make sure that the content is correct and sound. For this particular book, I must say that I worked with a great layouter who paid a lot of attention to detail this time. Also, with their new layout, they no longer had to re-type the equations, and the typesetting looks so much better now. I am pretty happy with how it turned out this time :)

elcapitan · on Feb 26, 2022

One thing I noticed a couple of times now is that they just blatantly copy other publishers bestseller titles, probably in the hope of people buying their books accidentally after reading a title recommendation.

Example:

    Hands-On Machine Learning with Scikit-Learn and Tensorflow, O'Reilly, 2017

    Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits, Packt, 2020

Really makes me want to filter them out as spam on Amazon etc.

tomrod · on Feb 26, 2022

I usually don't, but Raschka's first book was particularly accessible.

jph00 · on Feb 26, 2022

This new book is really good. I don't know why he went with Packt, who do have a lot of low quality titles - but this one is not low quality.

amval · on Feb 26, 2022

I made the mistake of buying one once. It was basically the repacked language documentation without adding anything of value. If they would publish that, the will publish anything.

wodenokoto · on Feb 26, 2022

Yeah, I also see Packt as a "do not buy"-stamp.

Buttons840 · on Feb 26, 2022

Yes, they aren't that bad. I think your right about them being lower quality than others, but I've seen some good ones. I think an earlier version of this Python ML book was quite good, I remember reading some of it and being happy with it.

mark_l_watson · on Feb 26, 2022

I didn’t like that this was a Pakt book, but the blog article was so well written and the author bios are impressive, so I just bought it as a Kindle book.

Just right now, I am retiring from a lead machine learning job to devote more time to caring for my wife who has some grim health problems. Before that I managed a deep learning team at Capital One.

I think that the field of deep learning blows away any other tech right now because it can be used to greatly improve everything else (bio tech, financial tech, medicine, corporate to corporate data communications, optimizing sales, etc., etc.)

Anyway, I am glad I dropped by HN this morning, saw this book and bought it. I enjoy going back to basics and relearning things, and I expect to enjoy learning new tools (this book uses PyTorch and SCikit-Learn - I have done 99% of my deep learning work in TensorFlow).

savant_penguin · on Feb 26, 2022

A tip for anyone who suffers with the slow training times of the sklearn logistic regression: you can write it with skorch in no time and get _much_ faster training times.

I wonder if sklearn will have a pytorch backend one day

gh02t · on Feb 26, 2022

GPyTorch also absolutely crushes the Scikit implementation for Gaussian processes in my experience. Scikit is a treasure, but maybe not my first choice for performance.

hnnemo · on Feb 27, 2022

skorch also supports GPyTorch, see https://skorch.readthedocs.io/en/stable/user/probabilistic.h... :)

zetazzed · on Feb 26, 2022

Consider also GPU accelerating the whole thing if you have a GPU around. cuML matches the sklearn API https://github.com/rapidsai/cuml/. Pays off very quickly if you have large datasets.

sbbq · on Feb 26, 2022

I think Rapids AI's cuML tried to go into this direction (essentially scikit-learn on the GPU): https://docs.rapids.ai/api/cuml/stable/api.html#logistic-reg.... For some reason it never took really off though.

Btw., going on a tangent, you might like Hummingbird (https://github.com/microsoft/hummingbird). It allows you trained scikit-learn tree-based models to PyTorch. I watched the SciPy talk last year, and it's a super smart & elegant idea.

antman · on Feb 26, 2022

A drop in replacement for a large part of sklearn for Intel CPUs: https://github.com/intel/scikit-learn-intelex

p1esk · on Feb 26, 2022

one of the big changes is that we transitioned the code example of the deep learning chapters from TensorFlow to PyTorch

thank god

cinntaile · on Feb 26, 2022

Why is that such a big deal?

lgessler · on Feb 26, 2022

PyTorch has become the de facto standard for research (meaning cutting edge models often have their sole implementation in PyTorch), and TF has had a much more unstable and uh, baroque API. It should be mentioned that while some of that API mess probably could have been avoided, other parts of it are a consequence of TF's view of computational graphs as static rather than dynamic, which gives it a fundamental performance advantage over PyTorch. (You can think of this as vaguely equivalent to why compiled languages can run faster than interpreted languages.)

p1esk · on Feb 27, 2022

fundamental performance advantage over PyTorch

It's funny, yes, you could expect that, but in reality TF is almost never faster than Pytorch

ShamelessC · on Feb 26, 2022

I take it you've never had to depend upon a Google library before.

Just kidding, but yeah - my understanding is that Tensorflow V1 was annoying to use and that Tensorflow V2 sort of "pytorch-ified" everything but it was a.) breaking changes and b.) too little too late.

Now Google has JAX (they JIT numpy so it will run well on their TPU's + autograd) which is coming along nicely and takes a different approach from pytorch. I'll still be using pytorch or a wrapper for pytorch-isms however.

dr_kiszonka · on Feb 26, 2022

From the book's Amazon page:

"PyTorch is the Pythonic way to learn machine learning, making it easier to learn and simpler to code with. This book explains the essential parts of PyTorch and how to create models using popular libraries, such as PyTorch Lightning and PyTorch Geometric."

macksd · on Feb 26, 2022

This is a pretty good write-up: https://nicodjimenez.github.io/2017/10/08/tensorflow.html

hervature · on Feb 26, 2022

Tensorflow 2 came out in 2019 and so that blog post really does not answer why PyTorch is better than Tensorflow from a technical standpoint. From the author's point of view, it makes clear sense because of the popularity of PyTorch over Tensorflow.

macksd · on Feb 26, 2022

The general complaints about the API's design and style have not changed so much even with the advent of eager execution by default, etc.

exdsq · on Feb 26, 2022

How would one self-study enough about ML to be able to move into an ML engineering role without an academic background on it? Any recommended paths out there?

kajecounterhack · on Feb 26, 2022

I would recommend joining an ML-centric company in an ML infrastructure role, and get real-world experience that way. (Example industries to look at: self driving car companies, spam & abuse departments of major companies, data labeling firms, ML consulting firms). Ideally you want to work with as many ML engineers as you can.

Classes and independent study are great, but a lot of these companies want to hire experienced folks for ML roles, so once you have picked up some basics from independent study it's helpful to get an _ML adjacent_ role to help you start moving laterally toward the ML engineer position you want.

bckr · on Feb 26, 2022

I corresponded with another user here who accomplished this and wrote about it here[]

[]http://karlrosaen.com/ml/

exdsq · on Feb 26, 2022

Thanks! Interesting career history btw - AgTech is a cool field :)

bckr · on Feb 26, 2022

Nice, yeah I should clarify that this was written by the fellow I corresponded with

jpalomaki · on Feb 26, 2022

Start from simple, take some scikit-learn tutorials and start building predictive models. Look for some real problems, maybe from area that interests you. Sports, weather, stock market. Set a goal for what you want to predict, look for data and then Google and go with trial and error.

You don’t really need academic degree on the topic to do productive work with these tools.

Also spend time with learning the related non-ml tools like Pandas and some Numpy. Understanding Pandas is (IMHO) key to get anything done and unfortunately it is not too simple to get around.

throwaway81523 · on Feb 26, 2022

The fast.ai videos are good though very long. Expect to spend a lot of time on the exercises, and you will need an nvidia gpu-equipped computer unless something has changed recently. Those are available as cloud rentals of course though.

mark_l_watson · on Feb 26, 2022

I would find a few things that you are interested in, write deep learning models around these areas of interests (it is OK to clone similar Keras or PyTorch examples and modify/extend them).

Then, bravely apply for deep learning jobs, admit to no professional experience but point to your models on GitHub and be ready to be able to carefully explain what you did, how you worked around problems, etc.

Be willing to take an entry level position.

Also, learn to be flexible about spending a lot of effort learning whatever domain your future employers work in. Learn their businesses, how they operate, what data they have, etc.

magicalhippo · on Feb 26, 2022

> write deep learning models around these areas of interests

I've read tons of articles on how to assemble models using PyTorch etc, but very little on how to design them.

Like overall architecture to use for what problems, conditioning of input data, number of layers, width of layers at each depth.

I haven't read this book yet, so maybe this has some on that, but are there any other decent resources that go a bit into this? At least something that gives some overall pointers, I'm guessing this can get quite involved quite fast.

mark_l_watson · on Feb 26, 2022

You might look through the Keras examples https://keras.io/examples/

Find Somerville to your application.

sriram_malhar · on Feb 26, 2022

I suggest starting with Andrew Ng's Deep Learning Specialization set of courses. It is a very decent overview.

Mochsner · on Feb 26, 2022

My issue with leaening ML isn't tensorflow, but that I'd need to figure out how to load the models into a mobile device running on C#... Which I have no idea how to do.

dbish · on Feb 26, 2022

One way to do that for C# is to build your model in PyTorch then transform it to ONNX which Microsoft has a bunch of C# tooling support for: https://docs.microsoft.com/en-us/windows/ai/windows-ml/train....

I am curious though, what mobile device are you trying to run on?

sriram_malhar · on Feb 26, 2022

Does anyone here have a comparison of this with Trax?