Where this 'best practice is “don’t”' idea comes from? I saw it couple of times when scraping topic surfaces. I think that it is kind of hypocrisy and actually acting against own good and even good of the internet as whole because it artificially limits who can do what.
Why are there entities which are allowed to scrape web however they want (who got into their position because of scraping the web) and when it comes to regular Joe then he is discouraged from doing so?
In my book, “Not best practice” doesn’t imply “never do”, but web scraping should be your option of last resort. Doing it well takes ages, and time spent doing it will often detract you from your goal.
As I said, in this case “learning data science” likely doesn’t require web scraping; it just requires some suitable data set.
The OP claimed in another comment that that doesn’t exist, but (s)he doesn’t say what dats (s)he’s looking for, so that impossible to check.
Why are there entities which are allowed to scrape web however they want (who got into their position because of scraping the web) and when it comes to regular Joe then he is discouraged from doing so?