I have been working on bioinformatics for many years now, and I see this question often asked by beginners, people that insist in using just a single language are crippling themselves and their work. You can and should learn both and more, programming languages are just tools, the more tools you have the better prepared you will be for solving any problems that arise.
It’s not a really good article in that way. There should be a decision tree per application that guides you to the language choice.
What are the R packages and what is available in Python? Are the Python or R packages just calling a C library after all, then does memory or single threading matter?
What about Julia, can you better express your thoughts and equations in Julia and that’s most important for your project?
Do you want to work in notebooks or build ‘production’ code?
Python is not a terrible language. It's widely used in reputable organizations by highly informed and capable software engineers to great effect. Generic language criticism in 2024 is an anti-pattern that intellectually hinders new engineers. There are certainly preferences for different use cases and highlighting those differences are productive discussion.
For Example: One of the most common technical concern points to bring to a new engineer is the Global Interpreter Lock (GIL) that restricts Python to execute in one thread only.
Research on PL design is alive and well in 2024, and Python is nowhere near the cutting edge. I never said it didn't get the job done, or that people shouldn't learn or use it.
It's just not as elegant or simple of a PL design as people sometimes seem to think it is. It's hard to look at something like Clojure or Haskell next to Python and to come to the conclusion that python is particularly elegant.
I think my main point got lost on people who took offense to my dislike for python: my main point is that you should also learn languages that are very different / that are in an altogether different branch.
You: If these tools are not ideal, what is a better tool?
Parent: I won't tell you. I just simply criticize online and provide no technical guidance, instead I gatekeep my definition of quality tooling. I think being a good senior expert does not include showing best practices I have developed through my years of experience.
You: Oh. Then I don't want to listen to you because I'm more interested in making personal progress on my professional journey than listening to uninitiated opinions provided without context.
Languages aren't just good or bad in the abstract. It's about problem-tool fit. The author mentions a specific field and evaluates the fit of R and Python. Do you believe there is a language with better fit to Bioinformatics? I'd like to hear your recommendation.
> Languages aren't just good or bad in the abstract.
I don't know about that. There's certainly no "perfect" language, but to claim that python is brilliant or elegant language design is to not know a whole log about PL design.
I think I'm mainly getting downvoted because Python is popular, and people think that means it must be inherently good.
Packages are installed into your system. May bio packages (bioconductor, etc) are difficult to run in any other way outside of a system install. So outside of running everything in a container, it is difficult to maintain a project level dependency versions.
No regard is given (by default) to install specific versions of a package. They just install the latest. So your build from 2 years ago will almost certainly be different, sometimes in important ways, from your build today, even if you used the same packages.
This says nothing as to R's suitability to help maintain bio-scientist's accuracy through the process. Often times they will just dump data to R data files, which are opaque to version control and difficult to read outside of the R environment, because the data files often contain references to types defined in packages, thus to decode the data you have to have the correct R packages installed. This makes reading it in an external environment infeasible.
R has many useful packages that just exist and work. But the verification and versioning, and reproducible system, is to me, makes it something to acutely avoid.
R has been part of my stack professionally for around 16 years and in that time I'd estimate maybe 15% of my code had anything to do with statistics. I've used it for numerical modelling, data viz, ETL pipelines, deploying APIs, building dashboards, amongst many other topics. To reduce R to statistics is doing it a disservice.
Curious what troubles you had with R packages? I find R packages the easiest to use of any language; install.packages("<package name>") and library() and you're off the races. I occasionally have library issues with python/ruby; many more still bundling javascript. By contrast R rarely gives me grief.
- R packages are much more likely to include compilation of C libraries, which can cause grief if you're not experienced enough to install specific libs that might be newer than eg what apt provides
- library(package) imports the full package into the global namespace (from package import *) which is fine for small projects but scales poorly
Anaconda largely handles the first problem if you can constrain you package use to its ecosystem though.
For my anecdata the worst R package wasn't any worse than a python package that needed gdal, but I had to deal with these problems easily 5 times more often
There's also the 'reticulate' package for R, which allows you to import Python libraries and exchange objects between the two languages. It includes a proper Python engine for knitr (with matplotlib support), letting you use both R and Python code blocks in the same R Markdown document.
I'd say compare the ecosystems. I don't know enough about either when it comes to bioinformatics, but I have a feeling that Python has a bigger ecosystem.
Not coming from Bioinformatics. But we had a similar question regarding time series modelling a while ago. We used kind of modified arima models and it was difficult to do it in python. For the rest we used python, especially for ML stuff and tried to keep the R part rather small
Tldr: R for classical stats, python for machine learning