Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
R or Python for Bioinformatics? (divingintogeneticsandgenomics.com)
13 points by sebg on June 8, 2024 | hide | past | favorite | 30 comments


I have been working on bioinformatics for many years now, and I see this question often asked by beginners, people that insist in using just a single language are crippling themselves and their work. You can and should learn both and more, programming languages are just tools, the more tools you have the better prepared you will be for solving any problems that arise.


It’s not a really good article in that way. There should be a decision tree per application that guides you to the language choice.

What are the R packages and what is available in Python? Are the Python or R packages just calling a C library after all, then does memory or single threading matter?

What about Julia, can you better express your thoughts and equations in Julia and that’s most important for your project?

Do you want to work in notebooks or build ‘production’ code?

Do you need to put your work on the web?


Agreed— But also both are really terrible languages, so I’d encourage people to broaden their horizons even more fearlessly.


Python is not a terrible language. It's widely used in reputable organizations by highly informed and capable software engineers to great effect. Generic language criticism in 2024 is an anti-pattern that intellectually hinders new engineers. There are certainly preferences for different use cases and highlighting those differences are productive discussion.

For Example: One of the most common technical concern points to bring to a new engineer is the Global Interpreter Lock (GIL) that restricts Python to execute in one thread only.


> Generic language criticism in 2024

Research on PL design is alive and well in 2024, and Python is nowhere near the cutting edge. I never said it didn't get the job done, or that people shouldn't learn or use it.

It's just not as elegant or simple of a PL design as people sometimes seem to think it is. It's hard to look at something like Clojure or Haskell next to Python and to come to the conclusion that python is particularly elegant.

I think my main point got lost on people who took offense to my dislike for python: my main point is that you should also learn languages that are very different / that are in an altogether different branch.


Are you getting paid for broadening your horizons or for getting sh*t done?

Stroustrup was spot on about two types of programming languages.


Grand-parent: programming languages are tools. The more tools you can wield, the better you can choose the right one for the job.

Parent: yeah, but R and python are shit tools; learn how to use better tools too.

You: are you paid to learn how to use better tools, or are you paid to solve problems with shit tools?


What you are really doing:

You: If these tools are not ideal, what is a better tool?

Parent: I won't tell you. I just simply criticize online and provide no technical guidance, instead I gatekeep my definition of quality tooling. I think being a good senior expert does not include showing best practices I have developed through my years of experience.

You: Oh. Then I don't want to listen to you because I'm more interested in making personal progress on my professional journey than listening to uninitiated opinions provided without context.


> You: If these tools are not ideal, what is a better tool?

I am sure there's the word rust in there somewhere ;-) After all, HN has been raving about polars.


Languages aren't just good or bad in the abstract. It's about problem-tool fit. The author mentions a specific field and evaluates the fit of R and Python. Do you believe there is a language with better fit to Bioinformatics? I'd like to hear your recommendation.


> Languages aren't just good or bad in the abstract.

I don't know about that. There's certainly no "perfect" language, but to claim that python is brilliant or elegant language design is to not know a whole log about PL design.

I think I'm mainly getting downvoted because Python is popular, and people think that means it must be inherently good.


It’s not mentioned in the article, but R struggles in a production environment.

Things like security and TLS support are afterthoughts.

If you're planning on scaling up, or working with confidential data then get as much of your pipeline into Python as possible.


Making bio R reproducible is difficult. Impossible to do casually.

The coupling of the undeclared environment and the code that runs is even then in Python.

The versioned cran is no longer available.

For this reason alone, use something that understands and takes this seriously.


I use nix (flakes) almost religiously now for exactly that reason (both for Python and R).


What? What’s your issue with making R reproducible?


Packages are installed into your system. May bio packages (bioconductor, etc) are difficult to run in any other way outside of a system install. So outside of running everything in a container, it is difficult to maintain a project level dependency versions.

No regard is given (by default) to install specific versions of a package. They just install the latest. So your build from 2 years ago will almost certainly be different, sometimes in important ways, from your build today, even if you used the same packages.

This says nothing as to R's suitability to help maintain bio-scientist's accuracy through the process. Often times they will just dump data to R data files, which are opaque to version control and difficult to read outside of the R environment, because the data files often contain references to types defined in packages, thus to decode the data you have to have the correct R packages installed. This makes reading it in an external environment infeasible.

R has many useful packages that just exist and work. But the verification and versioning, and reproducible system, is to me, makes it something to acutely avoid.


Regarding versioning — it sounds like 80% of your issues exist because of the lack of a project environment.

Have you ever checked out `renv`? It should work quite well with Bioconductor.

(Dumping data into binaries is… unfortunate, but this sounds more like a training issue than anything else.)


r should just transition to being a statistics package for the python ecosystem.


R has been part of my stack professionally for around 16 years and in that time I'd estimate maybe 15% of my code had anything to do with statistics. I've used it for numerical modelling, data viz, ETL pipelines, deploying APIs, building dashboards, amongst many other topics. To reduce R to statistics is doing it a disservice.


“The R Project for Statistical Computing”


Regardless of the official title, it long ago spread far beyond statistics.


Definitely Python. I only use R when I have to (working with other people). R packages are a nightmare.


Curious what troubles you had with R packages? I find R packages the easiest to use of any language; install.packages("<package name>") and library() and you're off the races. I occasionally have library issues with python/ruby; many more still bundling javascript. By contrast R rarely gives me grief.


Not OP but my experience is

- R packages are much more likely to include compilation of C libraries, which can cause grief if you're not experienced enough to install specific libs that might be newer than eg what apt provides

- library(package) imports the full package into the global namespace (from package import *) which is fine for small projects but scales poorly

Anaconda largely handles the first problem if you can constrain you package use to its ecosystem though.

For my anecdata the worst R package wasn't any worse than a python package that needed gdal, but I had to deal with these problems easily 5 times more often


There's also the 'reticulate' package for R, which allows you to import Python libraries and exchange objects between the two languages. It includes a proper Python engine for knitr (with matplotlib support), letting you use both R and Python code blocks in the same R Markdown document.

https://cran.r-project.org/web//packages//reticulate/index.h...


I'd say compare the ecosystems. I don't know enough about either when it comes to bioinformatics, but I have a feeling that Python has a bigger ecosystem.


You’d be surprised, R is huge in many niches.


Be like me: learn Python, then ask ChatGPT to translate tour Python into R every time you have to use R for a project


Not coming from Bioinformatics. But we had a similar question regarding time series modelling a while ago. We used kind of modified arima models and it was difficult to do it in python. For the rest we used python, especially for ML stuff and tried to keep the R part rather small

Tldr: R for classical stats, python for machine learning


To save you a click, a TL;DR from the article:

> Python and R both have their own pros and cons. If you can, learn both and use one that is suitable for the task at hand.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: