I am a PhD in CS, specifically in Programming Languages and Parallel Programming...

matt4077 · on Aug 24, 2016

You're painting an awfully dark picture of scientists' skills. Having been on both sides, I believe the deciding factor is simply the availability of libraries.

If you're doing web development you have an insane amount of languages to chose from because after String, Array, and File are implemented, HTTP is next. Having done a bit of web development, I'd also say a typical project only uses a subset of libraries that is surprisingly small.

Scientific computing is quite different: a paper in structural biology (my former stomping grounds) can easily require a few dozen algorithms that each once filled a 10-page paper. These could easily be packaged as libraries, but it's a niche so it rarely happens. Newer language quite often don't even have a robust numeric library. Leave the beaten tracks and your workload just increased by a magnitude.

That's also why science, unlike "general purpose" programming, often uses a workflow that connects five or more languages or so: a java GUI, python for network/string/fileIO, maybe R for larger computations, all held together by a (typically too long) shell script.

But these workflows are getting better. There's a build tool that formalizes the pipeline somewhat (I forgot the name) and APIs are surprisingly common. The reason why csv will never die is that the data fetched from APIs is usually more static than it is in a typical web app (-> local cache needed) and that scientists often work with data that just isn't a good fit for a database. Postgres just doesn't offer anything that enriches a 15MB gene sequence.

pjmlp · on Aug 24, 2016

I worked in the academia for a few years about a decade ago and nowadays interact with biology research in the industry for the last couple of years.

The way he painted the scientists skills matches my experience thus far.

dagss · on Aug 24, 2016

Yes, scientists programming skills (as averaged over population) suck. Factor 1: Programming not credited in itself or reviewed in publishing process. Factor 2: Often little education in or focus on programming, relative to wall clock time spent doing it.

But I don't think that is only fixed by more education and making scientists behave more like programmers. I think that to change things one also needs far better alternatives than the options available today, so that people are really encouraged to switch. Somehow, these must be written by people who know their CS and can write compilers, yet engage with the why scientific computing is a mess on the tool side too, not dismiss it as laziness.

I started out as a programmer, I have contributed to Cython, past two years have been pure web development in a startup. So I know very well why MATLAB sucks. Yet, the best tool I have found myself for doing numerical computing is a cobbled mess of Fortran, pure C, C/assembly code generated by Python/Jinja templates, Python/NumPy/Theano...

The scientific Python community and Julia community has been making great progress, but oh how far there is left to go.

pjmlp · on Aug 24, 2016

I agree, this is also one of the things that drives me against C and more into saner programming languages.

Because the majority of programmers in areas where software isn't the core product being sold, don't spend one second thinking about code quality.

As such tooling that on one side is more forgiving while allowing for fast prototyping, but at the same time enforces some kind of guidelines is probably the way to improve the current workflows.

GFK_of_xmaspast · on Aug 24, 2016

I've spent large parts of my career floating on the edges of academia and have had to interface with code written by academics many times, and : oh jesus it's almost always a huge mess.