Introduction to Python for Computational Science and Engineering [pdf]

drej · on Nov 25, 2016

> As Python 2.x is still the default Python on many system and there are a fair number of research codes out there based on Python 2, we will use Python 2.x in this book.

This is so unfortunate. Scientific computing is riddled with technical debt and starting with Python 2 today is fairly irresponsible. If you're already invested in Python 2 and have code/training written up, fine. But if you're learning it just now, as the book's audience obviously is, picking Python 3 should be a no brainer.

danso · on Nov 25, 2016

I'm a hardcore proponent of Python 3 (because it's the only Python I've ever used/learned), but I'm OK with this book accepting the realities and for at least committing to writing 3.x compatible code:

> However, we will write code that is as much as possible in the Python 3 style (and understood by Python 2). The most prominent example is that in Python 2.x, the print command is special where as in Python 3 it is an ordinary function.

I don't specialize in the purported domain of this book, but if the author thinks that `print` will be the most prominent differentiator, then I'm guessing that there aren't a ton of situations where it'll be hard to make his examples 3.x compatible (for this domain, I'm assuming behavior of the division operator will be another prominent, but easy-to-fix difference).

_coldfire · on Nov 25, 2016

Reduce is easy to fix also, but that ignores the entire reasoning for it being removed and only serves to highlight the massive chasm between 2 and 3.

The clean break of 3 and the sanity it brings to the language is undeniable.

I_deny_it · on Nov 26, 2016

> The clean break of 3 and the sanity it brings to the language is undeniable.

Well, I can't deny they broke it, but sanity is pretty deniable. Python has always been a dynamic language, and one of the core mantras was "there should be one obvious way to do it" (in contrast to Perl). All the new type annotation stuff and the multiple ways to handle string formatting are steps in very weird directions. Maybe they should make a Python version 4 to clean some of that mess up.

_coldfire · on Nov 26, 2016

>new type annotation stuff

xrange is now range?

>multiple ways to handle string formatting

This is a problem?

As for sanity: bytes/unicode integer division chained exceptions os.scandir performance

I'm keen for 4 when it comes. Clinging to the past is only going to make things harder.

I_deny_it · on Nov 26, 2016

>> multiple ways to handle string formatting > This is a problem?

Yes, if you think, "There should be one-- and preferably only one --obvious way to do it."

reachtarunhere · on Nov 26, 2016

That would put us in the league of JS folks. New JS every year (month?)

xapata · on Nov 26, 2016

Why do you think `reduce` was moved to functools?

As far as I know, it was simply because few people used it correctly and Guido thought it would help to tuck it out of the way in a module. That doesn't feel like a massive chasm to me. Most of the 2/3 moves and renames were cosmetic.

godelski · on Nov 25, 2016

In a lot of scientific computing backwards compatibility is essential. Knowing Fortran in that world is still helpful, even if you only have a basic understanding it helps with common packages like LAPACK and ARPACK. In science the science is first and coding second. Coding is just the tool used to get the job done. Most people in the community adopted python2 because it was easier to use than C/Fortran in a lot of cases where speed wasn't critical. I can not tell you how much easier it is to read scientific python code than scientific C or Fortran code. So even though a lot of main packages are supported by 3 a lot of university and custom software isn't. Someone hands you code and asks you to improve on it, or make this model work, you don't rewrite the entire thing, you use what already works and build from there.

drej · on Nov 25, 2016

I very much understand the bit that science is first and code second. I live by the very much same principle at work and we actually have a lot of (terrible) Fortran and C/C++ code around. And we're trying to adopt Python for large parts of of it.

But I'm not sure I follow your argument. I'm not talking about rewriting existing code. I'm talking about building new things. And they should be built in Python 3, because that's the standard that's meant to be used and supported in the future. Learning Py3 doesn't mean you can't maintain existing Python 2 code. It just means that whatever new thing you write is in a future proof setting.

(Btw. As much as people hate on Fortran, there's one thing that beats most other languages - take a 20 or 30 year old codebase and you're not unlikely to compile it and run it on a modern machine. We can't count on that with Python, but Py3 gives us a slightly better chance.)

godelski · on Nov 26, 2016

Well I guess the point of my argument boils down to the problem of changing key functions in established code. Even Fortran when they adopted new standards, old stuff works fine, for the most part. The problem with python is more that it got really popular in the scientific community, because it is relatively fast and extremely easy to write in, but some of the packages were slow to adopt. I think this created a weird scenario where I'm not sure if python2 will ever go away. The scientific world fell in love with python because it was like a free MATLAB, but more useful. And by the time 3 came out there was a lot of code developed that wouldn't port. If we get down to it, I think this is more a problem of porting than anything else. The scientific community has already made headway into 2 and I will admit that we are really slow to adopt. Because of this I think 2 will stay for quite some time. Myself, I don't take the time to learn 3 because if I write in it I'd just confuse my team. I literally can't write in 3 because it'd be detrimental to my job. This is definitely true for a lot of the scientific computing world. There is always an inherent danger to adopting new standards to a language. And python shows where it can turn bad. Because it got really popular and even though it isn't that detrimental to normal python users it is a huge difference in the scientific world.

I will mention that as a more middle ground coder and physicist, scientists are horrible programmers. Horrible. In python I don't see a lot of uses of definitions, so there are A LOT of global calls. I don't think I'm a great programmer, but it is definitely a focus on "programming is a tool" in our sphere. So there is no real care about "future proof" or any of that. The real care, especially in academia, is "can I get this done". Really the only people that care about future proof are the national labs that are building libraries to be used in mass, something like PETSc. But you even get problems there because things like ARPACK have issues, because it is dependent upon LAPACAK (many might not know, but there is a compatibility issue there with the newest versions).

I myself get upset about this. But what are you going to do? I don't disagree with the science first coding second attitude. But there needs to be a conscious effort of making things, at least, more future proof. The problem is no one pays us for that. We get paid for results. We do not get paid for verification, mind you that is an EXTREMELY important part of science. And we don't, usually, get paid for software development, at least primarily (we do if it leads to results). So it is convoluted and this turns into a large argument about a lot of things. But it is nowhere near as simple as "python 3 is supposed to be the new standard therefore we should write in it." Being a tool, we will always work off of previous code base, and we will always learn what the writer (or HOPEFULLY someone that knows a semblance of what the code does) tries to do with the code/library.

knlje · on Nov 25, 2016

I'd be happy to hear examples how Python 3 helps us in computational sciences. I've barely even tried it before and have used Python 2 a lot.

drej · on Nov 25, 2016

It's not that it helps in a particular way (though more on that below), it's just that starting with Python 2 now necessarily leads to rewrites later on. While you can see packages not supporting Python 3 (there are fewer and fewer of those), in the future you will see the opposite, some project already announcing end dates for their Py2 support (IPython to give an example).

I have to stress that I used 2to3 to tackle the vast majority of conversion issues. It was on a small codebase and, but still, it worked rather well. While I feel the transition has not gone terribly smoothly, I truly believe Python 3 is a better language - and as I noted in a different thread [0] - the unicode support alone is worth it (for me). While not significantly beneficial in the vast majority of computational sciences, it is helpful in some areas - linguistics to give an example.

[0] https://news.ycombinator.com/item?id=12930793

I_deny_it · on Nov 26, 2016

> It's not that it helps in a particular way (though more on that below), it's just that starting with Python 2 now necessarily leads to rewrites later on. While you can see packages not supporting Python 3 (there are fewer and fewer of those), in the future you will see the opposite, some project already announcing end dates for their Py2 support (IPython to give an example).

So, there's no benefit to Python 3, but we should all migrate to it anyways? You think Python 2.7 will die, but trust me as soon as the PSF abandons it, someone will swoop in and become the new defacto supporter. For many of us, stability is a feature, and the fact that 2.7 won't change in gratuitous ways is super attractive.

sandGorgon · on Nov 26, 2016

exactly. This is what i have always believed. just Google and Dropbox have too much py2 code to drop it or do a wholesale conversion.

Nothing is going to be EOLed. Its going to be business as usual and the python foundation will never agree to killing python 2 in the next decade.

the only way forward is through six ( https://pypi.python.org/pypi/six) or something like it. Its well worth building and funding a python 2 compatibility layer in python3... and then moving to py3 runtime.

im actually surprised that someone like Google is not throwing some funding towards building a compatibility layer.

module0000 · on Nov 25, 2016

> I'd be happy to hear examples how Python 3 helps us in computational sciences

Sir/Madam, I am here to make you happy then. Python 3's multiprocessing library is leaps and bounds "better" than python 2's. By "better", I mean faster(in my workloads) by 30-40%. If time is money, then that feature alone saves you both.

I_deny_it · on Nov 26, 2016

If you really want to run faster, you should try a new language. 30-40% for multiprocessing is nowhere near the benefit you'd get from a single threaded implementation in C, C++, Java, Rust, Ocaml, Go, or even JavaScript or LuaJIT. Those runs between 10 and 1000 times faster (1000% - 100000%) than Python.

dagw · on Nov 26, 2016

Those runs between 10 and 1000 times faster (1000% - 100000%) than Python.

They're faster than pure python in the general case, but once you throw numpy, numba, numexpr and cython into the mix and focus on numeric workloads, most of that difference disappears.

_Wintermute · on Nov 26, 2016

For scientific computing I can't see many people making use of python's multiprocessing library. A lot of the time you're writing scripts that you might run a handful of times at most, or if speed is an issue you write it in c++.

auxym · on Nov 25, 2016

The new matrix multiplication operator in 3.5 is nice syntactic sugar:

  C = A@B

vs

  C = A.dot(B)

I_deny_it · on Nov 26, 2016

This is the only compelling reason I've seen to switch to Python 3.

hermitdev · on Nov 26, 2016

The assertion is also not necessarily accurate. Python 2 & 3 are included on Ubuntu 14.04 (and maybe earlier). My understanding is that different system tools require different versions.

Python 2 is /usr/bin/python Python 3 is /usr/bin/python

If on Windows and you have both python 2.7 and python 3.x installed, just just use the py command to automatically select the correct runtime. I think you may need to specify the shabang for this to work (i.e. "#!/usr/bin/env python" for a 2.x script and "#!/usr/bin/env python3" for a 3.x script).

Despite some recent discussions lamenting slow adoptions, everyone I know and myself write all new code in 3.5.x and only fall back to 2.7 for legacy code or when you need a module that is not yet 3.x compatible. Problem is I work at a small company with about 10 devs. If a 2.x script works, we have zero incentive to port it to 3.x because we have such a huge backlog of work items.

rishabguha · on Nov 25, 2016

IMO, Python 3's support for matrix multiplication using the @ operator is itself worth the cost of admission. Much of technical computing is just implementing algorithms that use linear algebra extensively, and if you're coming from matlab littering your code with dot(dot(X,Y),Z) is a real pain

sandGorgon · on Nov 26, 2016

everyone i know does matrix multiplication using numpy or pandas and conveniently abstracts away the need to do language conversion.

same thing with asyncio vs gevent and tons of otger features. I say this again - there are industrial strength packages and libraries for py2 that gives all the power and convenience that you think you get in py3. Perhaps even better tested and used in production.

stared · on Nov 26, 2016

I find @ nice. But still, even in Python 3 if you use np.matrix (instead of np.array), multiplication works. And if not, then dot-method is way clearer than dot-function.

godelski · on Nov 25, 2016

I actually see np.dot(np.dot(x,y),z) as easier to read and remember. It is more explicit in what it is doing. Being explicit is a good thing.

noobermin · on Nov 25, 2016

   a + 4*(x-y) + b**5

vs

   (+ a (* 4 (- x y )) (expt b 5 ))

Sometimes, as arithmetic with lisp shows, conciseness is better. Also, it isn't being more explicit than it is being more verbose.

kazinator · on Nov 25, 2016

It is abosolutely being more explicit. It's being more explicit that (expt b 5) is a unit that forms a single argument in the + form.

This is implicit in

  a + b ** 5

according to a precedence rule between the and + operator, which is hidden in the parser's implementation and in documentation thereof.

Well, the white-spacing

  a + b**5

suggests it. But the suggestions produced by insignificant whitespace can be mere wishful thinking:

  int* x, y; // two C pointers? not!

Speaking of whitespace, also have the advantage of there being multiple ways to split the expression into multiple lines, all conforming to a very clear, simple formatting rule:

  (+ a
     (* 4 (- x y))
     (expt b 5))

  (+ a
     (* 4
        (- x y))
     (expt b 5))

Fully expanded, every term on separate line:

  (+ a
     (* 4
        (- x
           y))
     (expt b
           5))

In this manner, we can write complex expressions that would be quite unreadable in infix, requiring break-up into intermediate temporaries.

We almost have a circuit diagram now with "gates" for the operations: a three input + gate, etc:

     ____
    /    |- a
    |    |      ____
  --| +  |-----/    |- 4
    |    |     |  * |      ____
    \____|-    |    |-----/    |- x
           `   \____|     | -  |
            |             \____|- y
            |   _____     
            `--/     |- b
               | expt|
               \_____|- 5

dman · on Nov 25, 2016

Does @ work with numpy?

jmbond · on Nov 25, 2016

Yes. See Notes section in the docs for numpy.matmul [1]

[1] https://docs.scipy.org/doc/numpy/reference/generated/numpy.m...

dman · on Nov 25, 2016

Thanks!

stared · on Nov 26, 2016

For me it is one of the signs why the Python 2/3 division was a failure. In data science, everything is Python 2 first, then (usually, but almost always with some delay) ported to Python 3 (think: Spark, TensorFlow, etc).

If there is a new package (say, on HN), very often it does not work (well) on Python 3.

I use Python 3 (usually), but it makes it harder to do with data science, as often I need to overcome some small (but nasty) issues.

I teach Python 3. But not, it is not a no-brainer.

rsfern · on Nov 26, 2016

What kind of issues have you run into with Python 3 that make data science more difficult? I use Python 3 as well, and I don't think I've run into anything that fits your description...

stared · on Nov 26, 2016

Described above (and gave examples of two mainstream libraries; with smaller one its much more serious). If there is a new package, its likely to be either totally not working on Python 3, or be undertested. So there is a delay, before I can use something with Python 3. (Nothing about the Python 3 itself, but about its ecosystem.)

At the same time, I hardy see any strong points, in which Python 3 is much better for data science. (Examples?)

embleton · on Nov 25, 2016

This was an introduction to Python. I went through every example while reading this book and rarely found a point where translating to python 3 was any trouble.

fangohr · on Nov 26, 2016

Hi all, I am the author of the book.

There is a Python 3 version of the book available at http://www.southampton.ac.uk/~fangohr/teaching/python/book.h...

This also includes Jupyter Notebook files for those who want to execute (and play around with) the chapter content interactively.

Thank you for all the comments and discussion.

Hans ([email protected])

stared · on Nov 26, 2016

It looks wonderful! Just yesterday I did a quick workshop introducing neuroscience to scientific Python, in the Jupyter Notebook environment (very rough version here: https://github.com/stared/python-neuroaspects-2016, before updates). I will definitely sent the participants a link to your book.

But one small question: do you plan, by any chance, upload your notebooks to GitHub (or any other place, where one can easily see a rendered version; BTW one more selling point of Jupyter - the easiness to share)?

fangohr · on Nov 27, 2016

Thank you. It is on my todo list.

a3n · on Nov 25, 2016

I'm very sympathetic to "use python2." I still use python2.

But we should all keep in mind the "no shit really this time" EOL for python is 2020. http://legacy.python.org/dev/peps/pep-0373/

Even if you say "well, it'll just be forked," you don't really know how many forks and pain there will be. Maybe python2 will be like LibreOffice, or maybe it will be like OpenOffice.

Like it or not, python3 is the future.

module0000 · on Nov 25, 2016

And the bucket of people who want python2 forked weighed against the bucket of people can can fork/maintain it are not equal.

mynegation · on Nov 25, 2016

This is a very good guide. I thought I knew python ecosystem well and I found something new for myself ('visual' package for 3D illustrations).

I am wondering if there are guides for the "reverse direction": I already know how to program, but I want to learn new scientific domain that is interesting to me: e.g. material science, climate modeling, etc. Something like Rosalind[1] does for bioinformatics.

[1] http://rosalind.info/

godelski · on Nov 25, 2016

Well if you want to learn the science, you can just pick up a science book. I know many users here have good suggestions. But if you are specifically looking for scientific computing those generally talk about code basics. There are also numeric programming books. If you're trying to get into those studies I suggest going through a numerics book because the techniques will be similar among any of the code. But you'll have to spend time learning the non-coding parts as well. In science coding is just a tool.

water42 · on Nov 25, 2016

Woah, Rosalind looks awesome- like project euler for bioinformatics. Thanks for sharing the link.

sjnair96 · on Nov 26, 2016

Thanks for sharing Rosalind! If you (or anyone else) knows similar projects in bioinformatics or any other fields, do let me know!

fangohr · on Dec 1, 2016

Just found a page dedicated to Python 3 for scientists: https://python-3-for-scientists.readthedocs.io/en/latest/

Also an impressive line up of scientific tools that will drop Python 2 support by 2020: http://www.python3statement.org

(And for completeness: the book discussed in this thread is available for Python 3 at http://www.southampton.ac.uk/~fangohr/teaching/python/book.h... )

embleton · on Nov 25, 2016

I just finished working through this book and I really enjoyed it. I went from next to no Python knowledge to writing programs to analyze raw vibration data in a few days. I had previous experience with Matlab and this book was very useful in bridging the gap between the two systems.

Normal_gaussian · on Nov 25, 2016

I'm dubious about the worth of this book as I lived with several soton Engineering students as they went through the associated course (whilst I did CS as part of the other 'Engineering' faculty, ECS), though a lot may have been due to lack of engagement with this particular course (copying and memorisation were sufficient to get you through it). I was always so frustrated that my friends couldn't benefit from some of the great lecturers in my own faculty.

That said I know a few Engineering students there now who are awesome programmers (though Aero Engineers at heart, so I can't hire them :( ).

dschiptsov · on Nov 25, 2016

Python3 as default dialect!

dschiptsov · on Nov 26, 2016

What exactly is wrong with this particular comment?)

anc84 · on Nov 26, 2016

It does not contribute anything meaningful to the discussion on the submitted link.

dschiptsov · on Nov 26, 2016

It does. It emphasizes that this particular guide encourages use of Python3, instead of holding onto legacy, which means that the author understands programming languages basics or at lest knows which semantic unification has been done to make the language less inconsistent, hence more beautiful.