> As Python 2.x is still the default Python on many system and there are a fair number of research codes out there based on Python 2, we will use Python 2.x in this book.
This is so unfortunate. Scientific computing is riddled with technical debt and starting with Python 2 today is fairly irresponsible. If you're already invested in Python 2 and have code/training written up, fine. But if you're learning it just now, as the book's audience obviously is, picking Python 3 should be a no brainer.
I'm a hardcore proponent of Python 3 (because it's the only Python I've ever used/learned), but I'm OK with this book accepting the realities and for at least committing to writing 3.x compatible code:
> However, we will write code that is as much as possible in the Python 3 style (and understood by Python 2). The most prominent example is that in Python 2.x, the print command is special where as in Python 3 it is an ordinary function.
I don't specialize in the purported domain of this book, but if the author thinks that `print` will be the most prominent differentiator, then I'm guessing that there aren't a ton of situations where it'll be hard to make his examples 3.x compatible (for this domain, I'm assuming behavior of the division operator will be another prominent, but easy-to-fix difference).
> The clean break of 3 and the sanity it brings to the language is undeniable.
Well, I can't deny they broke it, but sanity is pretty deniable. Python has always been a dynamic language, and one of the core mantras was "there should be one obvious way to do it" (in contrast to Perl). All the new type annotation stuff and the multiple ways to handle string formatting are steps in very weird directions. Maybe they should make a Python version 4 to clean some of that mess up.
As far as I know, it was simply because few people used it correctly and Guido thought it would help to tuck it out of the way in a module. That doesn't feel like a massive chasm to me. Most of the 2/3 moves and renames were cosmetic.
In a lot of scientific computing backwards compatibility is essential. Knowing Fortran in that world is still helpful, even if you only have a basic understanding it helps with common packages like LAPACK and ARPACK. In science the science is first and coding second. Coding is just the tool used to get the job done. Most people in the community adopted python2 because it was easier to use than C/Fortran in a lot of cases where speed wasn't critical. I can not tell you how much easier it is to read scientific python code than scientific C or Fortran code. So even though a lot of main packages are supported by 3 a lot of university and custom software isn't. Someone hands you code and asks you to improve on it, or make this model work, you don't rewrite the entire thing, you use what already works and build from there.
I very much understand the bit that science is first and code second. I live by the very much same principle at work and we actually have a lot of (terrible) Fortran and C/C++ code around. And we're trying to adopt Python for large parts of of it.
But I'm not sure I follow your argument. I'm not talking about rewriting existing code. I'm talking about building new things. And they should be built in Python 3, because that's the standard that's meant to be used and supported in the future. Learning Py3 doesn't mean you can't maintain existing Python 2 code. It just means that whatever new thing you write is in a future proof setting.
(Btw. As much as people hate on Fortran, there's one thing that beats most other languages - take a 20 or 30 year old codebase and you're not unlikely to compile it and run it on a modern machine. We can't count on that with Python, but Py3 gives us a slightly better chance.)
Well I guess the point of my argument boils down to the problem of changing key functions in established code. Even Fortran when they adopted new standards, old stuff works fine, for the most part. The problem with python is more that it got really popular in the scientific community, because it is relatively fast and extremely easy to write in, but some of the packages were slow to adopt. I think this created a weird scenario where I'm not sure if python2 will ever go away. The scientific world fell in love with python because it was like a free MATLAB, but more useful. And by the time 3 came out there was a lot of code developed that wouldn't port. If we get down to it, I think this is more a problem of porting than anything else. The scientific community has already made headway into 2 and I will admit that we are really slow to adopt. Because of this I think 2 will stay for quite some time. Myself, I don't take the time to learn 3 because if I write in it I'd just confuse my team. I literally can't write in 3 because it'd be detrimental to my job. This is definitely true for a lot of the scientific computing world. There is always an inherent danger to adopting new standards to a language. And python shows where it can turn bad. Because it got really popular and even though it isn't that detrimental to normal python users it is a huge difference in the scientific world.
I will mention that as a more middle ground coder and physicist, scientists are horrible programmers. Horrible. In python I don't see a lot of uses of definitions, so there are A LOT of global calls. I don't think I'm a great programmer, but it is definitely a focus on "programming is a tool" in our sphere. So there is no real care about "future proof" or any of that. The real care, especially in academia, is "can I get this done". Really the only people that care about future proof are the national labs that are building libraries to be used in mass, something like PETSc. But you even get problems there because things like ARPACK have issues, because it is dependent upon LAPACAK (many might not know, but there is a compatibility issue there with the newest versions).
I myself get upset about this. But what are you going to do? I don't disagree with the science first coding second attitude. But there needs to be a conscious effort of making things, at least, more future proof. The problem is no one pays us for that. We get paid for results. We do not get paid for verification, mind you that is an EXTREMELY important part of science. And we don't, usually, get paid for software development, at least primarily (we do if it leads to results). So it is convoluted and this turns into a large argument about a lot of things. But it is nowhere near as simple as "python 3 is supposed to be the new standard therefore we should write in it." Being a tool, we will always work off of previous code base, and we will always learn what the writer (or HOPEFULLY someone that knows a semblance of what the code does) tries to do with the code/library.
It's not that it helps in a particular way (though more on that below), it's just that starting with Python 2 now necessarily leads to rewrites later on. While you can see packages not supporting Python 3 (there are fewer and fewer of those), in the future you will see the opposite, some project already announcing end dates for their Py2 support (IPython to give an example).
I have to stress that I used 2to3 to tackle the vast majority of conversion issues. It was on a small codebase and, but still, it worked rather well. While I feel the transition has not gone terribly smoothly, I truly believe Python 3 is a better language - and as I noted in a different thread [0] - the unicode support alone is worth it (for me). While not significantly beneficial in the vast majority of computational sciences, it is helpful in some areas - linguistics to give an example.
> It's not that it helps in a particular way (though more on that below), it's just that starting with Python 2 now necessarily leads to rewrites later on. While you can see packages not supporting Python 3 (there are fewer and fewer of those), in the future you will see the opposite, some project already announcing end dates for their Py2 support (IPython to give an example).
So, there's no benefit to Python 3, but we should all migrate to it anyways? You think Python 2.7 will die, but trust me as soon as the PSF abandons it, someone will swoop in and become the new defacto supporter. For many of us, stability is a feature, and the fact that 2.7 won't change in gratuitous ways is super attractive.
exactly. This is what i have always believed. just Google and Dropbox have too much py2 code to drop it or do a wholesale conversion.
Nothing is going to be EOLed. Its going to be business as usual and the python foundation will never agree to killing python 2 in the next decade.
the only way forward is through six ( https://pypi.python.org/pypi/six) or something like it. Its well worth building and funding a python 2 compatibility layer in python3... and then moving to py3 runtime.
im actually surprised that someone like Google is not throwing some funding towards building a compatibility layer.
> I'd be happy to hear examples how Python 3 helps us in computational sciences
Sir/Madam, I am here to make you happy then. Python 3's multiprocessing library is leaps and bounds "better" than python 2's. By "better", I mean faster(in my workloads) by 30-40%. If time is money, then that feature alone saves you both.
If you really want to run faster, you should try a new language. 30-40% for multiprocessing is nowhere near the benefit you'd get from a single threaded implementation in C, C++, Java, Rust, Ocaml, Go, or even JavaScript or LuaJIT. Those runs between 10 and 1000 times faster (1000% - 100000%) than Python.
Those runs between 10 and 1000 times faster (1000% - 100000%) than Python.
They're faster than pure python in the general case, but once you throw numpy, numba, numexpr and cython into the mix and focus on numeric workloads, most of that difference disappears.
For scientific computing I can't see many people making use of python's multiprocessing library. A lot of the time you're writing scripts that you might run a handful of times at most, or if speed is an issue you write it in c++.
The assertion is also not necessarily accurate. Python 2 & 3 are included on Ubuntu 14.04 (and maybe earlier). My understanding is that different system tools require different versions.
Python 2 is /usr/bin/python
Python 3 is /usr/bin/python
If on Windows and you have both python 2.7 and python 3.x installed, just just use the py command to automatically select the correct runtime. I think you may need to specify the shabang for this to work (i.e. "#!/usr/bin/env python" for a 2.x script and "#!/usr/bin/env python3" for a 3.x script).
Despite some recent discussions lamenting slow adoptions, everyone I know and myself write all new code in 3.5.x and only fall back to 2.7 for legacy code or when you need a module that is not yet 3.x compatible. Problem is I work at a small company with about 10 devs. If a 2.x script works, we have zero incentive to port it to 3.x because we have such a huge backlog of work items.
IMO, Python 3's support for matrix multiplication using the @ operator is itself worth the cost of admission. Much of technical computing is just implementing algorithms that use linear algebra extensively, and if you're coming from matlab littering your code with dot(dot(X,Y),Z) is a real pain
everyone i know does matrix multiplication using numpy or pandas and conveniently abstracts away the need to do language conversion.
same thing with asyncio vs gevent and tons of otger features. I say this again - there are industrial strength packages and libraries for py2 that gives all the power and convenience that you think you get in py3. Perhaps even better tested and used in production.
I find @ nice. But still, even in Python 3 if you use np.matrix (instead of np.array), multiplication works. And if not, then dot-method is way clearer than dot-function.
It is abosolutely being more explicit. It's being more explicit that (expt b 5) is a unit that forms a single argument in the + form.
This is implicit in
a + b ** 5
according to a precedence rule between the and + operator, which is hidden in the parser's implementation and in documentation thereof.
Well, the white-spacing
a + b**5
suggests it. But the suggestions produced by insignificant whitespace can be mere wishful thinking:
int* x, y; // two C pointers? not!
Speaking of whitespace, also have the advantage of there being multiple ways to split the expression into multiple lines, all conforming to a very clear, simple formatting rule:
(+ a
(* 4 (- x y))
(expt b 5))
(+ a
(* 4
(- x y))
(expt b 5))
Fully expanded, every term on separate line:
(+ a
(* 4
(- x
y))
(expt b
5))
In this manner, we can write complex expressions that would be quite unreadable in infix, requiring break-up into intermediate temporaries.
We almost have a circuit diagram now with "gates" for the operations: a three input + gate, etc:
____
/ |- a
| | ____
--| + |-----/ |- 4
| | | * | ____
\____|- | |-----/ |- x
` \____| | - |
| \____|- y
| _____
`--/ |- b
| expt|
\_____|- 5
For me it is one of the signs why the Python 2/3 division was a failure. In data science, everything is Python 2 first, then (usually, but almost always with some delay) ported to Python 3 (think: Spark, TensorFlow, etc).
If there is a new package (say, on HN), very often it does not work (well) on Python 3.
I use Python 3 (usually), but it makes it harder to do with data science, as often I need to overcome some small (but nasty) issues.
I teach Python 3. But not, it is not a no-brainer.
What kind of issues have you run into with Python 3 that make data science more difficult? I use Python 3 as well, and I don't think I've run into anything that fits your description...
Described above (and gave examples of two mainstream libraries; with smaller one its much more serious). If there is a new package, its likely to be either totally not working on Python 3, or be undertested. So there is a delay, before I can use something with Python 3. (Nothing about the Python 3 itself, but about its ecosystem.)
At the same time, I hardy see any strong points, in which Python 3 is much better for data science. (Examples?)
This was an introduction to Python. I went through every example while reading this book and rarely found a point where translating to python 3 was any trouble.
It looks wonderful! Just yesterday I did a quick workshop introducing neuroscience to scientific Python, in the Jupyter Notebook environment (very rough version here: https://github.com/stared/python-neuroaspects-2016, before updates). I will definitely sent the participants a link to your book.
But one small question: do you plan, by any chance, upload your notebooks to GitHub (or any other place, where one can easily see a rendered version; BTW one more selling point of Jupyter - the easiness to share)?
Even if you say "well, it'll just be forked," you don't really know how many forks and pain there will be. Maybe python2 will be like LibreOffice, or maybe it will be like OpenOffice.
This is a very good guide. I thought I knew python ecosystem well and I found something new for myself ('visual' package for 3D illustrations).
I am wondering if there are guides for the "reverse direction": I already know how to program, but I want to learn new scientific domain that is interesting to me: e.g. material science, climate modeling, etc. Something like Rosalind[1] does for bioinformatics.
Well if you want to learn the science, you can just pick up a science book. I know many users here have good suggestions. But if you are specifically looking for scientific computing those generally talk about code basics. There are also numeric programming books. If you're trying to get into those studies I suggest going through a numerics book because the techniques will be similar among any of the code. But you'll have to spend time learning the non-coding parts as well. In science coding is just a tool.
I just finished working through this book and I really enjoyed it. I went from next to no Python knowledge to writing programs to analyze raw vibration data in a few days. I had previous experience with Matlab and this book was very useful in bridging the gap between the two systems.
I'm dubious about the worth of this book as I lived with several soton Engineering students as they went through the associated course (whilst I did CS as part of the other 'Engineering' faculty, ECS), though a lot may have been due to lack of engagement with this particular course (copying and memorisation were sufficient to get you through it). I was always so frustrated that my friends couldn't benefit from some of the great lecturers in my own faculty.
That said I know a few Engineering students there now who are awesome programmers (though Aero Engineers at heart, so I can't hire them :( ).
It does. It emphasizes that this particular guide encourages use of Python3, instead of holding onto legacy, which means that the author understands programming languages basics or at lest knows which semantic unification has been done to make the language less inconsistent, hence more beautiful.
This is so unfortunate. Scientific computing is riddled with technical debt and starting with Python 2 today is fairly irresponsible. If you're already invested in Python 2 and have code/training written up, fine. But if you're learning it just now, as the book's audience obviously is, picking Python 3 should be a no brainer.