The sponsoring organization is is NNSA rather than NASA. NNSA is the National Nuclear Security Administration and hence the association with Lawrence Livermore Labs.
I work for NNSA actually. There's a ton of codes in the nuclear engineering sector that are written in Fortran and it's the lingua franca of the industry, though C++ and Python are taking over for new projects. One of the biggest codes in the industry (MCNP) is an enormous chunk of Fortran with a lineage going back over 60 years.
It'd be nice to see an LLVM frontend for Fortran; Intel has a stranglehold with ifort due to their excellent vectorization. Hopefully this project can provoke some competition.
NASA engineer here. Title typo aside, that actually sounds a lot like our legacy code-base. Though Python has been making a lot of inroads, thanks partly to f2py
I weep to think of of all the cycles wasted on expensive tax payer funded hardware due to using Python over much faster Intel Fortran. We are talking like orders of magnitude.
I once ported a fellow undergrads astronomy program from Python to C+CUDA and it ran on their personal workstation in less than a day, when it had been using the department cluster for a week before :/
People often hate on Fortran because its old and creeky when it often as fast or faster than C...
Python isn't necessarily replacing the hardcore number crunching code (at least in my industry). It's more becoming the glue code to wrap those libraries and make them more accessible. That and data processing. I don't see people running massive simulations on clusters using mpi4py or whatever (thank god).
Even then, in a lot of cases programmer time is infinitely more expensive than CPU time, so it often still makes sense if Python is the more accessible choice.
>Even then, in a lot of cases programmer time is infinitely more expensive than CPU time, so it often still makes sense if Python is the more accessible choice.
I think this is generally true outside of scientific computing, but often not the case here. Esp. if you consider the low salaries of scientists in the public service compared to private industry.
Even in the private sector I have routinely made things an order of mag. faster saving the need to massively scale out. People also forget about total cost of ownership, 10 servers costs much less than 10x what 100 servers require operate when you consider cooling, networking, power, part replacement etc.
It's true in scientific computing too... there's plenty of times when a quick Python script that takes 10 minutes to bang out is better than writing a chunk of C++ in an hour. The lost time isn't necessarily in the salary of the scientist, it's in the time spent. A lot of the scientists working in the big government research labs have one-of-a-kind knowledge and their time is at a premium in terms of other stuff they could be working on.
But I agree, if you're writing a massively parallel simulation in Python and running it on a supercomputer it's a net waste of money. There's a big trend in the government projects to build libraries like Trilinos or MOOSE, which work kind of along the same philosophy as NumPy - let specialists implement the parts that really matter (in terms of computational expense) in a compiled language and let the scientists glue together those parts in a higher level language. It's a good strategy IMO, especially for technologies like CUDA where there's a huge benefit but also a steep learning curve.
I use MCNP very often. I'm curious if there are any goals or plans to transition it to something more maintainable? I once attempted to make my own modification regarding outputting more information but could not grok any of that code .. It looked completely alien coming from a C++ background.
MCNP is a very typical old-school Fortran code. It takes practice to get a feel for reading that stuff, but it has its own beauty once you get the hang of it.
SCALE is mid-way into migrating to C++. AFAIK though MCNP isn't going to change.
Yup. I'm working with weather code myself; in my code a matrix mult is ~25% faster on intel versus gfortran due to this. (edit: this is still within the range of "viable competition" to me though).
GFortran maintainer here. Please don't judge GFortran by the speed of the MATMUL intrinsic. The matmul implementation is really a bare-bones unoptimized version. If you want performance use the -fexternal-blas option and link in a tuned BLAS library such as MKL or OpenBLAS. The exception is if you have very small fixed-size matrices, then the code is inlined and should be faster than BLAS.
Thanks! I should mention that these tests were for a specific (highly tuned for the specific job) production HPC machine, so I don't know how this compares to a more generic setup - unfair of me to generalize the results to every matrix multiplication. We developed off the production machine in GFortran, and it worked very very well for us - didn't do timing tests there, but anecdotally, in development the difference was unnoticeable.
Oh cool! Thanks for your time maintaining Gfortran. I know I said Intel has a stranglehold, but personally I prefer using Gfortran where I can as I prefer open source. So thanks!
Actually, Intel's BLAS (in MKL) is probably an even better product. [On Intel hardware] MKL makes an enormous, borderline hard-to-believe magic level improvement even over the other tuned BLAS libraries like ATLAS.
Gfortran is OK and ifort isn't insurmountably more advanced, but Intel definitely has the mindshare in the HPC market locked down and they do a lot to promote their technologies in that community. That is what really gives them the vice grip on the market. IBM is really the only other company with the same cachet in that market.
Any pointers you can give to resources on this please?
I'm dabbling in satellite image processing and have it working in Python (numpy et al) on a single core, but want to try splitting the work across multiple cores as 80 minutes per image is a long time to wait!
Need more details on what you're doing, but many simple operations used in satellite image analysis are embarrassingly parallel. You could, for example, read in chunks or strips using GDAL, distribute them across multiple threads or cores, and get some speed-up. If much of the time is spent in numpy rather than Python, I think it should be non-blocking so the GIL won't get in your way.
It's a friend's research project I'm tinkering with, so if I don't sound like I know what I'm doing I don't... :)
The first step is masking out clouds (morphological erosion + more), and then applying a haze optimized transform to the image data. I'm confused about making code parallelisable when the transform of a pixel relies on neighbouring pixels - how do you remove boundary errors (take extra pixels, then crop off each strip before recombining I guess)?
There must be something in-place to deal with boundary errors round the edge of the image, but so far I haven't grasped it!
Numpy, Scipy, and particularly scikit-image are where the time is spent - my dabbling was originally to try and speed things up, but I've also become interested in understanding what is going on.
Yeah, when I need neighbouring pixels I extract overlapping chunks. As far as the edge of the scene - there aren't any perfect solutions that I know of. You could try to fudge it by modifying the algorithm to handle the edges with fewer pixels, or perhaps use data from different scenes as an approximation. Or you may simply lose the data there.
This is where you get into the nitty-gritty (and stupidly difficult) type of parallel algorithms. When you have e.g. cells that are dependent on their neighbors you can no longer use the trivial divide-and-conquer strategy.
Usually the strategy is to break the domain into chunks, with each chunk being multiple cells. Inside the chunk, the processor already has the data for adjacent cells so it can just read it. On the boundaries where some of the adjacent cells are in another chunk and on a different processor, it has to ask for the data... the vast majority of scientific codes here use OpenMP (not so bad, shared memory) or MPI (absolutely miserable to use IMO, but powerful distributed memory) to communicate. There's other tools for the job that are nicer than MPI, but they don't get much attention from HPC people (who tend to be very conservative with new technology, especially "high-level" abstractions).
Inter-processor communication is orders of magnitude slower than querying local memory (especially if that processor is say on another machine), so on top of that there's lots of clever methods of what is essentially figuring out how to slice up the problem domain; examples include Red-Black schemes, domain decomposition and Wavefront propagation.
To answer your question specifically on boundary conditions (and here I'm talking about the edges of the whole image where there's literally nothing in some cells nearby, not boundaries between partitioned regions like I was above)... you just have to choose something. There are three common approaches, which I'll illustrate by imagining you're doing a blur that decides how to blur a pixel by looking at the 8 pixels that surround it and that you're on the far right edge of the image (so the right three neighboring pixels would lie outside the image):
1. Zero - treat the three pixels as black/zero when calculating. Equivalently, treat them as whatever arbitrary value you find works best (e.g. white/1).
2. Reflective - treat the three pixels as whatever value the pixels on the left side at the appropriate positions would be. That is to say, wrap around to the other side.
3. Symmetric - treat the right three pixels as if they were the same as the left three (or the center three if you like).
This is a common problem also when solving something like a differential equation on a discretized spatial grid. In this case, the edge values are determined based on your boundary conditions- method 1 corresponds to vacuum boundaries, 2. corresponds to periodic boundaries and 3. is a specific type of extrapolated boundary.
Referring to Fortran programs as "codes" seems to be a common, possibly generational, way of talking about old software in the aerospace industry. I used to hear it around the office a lot. Some of "codes" date back to the late 1950s. It would be interesting to know more about how the language we use to speak about software has changed over time if anyone feels like sharing.
Not sure it's generational as I'm under 30 so I'm not that old :\.
As far as I hear it used, "code(s)" is the standard term for what is loosely any large simulation package amongst people in the numerical community (which includes mathematicians and computer scientists alike, so it's not just an engineering thing).
The other fun one is people who refer to the input files for the simulation packages as "input decks" (aka, punched card decks). I occasionally get a funny look when talking to someone from outside the computational science community use that term. And I'm in my 30s - punched cards were history by the time I started using computers myself, so this kind of terminology isn't really generational - it seems to come from specific communities.
And "tty" = "teletype terminal". We use "glass ttys", which emulate the older mechanical ones.
My favorite old term, which is still used at times, is 'false drop'. (Eg, see http://www.researchinformation.info/features/feature.php?fea... .) The term comes from the days of manual punched cards, using edge-notched cards, where you would stick a needle through a given position deck of cards and lift. The cards with the notch at that position would fall out. Those with a hole would stay. Hence, the cards you want "fall out".
But if you used multi-punch encoding (eg, encode two rare terms to a single location) in order to decrease the number of holes required, then you might get some "false drops".
Nowadays we mostly say 'false positives' for this case.
I once heard one of the guys from Cray give a presentation where he told a story about playing a prank on one of the Army research guys. He said it was right after they had installed a shiny new (washing machine sized) hard drive. They had already duplicated his (many thousands of punch cards long) program onto the drive.
They had a glass room overlooking the actual computer room where you'd go after handing one of the techs your cards to be entered. They brought the guy up there and he was waiting after handing off his cards and they had pre-arranged with the technician to trip and dump the cards everywhere. He said the Army guy turned white as a sheet of paper and looked like he was going to cry (this would have necessitated many hours putting the cards back in order). Then he told the guy that it was fine and they already had a copy loaded in on the drive.
The usual solution was to take a marker and swipe the deck along the top side, in a diagonal, so they are much easier to sort.
You might recall that Fortran ignored everything beyond column 72. This was where you put the 'sequence number', aka line number, so when you drop the deck you stick the cards the sorting machine and voila, it's ordered again.
> You may have heard the story of the operator who dropped a whole box of cards. Wanting to put things right as quickly as possible, he sorted the cards, without consulting the user. As it turned out, that was the worst possible response. Up until that point, the box had contained a sample of random numbers.
–Ted Powell, Dec 2006
If this is true I can also compile scientific software with emscripten to Javascript, which is then able to compile the Fortran libraries BLAS and LAPACK.
I like to have one click live demos of such software. Speed and memory doesn't matter for demos.
Yes, by using f2c. For BLAS and LAPACK, this is a big effort and doesn't run out of the box. The cblas libraries are too old and don't contain new functions.
LLVM has a huge momentum now and it's negatively affecting GCC. GCC has been in trouble before but they got out of it by being the best free compiler available. I don't know how much longer they'll be able to sustain without making some changes.
> Between 2000 and 2004, [g95 fortran] front-end was coupled to the rest of the infrastructure of the GNU Compiler Collection. This was not trivial (just as it will not be trivial to couple the PGI front-end to the LLVM infrastructure).
They really don't get that LLVM's modularity is a strength...
I'm not at all familiar with Fortran. I was under impression that its old language living in legacy. I believe there are other good alternatives, why still use Fortran for new code?
It's still faster than the alternatives. Things like Scipy/Numpy are more alternatives to Matlab as is R. Julia isn't fast enough yet. C/C++ doesn't have the built-in matrix/vector support. The latter is what I ran into when I was coding for my Master's thesis, so I just gave up and went with Fortran. For my PhD, I used Matlab as the speed wasn't an issue.
There's also issues with wanting to use the old Fortran code in new projects and if the new alternatives don't have a good interface for Fortran, it's a bit of a pain to integrate.
At least for me, Julia's fast enough in most cases. I switched from C/C++ to Julia for all of my scientific work, and it runs fine, even on a supercomputer like Titan (barring something that requires thousands of nodes at once — MPI is not yet integrated into Julia). You just have to be careful in the tight inner loops and check the assembly that it outputs using code_native().
Granted, there's plenty of fast libraries written in C/Fortran that don't have a Julia equivalent yet, and depending on the overhead of ccall(), you may wish to stick with writing the rest of your code in the language of the library.
Good to know. I've been looking at it, but not seriously. It's been more keeping an eye on it. I was hoping that Fortress would go somewhere, but it died.
My other problem is code generation. I can't generate Julia code yet.
No, I'm talking about generating Julia code from elsewhere. I'll derive some math in another program (e.g., Maple, Mathematica, SymPy) and output code that represents that math. I can't do that with Julia right now.
I see. Indeed, Julia's ecosystem is still quite young. (possibly of interest: a project building a Julia-integrated CAS called Nemo, at http://nemocas.org/)
True, but the problem I had (which I will admit was back in '99) is that there's no standard one. So, if you need to solve a system of ODEs and a linear system, you're invariably needing to convert between the two.
This is still true. This library uses Eigen, that one uses Boost, the other one uses a home grown one. And then you want to use another library, it uses Eigen (yay!), but an old version of Eigen and it won't compile against the current version.
C++ really crippled itself in this regard by never making a standard. I never bought Stroustrup's claims of "but you can write your own" for these reasons.
At least from one report I read sometime ago, scientists in Los Alamos National Lab were looking at alternatives to Fortran for their future generations of climate models. C/C++ is under their radar. One reason for the switch is to better deal with unstructured grids. They are moving away from the traditional rectilinear grids. This is more in line with many FEM applications where C++ dominates.
Last I checked, some things that can be vectorized, but aren't in LAPACK/BLAS weren't particularly fast. There's even one example from a few years ago on Stack Overflow where it was at least an order of magnitude slower than Octave.
FORTRAN doesn't share the mess that is aliasing in C/C++, which makes it much easier to optimize for numerical-heavy code, so your linear algebra libraries are typically all written in FORTRAN.
That said, I believe most of the post-MPI HPC frameworks are based on C/C++ with little support for FORTRAN.
Pretty sure a lot of FOTRAN code are compiled or wrapped around with C interface when I install numpy/scipy. I just don't remember which module to be exact.
Not just LA, pretty much all numerical methods. Fortran is still a very natural and surprisingly pleasant language for numerics these days. It's just terrible at everything else... which is why it doesn't get much attention in the broader programming community I think.
Fortran is in a somewhat unique niche- basically anybody who uses a computer uses a library written in Fortran (via BLAS) but very few people are aware of it, especially that it has continued to develop and is actively in usd. There are plenty of places (mostly related to parallel evaluation) where Fortran is state of the art. But speaking as someone mildly competent in it, it isn't something everybody needs to care about. It has its niche and is pretty awful at everything else, best left to its little island of programmers. Haskell is my favorite language and C++ or Python are what I use in a professional setting, but I won't hesitate to break out the good old `IMPLICIT NONE` if it is the right tool.
+1. The extent of my use of LA in Fortran is pretty much BLAS/LAPACK. But for pretty much everything else numerical it still wins handily, at least for my uses (numerical PDEs, FFTs).
It likely seems niche to most programmers, but array slicing is such a godsend to certain areas of software that there really is no more natural language to write a code in. There is Matlab, but its purpose is for prototyping. In fact it is a major credit to Matlab that the conversion is very natural.
This niche is why Fortran exists and has such a strong following. And the more modern features are like candy sprinkled on the core features. I could write a sonnet about modern fortran.
It's not your happiness that makes me cringe but the borderline cultish comment that there's no "single place on the internet as valuable as this community". Just reinforces the stereotype that this community is full of itself. It would be great if we could all be a bit more grounded.
What on earth are you talking about? I have literally never ever expressed my happiness over this community before. But just because i mention something once after having been here for i don't know how many years I am suddenly cultish?
It sounds more like a cultish anti-sentiment if anything.
Or, alternatively, Nvidia is doing this because their current CUDA compiler works with LLVM. And a possible reason that they built their CUDA infrastructure on LLVM is because the people making GCC privileged ideology over technical merit and made it much harder for tools (even open source or libre ones) to integrate with the compiler. Remember that this was a predictable outcome when you're inevitably burned by this.
Doing a rewrite seems to me to always result in a cleaner codebase faster than the incremental refactoring. That said, GCC today is a lot easier to interop with than the GCC of pre-LLVM times.
But, no amount of technical merit can save you when you don't have a free compiler anymore. Don't say Stallman didn't warn you, because he did, just like he warned you about all the other crap.
Remember, when you're getting burned by this, that you chose to make snarky comments on the Internet instead of realizing what the moral path forward was and doing the ethical thing.
GCC includes a Java frontend too, but do any serious Java shops use it? It's not just about having a feature, it's about implementing that feature well and making it easy for others to contribute their improvements.