I find it very telling that both memory hotspots were explicit loops, and the fi...

makmanalp · on June 1, 2017

That's an interesting point - there are competing factors here. The list comprehension is usually a bit more efficient due to not having to look up the local variable for the list on every iteration, I think. Take a look at this (albeit a bit old) article:

https://wiki.python.org/moin/PythonSpeed/PerformanceTips#Loo...

I imagine that lookup overhead would happen on every iteration, but the list add only happens when the conditional evaluates to true - so it seems a bit less likely that the append is the overhead, but maybe the cost of allocating memory is so bad (it is a syscall after all) that it compensates for the infrequency.

Generators would help with this too: instead of materializing and appending the list at the end every time (i.e. [2] + [x for x in s if x]), you could do something like itertools.chain([2], (x for x in s if x)). Then you avoid ever allocating a massive list -- provided that the rest of your code avoids that too.

falcolas · on June 1, 2017

Well, the fact that they're running it in PyPy makes all of this that much more challenging to debug - you'd have to look at the runtime-generated bytecode to identify the actual causes.

lilbobbytables · on June 1, 2017

Localizing append helps, but that's not all of it:

Building a list of 10,000,000 items:

List comprehension: 599ms For loop: 930ms For loop with append localized: 736ms

dom0 · on June 2, 2017

Comprehensions have their own opcodes which directly call e.g. PyList_Append with no further checks or lookups in each iteration.