Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Python: Overlooked core functionalities (erikvandeven.medium.com)
237 points by erikvdven on July 24, 2023 | hide | past | favorite | 181 comments


I'm annoyed at the reason that any/all have to be on this list. If they (and map, filter, …) were methods, you could just write `foo.` and your IDE could show you what methods are available. Postfix would make things easier to read too:

    bar.baz()\
       .filter(some_filter)\
       .map(some_op)\
       .min()\
       .foo()
Data/control flows from top to bottom. One operation per line. But with freestanding functions:

    min(map(some_op, filter(some_filter, bar.baz()))).foo()
To follow the flow of data/control, you start in the middle, go right, then skip left to filter, read rightwards to see which filter, skip left to map, read rightwards to see what map, go left to min, then skip all the way to the right. Just splitting it into multiple lines doesn't help, you need to introduce intermediate variables (and make sure they don't clobber any existing ones) and repeat yourself whether they clarify things or not. The same issue exists for list/dict/set comprehensions.


Here you go:

    class WrappedList:
        _fns = [map, filter, min, max, all, any, len, list]

        def __init__(self, it):
            self.it = it

        def __getattr__(self, name):
            for fn in self._fns:
                if name == fn.__name__:
                    def m(*args, **kwargs):
                        result = fn(*args, self.it, **kwargs)
                        if hasattr(result, '__iter__'):
                            return self.__class__(result)
                        else:
                            return result
                    return m

        def unwrap(self):
            return self.it

This allows you to do stuff like

    WrappedList([1, 2, 3, 4]).filter(lambda x: x % 2 == 0).map(lambda x: x * 3).list().unwrap() # [6, 12]
    WrappedList([1, 2, 3, 4]).map(lambda x: x >= 5).any() # False
Deciding whether or not this is something you should do, rather than just something you can do, is left as an exercise for the reader.


Python debugging is most sane when the code just tries to keep it simple. After thousands of pdb sessions I can say most people should not be allowed to do this kind of thing in real code!


With a small reminder that the pythonic way to do filter and map is even more readable - but it's limited in scope:

   [x * 3 for x in [1, 2, 3, 4] if x % 2 == 0]

with that said, I still love the general concept of chaining, and I use that style a lot where it is already convenient and popular - in pandas code.


And herein we see a weakness of Python: There is no way to get rid of the lambda lambda lambda, without actually naming things using def. Even though we are defining a pipeline of steps, still have to put up with syntactic clutter. Compare with threading/pipeline in other languages.


You could improve the speed a little bit (maybe) by doing something like:

    self._fns = {fn.__name__: fn for fn in [...]}
This may not be faster since the list is so short, but worth checking into


This reads so much better


If you actually think this code is better there's a real library that does this: https://github.com/EntilZha/PyFunctional.


One puzzling thing is that it uses backslash continuation in its examples. The most favoured style, IMO is to use ()'s for line continuation, maybe the author just doesn't know about those.


Both your suggestions are awful.

The reason they are bad is that intermediate results are never named (and thus are never explained). In simple situations, it's possible to infer from the context what the author's intention was, but in more complicated cases, if you want to understand someone's code, especially if it's written in the way you did, you'd have to "disassemble" it into simpler operations, name the variables (after investigating or guessing the purpose of each operation) and then try to come up with the full picture of what's going on.

Also, as a style suggestion: avoid using backslashes. In your situation, you could just put dots at the end of the line and it will be enough to not need the backslashes. It adds noise to your code, i.e. characters that add no meaning, just sort of a "scaffolding" to hold your code together.


In my own python toolbox (specifically for the list-of-dictionaries use-case) I inject .log(). calls into the pipeline as need to show what the actual intermediate values would be.

Naming intermediates is fine (and encouraged) if there are actually meaningful names to be given. But sometimes the expression itself is the shortest meaningful name for the expression.

Re backslashes, you can also just wrap the expression in parentheses.


A perhaps more appropriate name for your 'log' would be 'peek'. Log reminds of logging, which usually does not return a value, but writes to stdout or a file or similar.


but that's exactly what log does, it is a pipeline filter that logs the content to standard output but otherwise it is passthrough.


You wouldn’t really write it as you have in the second example though. The Pythonic way of writing something like this is to use list comprehensions or generator expressions, for example:

    min(some_op(item) for item in bar.baz() if some_filter(item)).foo()
Or decomposed a little for clarity:

    processed_items = (some_op(item) for item in bar.baz() if some_filter(item))
    min(processed_items).foo()
This is pretty readable – a natural language description of the first line is “do some_op for each item in bar.baz that matches some_filter”, which corresponds 1:1 with the code.


This feels somehow even worse. I don’t think it’s more readable than Op’s example, but now it’s also weird to write.

The chaining example in OP’s first example is way better


both examples seem pretty contrived and I think this comes down just what language you are used to. Their other code seems very JS-y.

I work in both python & js. The python reads like natural language:

    Processed items is a set of some transformation of each item in bar.baz() where something is true for that item.
    then Foo the smallest in that list.
It reads like english.

JS-y stuff doesn't read like natural language, but I do think its more concise and fits the IDE function discovery workflow better.

Both models can be made into horrid messes or elegant solutions. Both are highly readable.

Now I like the python one because I find it natural to attach contextual "whys" or "because" comments to them.

    Processed items is a set of some transformation of each item in bar.baz() where something is true for that item.
    then Foo the smallest item.
    # because foo is a slow function and we don't want to foo every bar and baz


And if you have nested list comprehensions this totally stops making sense somehow.


any, all, map, filter, min, max, for loops, zip, list, tuple, reduce, list comprehensions, cycle, repeat, islice, and so on in python work on iterables, and iterable is a protocol, not a class. it would certainly be interesting to program in a language where conforming to a protocol (perhaps one that nobody had thought up yet when you wrote your class) would give your class new methods, or where all iterables had to derive from a common base class, but it would be a very different language from python

incidentally in your example, though data does flow from top to bottom, control does not, assuming the filter and map methods are lazy as they are in python; it ping-pongs back and forth up and down the sequence in a somewhat irregular manner, sometimes reaching as far as .min() before going back up, and other times turning around at .filter(...)

i wonder if you could implement the ide functionality you want with a 'wrap' menu of popular functions that are applicable to the thing to the left of your cursor, so when you had

    filter(some_filter, bar.baz())|
(with | representing your cursor) you could select `map` or `min` or whatever from the wrap dropdown and get

    min(filter(some_filter, bar.baz()))|
for any given cursor position in python there are potentially multiple expressions ending there, in cases like

    "y: %s" % y|
but maybe that's not such a hard problem to solve


> it would certainly be interesting to program in a language ... where all iterables had to derive from a common base class, but it would be a very different language from python

You mean Ruby? :P

(All Ruby iteratables mixin Enumerable, which is baaaaaaaasically inheritance.)


Or Rust! Everything that implements the Iterator trait gets access to all of Iterator’s goodies, like map, filter, reduce, etc. Implementing iterator just requires adding a single next(&mut self) -> Option<Item> method on your type.

Lifetimes and async are a massive pain in rust. But the trait system is a work of art.


I like Rust's struct + traits approach, because they avoid inheritance and encourage composition. I am sure people have built bad workarounds though to do inheritance anyway.


ruby is closer to what i meant because you can't add methods to rust's iterator, can you? but people add stuff to enumerable all the time


You can!

    trait MyIterHelpers: Iterator {
        fn dance(&self) {
            println!("wheee");
        }
    }
    
    // And tell rust that all Iterators are also MyIterHelpers.
    impl<I: Iterator> MyIterHelpers for I {}
The one caveat is that using it in a different context will need a use crate::MyIterHelpers; line, so the namespace isn't polluted.


neat, i didn't know that was possible


Or its inspiration, Smalltalk.


> i wonder if you could implement the ide functionality you want with a 'wrap' menu of popular functions that are applicable to the thing to the left of your cursor

This is already implemented in IntelliJ for Java - they call it "Postfix Completion". For example you can type ".cast" after an expression to wrap what's before the cursor in a cast expression, so type "a + b.cast", then pick cast to "float", and pick how large a preceding expression you want to cast, and you can end up with "(float)(a + b)" and go from there. They have postfix completion that can extract expressions into variables, create if-statements and switch-statements from expressions, and so many more things that I wish I had when doing non-trivial Python coding in my IDE of choice (which is not by Jetbrains)...


> it would certainly be interesting to program in a language where conforming to a protocol (perhaps one that nobody had thought up yet when you wrote your class)

Not automatic, but you could use a decorator + the protocol as type annotation, I think


Inspired by others here, I tried hacking something together myself

  from functools import partial
  
  class Pipeable:
      def __init__(self, fn):
          self.fn = fn
  
      def __ror__(self, lhs):
          return self.fn(lhs)
  
  def pipeable(fn):
      return lambda *args: Pipeable(partial(fn, *args))
  
  filter = pipeable(filter)
  map = pipeable(map)
  list = pipeable(list)
  sum = pipeable(sum)
  min = pipeable(min)
  max = pipeable(max)
  any = pipeable(any)
  
  # Usage:
  
  range(1, 100) | filter(lambda x: x < 50) | max()
  # 49
  
  [1, 2, 3, 4] | filter(lambda x: x % 2 == 0) | map(lambda x: x * 3) | list()
  # [6, 12]
  
  [1, 2, 3, 4] | map(lambda x: x >= 5) | any()
  # False


In my mind this is a holdover from when Python was much more procedural/C-like and as a Python developer it's one of my pet peeves. (I can't count how many times I've started writing the name of a list, had to backtrack to stick a `len` in front, and then tap tap tap arrow keys to get back to the front.)

I suppose we really ought to blame Euler for introducing the f(x) notation 300 years ago... Very practical when the function is the entity you want to focus on, often less useful in (procedural) programming, where we typically start with the data and think in terms of a series of steps.

Some languages like D and Nim have "UFCS", uniform function call syntax, where all functions can be called as methods on any variable. Basically, it decouples the implicit association between method dispatch and namespacing/scoping semantics. Rust also has something they call UFCS, but it only goes one way (you can desugar methods as normal functions, but you can't ... resugar? arbitrary functions as methods). Python couldn't implement this without breaking a lot of stuff due to its semantics, but it is definitely a feature I'd like to see more of.


> In my mind this is a holdover from when Python was much more procedural/C-like

That never existed. Or if it did, it was long before any trace exists, and there's trace from quite a way back when e.g. the first commit in which I can find the len() builtin (https://github.com/python/cpython/commit/c636014c430620325f8...) also has calls to file.read and list.append, and the first python-level methods are created just a few commits later (https://github.com/python/cpython/commit/336f2816cd3599b0347...). Though there may be missing commits, this is 30 in, back when Python was an internal CWI thing (although nearly a year in, according to the official timelines of the early days).

This was years before magic methods were even added (https://github.com/python/cpython/commit/04691fc1c1bb737c0db...).

So no, I don't think it's a "holdover from" anything. Rather seems like it's GvR's sensibilities.


Thanks for the thorough correction. I think I was making that assumption due to the semantics of the language, which suggests classes and methods being somewhat "bolted onto" a dict-based core. Unfortunately for me, it makes me all the more dissatisfied with the choice.



Thanks. I may have already read that post (or I just correctly backtracked the reasoning), as I was pretty much convinced namespacing conflict (the second bit of rationale) was a factor for the dunder-ing of methods, but I had no source so ultimately decided not to put it in.


> and then tap tap tap arrow keys to get back to the front.)

Learn a better editor, and this will stop being a problem.


Or just use any text editor ever and use Ctrl+arrow to jump word-wise. The most common efficiency issue in editing is editor literacy, not editor featureset.


Good programming editors are designed with the idea that as you master the program, you become more precise in telling it what to do. When editing programs, the author usually applies several navigational schemes to interpret the text of the program: by structure, by syntactical elements, but geography of the screen.

To expand on this: examples of navigating by structure include moving by token / expression / definition. Examples of moving by syntax would be the search or "jedi" navigation (i.e. navigation where you enter a special mode requiring from you to type characters that iteratively refine your search results). Finally, simply moving up / down / left right by certain number of characters is the "screen geography" way.

There's no way to tell which method is better, because they apply better in different situations, however the "screen geography" method usually ends up being the worst, because it's the most labor-intensive and requires from the author to dedicate a lot of attention to achieve precision (i.e. move exactly N spaces to the left and then exactly M spaces down is very easy to get wrong, also, with larger N and M becomes really tedious).

Navigation by word is only slightly better than navigation by character, and often falls into the "screen geography" kind of navigation. It's easy to learn, it's quite universal and doesn't require understanding of the structure of the program or mastering better techniques (eg. "jedi jump"). That's not to say that it should be excluded from the arsenal -- quite the opposite, but a master programmer (in the sense of someone who writes programs masterfully) would be the one who's less reliant on this kind of navigation.


> If your pavement has potholes, just learn to jump over them.


No. That's a wrong analogy. There's no way around having to navigate the text of the program back and forth, by character, by word, by statement, by definition and so on. This is bread and butter of people who write code.

If you complain about doing this, this is because you don't know how to perform the basic functions necessary to write code. Heuristically, this is because you are either using a bad editor or didn't learn how to use a decent one.

I.e. your complaint is more comparable to Amazon reviews coming from people who don't know how to use the product and then write something asinine, like that one about a loo brush that feels too rough when used in the capacity of toilet paper (though I believe that one was actually a joke inspired by similarly stupid but less funny reviews).


How I eventually resolve this kind of problems.

    minimum = +Inf
    for b in bar.naz():
        if not some_filter(b):
            continue
        b = some_op(b)
        minimum = min(minimum, b)
    foo(minimum)
Yes, plain old procedural python. data flow from top to bottom. it allows `print` debugging, very usefull to debug some_filter and some_op are broken.


With python I'd decompose that one-liner into several variables for readability. That probably ends up using more memory than it would otherwise but I generally don't work on systems where that matters much.

Scala was really nice for this syntax when I used it for Spark.


Map and filter don't actually consume anything until they're used later, they produce iterables. So if you pulled them into their own lines they wouldn't consume (much) extra memory. Taking the original:

  min(map(some_op, filter(some_filter, bar.baz()))).foo()
An alternative is also to use a generator comprehension that's identical to the inner part (in effect):

  min(some_op(item) for item in bar.baz() if some_filter(item)).foo()
Which could still be pulled out to a pair of lines for clarity:

  items = (some_op(item) for item in bar.baz() if some_filter(item)) # or some better name given a context
  min(items).foo()


Makes me wish python had a pipe operator like Julia's |> and R's %>%



There is a niche use-case for the reverse order `(foo min map filter baz bar)`, which is, solving typed holes (you could refine the hole as like `_.foo()` although that wouldn't be interoperable with things like next token prediction).

But that's more of a math thing than an everyday coding thing, where dot chaining usually reads nicer.

Mixed is definitely the worst, like you said.


Would that really work? You can chain those functions because they return the same type. For example, filtering a list returns a subset of the list.

Any/all return a Boolean, so the chain would stop there.

I also personally think

    any(x % 5 in range(y))
Is more clear than

    range(y).any(lambda x: x % 5)


Your point about ordering and readability really rang true for me. My way around this in Python is to separate the map and the reduce: do the map in one part with a list comprehension and the reduce in a second part on a new line.

I’ll wrap the whole thing in a named function as a way of describing what I’m doing and make it a closure if it’s used only once:

  def f(bar):
    def smallest_baz():
      bazs = (
        some_op(b)
        for b in bar.baz()
        if some_filter(b)
      )
      return min(bazs)

    return smallest_baz().foo()


it's interesting I completely agree with you and it's a big reason I find Python irritating to write (compared to Groovy, Kotlin, Ruby, etc). However there do seem to be a lot of people that dislike this method chaining style and will assert that functional style is better in every way. But I just can't fundamentally agree that writing these as functions is as readable.

Even if you go far out of your way to format it similarly, it still forces you to do a lot of mental work to see the inner most starting point and then deduce what the sequence of operations that happens is backwards, eg:

   foo(
      min(
           map(lambda x: ...,
               filter(lambda: y: ....,
                     baz(bar)
              )
          )
       )
(and of course, the python linters are typically configured to hate this so you can't realistically write it this way even if you want to)


Or, you know, you could write good'ol for loop and use multiple statements, instead of having a gigantic expression


for my smooth brain `map(this, to_that)` makes better sense than `to_this.map(that)`

same with give me `min(of_this)` instead of `of_this_want.min()`


Agree to a big extent. Rust has lots of methods, because their traits work best or most habitually with methods. So I see a comparison of Rust x.min(y) vs Python min(x, y).

The Rust x.min(y) to me is so asymmetric. min(x, y) conveys the symmetry of the operation much better, x and y are both just elements. (And the latter is how it can be used in Python. In Rust, you can call Ord::min(x, y) to get the symmetry back, but it is less favoured right now for some reason.)


This is the same mistake that golang did.


those \ are super ugly though


I would not recommend the default arguments hack. Any decent linter or IDE will flag that as an error and complain about the default argument being mutable (in fact, mutable default arguments are the target of many beginner-level interview questions). It's much easier to decorate a function with `functools.cache` to achieve the same result.


Or, if you need a "static" variable for other purposes, the usual alternative is to just use a global variable, but if for some reason you can't (or you don't want to) you can use the function itself!

    def f():
        if not hasattr(f, "counter"): 
            f.counter = 0
    
        f.counter += 1
        return f.counter

    print(f(),f(),f())

    > 1 2 3


I didn’t realize that the function was available in its own scope. This information is going to help me do horrible things with pandas.


This is very important for self-recursion.


Is there something that isn't "self-recursion"?


Mutual recursion. Horrible example, don’t use this:

  even 0 = true
  even n = not (odd n-1)
  odd 0 = false
  odd n = not (even n-1)


That should be

  even 0 = true
  even n = odd n-1

  odd 0 = false
  odd n = even n-1
I fed a C version of this (with unsigned n to keep the nasal daemons at bay) to clang and observed that it somehow manages to see through the mutual recursion, generating code that doesn't recurse or loop.


You are correct, I don't know why I put the nots in there. Either way, demonstrates mutual recursion.


This is very important for self-recursion.


This is very important for self-recursion.


RecursionError: maximum recursion depth exceeded



This is very important for self-recursion.


In Python you'd maybe think, smart, then my counter is a fast local variable. But you look up (slow) the builtin hasattr and the module global f anyway to get at it. :)

I looked at python dis output before writing this, you can look at how it specializes in 3.11. But there's also 4 occurences of LOAD_GLOBAL f in the disassembly of this function, all four self-references to f go through module globals, which shows the kind of "slow" indirections Python code struggles with (and can still be optimized, maybe?)

You could scratch your head and wonder why even inside itself, why is the reference to the function itself going through globals? In the case of a decorated or otherwise monkeypatched function, it has to still refer to the same name.


More concretely, one of the classic Python bugs is to use `[]` as a default argument and then mutate what "is obviously" a local variable.


I think it's even more safe/preferable to use non-mutable `None`s as a default and do:

``` def myfunc(x=None): x = x if x is not None else [] ... ```


In some cases you can also do:

  x = x or []
Your method is best when you might get falsy values but if that’s not an issue the `or` method is handy.


I tend to dislike this method as it's unclear what or returns unless you already know that or behaves this way. x if x is not None else default is cleaner in my opinion


I'm learning python, and I hit this milestone about a week ago!


What's it do?


When you set an object as a default that object is the default for all calls to that function/method. This also holds true if you create the object, like that empty list. So in this case, every call that uses the default argument is using the same list.

    def listify(item, li=[]):
        li.append(item)
        return li

    listify(1) # [1]
    listify(2) # [1, 2]


I would hate to get an interview question where the very premise of it is wrong. Python does have mutable arguments, but so does Ruby.

    def func(arr=[])
      # Look ma we mutated it.
      arr.append 1
      puts arr
    end
Why calling this function a few times outputs [1], [1],... instead of [1], [1, 1],... isn't because Ruby somehow made the array immutable and hid it with copy-on-write or anything like that. It's because Ruby, unlike Python, has default expressions instead of default values. Whenever the default it needed Ruby reevaluates the expression in the scope of the function definition and assigns the result to the argument. If your default expression always returned the same object you would fall into the same trap as Python.

The sibling comment is wrong too -- it is a local variable, or as much one as Python can have since all variables, local or not, are names.


Just as a demo of what you're saying:

If you were to do (the following is from memory, probably has typos):

  def func(arr=[]):
    print(locals)
You'd see `arr` there. The `[]` value lives in `func.__defaults__`:

  def func(arr=[]):
    print(locals)
    print(func.__defaults__) # will print: ([],)
If you assign to `arr` nothing changes with defaults:

  def func(arr=[]):
    print(locals)
    arr = 10
    print(func.__defaults__) # will still print: ([],)
But since lists are mutable, calling a mutating function on the list referenced by `arr` will cause a mutation of the list stored in defaults:

  def func(arr=[]):
    print(locals)
    arr.append(10)
    print(func.__defaults__) # will print: ([10],)
But only when `func` is called without something to assign to `arr`:

  # if pristine and it has not been run before
  def func(arr=[]):
    print(locals)
    arr.append(10)
    print(func.__defaults__) # will print: ([],)
  func([])


Agreed, I found that example very confusing.


Why does that issue only come up with default arguments?

Why not other places?


Default arguments are evaluated and created when the function definition is evaluated, not when the function itself is evaluated. This means that the scope of the default argument is actually the entire module, not just a single invocation of the method. This is what throws people off.


functools.cache is pretty new; py3.8 is still supported for another year and a bit.


functools.cache is basically `functools.lru_cache(maxsize=None)`. `lru_cache` was added in py3.3, which is widely available.


The big missing item from the list: generators!

Using "yield" instead of "return" turns the function into a coroutine. This is useful in all sorts of cases and works very well with the itertools module of the standard library.

One of my favorite examples: a very concise snippet of code that generates all primes:

  def primes():
      ps = defaultdict(list)
      for i in count(2):
          if i not in ps:
              yield i
              ps[i**2].append(i)
          else:
              for n in ps[i]:
                  ps[i + (n if n == 2 else 2*n)].append(n)
              del ps[i]


And this is a presentation explaining why generators may be extremely useful for all kind of data pipelines: https://www.dabeaz.com/generators/Generators.pdf

If you don't know it already, it is really worth looking into. I am a python dev with nearly a decade of experience and I knew generators, and yet this was still an eye opener.


Note that despite this being a python-specific slide deck, generators and iterators are also present in many other languages, including but not limited to Rust and JS.

The concepts matter more than the chosen language in this deck.

I learned a lot! Looks like I can apply this to a PHP trace/profile parser project, especially the pipelined parsing and the query language idea.


Wow, thanks for that -- that's an excellent slide deck.


But wait, there's more, you can send data back to the function! (Will be returned as the yield output)

https://stackoverflow.com/questions/20579756/passing-value-t...

And don't forget "yield from" (same as yielding all values in a list, but keeps the original generator! You can send data back to the list if it is itself another generator!)


Anyone have good examples of how/when to actually use this? I've personally never interacted with or written a generator that expects to receive values.


I actually had a great use case for this last week. Needed to flatten a list of nested dicts, e.g.:

  [
    {"name": "/dev/loop0"},
    {"name": "/dev/loop1"},
    {"name": "/dev/loop2"},
    {
      "name": "/dev/sda",
      "children":
        [
          {
            "name": "/dev/sda1",
            "children":
              [{"name": "/dev/mapper/lubuntu--vg-root"}, {"name": "/dev/mapper/lubuntu--vg-swap_1"}],
          },
        ],
    },
    {"name": "/dev/sdb", "children": [{"name": "/dev/sdb1"}, {"name": "/dev/sdb2"}]},
    {"name": "/dev/sdc", "children": [{"name": "/dev/sdc1"}, {"name": "/dev/sdc9"}]},
  ]
Wound up writing a recursive generator (with some help from #python on IRC):

  def flatten(items):
      for item in items:
          yield {k:v for k,v in item.items() if k != 'children'}
          if 'children' in item:
              yield from flatten(item['children'])
which results in:

  [{'name': '/dev/loop0'},
   {'name': '/dev/loop1'},
   {'name': '/dev/loop2'},
   {'name': '/dev/sda'},
   {'name': '/dev/sda1'},
   {'name': '/dev/mapper/lubuntu--vg-root'},
   {'name': '/dev/mapper/lubuntu--vg-swap_1'},
   {'name': '/dev/sdb'},
   {'name': '/dev/sdb1'},
   {'name': '/dev/sdb2'},
   {'name': '/dev/sdc'},
   {'name': '/dev/sdc1'},
   {'name': '/dev/sdc9'}]


I see your function and "yield" (pun definitely intended) the following:

    def flatten(children=[], **other):
        if other: yield other
        for child in children: yield from flatten(**child)


That's pretty brilliant to use `children` as the keyword name, thanks!


Thanks for the example, but I was more looking for something that uses "generator.send(...)". I definitely agree that yielding items out of generators is extremely useful, but not so sure on examples of generators that are sent values.


This is the basis of most older async frameworks (see: Tornado, Twisted). A while ago I put together a short talk on how to go from this feature -> a very basic version of Twisted's @inline_callback decorator.

https://github.com/ltavag/async_presentation/tree/master


Anything with feedback control. Updating a priority queue's weights, adaptive caching, adaptive request limiting, etc. Ironically it looks like HN itself rate limited me the first time I tried to reply lol


I am a python noob and this is going to take me some time to process.


Best way to think about it is that a generator can throw some questions back to the caller. It always looks a bit messy though.

    question_bank={'1+1' : '2', '2+3' : '5'}

    def Quiz():
        for question, correct_answer in question_bank.items():
            answer = yield question
            if answer == correct_answer:
                print('Correct!')
            else:
                print('Wrong.')
        yield 'Finished!'
                
    question = Quiz()
    q = next(question)
    while q != 'Finished!':
        q = question.send(input(q))


I like using generators when querying APIs that paginate results. It's an easy way to abstract away the pagination for your caller.

  def get_api_results(query):
    params = { "next_token": None }
    while True:
      response = requests.get(URL, params=params)
      json = response.json()
      yield from json["results"]
      if json["next_token"] is None:
        return
      params["next_token"] = json["next_token"]
  
  for result in get_api_results(QUERY):
    process_result(result)  # No need to worry about pagination


Thanks! I tried to add mostly the stuff I don't encounter that often in blogs/tutorials etc. But guess you are right. Generators, or at least the 'yield' keyword, is often misunderstood, and we can't emphasize them enough


Just to clarify, I don't mean your article is bad or incomplete -- quite the contrary, I enjoyed it a lot. Generators are one of my favorite Python features and they're kind of underused, mostly because people simply don't know about them.

A couple more along the same lines:

- Metaclasses and type. (This is admittedly dark magic, but useful in library code, less so in application code)

- Magic methods! Everyone knows about __init__, but you can override all sorts of behaviors (see: https://docs.python.org/3/reference/datamodel.html)

My favorite example (I have a lot of favorite examples :)) is __call__, which emulates function calling and is the equivalent of C++'s operator().

Why is it my favorite? Because as the old adage goes, "a class is a poor man's closure, a closure is a poor man's class":

  class C:
      def __init__(self, x):
          self.x = x
      def __call__(self, y):
          return self.x + y
 
  >>> a = C(2)
  >>> a(3)
  5


Thanks a lot! Really appreciate it. Love the example! Haven't used the dunder __call__ yet (like many magic methods I guess), but that's a nice one!

I didn't have to use Metaclasses, either, though I have read about them, especially in Fluent Python. But I guess I belong to the 99% who haven't had to worry about them, yet :P


I find that __call__ is very confusing, but maybe because I'm not used to seeing if often.

What is the benefit compared to having a method named "add" that also explains the behavior?


If an object is callable you can use it in places that might conventionally expect functions. The utility of that is very situational, though. I've only used it a handful of times myself over the years I've known and used Python.

It may also give you a "clearer" (in quotes because subjective) presentation for something you're trying to do.


I see it a lot in HuggingFace, and use it myself for classes that are used like a function, especially when the obvious method name is the verb form of the class name

    processor = SomeProcessor.load("path/to/config")

    # with __call__
    processed_inputs = processor(inputs)

    # less awkward than
    processes_inputs = processor.process(inputs)
The only benefit is to the human, same as @property or even @dataclass.


Thanks for writing that up! I disagree though, I prefer the processor.process for clarity, and for not adding another way of doing things that regular methods already do.


I think I figured out that count(2) is from itertools? I'm new to python.

I think you could simplify the rest like so:

    def primesHN():
        from collections import defaultdict
        from itertools import count
        yield(2)
        ps = defaultdict(list)
        for i in count(3,2):
            if i not in ps:
                yield(i)
                ps[i**2].append(2*i)
            else:
                for n in ps.pop(i):
                    ps[i + n].append(n)


> I think I figured out that count(2) is from itertools?

It is. Itertools is a masterpiece of a module. It has a lot of functions that operate on iterators and will work both on standard iterables (lists, tuples, dicts, range(), count() etc.) and on your own generators. It forms a sort of "iterator algebra" that makes working with them very easy.

> I think you could simplify the rest like so:

Sounds good, but with a caveat: you do need to call "del" at the end for memory deallocation purposes. The garbage collector isn't smart enough to know you won't be using those dictionary entries any longer. Technically the code still works, but keeping everything in memory defeats the purpose of writing a generator.


> you do need to call "del" at the end

The garbage collector doesn't understand "pop"? That seems...dumb? ¯\_(ツ)_/¯


can you explain how generators work with multiprocess (Thread based pool) ?

is ps internal variable unique for each Thread or same?

is it safe to execute your primes() from different threads?


> can you explain how generators work with multiprocess

The best way to think of a generator is as an object implementing the iteration protocol. They don't really interact with concurrency, as far as multiprocess is concerned, they're just regular objects. So the answer is that it depends on how you plan to share memory between the processes.

> is ps internal variable unique for each Thread or same?

ps is local to the generator instance.

  def f():
      x = 0
      while True:
          yield (x := x + 1)
 
  >>> f()
  <generator object f at 0x10412e500>
  >>> x = f()
  >>> y = f()
  >>> next(x)
  1
  >>> next(x)
  2
  >>> next(y)
  1
> is it safe to execute your primes() from different threads?

For this specific generator, you would run into the GIL. More generally, if you're talking about non CPU-bound operations, you need to synchronize the threads. It's worth looking into asyncio for those use cases.


A yield will simply return a generator object, which contains information about the next value to use, and how to continue the function execution. That's why you need to use functions that yield things inside loops or list(...).

If you run it from different threads I guess it will be the same as calling the function multiple times, it will return a new started-from-the-top generator.

    def sum():
        yield 1
        yield 2
    print(repr(sum()))
    print(next(sum()))
    print(next(sum()))
Prints

    <generator object sum at 0x7fc6f14823c0>
    1
    1


so Thread based based pool will have same instance of generator, while Process based pool with have unique instance of generator?


In this example, calling sum() creates a generator and returns it. Say g = sum(). If you share g between threads, they will all use the same generator object! If you call sum() separately per thread, they will be different generators.

If you try to send g to a different process, you will get an error, because it doesn't serialize.


I don't know if a generator can be shared across threads, but in that case ... I have no idea :/

You'll need to search, or try!


I know it's a really minor point, but in a blog post about Python (rather than just one that is using Python), it kind of bothers me to see "non-Pythonic" code style,

    if(x > 0): ...
vs

    if x > 0: ...
but probably just OCD kicking in.


Since Python 3.7

  import pdb
  pdb.set_trace()
can be written as just

  breakpoint()


That's not entirely true, because `breakpoint` is a more general hook, `pdb.set_trace` is just its default behaviour.

This is, if anything, better. Because that way you can e.g. replace stray `breakpoint()` calls by warnings rather than break production :D


I was told that at my job, but my fingers are so used to type `pdb` and emacs template-replacing it that I can't change.


Configure Emacs to template-replace it with breakpoint()


And this also works with Debugpy so you can actually use a proper debugger and not pdb which is frankly terrible.


Thanks for the tip! :)


This is a bit of bikeshedding, but I think

  if not n in memo:
is more naturally written as

  if n not in memo:


I agree. The first is order of operations dependent. Without looking, is that `(not n) in memo` or `not (n in memo)`?

The sent can only be interpreted one way.


As someone learning Python, but having worked with other languages, I think your second example is better as it reads more like English. I think that simplicity actually ends up much more rewarding when it comes to reading code.


Linters agree with you with the default config, and will warn on "if not x in y".


Agree. Using "not in" can also theoretically make certain checks faster (e.g. testing negative presence in a hash-based data structure can bail out without walking the collision chain if the initial hashed location does not have an element).


I would absolutely point this out in a code review. It's not even that pedantic, it's the kind of code that causes a double take b/c.


  - none of these functionalities are "overlooked", this is pretty basic python
  - for fibonacci you have a decorator for memoization (functools cache / lru_cache)
  - you don't need to use parenthesis for a single line "if"


You are very much right a lot of it is pretty basic knowledge. From my experience though, a lot of python developers don't take the python docs or tutorial as first resource, and quite some developers I met did lack quite some knowledge I mentioned in the article.

You are right about the fibonacci operator, I thought I did refer to another article where I mention the lru_cache as well :) But I'll double check.

Good one about the parenthesis! I'll post an update soon


You consider these 'basic' python ? just curious, I'd say it's a bit below intermediate.


At the point we're disagreeing about 'basic' vs. 'bit below intermediate'.. idk we at least have to agree how many levels the model has.

Fwiw I also thought it was pretty regular stuff, and then arcane library functions you've either needed or you haven't. Also, that's a generator, not a list comprehension.


One man's basic is another man's low intermediate.... but I agree that none of these seem overlooked to me,. They are pretty basic things once you get past the few chapters of your first python book.


Overlooked by whom?

Even though most of the stuff OP writes about is worthless, I see it used a lot (if it's old) and less so, but still enough otherwise...

This article reads to me like as if it was written by someone learning Python, perhaps in their 3'rd-4'th month, when they finally decided to open documentation / some existing project code instead of implementing calculators and animal class hierarchies...


A couple people already pointed out that you can write `breakpoint()` instead of using `pdb.set_trace()`.

Here's one more trick: you can use `pdb` to run scripts! `python -m pdb foo.py` will run `foo.py` but trigger a breakpoint on the first error.


That is definitely a neat one! If you are ok with it, I might add that one. I just updated the article already with some of the great comments and tips I recieved over here.


Oh! Thats a really nice one!


> The underscore _ can be used as a throwaway variable to discard unwanted values:

So can any other variable, using underscore is just a convention to make it obvious that you're not planning to re-use it (it doesn't get GCed more aggressively or anything).

Similarly, private methods being prefixed with an underscore is also just a convention, you can access them from anywhere.

However, double underscores are used for magic attributes and name mangling for class attributes, which are interpreted differently! (See: https://stackoverflow.com/a/1301369)


Many linters are also configured to ignore '_' for many tests (such as any 'unused variable' warnings)


In the interactive environment _ automatically holds the value of the last statement executed.

  >>> 1+1
  2
  >>> _ * 3
  6


Pretty good list. Two corrections:

The `first, *middle, last` trick doesn't work if your list only has one element:

  first, *middle, last = [1]
  ValueError: not enough values to unpack (expected at least 2, got 1)
And the last title has a typo:

> Separater for Large Numbers


I don't think the point about splat unpacking really is a correction. Unpacking always requires that the iterable has enough values to assign to the specified variables, this has nothing to do with the use of *middle.


To be clear, this is only a problem if your array can ever have less than two values, or if you require that `first` and `last` be different values.

The mistake is suggesting the unpacking without caveats, because it'll fail in situations that the naive solution doesn't:

    items = [1]
    first = items[0]
    middle = items[1:-1]
    last = items[-1]
This will still work as long as you have any elements.


> Python arguments are evaluated when the function definition is encountered

This is a giant pain. Easy to miss. Sometimes forces you to deal with Optional[Something] instead of just Something.

Compare with Julia where default arguments are evaluated ... very late:

    julia> f(a, b, c, d = a * b * c) = d
    f (generic function with 2 methods)
    
    julia> f("hello", " ", "world")
    "hello world"
that's really neat.


Probably one of the benefits I gained from writing JavaScript before ES5 (although have worked with many languages, I've only used a few that were dynamic - PHP, JS, and old VB). I write my functions as early as possible, having remembered hoisting rules from JavaScript (and trying to only rely on OOP with Python where it naturally makes sense).

Looking at your Julia example, this seems much more friendly and less surprise and error-prone.


    first, _, last = [1, 2, 3, 4, 5]
I guess this is a typo, it should be

    first, *_, last = [1, 2, 3, 4, 5]
(As explained above!)

Other than that, nice list of python tricks, I love not-so-known features because it can make code shorter and prettier!


Sharp! Updated that line. And thank you for the compliment :)


> Python arguments are evaluated when the function definition is encountered. Good to remember that!

I would never try to exploit this behavior to achieve some kind of benefit (avoiding max recursion). Any tricks you try to do with this is almost definitely going to to cause bugs that are very difficult to track down. So don't be too clever here.


Yeah. I was really surprised to see this as a feature to be used rather than a gotcha. I've seen it more as gotchas, as in actual bugs introduced because of this behavior, and never as a feature until now. I can see why he thinks it's useful though and, maybe within his specific context, it is. That said, even for his example, I think he would have been better off using https://docs.python.org/3/library/functools.html#functools.c...


You are right about that, perhaps it is good to mention it as "gotcha". Or I could have used a better title. I do think though, it is good practice to know this stuff. About the cache decorator: I did link to another article where I discuss lru_cache and cache :)


Before I saw your comment, I had "overlooked" that these were presented as beneficial features, rather than just curiosities. As someone just learning Python, but familiar with other languages, I can only hope that if I start using Python in production with other developers they take the most obvious route (or use a comment as to why they would be relying on this type of behavior).

I chose to learn Python because it seemed to be the easiest to read, which to my mind meant working in a team would lead to easier discovery and understanding. Then I see articles like this, and wonder if I'll have a lot of footguns to watch out for where the code isn't as clear as it seems.


I fully agree on this! Like the Zen of Python says: explicit is better than implicit. You should not expect your teammates know all "special" behavior of the language and if you can write it more straightforward, you should in that case. Guess Kyle mentiones the same about JavaScript in his books. You are right about that. Perhaps that should be a nice addition to the post. I do believe though, it is good to be familiar with this behavior in case you ever come accross such situation.


Would add:

* For dicts, learn .setdefault() vs. .get() vs. defaultdict()

* .sort(key=sortingkey)

* itertools groupby, chain

* map, filter, reduce


Perhaps for another article? :) But thanks, definitely a nice list! My purpose for now was to keep the article 'digestible' as well.


Multiple context managers in a single with statement is something I didn’t know!


Great list, thanks, I'll be sure to use some of these.

Here's the obvious question: how many more unknown-but-useful features are hidden away in other similar articles.


Too many I bet. There is always stuff to learn I guess, but reading Fluent Python gets you pretty far. @qsort also mentioned some nice extras.


A couple details worth noting:

- `repr` often outputs valid source code that evaluates to the object, including in the post's example: running `datetime.datetime(2023, 7, 20, 15, 30, 0, 123456)` would give you a `datetime.datetime` object equivalent to `today`.

- Using `_` for throwaway variables is merely a convention and not built into the language in any way (unlike in Haskell, say).


> Because the language is so easy to learn, many practitioners only scratch the surface of its full potential, neglecting to delve into the more advanced and powerful aspects of the language which makes it so truly unique and powerful

We have definitely found this to be true in hiring. Many people’s Python knowledge seems to just be surface deep.


Some of these features lie in the border of the uncanny valley where languages like Ruby and "vanilla Javascript" live, and are not compatible with the principle of least surprise or even the Zen of Python. I don't write too much Python anymore, but when I do I keep it simple and explicit.


I find a lot of python like that. It's a simple language to get started in but an incredibly complex language to try and get across more than skin deep. Maybe not C++ complex but more than I expected.

It has some wild features and crazy syntax and if you know it, it's probably awesome, but I too like to keep it mostly simple and obvious.


I agree. Someone else here also mentioned that they prefer code that is easy to read over code that uses a lot of "unfamiliar functionality," let's call it that. And I do agree; Kyle mentions the same thing if I remember correctly when it comes down to JavaScript. It is better not to expect your colleagues or other developers to know the ins and outs of the language as well. If one way is 10x easier to understand, just stick with that.

But as you said: if you know it, it's probably awesome. In my opinion, it never gets boring to discover new things in Python, and it does make you a better Python developer. Knowing what and when to apply certain knowledge is where your experience comes in.


Re unpacking with * one I use often is when you have a list of types of coordinates you want to plot, i.e.

  # z = [(x0,y0), (x1,y1) ...]
You can do

  import matplotlib.pyplot as plt
  plt.plot(*zip(*z))
I spent years doing

  x = [t[0] for t in z] # etc
before I realized this.


I'd argue that your original approach is actually better than your new approach.

Using a list comprehension, such as your original approach, is pretty easily understood by anyone writing python and is easy to follow, it is also quite terse.

Your recursive unpacking zip thing is much harder to understand and read. This reminds me of the type of stuff you find in the codebase years later when the person who wrote it is long gone and you find a comment next to it that says:

# No idea why this works, but don't touch it

One of the problems I have with python is that there are a million super creative ways to do stuff, especially using less known parts of the language. People love to get super creative with it, but usually the simplest solution is actually the best one, especially when working on a team.

In your example above, you aren't even saving any real space. Both approaches can be done inline, the list comprehension is maybe a few extra characters. You're not really saving anything, just making it harder to read and maintain by others.

When I moved from a company that wrote in Python to one that wrote in Golang, I found that the restrictions that Golang offers is a huge benefit in a team. Because you don't have access to all these crazy language components that python has, the code written in Go would be almost identical regardless of who wrote it. Of course everything in Golang is far far more verbose than Python, but I actually found it 100x more maintainable.

In the python codebase it was very easy to tell who wrote different parts of a codebase without looking at the git blame, because there was almost a "voice" with the style of writing python. But in Golang it was more restrictive which meant that the entire codebase was more cohesive and easily to jump around.


The need actually comes up a lot to transpose a list of lists. That zip can do it is not hard to visualize and it's an idiom worth learning. If it still seems unclear, you can name things to help:

    columns = zip(*rows)
or

    def transpose(list_of_lists): return zip(*list_of_lists)
But anyway, yeah, tastes differ, it's fine if we disagree. I do agree that Python has gotten uncomfortably complex. But this is a very old feature from simpler times and does not add any syntax or metaprogramming features, it's just an already needed function.


Not saying I disagree with you, but I do want to note that the specific example of unzipping a list using `zip` has been in the official zip docs [1] as long as I can remember, and as such, should be commonly understood by Python developers.

[1] https://docs.python.org/3/library/functions.html#zip


Hmm, I encountered or used all of these somewhere, but 4 days ago I learned something else: python natively supports complex numbers.

    a=1+3j
    b=a+4j
I encountered this when a friend noticed some weird syntax for a numpy meshgrid (via mgrid):

    np.mgrid[-1:1:5j]


Numpy has a lot of these shortcuts that are quite opaque. For example np.r_ and np.c_

This one can be explained as "equivalent to np.linspace(-1, 1, 5)", i.e 5 evenly spaced points between -1 and 1. Normally the step size is an integer but with a complex "step" it switches the meaning from step size to number of equidistant points.


That parses to me as "given the list np.mgrid, return indexes -1 through 1, step by 5j". I know it's not, but that's what it looks like to me.


Repr prints source code that will (often) give you an equivalent object. I would be highly surprised if it got you The same object instance. == but not ===


  import random
  some_value = 9 # return a number between 0 and, including, 100

  if below_ten := some_value < 10:
    print(f"{below_ten}, some_value is smaller than 10")

Random isn't used in this function, but more importantly why would you assign the value to below_ten if the point is to just print it, why not just print some_value?

Even in the next example of the walrus operator - it is extremely contrived:

    if result := some_method(): # If result is not Falsy
        print(result)
Why not just:

    if some_method():
        print(True)


Mutable default arguments is widely regarded to be a footgun. I agree.


the most surprising feature I learned about Python's core functions was enumerate() had a "start" parameter. I wrote countless +1 offsets.


In newer versions of Python, pdb.set_trace() is automatically aliased to the top level breakpoint() function. Your no longer need to import pdb.


"Overlooked core functionality" is an interesting way to spell "massive footgun".


I've been coding for most of my life and I can't believe some people would choose some of these tricks when python has much simpler syntax for most of these.

The first, *_, last trick for example would be particularly obnoxious to encounter. The first element is my_list[0], last is my_list[-1]. Dead simple, way easier to understand at a glance.


And does not work at all on iterators.


The walrus operator isn't overlooked imo. It's more that many still haven't updated to >3.8


3.7 was released in 2018 and is already EOL. Those folks should probably start considering an upgrade...


Even if the environment updates and you could use new stuff it takes time for you to rediscover this stuff after the upgrading and start using it.


Articles like this are gold. Thanks!


Thank you so much! And discussions about these articles over here are even more valuable :)


Slice notation can be used on the left-hand side of assignment.


as an alternative to pdb I like to use

`import code; code.interact(local=locals())`

Drops you into the interpreter and is sufficient for a lot of debugging problems.


Use all this and you've got yourself a poor man's Ruby.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: