Hacker News new | past | comments | ask | show | jobs | submit login
Python Type Hints – *args and **kwargs (2021) (adamj.eu)
267 points by ekiauhce on Aug 27, 2023 | hide | past | favorite | 149 comments



For typing **kwargs there are TypedDicts https://peps.python.org/pep-0692/

If your function just wraps another you can use the same type hints as the other function with functools.wraps https://docs.python.org/3/library/functools.html#functools.w...


While functools.wraps does propagate __annotations__ by default, be aware that not all IDE-integrated type checkers handle that properly. It's easy in PyCharm, for example, to use functools.wraps such that the wrapper function is treated by the IDE as untyped.

Underneath, this is because many (most?) type checkers for Python aren't actually running the code in order to access annotation information, and are instead parsing it "from the outside" using complex and fallible techniques of variable reliability. That said, it's a testament to JetBrains' excellent work that PyCharm's checker works as well as it does, given how crazily metaprogrammed even simple Python often turns out to be.


Pycharm has the worst type checker that exists today. It may have been the best a few years back but others have suppressed it considerably.

I recently switched from Pycharm to vscode which uses pyright and it's night and day on the amount of type errors it catches, it considerably improved the quality of my code and confidence during refactoring.

And to add insult to injury Pycharm doesn't even have a pyright plugin and the mypy plugin is extremely slow and buggy.


It’s sad to see this happen to be honest. Seems like Jet Brains is getting distracted from their core value proposition: good IDEs. If electron based IDEs are becoming more responsive and performant than their “native” IDEs they have major priority problems.


What even distracts them? IDEs are supposedly the only thing they do. Well, maybe except for Kotlin. And it's not like their IDEs are very cheap either. I mean, not that cheep that I'd like the idea of being too much mentally invested into something, that barely competes with a free source-code editor, let alone lags behind it.


I don't actually know but my assumption is that that they're working with a very old codebase based on the "bespoke parsing and plugins for every language" paradigm that served them well for decades. Meanwhile eg VSCode is using the "language server with treesitter and queryable compiler genetically integrated over a standard API" model that's only recently become widespread.

When I first learned about LSPs it was immediately clear to me they would run circles around the "traditional" IDEs. I'd given up on using IDEs because I found them too finicky and error prone, but LSPs have been a total game changer.


I'm pretty sure they make most of their money from TeamCity build agents.

IntelliJ (the Java+ IDE) always has a community edition that is open source. I can vouch that it is truly free and not crippleware. For most Java programmer, this edition is sufficient.


There is Fleet, which they purport to be their next gen IDE. Which I haven’t even tried, though I am am avid pycharm user, so maybe it’s not getting the results they hope for?


Any examples? I don't write Python that much nowadays, and while I'm sure its type checker doesn't do everything, I kinda never felt disappointed by what it does. Maybe, a considerable part of that is that I still don't really think of Python as a type-checked language, so everything an IDE does for me still feels like quite a bit of an improvement over how I used to write code in Python for a long, long time. But really, "night and day on the amount of type errors"?..


Well on Pycharm 2022.3 which is what I still have installed even this simple function doesn't show any error.

  def foo() -> int:
      pass
I sure hope they improved the type checker in later versions...


On the other hand, Pycharm will show an error if you replace the pass keyword with any statement. Seems like a deliberate behavior - it declines to typecheck a stub.


Um lol what do you think the error is here? This is widely accepted syntax for a stub. So yes it does return None if run but it's not expected to ever be run. So pretty ironic that you would blame pycharm (which is indeed excellent) for your own misunderstanding.

https://mypy.readthedocs.io/en/stable/stubs.html#using-stub-...


> Um lol what do you think the error is here? This is widely accepted syntax for a stub.

No it isn't. The accepted stub syntax is an ellipses (...).

A competent type checker like pyright will error on this code.

  Function with declared return type "int" must return value on all code paths
    Type "None" cannot be assigned to type "int"
And if you need more proof that pycharm is useless as a type checker:

  def foo(b) -> int:
      if b:
          return 1
A classic error where you forget to return a value from all paths and pycharm is silent.


I linked directly to the section of mypy docs that shows you are wrong - all you have to do is click the link to understand that you are wrong:

>(Using stub file syntax at runtime). You may also occasionally need to elide actual logic in regular Python code... You can also elide default arguments as long as the function body also contains no runtime logic: the function body only contains a single ellipsis, the pass statement, or a raise NotImplementedError(). It is also acceptable for the function body to contain a docstring. For example:


And why would I care about mypy docs? pyright does the correct thing by showing an error.

In any case, the second example is definitely not a stub.


this is the weirdest "head in sand" moment; you literally called out mypy as a desirable alternative

>And to add insult to injury Pycharm doesn't even have a pyright plugin and the mypy plugin is extremely slow and buggy.

ie you recognize that aligning with mypy is desirable.


That’s not a type error in python. All types in puthon (afaik) accept ‘None’ as a value.

For example, try:

   a : int = None 
It will succeed. This is done (I think) so you can tell whether optional arguments are defined, declare variables before use (eg if you have a conditional with two branches both setting a different value for a variable sibce python blocks aren’t expressions) and that kind of thing.


It's not recommended anymore. Optional types should be explicit. https://peps.python.org/pep-0484/#union-types


Each type checker can implement however strict rules it wants. And pyright gives the correct answer here:

  Expression of type "None" cannot be assigned to declared type "int"
    Type "None" cannot be assigned to type "int"


There definitely seems to be a few areas since the Fleet announcement that have given me pause on JetBrains.

Their Python support has t kept up with other tools as noted. I’ve see a similar decline in the ability for for them to keep up to date with things like Svelte, Vue, Astro etc too.

They need to embrace the LSP


There is also typing.ParamSpec when the purpose is to write a generic wrapper:

https://docs.python.org/3/library/typing.html#typing.ParamSp...


I think pep 612 is trying to make the ergonomics better for the 'forwarding' / pass-through case (when .wraps isn't appropriate)

https://peps.python.org/pep-0612/


Interesting, looks like they ended up having to introduce typing.Unpack, to differentiate the ambiguity with the the TypedDict referring to the type of all the kwargs, vs just Mapping[str, TypedDict]

Not ideal but not too bad either.


In this section, what is this slash in the function definition for the second foo() ?

https://peps.python.org/pep-0692/#keyword-collisions


I actually created a library for this!

Forge: forge (python signatures) for fun and profit

https://python-forge.readthedocs.io/

https://github.com/dfee/forge


The ability of **kwargs to leave behind no proper documentation and silently swallow any invalid arguments has made us remove them entirely from our codebase. They're almost entirely redundant when you have dataclasses.


Yea, really only useful imho for proxy functions that then just pass the arguments along to something that DOES properly type every arg.


But doesn't this break type checking for the users of the proxy functions?


You can write the proxy/decorator to preserve typing info using a typevar.

    F = TypeVar(“F”, bound=Callable)
    def wrapper(f: F) -> F: …


What about decorators, or wrappers around third-party code whose contracts change frequently (or even second party code when interacting with functions provided by teams that don't follow explicit argument typing guidelines, if you have that sort of culture)?


Usually the solutions range from a culture of “just don’t” to tests/mypy that have become increasingly stricter over the years, every time we’ve come a step further up the ladder. But I admit, it has taken quite some bridging to get there.

Moving to static Python in most places has dramatically improved the code and language.


As someone that works on a Python compiler, this is a very limited view of reality…


Can you explain a bit more?


Those are better handled by typing.ParamSpec, it should keep track of the unwrapped function's arguments.


/me cries in Django

Kwargs everywhere, often only defined for a type at runtime by spooky voodoo action at a distance metaclass shenanigans...


Hello, I am pytest. I heard ya'll are talking about magic and kwargs fudging?


What do you do when inheriting from a base class with a defined __init__ ?


For everybody reading this and scratching their head why this is relevant: Python subclassing is strange.

Essentially super().__init__() will resolve to a statically unknowable class at run-time because super() refers to the next class in the MRO. Knowing what class you will call is essentially unknowable as soon as you accept that either your provider class hierarchy may change or you have consumers you do not control. And probably even worse, you aren't even guaranteed that the class calling your constructor will be one of your subclasses.

Which is why for example super().__init__() is pretty much mandatory to have as soon as you expect that your class will be inherited from. That applies even if your class inherits only from object, which has an __init__() that is guaranteed to be a nop. Because you may not even be calling object.__init__() but rather some sibling.

So the easiest way to solve this is: Declare everything you need as keyword argument, but then only give **kwargs in your function signature to allow your __init__() to handle any set of arguments your children or siblings may throw at you. Then remove all of "your" arguments via kwargs.pop('argname') before calling super().__init__() in case your parent or uncle does not use this kwargs trick and would complain about unknown arguments. Only then pass on the cleaned kwargs to your MRO foster parent.

So while using **kwargs seems kind of lazy, there is good arguments, why you cannot completely avoid it in all codebases without major rework to pre-existing class hierarchies.

For the obvious question "Why on earth?" These semantics allow us to resolve diamond dependencies without forcing the user to use interfaces or traits or throwing runtime errors as soon as something does not resolve cleanly (which would all not fit well into the Python typing philosophy.)


FWIW, I've come to regard this (cooperative multiple inheritance) as a failed experiment. It's just been too confusing, and hasn't seen adoption.

Instead, I've come to prefer a style I took from Julia: every class is either (a) abstract, or (b) concrete and final.

Abstract classes exist to declare interfaces.

__init__ methods only exist on concrete classes. After that it should be thought of as unsubclassable, and concerns about inheritance and diamond dependencies etc just don't exist.

(If you do need to extend some functionality: prefer composition over inheritance.)


This is why I hate Python, absolutely none of this is obvious from the design of the language


At an even more basic level, the lack of static typing seems like such a tradeoff getting an incredibly huge nuisance in readability and stupid runtime bugs that shouldn't be a thing in exchange for a feature that's rarely useful.

Granted, I'm primarily an embedded developer. Can any Python experts explain to me a highly impactful benefit of dynamic typing?


For small programs, dynamic typing can be faster to write (not read). As soon as your program grows: "uh oh". Once you add maintenance into the cost equation, dynamic typing is a huge negative.

To be fair: 15 years ago, people were writing a lot of Java code that effectively used dynamic typing by passing around Object references, then casting to some type (unknowable to the reader) when using. (C#: Same.) It was infuriating, and also very difficult to read and maintain. Fortunately, most of that code is gone now in huge enterprise code bases.


I'm not sold on this. Often I type the output I want to get, and reverse the code to get there. and that's faster because it's now all auto completing.


Interesting point. What language?


That's been my experience of powershell and typescript. To a lesser extreme python because its type hints are a bit crap.

Though I can see why you might not agree after trying an extreme like Rust. Sometimes I want to run a broken program to stop the debugger and see what I'm dealing with and rust won't do that.


To add to your list: During string concatenation, there is no automatic conversion to string. It results in an exception. It is infuriating.

This code:

    "abc" + 123
... will raise this exception:

    TypeError: can only concatenate str (not "int") to str
I have wasted so many hours fixing this same bug, over and over again.


I strongly disagree. Not converting the int to a string automatically is absolutely the right decision. In all code I write, this TypeError would catch an actual error, because concatenation of strings is just not the right tool for creating "abc123" from "abc" and 123, so I would not use it for that. Hence, if this exception occurs, it indicates that I probably mixed up variables somewhere. Use one of the (admittedly too) many string formatting tools that Python offers, for example an f-string like f"abc{123}". (Also, if you have enough type annotations in your code, the type checker will warn you about these, so you can fix them before they hit testing or even production.)


Interesting. 100% of the times I encountered this TypeError, I actually wanted to create the concatenated string. It never caught an actual error.

Now, I guess I'm not against and explicit cast and I can imagine how the error could catch an actual bug. It's painful when the error stops the execution when the string concatenation was intended, but it is not really an issue anymore with the possibility to type check before the execution.

> concatenation of strings is just not the right tool for creating "abc123" from "abc" and 123

Why? This sounds like an opinion to me. String interpolation of formatting features are nice but I find them quite clunky in such simple cases.

Of course when you have to be careful to call str(val), it's arguably as clunky...


Of course it’s an opinion. But in my experience, I almost exclusively have some sort of “template” with some parts to fill out, and string interpolation represents this better (in my opinion). This is especially true if the different parts to fill in are of different types.

As I wouldn’t use string concatenation for this purpose, it’s impossible for me to run into a situation where I wanted the concatenated string. (And even if I did, I would be glad for the reminder to change this into an f-string.)

And the bugs that it catches are of the form: I took some user input, expecting it to be a number, but forgot to convert it into one. Then I passed it to a function expecting a number, and it thankfully crashed instead of turning everything else into strings as well.

Maybe this is also a question that informs your view on this: Do you expect "abc" or 123 to be the “variable” part of that expression?

- If "abc" is a literal in the code with 123 coming from a variable, wanting 123 to turn into a string as well is somewhat unterstandable. - However, if 123 is the literal part of the code and "abc" the value of some variable, I would expect to mostly run into this in cases where I am actually doing some math and that the variable is a string just is some accidentally unparsed input.

In what I do, the second case would be more common.


> So the easiest way to solve this is: Declare everything you need as keyword argument, but then only give *kwargs in your function signature to allow your __init__() to handle any set of arguments your children or siblings may throw at you. Then remove all of "your" arguments via kwargs.pop('argname') before calling super().__init__() in case your parent or uncle does not use this kwargs trick and would complain about unknown arguments. Only then pass on the cleaned kwargs to your MRO foster parent.

The easiest way is to not put your arguments into kwargs in the first place. If you put them as regular function arguments (probably give them a default value so they look like they're related to kwargs), then the python runtime separates them from the rest when it generates kwargs and you don't have to do the ".pop()" part at all.


Thank you for explaining this; there are a lot of comments here suggesting trivial code style improvements for use cases where *kwargs wasn’t actually needed. The more interesting question is how to improve the use case you describe — which is how I’ve usually seen *kwargs used.


Having used Python a lot, I was never glad for multiple inheritance. I’d prefer traits.


Are traits and mixins the same? If not, can you please provide a trivial example. It would be useful to better understand what you mean. When I was very young, learning C++, I thought multiple inheritance was so cool. Now, I know it is like sleeping in the open jaws of a saltwater croc.


Traits as in the original Smalltalk research, Rust traits or Haskell type classes are like interfaces, but only when in scope. So until you import a trait, its implementation on various types isn’t visible.

This makes it possible to safely make a new trait and implement it on a built in type (like giving int a method) without the chance of another unrelated use of the type accidentally using what this trait provides.


I agree - it is convenient to use at first but it sure makes it hard to use an unfamiliar codebase!


They are a necessity for subclasses though, especially when subclassing from an external library that will likely change underneath you.


Seems pretty important for something like a plotting function where you want to be able to pass any tweaks to any subplots.


This does restrict all of your keyword arguments to the same type. If you have keyword arguments of different types, you're right back to no type safety.


Well, if you want to type your kwargs and use newer versions of python, you can use Unpack with typed dicts to achieve that. But the footgun there is that you can't redefine fields when extending them, so no Partial<SomeType> for you.


True, but there are a couple of mitigations available: you can express the types of selected kwargs (by leaving them out of the * residual), and you can use typing.Union/| to express product types for values in the residual as well.


That seems obvious? If you want a variable number of arguments of arbitrary type you have to specify the common supertype, commonly top itself.

To do otherwise would require some form of vararg generics which is uncommon.


It's extremely common for Python programmers to write code with kwargs of different types. Look at subprocess.run() for example.


Why do people not just type everything they want passed?

def variable(n:str, nn:str, nnn:str, *, a:int, b:int, c:int)

Anything after,*, is a kwarg.


It is used when the number of argument can vary, like:

    def sum(*args: int) -> int:
        if len(args) == 0:
            return 0
        return args[0] + sum(*args[1:])


That is an entirely different use-case than a function signature allowing arbitrary keyword arguments. Arbitrary keyword args are different than arbitrary positional args like you have in your example.

GP is suggesting that one should only ever use explicit keyword-only args (anything listed after `*,` in the signature) versus arbitrary keyword args implicit via `**kwargs`.

e.g. (omitting type hints for clarity):

    def sum(*args, **arbitrary_kwargs):
        ...
vs

    def sum(*args, some_keyword_only_arg):
        ...
In my opinion if one finds themselves writing code that uses arbitrary kwargs, they've got a design problem.**


It seems altogether surprising that with an empty list or tuple a, a[1] results in index error, yet a[1:] quietly returns an empty list or tuple.


> It seems altogether surprising that with an empty list or tuple a, a[1] results in index error, yet a[1:] quietly returns an empty list or tuple.

`a[1:]` returns the sequence of elements that start at index 1. If there is no such element, the list is empty. I don’t see any good reason why this should throw an error.


Both cases are an index error. It's just for some other reasons in case of the section, the error is represented by an empty object and it's left to user to handle the result.

This could easily conceal the indexing error unless the caller code explicitly checks the length of the returned section.


> This could easily conceal the indexing error unless the caller code explicitly checks the length of the returned section.

An empty returned section doesn’t mean the index was out of bounds (`a[0:0]`); if you want to make sure you have to check the length before slicing, like in Go.


Here it is. It only underscores the ambiguity of such handling of the section indexes. Flagging the index error would make this less ambiguous/silent.


Then why doesn’t a[1] return None?

I understand the logic behind both decisions, but it’s not surprising that people find it inconsistent and unintuitive.


> Then why doesn’t a[1] return None?

Because there would be no way to distinguish between "a[1] contains None" and "a[1]" doesn’t exist.


And with a[1:] returning the empty list there’s no way to distinguish between a is empty and a only has one element.

These are, in the end, relatively arbitrary language design decisions.


When you slice a list, you get a list. When you see there is nothing inside the returning list, you know that means end of list, contains zero element. Slicing and indexing return object at different level.


Slicing a list, when the first index is invalid for that list, could easily throw an exception instead.


This should signal an explicit error, which invalid index is indeed. If user believes for some reason the invalid indexing is ok, then it could be caught and handled. No ambiguity.


I think it is consistent, it works a bit like filtering an element from a mathematical set.

Given a set of sheeps, let x be the five-legged sheep is inconsistent because we know neither the existence or uniqueness of shuch sheep, so it raises an exception.

Given a set of sheeps, let x be the subset of five legged sheeps is the empty set because there is no such sheep.

but this may also just be because I internalised Python's behavior.

Some language have a specific value to denote the first thing, for example:

   ["a", "b", "c"][4]

gives `undefined` in JavaScript but it differs from `null` which would be the equivalent to `None` in Python (and I don't think Python has such concept).


a[1] has to raise an IndexError because there's no return value it could use to otherwise communicate the item doesn't exist. Any such value could itself be a member of the sequence. To behave otherwise, Python would have to define a sentinel value that isn't allowed to be a member of a sequence.

When using slice notation, the return value is a sequence, so returning a zero-length sequence is sufficient to communicate you asked for more items than exist.

It may be surprising, but it almost always leads to more ergonomic code.

https://discuss.python.org/t/why-isnt-slicing-out-of-range/


You should use `Iterable`


I'm not sure

    print(firstname, lastname)
for example is more readable than

    print((firstname, lastname))
especially since I would then have to write

    print((surname,))
to just print a single string.

Variadic functions are rather classic, I think Go, Rust, C and JavaScript also have them.


FWIW Rust does not have variadic functions. The closest thing would be either macros, which are variadic, or trait methods, which are not variadic but can look like they are.


Oh yeah, that’s right! Thanks for the correction


How is it more "readable"? The two are just as readable.

What do you do with your first example if you have a list (generated at runtime, not a static one) to pass to the function? This wouldn't work (imagine the first line is more complicated):

    l = (1,2,3)
    print(l)


that's what the splat operator is for - it unpacks a list into separate arguments. in this case e.g.

   xs = (1, 2, 3)
   f(*xs)
is equivalent to f(1, 2, 3), not f((1, 2, 3))


And that's readable?


It's more readable, for my brain at least, because there is less distracting syntax cruft lying around.


You're distracted by the parentheses? Really?


Absolutely!

Something I keep relearning is that people are different, sometimes in ways hard for me to comprehend :)


Your example has a fixed number of names. What if you wanted to accept any number of names, like Pablo Diego José Francisco de Paula Juan Nepomuceno María de los Remedios Cipriano de la Santísima Trinidad Ruiz y Picasso? Really, though, Iterables make more sense for monadic types.


We would force broad changes in human society to conform to the assumptions of our database scheme, same as we always have.


I knew a Chinese girl whose parents, surnamed 吕 and 郎, wanted to give her the combined surname 吕郎. This was not allowed, so formally she was surnamed 吕 and given a three-syllable personal name starting with 郎.

There are a couple funny things about this:

1. A personal name of three syllables is stranger than a surname of two.

2. Double-syllable surnames are unusual, but definitely not unheard of. This girl told me that she hadn't been allowed to receive the double surname 吕郎, because it was too long. I asked what would have happened if her double surname had been 司马 instead. "That's different!"

(If the government of China tried to pick a legitimacy fight with the name 司马, it would lose, and everyone knows this.)

So this almost looks like an example of the kind of thing you're referring to, except that the database scheme has nothing to do with it. A surname that was nontraditional but within the technical norms was rejected in favor of a personal name that was both nontraditional and well outside the technical norms.


It's pretty common when wrapping a function that has a large number of config options.

The wrapper is usually some shorthand for building a handful of those args or adding some side-effect, while still allowing the caller access to the remaining config options via kwargs.

Here's one example of that in the wild https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot....


Your signature requires exactly 3 positional[0] and 3 keyword arguments. The OP allows any number of either.

[0] actually 3 positional-or-keyword which is even more widely divergent


But why would you want that doesn't that make for a more confusing api? Would it not be better to just have everything as a kwarg? You would get better types that way


I think what GP is saying is that with explicit kwargs you can't express variadic signatures, i.e. "this function takes one int positional, and then any number of key/value pairs where the values are lists". The variable length is the important bit.

It's certainly debatable whether doing that is better than passing a single argument whose value is a dict with that same type, but many people do prefer the variadic args/kwargs style.


I genuinely don’t understand what you are asking.


If you have enough arguments that the signature becomes obscure to read you need a dataclass to pass into the function instead.

I would rather:

    @dataclass(frozen=True, slots=True)   
    class VarThings:   
        n: int   
        ...

    def variable(a: VarThings):   
        ...
Than a million args


I usually start with a namedtuple unless I need the additional features provided by a dataclass.


Why? Dataclasses are vastly better: more typesafe, less hacky, etc.


The new syntax is basically the same as dataclasses:

  class Employee(NamedTuple):
    name: str
    id: int


> Anything after,*, is a kwarg.

A required positional OR kwarg as you’ve done it. Its closer to an optional kwarg if you expand the type declaration to also allow None and set a None default.

But there are times when you want to leave the number and names of kwargs open (one example is for a dynamic wrapper—a function that wraps another function that can be different across invocations.)


In my experience it's generally because Python developers make functions with an insane number of keyword arguments, and then wrap those functions. They don't want to type them all out again so they use kwargs.

subprocess.run() is an example of that. Also check out the functions in manim.

The inability to properly static type kwargs using TypedDict is probably the biggest flaw in Python's type hint system (after the fact that hardly anyone uses it of course).



PEP 612 made this much better FWIW.

https://peps.python.org/pep-0612/


> In the function body, args will be a tuple, and kwargs a dict with string keys.

This always bugs me: why is `args` immutable (tuple) but `kwargs` mutable (dict)? In my experience it’s much more common to have to extend or modify `kwargs` rather than `args`, but I would find more natural having an immutable dict for `kwargs`.


Yeah, that is odd. Python still has no immutable dict type, except it kinda does: https://adamj.eu/tech/2022/01/05/how-to-make-immutable-dict-...


> This always bugs me: why is `args` immutable (tuple) but `kwargs` mutable (dict)?

Because python didn’t (still doesn’t, but at this point even if it did backward compatibility would mean it wouldn’t be used for this purpose) have a basic immutable mapping type to use.

(Note, yes, MappingProxyType exists, but that’s a proxy without mutation operations, not a basic type, so it costs a level of indirection.)


In Python, except for mutability, is there any difference between tuple and list? In my experience: Pure Python people get so excited about tuples ("oh, it's so Pythonic"); others: much less.


> In Python, except for mutability, is there any difference between tuple and list? In my experience: Pure Python people get so excited about tuples ("oh, it's so Pythonic"); others: much less.

In my experience, people who don’t care about tuples are people who don’t understand them. It’s not much about being Pythonic or not (they exists in other languages) but rather about choosing the right data structure for solving your problem. Tuples are much more than a way of making immutable lists, they offer a type-safe and serializable representation of pairs and triplets; something you can’t have with a list. If you don’t use them yet, I really encourage you to document yourself (and again-- in general, not just in Python) because you’re missing something.


Generally tuples are an antipattern.


Hot take! Can you explain more? It might foster some good discussion. Also, how do you feel about the comments from the other "sister" post? They seem to be very keen about list vs tuple.


#TIL. Also cool to know is pydantic's @validate decorator: https://docs.pydantic.dev/latest/usage/validation_decorator/... and in case you were thinking its not superflous to mypy(https://docs.pydantic.dev/latest/usage/validation_decorator/...).


Is it just me or are Python type hints like..goofy?


As someone who has written Python for nearly 20 years now, and also has plenty of experience with strongly and statically typed languages (including a fair bit of Haskell), I think type hints in Python should at most remain just that, hints.

A language being statically typed or dynamically typed is a design decision with implications for what the language can do. There are benefits to each method of programming.

Trying to strap type checking on to Python is born out of some misplaced belief that static type is just better. Using Python as a dynamically typed language allows you for certain programming patterns that cannot be done in a statically typed language. There are some great examples in SICP of Scheme programs that could not exist (at least with as much elegance) in a typed language. Dynamic typing allows a language to do things that you can't do (as easily/elegantly) in a statically typed language.

Some may argue that these type of programming patterns are bad for production systems. For most of these arguments I strongly agree. But that means for those systems Python is probably a poor choice. I also think metaprogramming is very powerful, but also a real potential footgun. It would be ridiculous to attempt to strip metaprogramming from Ruby to make it "better", just use a language that depends less on metaprogramming if you don't like it.

This is extra frustrating because in the last decade we've seen the options for well designed, statically typed languages explode. It's no longer Python vs Java/C++. TypeScript and Go exist, have great support and are well suited for most of the domains that Python is. Want types? Use those languages.


Have to disagree with this. Choosing a language is not just about its features but also its ecosystem. I chose Python for my current project because it has great libraries that don't exist in other languages.


"my current project" type problems are where Python is great. Types remain "nice to have" (if you like them) and aren't really essential compared to the ease of prototyping new ideas and building a PoC. You're choosing Python because the benefit of libraries outweighs your personal preference for types.

Most of my work is in machine learning/numeric computing, so I'm very familiar with the benefits of Python's ecosystem. Basically all of AI/ML work is about prototyping ideas rapidly, where access to libraries and iterating fast greatly trumps the need for type safety.

At nearly every place I've worked, Python is the tool for building models quickly but shipping them to production and integrating them with the core product almost always involves another language, typically with types, better suited for large engineering teams working on a large code base where you really want some sort of type checking in place. Most of the companies I know that do serious ML in production typically take models from python and then implement them in either C++ or Scala for the actual production serving.

It's worth pointing out that the vast majority of those libraries you use were initially developed without any consideration, or need, for types. Great, reliable, software can be written without types. Dynamic typing is a great language choice, and there's no need to fight the language itself by trying to bolt types on.

Where types are important is where you have a complex, rapidly changing code base with a large number of developers of differing skill levels releasing frequently. If that's the environment you're in, I would strongly recommend against using Python in prod, even if it means you have to implement the features of some libraries internally.


You keep saying how typing prevents some "elegant" things (are they really prevalent?) or "iterating fast greatly trumps the need for type safety". But in my experience, anything more than half a dozen modules/classes can turn into a bloody minefield very fast. And the supposed speed of iteration is negated and reversed by having to dig through the untyped codebase trying to very inefficiently determine the types manually, with my own eyes, as to not screw up. What are those supposed super-benefits of ditching type safety?


> anything more than half a dozen modules/classes can turn into a bloody minefield very fast

I don't disagree with this, but it's worth noting that there are many data scientists I've worked with who have never written a python class or module, and yet produced large amounts of valuable work.

For quick prototyping and exploratory work, both domains where Python sees a lot of success, it's often the case that you don't really know what types you're working with and are iterating very quickly, such that not being able to quickly change all of the types your working with can be a time sink.

So I think you're imagining writing software that is starts more well defined than most optimal use cases for Python.

> And the supposed speed of iteration is negated and reversed by having to dig through the untyped codebase trying to very inefficiently determine the types manually

I do understand this feeling, but this is applying the logic/patterns of statically typed programming dynamic programming. The dynamic answer to "how do I write robust code" is not to keep the types in your head, it's write tests. Well written tests in turn become documentation for your code.

Again, this style of programming works well when you don't really even know how you want your programming to behave. You naturally write tests when writing this kind of code, even if it's just manual tests. Get in the habit of starting your manual tests as unit tests and you're already doing TDD.

Dynamic types works best when you are writing code in a very interactive and exploratory way.

To be clear: I'm not advocating for dynamic languages over statically typed ones. I do believe, whenever possible, production software should be written in strongly typed languages. I think anytime you know the behavior of your program before you start writing you should use statically typed languages.


Last thing I want is Go types tbh. Trying to figure out what implements a given interface is difficult without a sufficiently clever tool.

Funnily enough, Python has Go-esque types also, Protocols, and they have the exact same issue. I only use them when I really really need structural typing to reuse code in a typesafe way.


Call me crazy, but I just use a statically typed language where static types are required.


I agree. It adds all the inconvenience of static typing with none of the benefits.


> none of the benefits

Autocomplete and type-checking are massive boons to writing "type-correct" code, fast. It doesn't guarantee that your code won't explode at runtime or is logically correct (that's what tests are for), but it does help eliminate an entire class of bugs, and, again, speeds up development a massive amount when dealing with very large codebases.


They're quite limited in some ways, obscenely powerful in others, and have a fairly strange syntax, yeah.


Big time. Getting better very quickly however


Alternatively, use an `@overload` in a `.pyi` file and specify your types there.

This means that you will have 2^N combinations and doubling every time you accept a new argument.

If that is not good enough, then simply use a `TypedDict` with everything optional instead of `**kwargs`. Your call will then become `foo(SomeTypedDict(p1=p2,...))`.


That article promulgates a misunderstanding about immutability. For my way of thinking, python is already an interpreted language and I can enforce tropes in code more cleanly and effectively than people taking something five levels up at face value and trying to figure out what sticks when they throw it against the wall: no wonder they end up frustrated, and it's a frustrating situation.

Given:

    def foo(*args):
        print(args)
        return

    class Thing(object):
        def __init__(self,a,b):
            self.a = a
            self.b = b
            return
        
        def foo_style(self):
            return (self.a, self.b)
args is not required to refer to a tuple:

    >>> foo(*[31,42])
    (31, 42)
I can have objects construct parameters conforming to the specifications for a signature:

    >>> foo(*Thing(3,91).foo_style())
    (3, 91)
Consider that a counterexample.


Within the function, args is a tuple, as your output demonstrates.


What if the second argument is a float?


Nothing. Type hints are only paint on the bike shed. By default, there is no function in the language, e.g., raise exception for incorrect type.


Although these two comes in handly, people have been using them wrong. Often in scientific open source package, they slap *kwargs in function definition without documentation. How am I suppose to know what to pass in?

https://qiskit.org/ecosystem/aer/stubs/qiskit_aer.primitives...


Especially when they don't even leave a doc string so you're forced to track down the packages documentation online just to interact with certain interfaces.

I work in a large python codebase, we have almost no usage of `*kwargs` beyond proxy methods because of the nature of how they obfuscate the real interface for other developers.


The worst is when someone puts **kwargs at the base of a class hierarchy, not only necessitating its use in subclasses (if you want to be strict about types) but also swallowing errors for no good reason. Fortunately I think this style is fading out as type hints become more popular.


When I was first starting out, a then senior engineer told me: "friends don't let other friends use kwargs".

That always stuck with me.


I once worked on a code base where we had *kwargs passed down 4 or 5 layers deep (not my idea.) It was a true joy.


This is literally me. It is a math program that can evaluate equations and generate code. 6 layers of heterogeneous data structure which the math operation being act on 1st layer has its effect down to 6th layer. Temporarily using *kwargs to make it works but still thinking what is the proper way to do it right.


Out of interest, what sort of company/industry do you work in where you're able to work on this kind of thing?


Can you organize the data structures into classes or dataclasses?


Already doing this. The problem is there are 5 layers in between. Copy and paste the same docstring into all layers is doable but do not seem smart.


Sadly a problem with any wrapper function is that it nullifies this kind of information. Use functools.wraps.


PyCharm usually figured this out if it's not too complex. I often wrap session.request() with some defaults/overrides and autocomplete usually shows me the base arguments as well.


My question is that can @warps warp more than 1 function?

Maybe in some use case people need to merge 2 functions into 1, I don't know if it can handle this situation.


I'm not sure what it means to "merge two functions into one", can you elaborate?

If you are referring to a type signature for a function that passes through it's arguments to one of two inner functions, each of which has different signatures, such that the outer signature accepts the union of the two inner signatures, well ... you could achieve that with ParamSpecs or similar, but it would be pretty hard to read and indirected. Better, I'd say, to manually express appropriate typing.Union (|) annotations on the outer function, even if that is a little less DRY.


> I'm not sure what it means to "merge two functions into one", can you elaborate?

I'm not OP, but I see this pattern often enough:

    def foo(**kwargs):
        pass

    der bar(**kwargs):
        pass

    def wrapper(**kwargs):
        foo(**kwargs)
        bar(**kwargs)


Yup, this exactly.


What would you have wraps() do for two functions? Concatenate the docstrings and union of the annotations? Perhaps you want only the latter, that seems like it could easily be its own decorator: `@functools.annotate_intersection(f1, f2, …)`


That makes sense. If you have two functions with identical type signatures, then you should be able to invoke @functools.wraps (or its underlying functools.update_wrapper) such that the annotations are propagated correctly. In this case, that might be as simple as "functools.wraps(assigned=('__annotations__'))".

The rest of what "wraps" does isn't really applicable to what you're trying to do, though: propagating __doc__, __name__ or __module__ etc. isn't really meaningful when combining two functions with different names, as toxik points out.

Coming at it from the other side, you can use typing.ParamSpecs to write a higher-order "wrapper()" that wraps 2 arbitrary functions with identical signatures, like this:

    S = typing.ParamSpec('S')
    R = typing.TypeVar('R')
    def wrapper(func1: typing.Callable[S, R], func2: typing.Callable[S, R]) -> typing.Callable[S, typing.Tuple[R, R]]:
        def inner(**kwargs):
            return func1(**kwargs), func2(**kwargs)
        return inner

You can additionally use typing.Concatenate to indicate additive modifications to those signatures if the two inner signatures aren't the same.

However, what I thought you meant originally is the common problem of deduplicating large/complex function type signatures so you don't have to write them out multiple times or risk drift. In the parent post, consider what would happen if the signatures of "wrapper", "foo", and "bar" were all a) large and b) the same/very similar. Python's answers to that problem are much less good:

1. Duplicate them and write a unit test that ensures that the __annotations__ property of the functions that should have the same signatures remain in sync (or sub/supersets of each other, or whatever you'd prefer). This addresses drift, but requires a testing system and doesn't save the duplicate code.

2. A hack: use object constructors for functions. Write the in-common parts of the signature in the constructor of a parent class, and then put the body of "foo" and "bar" in the __init__ methods of child classes whose instances are not used/are useless, taking advantage of the superclass relationship to indicate to typecheckers that the parameters are all shared. For example, many typecheckers will do the right thing when handed code like this:

    class _Super:
        def __init__(self, arg1: SomeType, arg2: SomeOtherType, ...):
            self.arg1, self.arg2 = arg1, arg2
            
            
    class foo(_Super):  # Bizarre casing is intentional, this is not meant to be used as an instance
        def __init__(self, *args, **kwargs):
             super(*args, **kwargs)
             ...  # Business logic of old "foo()" method goes here, using object fields self.arg1 etc. instead of named variables.
             
    class bar(_Super):  # Bizarre casing is intentional, this is not meant to be used as an instance
        def __init__(self, *args, **kwargs):
             super(*args, **kwargs)
             ...  # Business logic of old "bar()" method goes here, using object fields self.arg1 etc. instead of named variables.
             
    class wrapper(_Super):
        def __init__(self, *args, **kwargs):
            foo(*args, **kwargs)
            bar(*args, **kwargs)
This does solve the duplication problem, and you can use dataclasses with __post_init__ methods for your "foo"/"bar" business logic to smooth out some of the boilerplate and weirdness there, but it remains a very bizarre coding style which substantially trades away readability/familiarity (and performance, if this is on a very hot path) in return for type-checker-friendliness.

3. Use a databag object (ideally an immutable slotted class, dataclass, typing.NamedTuple, [c]attrs, or something of that sort) to encompass all the data that would previously go in your argument signature, so that "wrapper", "foo", and "bar" all end up taking a single such object as their sole argument and accessing fields of that argument to do their work. This is probably (maybe? Lots of Scotsmen in this area...) the most traditionally Pythonic of these options, but is still a far cry from the convenience of something like functools.wraps.


OT, but this is my number one peeve with code documentation: going to the effort to write a doc comment, taking up at least 6 lines, probably using some special syntax, cluttering up the code, but then adding no information that can't be derived from the signature.

If you're not going to document something (which I totally respect), at least don't make the code worse while doing it.


Even better when the docstring doesn't even match the signature.


I have been annoyed by this too! I like how seaborn handles it now in documentation: https://seaborn.pydata.org/generated/seaborn.barplot.html?hi...




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: