Do you get upset when you see one of create_engine("postgresql://"), create_engi...

kstrauser · on April 10, 2024

I didn't say it was uncommon, just that it's a bad idea.

There's no need whatsoever for a separate client API. There could be a convention like:

  from csvbase import loader as csvloader
  df = pd.read_csv(csvloader("calpaterson/onion-vox-pops"))

The user wouldn't have to know anything but what to import to fetch a certain thing, and it's explicit about what's coming from there. There's also less risk of mistyping the URL string ("oops, I just typed cvsbase and accidentally loaded a list of CVS drugstores"), and code completion can tell you that csvloader() fetches things through the csvbase module.

Cons:

- It takes 5 seconds longer to type, one time.

Pros:

- You can tell what the code does at a glance.

mekoka · on April 10, 2024

You're missing the point. It's not about the use of bespoke URIs. This is the debate on "convention over configuration" revisited. Conventions are cute and always look clever while you're reading documentation. But they're a pain during maintenance, as they force others who don't know them to first go read the doc, instead of simply inferring from the previously researched and clearly stated declarations how all the wires connect.

When I see `pd.read_csv("csvbase://")` during debug, I wonder how pandas knows to speak to csvbase (as the article anticipates). Nothing is imported. Nothing is configured. Things just speak to one another. So, can I also call pd.read_csv("other_csv_server://") like this? When I replace pandas with koalas, will koalas.read_csv("csvbase://") also work? How the wires connect between pandas and csvbase is hidden. Unless you know that the two are obeying some implicit lower layer (the fsspec standard), this becomes a mystery. Mysteries are the last thing you want when debugging.

I don't know which `create_engine()` function you're alluding to. The one I know and have used comes from SQLAlchemy. How it works has always been obvious. I've never seen any mention of fsspec. I looked at its code and it's predictably just a convenient syntax to specify connection information in a single string. The string is simply parsed to extract connection attributes, which are then relayed to the lower DBAPI. There's no mystery involved.