At this stage I feel that the natural evolution for SQL is instead to use englis...

munk-a · on Feb 28, 2024

Does this really appeal to developers? SQL is an extremely expressive language and while there are things in it I would change they are mostly minor points. Writing a query in english where the actual SQL it translates to is dependent on the specific version of the LLM seems dangerous - and forcing the query expression to be in a language that not everyone speaks either as a first language or at all seems to make code less accessible.

When properly styled with good indentation and syntax habits SQL is extremely readable.

alfalfasprout · on Feb 28, 2024

It's good for the case where you want a somewhat complex analytical query generated on the spot.

But this is only for initial generation. After that, you should be using pure SQL.

towelpluswater · on Feb 29, 2024

I’ve never understood why people want a more verbose version of sql.

I think what people really want is business rules and data cleaning and schema discovery.

If you had to use English against multiple source systems to and tons of joins, the sentence would be paragraphs.

Where I think there’s value is in using something like a data catalog to label business rules against a data warehouse, tied to dashboard queries and other common ones.

But that’s a hard problem and a unique model to every customer. And always changing.

staticautomatic · on Feb 29, 2024

Combining schema discovery and data catalog seems like it might be a hard problem requiring a lot of LLM prompt engineering gymnastics but maybe I underestimate the state of the art.

importantbrian · on Feb 28, 2024

We might eventually get a good LLM to SQL tool, but my experience with them is that they make slick sales demos, but are worse than useless in the real world.

You have to know SQL to use them. They produce a lot of code that looks correct and produces correct-looking results with subtle errors. So you can't just hand it to someone who doesn't know SQL and let them query the database, but that's the use case where something like this would be valuable. You have to be experienced with SQL and know all the peccadillos of the DB you're working with to check the query and output for correctness.

For someone like me who is experienced with SQL, I can write simple queries just as fast as I can figure out how to prompt the LLM to get what I want. Where a tool like this would be really helpful is if it could help me write more complex queries more quickly. However, it is non-trivial to get the LLM to generate complex queries that take into account all the idiosyncrasies of your specific data model. So again it ends up being much faster for me to just write the query myself and not involve the LLM.

Where I think LLMs go wrong with SQL is that to write good SQL you have to have a deep knowledge of the underlying data model, and the LLMs aren't good at that yet.

apoorvnandan · on Feb 29, 2024

Strongly agree. LLMs need something more than just the DDL of the tables and an instruction to write useful SQL in the real world. However, I've had decent success by (1) integrating heavily with a semantic layer on top of your database, and (2) going the agent approach where the LLM is allowed to run different queries and explore the data before writing the final query.

staticautomatic · on Feb 29, 2024

Asking the LLM if the query can be optimized a few times and then checking it in a planner works surprisingly well for me but YMMV.

caust1c · on Feb 28, 2024

We built LLM-to-SQL before this at RunReveal, and while it's useful and gets queries mostly correct 80% of the time, 20% of the time it's way off or requires nontrivial manual intervention.

We're still fairly bullish on the LLM-to-SQL front though, but in the meantime PQL is a good bridge.

munk-a · on Feb 28, 2024

As a company that's invested into this. Would you mind talking as to why you don't want to use raw SQL - are there particular deficiencies you've found in it?

__mharrison__ · on Feb 28, 2024

See tools like pandas and Polars. These database libraries are abstractions that give you a spray of SQL functionality. I prefer using these libraries because it feels much more intuitive (and works with the Python/arrow ecosystem). (I'm also biased since I make a portion of my living off of pandas training material.)

ejcx · on Feb 28, 2024

We do use raw sql, but we're a security business which tends to have heavy reliance on other languages that have a similar syntax to pql

jodrellblank · on Feb 28, 2024

Here is a long blogpost "against SQL" which lists many deficiencies of it: https://www.scattered-thoughts.net/writing/against-sql/

In short, it has a longer spec than famously-complex C++ while making a much less expressive language out of it.

rgmerk · on Feb 28, 2024

A thought experiment:

Rather than an LLM, you can send your request for an SQL query directly to Donald D. Chamberlin, one of the original designers of SQL. Furthermore, he gets an ERD for your database.

What odds you get back a query that gives you correct answers?

mritchie712 · on Feb 28, 2024

we (https://www.definite.app/) do this.

The SQL generation works well out of the box and works better as you update the semantic layer. The semantic layer includes things like joins and measures (e.g. aggregate functions) that you'd want standard definitions for. For example, you don't want an LLM creating a definition for MRR on the fly. All the semantic definitions are plain SQL.

quick demo: https://www.loom.com/share/a0d3c0e273004d7982b2aed24628ef40?...

dhosek · on Feb 28, 2024

SQL is part of a generation of languages which attempted to have (within strict syntactical constraints), a natural language-sounding approach to begin with (another one that comes to mind is AppleScript).