Show HN: Vanna AI – Open-sourced text-to-SQL in Python

npace12 · on Sept 8, 2023

Looks nice but I'm confused why it's advertised as if it's training anything. It's just doing retrieval from a vector db and generating a prompt with openai/LLM.

zainhoda · on Sept 8, 2023

We use "train" in a colloquial sense because you're right -- it's RAG. We've tried to think about a different term but "train" seems to be the one that most data analysts resonate with.

ban-lan-gen · on Sept 8, 2023

Just curious what kinds of models are used in your project?

https://arxiv.org/abs/2306.08891 One paper seems to suggest trained specialized model can outperform LLM in some tasks.

zainhoda · on Sept 8, 2023

Thanks for the link! I'll check it out.

In the meantime, this post we wrote might be interesting for you:

https://vanna.ai/blog/ai-sql-accuracy.html

crazy_marksman · on Sept 9, 2023

Impressive! Quick question - is it possible to generate sql for a slight variant of sql? My project augments standard sql with a few new constructs.

zainhoda · on Sept 9, 2023

It probably could but it would require adjusting the prompt. You’d have to override the generate_prompt function and tell it that you’re using a variant of SQL and describe the differences.

crazy_marksman · on Sept 10, 2023

Is there a token limit for the train step? Like length of documentation, number of example SQL queries?

zainhoda · on Sept 11, 2023

There isn't technically a limit on the storage side but it's generally better if you keep documentation to a manageable length.

You call vn.train(sql=...) on each individual SQL statement that you have.

What'll happen under the hood is the package will use the 10 most relevant SQL statement examples, 10 most relevant pieces of documentation, and 10 most relevant DDL statements.

If using 10 examples exceeds the (approximate) token limit for the model, it'll pare down to a smaller number that'll fit into the context limit.

jesuslop · on Sept 8, 2023

These results are impressive, guys. I'm saving a pointer to dig deeper to glean the details. Tip in the hat for making it open.

tzm · on Sept 8, 2023

Open source, but requires an API key?

zainhoda · on Sept 8, 2023

API key is optional -- that's only if you want to use the hosted vector database. When running locally, no Vanna API key is necessary. Instead, you can put in an OpenAI API key as shown here: https://vanna.ai/docs/local.html

If you want to use a locally-hosted LLM, that's also possible by implementing the necessary abstract methods: https://vanna.ai/docs/vanna.html#open-source-and-extending

ashishsingal · on Sept 8, 2023

oh how exciting :)