Run and create custom ChatGPT-like bots with OpenChat

underlines · on June 7, 2023

Disclaimer: I am curating LLM-tools on github [1]

A few thoughts:

* allow for custom endpoint URLs, this way people can use open source LLMs with a fake openAI API backend like basaran[2] or llama-api-server[3]

* look into better embedding methods for info-retrieval like InstructorEmbeddings or Document Summary Index

* Don't use a single embedding per content item, use multiple to increase retrieval quality

1 https://github.com/underlines/awesome-marketing-datascience/...

2 https://github.com/hyperonym/basaran

3 https://github.com/iaalm/llama-api-server

forgingahead · on June 7, 2023

* Don't use a single embedding per content item, use multiple to increase retrieval quality

Can you share some specific examples of what you mean by this? How would you process specific info types (eg: news article, or web page, or product catalogue data) this way, and how would you handle retrieval that makes the quality "better"?

*Edit: Thanks for all replies so far - yes I am aware about splitting or chunking the data, but interested in a good write-up of techniques and pros/cons of each with examples. Eg: Chunking sentences vs. paragraphs, providing context around the embedding result, asking GPT to generate questions to chunks and embedding that instead, combining interaction data (eg: purchases or clicks after search queries) with actual content data before embedding, embedding attributes around data, and so on.

iamflimflam1 · on June 7, 2023

There are a couple of things that can help. As has been pointed out by other commenters - chunking up content is very useful.

The other often neglected approach is to use an LLM to derive new content and use the embedding of this as well.

E.g. ask the LLM “give me a list of questions that can be answered by the following passage”

You then use embeddings of the generated questions instead of embeddings of the original content.

jerpint · on June 7, 2023

another approach is HyDE, where you ask the LLM to come up with a plausible (but likely wrong) answer, and use the embedding of the wrong answer to find the appropriate chunk, pretty clever

sandkoan · on June 7, 2023

Presumably, they're referring to chunking up the data into discrete semantic units—smaller vectorizable subsections (e.g., paragraphs) more precisely capturing different parts of the data.

lgas · on June 7, 2023

I'm curious about more sophisticated answers to this question, but the obvious approach would be to split the article or web page into sentences and do an embedding per sentence.

mrtranscendence · on June 7, 2023

When I was playing around with search via embeddings (as a test I was using Vampire the Masquerade V5 sourcebooks, and asking rules questions), I got the best results -- in terms of correct answers -- by using sentence embeddings. I'd search the query against the sentence embeddings, and then retrieve more context surrounding the winning sentence(s). That context would be passed to the LLM.

It wasn't perfect, though. I'm tempted to try the avenue of having an LLM generate questions for each passage and then use those embeddings, but it sounds a bit expensive to set up given the length of the books.

lgas · on June 10, 2023

Sentence embeddings + retrieving context makes sense.

By any chance, have you done any work with indexing code using embeddings? I'd like to do something similar there, but there's no obvious notion of "sentence", especially across languages.

Probably the closest analogue is just lines of code, but breaking lines on newlines might break an expression in the middle removing meaning from both halves.

I was planning on trying indexing overlapping groups of lines but haven't had time yet.

lhenault · on June 7, 2023

Using this as an opportunity to mention my own related project, perhaps it can end up on your nice list one day. :)

https://github.com/lhenault/SimpleAI

underlines · on June 16, 2023

added to the list :) thanks!

moneywoes · on June 7, 2023

Do you have a list somewhere, most impressive project you’ve seen?

milar · on June 7, 2023

Read the README and the code. This is a breathless pronouncement of a thin wrapper of some other wrappers. We really need to watch the hype, lest we kill the actually important stuff going on in ML

jkukul · on June 7, 2023

to be fair the author mentions in the README that the project was built to validate the idea. I think if you're just trying to validate an idea it's totally fine to be using thin wrappers

naiv · on June 7, 2023

I also think it is a great way to learn from reading the code without noise and overhead added

firtoz · on June 7, 2023

Sometimes innovation happens with small steps.

obiefernandez · on June 7, 2023

This feels similar to MagmaChat [1] which I open sourced about a month ago. Except mine is in Ruby on Rails.

[1] https://magmachat.ai

sunshadow · on June 7, 2023

I think you need a small video and much more clear and simple marketing material. For example, these don't mean anything for me: "Boxcars, CableReady, and StimulusReflex"

jsherry · on June 7, 2023

This may be a naive question. But is there a way to embed a chat bot like this that only queries the data we feed it, and not the universe of other stuff in gpt? Like I don’t want people using the chatbot in our product to query a good strawberry shortcake recipe. We just want them to query about data we allow it to query which is native to our business.

Is this feasible? Thx.

killthebuddha · on June 7, 2023

When you're building applications on top of LLMs, there are a number of central problems that you're trying to solve and this is one of them. Solutions are numerous and widely variable, everything from basic regex parsing to fine-tuning validator models to new programming/modeling languages. Here's some examples:

  - https://github.com/microsoft/guidance
  - https://github.com/NVIDIA/NeMo-Guardrails/
  - https://github.com/r2d4/rellm
  - https://shreyar.github.io/guardrails/
  - https://lmql.ai/
  - https://github.com/jbrukh/gpt-jargon

mrtranscendence · on June 7, 2023

There’s no foolproof way to do what you want at this point. You could have a separate model trained to infer whether a query is relevant to your product, and then reject the query if it’s predicted to be irrelevant. That’s not 100%, though.

fleischhauf · on June 7, 2023

I don't think you can really 100% prevent it. even openai has issues with gpt responding in an undesired way (google Dan, where people try to hack instructions to get responses that are undesired by the openai team). However I think you can make it more difficult (as in the person trying to misuse your chatbot will need to put in some effort to get the strawberry cake, if you have instructed it before to only give information about your product)

experimenting · on June 7, 2023

Yes, this is feasible.

Look into https://github.com/NVIDIA/NeMo-Guardrails and specifically to your question there are "topical rails" to ensure the conversation stays on a set of topics you greenlighted.

Also takes care of jailbreaks and allows custom conversation flow templates.

mrtranscendence · on June 7, 2023

I'm curious how that works, as the documentation is a little under-specified. It seems like it requires specifying exact "utterances" from the user, but I don't think that can be the case -- wouldn't it be flatly useless that way? But it's not clear how to use it to, for example, disallow talking about politics. Or to disallow talking about topics unrelated to the dev's product, for that matter.

chintler · on June 7, 2023

This is somewhat possible. I've created a way to chat to our company's material publicly. We used a lot of prompt engineering and custom guardrails to achieve this. However, it severely limited the length of the conversation that a user can have.

moneywoes · on June 7, 2023

I think Simon Wilson who has a big blog posted often mentioned this is impossible

whywhywhywhy · on June 7, 2023

Shouldn't the title be "Run and create custom GPT-3 bots with OpenChat"

Because it doesn't run "ChatGPT-like chatbots" which implies a non-OpenAI model with similar results, it just runs OpenAI tech in a wrapper?

ivanstegic · on June 7, 2023

This is a great idea and would love to see something like this succeed!

If I understand how all of these OpenAI dependent apps work, none of them actually have the LLM and are doing any kind of heavy processing. AFAIK, they’re all packaging your data, submitting it to OpenAI on every request and then repackaging the output. There’s no real indexing, no real tangible thing, you have to start from scratch every time. So it’s likely going to be very expensive and super slow.

Or am I wrong and I’ve missed something here?

jkukul · on June 7, 2023

For most applications, packaging all the data and submitting it to OpenAI won't be feasible due to the limited token window size.

I think the most common design pattern nowadays goes like this:

1. Chunk all your data (e.g. per paragraph of content)

2. Generate an embedding for each chunk

3. Index embeddings in a vector database

4. When a query comes in, find chunks relevant to the query (based on embeddings similarity) and ONLY send the relevant chunks + query to a LLM to formulate the answer

Quickly glancing through the repository from this post, I can see that it also follows this pattern. It uses OpenAI's embedding API for 2. and Pinecone DB for 3.

weinzierl · on June 7, 2023

I've seen this described as the common approach and argued for it but with my limited knowledge I have difficulties countering the argument that it would be best to just finetune the model with your own data.

I don't think it is as much the context window size because you would chunk your data anyways. I think the counter argument is either that finetuning is limited by the risk of overfitting and catastrophic forgetting or cost prohibitive. I think it is more of the former. Am I on the right track with this arguments?

Another point to consider is probably the vector DB contains an exact version of your data you get that as a result whereas the model will only be able answer vaguely or by paraphrasing.

moneywoes · on June 7, 2023

In contrast, What’s the flow for training or fine tuning your own model

kaycebasques · on June 7, 2023

Once I dug in to the fine-tuning APIs [1] I realized that the phrase "training the model on your docs" often doesn't make sense for the use case people are trying to solve. You provide hundreds of input examples and tell the model how it should complete those prompts. Fine-tuning has a lot of use cases, but "keeping the LLM generally grounded in the facts of my website" is not one of them.

[1] https://platform.openai.com/docs/guides/fine-tuning/prepare-...

weinzierl · on June 7, 2023

Fine-tuning has a lot of use cases, but "keeping the LLM generally grounded in the facts of my website" is not one of them.

Yes, that's what everyone says and it makes total sense to me. I'm looking for (technical, but not too technical) arguments why it is not possible. There I'm not so much interested in the "grounded in the facts of my website" point but more in the similar "take the data from my large private knowledge base into consideration" point.

In other words I don't want to restrict the knowledge the model has or the answers it gives. I want to add a considerable amount of my own knowledge. This seems not to be possible without training from scratch. The question is "Why?"

ivanstegic · on June 7, 2023

Thank you for this.

weinzierl · on June 7, 2023

If you are interested in this approach, I found there are good examples using LangChain, so it's a good keyword to search for.

tomschwiha · on June 7, 2023

This seems to be mainly a wrapper around the OpenAI API From the repo they want to integrate Open Source LLMs in the future too.

I feel lately - GPT-4 is superb in performance, but locked up. Using a weaker model feels better because I can just spin up a server and run it on my own. Recent Twitter/Reddit changes remind that relying on others can be a bad thing.

ryan-adami · on June 7, 2023

Mentioned this in a preview reply but this was something Convostack wanted to solve by allowing anyone to integrate their Langchain agent with a production-ready chatbot. It's completely open-source and also has pre-built React UI components. As a disclaimer I helped work on the project but curious to hear what you guys think: https://github.com/ConvoStack/convostack

ivanstegic · on June 7, 2023

Yea, agree.

ShakataGaNai · on June 7, 2023

If I understand what this tool is doing, there is an important security caveat.

>providing PDF files, websites, and soon, integrations with platforms like Notion, Confluence, and Office 365.

Means that anything you feed this ChatBot, gets turned into data that's uploaded to OpenAI. So if you're using an internal Confluence, consider all that data public now. We've already seen intranet pages show up on ChatGPT/OpenAI in the past.

iamflimflam1 · on June 7, 2023

This simply isn't the case and people need to stop repeating it without doing some research first.

https://openai.com/policies/terms-of-use

    Use of Content to Improve Services. We do not use Content that you provide to or receive from our API (“API Content”) to develop or improve our Services. We may use Content from Services other than our API (“Non-API Content”) to help develop and improve our Services. You can read more here about how Non-API Content may be used to improve model performance. If you do not want your Non-API Content used to improve Services, you can opt out by filling out this form. Please note that in some cases this may limit the ability of our Services to better address your specific use case.

ShakataGaNai · on June 7, 2023

The research was seeing intranet sites cited by OpenAI. So maybe someone copied and pasted it into the web interface. Maybe the big multi-national firms had their intranet exposed to the internet. I don't know how it got there; heck, no one may know.

naiv · on June 7, 2023

The OpenAI API does not use the data for training.

tagyro · on June 7, 2023

I went ahead and installed it in a proxmox container, was fairly easy on x64 (arm support would be nice). One suggestion: it would be nice to have a short-term memory - a la ChatGPT. With the token limit at 4-8k for GPT-4, it would be nice to take advantage of that with both the "long-term memory" (vector store) but also a "short-term" one (as in, sending the previous questions/answers for context).

thomasfromcdnjs · on June 7, 2023

Technical question: What code in the websiteHandler.ts is responsible for spidering the website in question?

https://github.com/openchatai/OpenChat/blob/main/llm-server/...

tomschwiha · on June 7, 2023

Its using Lang Chain and the document loader: https://python.langchain.com/en/latest/modules/indexes/docum...

Tepix · on June 7, 2023

"Support offline open-source models (e.g., Alpaca, LLM drivers)" is already on the roadmap, which is great! There's just so much cool stuff we can try with LLMs now…

What's the best discussion forum to exchange ideas, experiences and to collaborate on using and customizing (local) LLMs for (indy) gaming and other cool projects?

ryan-adami · on June 7, 2023

This was the goal with ConvoStack which allows people to implement our IAgent Express js interface with any custom Langchain agent. We saw this as an issue with other chatbot platforms which were limiting for developers. Comes with pre-built React components and Redis for caching as well to easily have production-ready chat interface. It's completely open-source too so can be self-hosted and modified to your liking. As a disclaimer, I helped develop this but would love for you to check it out and see if this is something you're looking for. https://github.com/ConvoStack/convostack

Tepix · on June 15, 2023

Looks interesting, thanks.

briscoooe · on June 7, 2023

I just wrote a GitHub issue outlining some questions I have about this project. I presume will get deleted soon so I'm posting here:

https://github.com/openchatai/OpenChat/issues/43

Filligree · on June 7, 2023

Deleted already. What was in it?

jerpint · on June 7, 2023

> Each chatbot has unlimited memory capacity, enabling seamless interaction with large files such as a 400-page PDF.

How is this possible? Do they do fancy tricks at inference?

killthebuddha · on June 7, 2023

I've scanned the codebase and IMO the statement is mostly misleading. I think LangChain agents (which this is a thin wrapper around) do have some default compression/reranking behavior to help fit relevant context into the context window, but it's very, very, very far from actually infinite short-term memory.

I think the framing where the statement could be considered true is if you assume "memory = persistent storage", which is IMO not what most people will do.

Note: I'm not trying to imply the authors are being intentionally misleading.

minimaxir · on June 7, 2023

Likely an embeddings vector store, which is not "memory" as traditionally defined in conversational AI apps.

jdonaldson · on June 7, 2023

Just using this post as a self righteous dunk on developers trying to cash in on "Open" AI when it is in fact not open, including OpenAI itself.

sagarjani123 · on June 7, 2023

The app is very intuitive and easy to use, it seems the code is in PHP. I made a similar app findrGPT.com for website and pdf

funerr · on June 7, 2023

How is this different from chatbase.co ? (not affiliated)

qxxx · on June 7, 2023

don't know what chatbase.co is but it seems to not be free. OpenChat is (?)

thepangolino · on June 7, 2023

Glad to see this pop up. Local models are vital for privacy!

tirpen · on June 7, 2023

This isn't a local model though, is it?