Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It seems like the article talks a lot about what can be done with a new model but doesn't say how a language model (ChatGPT) is integrated with code interpreter, how they exchange data and how the data is represented as tokens for a language model. Can someone please explain? I understand how language model works, how it consumes and predicts tokens, just don't understand how it can run code and process the output and how all of this fits into token number limit.


It's trained to emit special commands when it wants to do things. Think of it like being a wizard: speaking magical words of power causes things to happen. So it might say:

    Sure, I can make a Shepard tone for you.

    >>> COMMMAND { "action": "run_python", "program": "import blah\n...." }
and the driver (the program running the inference loop, AI people what do you really call this?) recognizes that the AI "predicted" this escape sequence and then when the command is finished being "predicted" it runs it. Then the result is added to the prompt. The AI can then see the result and use it.

Re: token limit. A token can be a whole word and there can be thousands of them in the context window simultaneously. So it can look back quite a long way. GPT-4 is rumored to have a token limit in the tens of thousands, although the true number is apparently not public. So that's a lot of code and results.

That said, if you asked for a program that emitted a million line CSV file and then expected the AI to read it, it would indeed get very confused and go wrong because it'd lose track of what it was trying to do. But that's no different to asking a human to mentally digest a million line CSV.


As I understand you don't even need to train the model, you can just tell it in plain English how to use the plugin, ie how to format the data, and it will do that if it sees fit.


The ability to reliably follow such instructions given a JSON manifest likely comes from training on a (small) dataset of these "plugin manifests".


I guess some additional training doesn't hurt and could make it more deterministic and reliable. But it's impressive how you can already tell it to create simple JSON structures from naturally worded descriptions, so I'm convinced it would already work reasonably well without additional training.


It can but it'd make up its own schema. The driver is deterministic logic and needs a completely predictable token sequence.


Not if you give it the correct schema in the prompt


In my experience it still sometimes makes up schema or outright outputs mansplaining-like plaintext instead of JSON even if I give it correct schema. Happened once about 15~20 attempts, 5% is still too high to be considered reliable :(

I've tuned temperature, added a logit_bias to heavily prefer `{` token, this helped with the plain English v.s. JSON issue, but didn't help with hallucination. I guess I really need API access to {role: "tool"}.


That uses up context window and from what I understand it isn't as reliable as fine tuning. My guess is it's not just stuff in the prompt, it's been fine-tuned (re-trained) on examples.


If I asked a human to digest a million line csv, I'd expect them to sample some of the first results and read through the headers, then maybe pull it into excel and do some summaries or make some graphs.

Not try to read all of the lines


So it can create and run programs, not just call some pre-defined set of utilities? That's impressive.


Yes, one time use Python, use it and throw it away. Similarly diffusion images are mostly throwaways and personal stuff, one time use art. Waiting for the one-time use UIs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: