It seems like the article talks a lot about what can be done with a new model bu...

mike_hearn · on March 26, 2023

It's trained to emit special commands when it wants to do things. Think of it like being a wizard: speaking magical words of power causes things to happen. So it might say:

    Sure, I can make a Shepard tone for you.

    >>> COMMMAND { "action": "run_python", "program": "import blah\n...." }

and the driver (the program running the inference loop, AI people what do you really call this?) recognizes that the AI "predicted" this escape sequence and then when the command is finished being "predicted" it runs it. Then the result is added to the prompt. The AI can then see the result and use it.

Re: token limit. A token can be a whole word and there can be thousands of them in the context window simultaneously. So it can look back quite a long way. GPT-4 is rumored to have a token limit in the tens of thousands, although the true number is apparently not public. So that's a lot of code and results.

That said, if you asked for a program that emitted a million line CSV file and then expected the AI to read it, it would indeed get very confused and go wrong because it'd lose track of what it was trying to do. But that's no different to asking a human to mentally digest a million line CSV.

iforgotpassword · on March 26, 2023

As I understand you don't even need to train the model, you can just tell it in plain English how to use the plugin, ie how to format the data, and it will do that if it sees fit.

rfoo · on March 26, 2023

The ability to reliably follow such instructions given a JSON manifest likely comes from training on a (small) dataset of these "plugin manifests".

iforgotpassword · on March 26, 2023

I guess some additional training doesn't hurt and could make it more deterministic and reliable. But it's impressive how you can already tell it to create simple JSON structures from naturally worded descriptions, so I'm convinced it would already work reasonably well without additional training.

mike_hearn · on March 26, 2023

It can but it'd make up its own schema. The driver is deterministic logic and needs a completely predictable token sequence.

thomashop · on March 26, 2023

Not if you give it the correct schema in the prompt

rfoo · on March 27, 2023

In my experience it still sometimes makes up schema or outright outputs mansplaining-like plaintext instead of JSON even if I give it correct schema. Happened once about 15~20 attempts, 5% is still too high to be considered reliable :(

I've tuned temperature, added a logit_bias to heavily prefer `{` token, this helped with the plain English v.s. JSON issue, but didn't help with hallucination. I guess I really need API access to {role: "tool"}.

mike_hearn · on March 26, 2023

That uses up context window and from what I understand it isn't as reliable as fine tuning. My guess is it's not just stuff in the prompt, it's been fine-tuned (re-trained) on examples.

8note · on March 26, 2023

If I asked a human to digest a million line csv, I'd expect them to sample some of the first results and read through the headers, then maybe pull it into excel and do some summaries or make some graphs.

Not try to read all of the lines

codedokode · on March 26, 2023

So it can create and run programs, not just call some pre-defined set of utilities? That's impressive.

visarga · on March 26, 2023

Yes, one time use Python, use it and throw it away. Similarly diffusion images are mostly throwaways and personal stuff, one time use art. Waiting for the one-time use UIs.