Hacker Newsnew | past | comments | ask | show | jobs | submit | isaacfung's commentslogin

Some models use vector quantized variational autoencoders to discretize images into sequences of discrete symbols from a fixed codebook.

https://grok.com/share/bGVnYWN5LWNvcHk%3D_572b4955-6265-4210...


What AI tools have you used recently? Have you verified if they all use models trained on copyrighted material with permission?


Ah, that's a classic. "How can you criticize Big Oil and at the same time drive a car!" and voila, the case is closed.

I am allowed to criticize things without having to live like a hermit. I make moderate use of ChatGPT, yet at the same time I think that its training does not fall under fair use, and that creators should get compensated. If OpenAI's business model does not allow for this, then it should fail, and that's fine by me. I lived without ChatGPT, and I can live without it again.


Something similar happened to Google Map in Hong Kong recently. A dozen schools had their names changed by some pranksters. It's surprisingly easy.

https://www.scmp.com/news/hong-kong/society/article/3279201/...


You may find the huggingface course more approachable

https://huggingface.co/learn/diffusion-course/en/unit0/1



How do you keep table structure?


They should probably keep tables and lists and strip most of the rest.


The text is converted to embeddings after tokenization. The neural networwk only sees vectors.

Imagine the original question is posed in English but it is translated to Chinese and then the LLM has to answer the original question based on the Chinese translation.

It's a flaw of the tokenization we choose. We can train an LLM using letters instead of tokens as the base units but that would be inefficient.


By that definition the LLM literally does not see anything. LLMs predict tokens. That's it.


The LLM sees tokens, and predicts next tokens. These tokens encode a vast world, as experienced by humans and communicated through written language. The LLM is seeing the world, but through a peephole. This is pretty neat.

The peephole will expand soon, as multimodal models come into their own, and as the models start getting mixed with robotics, allowing them to go and interact with the world more directly, instead of through the medium of human-written text.


It sees embeddings that is trained to encode semantic meanings.

The way we tokenize is just a design choice. Character level models(e.g. karpathy's nanoGPT) exist and are used for educational purpose. You can train it to count number of 'r' in a word.

https://x.com/karpathy/status/1816637781659254908?lang=en


The possibility seems far beyond gaming(given enough computation resources).

You can feed it with videos of usage of any software or real world footage recorded by a Go Pro mounted on your shoulder(with body motion measured by some sesnors though the action space would be much larger).

Such a "game engine" can potentially be used as a simulation gym environment to train RL agents.


When people say "LLMs are not capable of innovation", what exactly do they consider as innovation? If LLMs are not capable of innovation on their own, what if we augment them with means to interact with the environment so they can obtain new training data?

e.g. The minecraft bot Voyager can explore the game environment and extend its skill library(stored as a vector database), is that considered as innovation? There are also systems like leandojo/alphaproof that discover new proofs and use LLM in non-trivial ways(not just naively predict the next token in one shot). Reinforcement learning algorithms like AlphaGo/AlphaZero use self play and use monte carlo tree search to learn to outperform humans. You can similarly use LLMs to generate actions and estimate state values(check the language agent tree search paper).

Most people use LLMs by prompting them with some additional context(chat history and data retrived from database) but there is nothing that stops us from continuously improving a LLM(either by modifying its weight or augmenting it with external database) by asking it to evaluate the task outcome/error message and feeding it back to the LLM. We can also ask it to just keep on generating new tasks to experiment with the environment/internet to get new knowledge.


> what if we augment them with means to interact with the environment so they can obtain new training data?

So far, the current generation of LLMs that are in widespread use do not have that ability as far as I am aware. To actually do it to a degree that would rival human learning, they would need access to a lot more environment than you might be thinking of.

Sure, for programming, the most basic environment would be the platform to run the code on and the output of the code. But choices in programming are made based on more things like performance, load impact, behavior in production environments, interaction with other applications, platform logging, adjacent application logging. Or even before that, using previous experience to judge specifications for an application which takes things in account like the expected user base, costs, etc.

The real world is a lot more complex than a minecraft world or a game of Go. Which is to say, I am not saying that it is impossible. I am sure research is ongoing to do exactly that.

But the LLMs that are currently already disrupting communities like StackOverflow are not doing any of that. Given how complex the task is to plug in all relevant stimuli and the fact that for now they can get by without doing all of this I think things are more likely to get worse before they potentially will get better.


bro, it's not just about the money. He's promoting himself, building up a portfolio, learning new skills and entertaining himself.


if you work for money you build up a portfolio and probably learn new skills - it is difficult not to.

I have considered doing this before, contests etc. but I'm not sure I can breakdown the actual cost of doing it, the benefit, and risk to be able to see what one actually gets from the time devoted.

Not to say that I might not do other things that cannot be quantified easily either, but in the contests case I want more quantification, I guess it just seems more likely a waste of time to me.


We have translators. Doesn't mean we can't replace them with a cheaper, more accessible tool. That's the whole point of automation.

Reasoning stuff is not useless. They provably(according to benchmarks) improve the performance of coding and math related tasks.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: