More

isaacfung · 2025-10-20T16:38:50 1760978330

Some models use vector quantized variational autoencoders to discretize images into sequences of discrete symbols from a fixed codebook.

https://grok.com/share/bGVnYWN5LWNvcHk%3D_572b4955-6265-4210...

isaacfung · 2025-04-24T15:36:39 1745508999

What AI tools have you used recently? Have you verified if they all use models trained on copyrighted material with permission?

deng · 2025-04-25T18:03:54 1745604234

Ah, that's a classic. "How can you criticize Big Oil and at the same time drive a car!" and voila, the case is closed.

I am allowed to criticize things without having to live like a hermit. I make moderate use of ChatGPT, yet at the same time I think that its training does not fall under fair use, and that creators should get compensated. If OpenAI's business model does not allow for this, then it should fail, and that's fine by me. I lived without ChatGPT, and I can live without it again.

isaacfung · on Sept 26, 2024

Something similar happened to Google Map in Hong Kong recently. A dozen schools had their names changed by some pranksters. It's surprisingly easy.

https://www.scmp.com/news/hong-kong/society/article/3279201/...

isaacfung · on Sept 11, 2024

You may find the huggingface course more approachable

https://huggingface.co/learn/diffusion-course/en/unit0/1

isaacfung · on Sept 11, 2024

Also this blog post https://yang-song.net/blog/2021/score/

isaacfung · on Sept 3, 2024

How do you keep table structure?

jaimehrubiks · on Sept 3, 2024

They should probably keep tables and lists and strip most of the rest.

isaacfung · on Sept 2, 2024

The text is converted to embeddings after tokenization. The neural networwk only sees vectors.

Imagine the original question is posed in English but it is translated to Chinese and then the LLM has to answer the original question based on the Chinese translation.

It's a flaw of the tokenization we choose. We can train an LLM using letters instead of tokens as the base units but that would be inefficient.

daveguy · on Sept 2, 2024

By that definition the LLM literally does not see anything. LLMs predict tokens. That's it.

gaganyaan · on Sept 2, 2024

The LLM sees tokens, and predicts next tokens. These tokens encode a vast world, as experienced by humans and communicated through written language. The LLM is seeing the world, but through a peephole. This is pretty neat.

The peephole will expand soon, as multimodal models come into their own, and as the models start getting mixed with robotics, allowing them to go and interact with the world more directly, instead of through the medium of human-written text.

isaacfung · on Sept 2, 2024

It sees embeddings that is trained to encode semantic meanings.

The way we tokenize is just a design choice. Character level models(e.g. karpathy's nanoGPT) exist and are used for educational purpose. You can train it to count number of 'r' in a word.

https://x.com/karpathy/status/1816637781659254908?lang=en

isaacfung · on Aug 28, 2024

The possibility seems far beyond gaming(given enough computation resources).

You can feed it with videos of usage of any software or real world footage recorded by a Go Pro mounted on your shoulder(with body motion measured by some sesnors though the action space would be much larger).

Such a "game engine" can potentially be used as a simulation gym environment to train RL agents.

isaacfung · on Aug 20, 2024

When people say "LLMs are not capable of innovation", what exactly do they consider as innovation? If LLMs are not capable of innovation on their own, what if we augment them with means to interact with the environment so they can obtain new training data?

e.g. The minecraft bot Voyager can explore the game environment and extend its skill library(stored as a vector database), is that considered as innovation? There are also systems like leandojo/alphaproof that discover new proofs and use LLM in non-trivial ways(not just naively predict the next token in one shot). Reinforcement learning algorithms like AlphaGo/AlphaZero use self play and use monte carlo tree search to learn to outperform humans. You can similarly use LLMs to generate actions and estimate state values(check the language agent tree search paper).

Most people use LLMs by prompting them with some additional context(chat history and data retrived from database) but there is nothing that stops us from continuously improving a LLM(either by modifying its weight or augmenting it with external database) by asking it to evaluate the task outcome/error message and feeding it back to the LLM. We can also ask it to just keep on generating new tasks to experiment with the environment/internet to get new knowledge.

creesch · on Aug 20, 2024

> what if we augment them with means to interact with the environment so they can obtain new training data?

So far, the current generation of LLMs that are in widespread use do not have that ability as far as I am aware. To actually do it to a degree that would rival human learning, they would need access to a lot more environment than you might be thinking of.

Sure, for programming, the most basic environment would be the platform to run the code on and the output of the code. But choices in programming are made based on more things like performance, load impact, behavior in production environments, interaction with other applications, platform logging, adjacent application logging. Or even before that, using previous experience to judge specifications for an application which takes things in account like the expected user base, costs, etc.

The real world is a lot more complex than a minecraft world or a game of Go. Which is to say, I am not saying that it is impossible. I am sure research is ongoing to do exactly that.

But the LLMs that are currently already disrupting communities like StackOverflow are not doing any of that. Given how complex the task is to plug in all relevant stimuli and the fact that for now they can get by without doing all of this I think things are more likely to get worse before they potentially will get better.

isaacfung · on Aug 15, 2024

bro, it's not just about the money. He's promoting himself, building up a portfolio, learning new skills and entertaining himself.

bryanrasmussen · on Aug 15, 2024

if you work for money you build up a portfolio and probably learn new skills - it is difficult not to.

I have considered doing this before, contests etc. but I'm not sure I can breakdown the actual cost of doing it, the benefit, and risk to be able to see what one actually gets from the time devoted.

Not to say that I might not do other things that cannot be quantified easily either, but in the contests case I want more quantification, I guess it just seems more likely a waste of time to me.

isaacfung · on Aug 12, 2024

We have translators. Doesn't mean we can't replace them with a cheaper, more accessible tool. That's the whole point of automation.

Reasoning stuff is not useless. They provably(according to benchmarks) improve the performance of coding and math related tasks.