Of course you could perceive it, measure it and record it. The entropy of your body or your brain is not necessarily increasing, nor is the entropy of your computer or other information storage systems.
Entropy only statistically tends towards an increase in closed systems and neither your computer or your brain are closed systems. They are both constantly getting energy from an external source of power and in turn dispersing previously consumed energy out into their environment.
And yet you still manage to perceive things just fine... in fact your perception of the world is unlikely to change whether or not the entropy in your brain increases or decreases by some bounded amount (of course too much of either an increase or decrease will destroy your brain).
Your claim about remembering an event, which likely alludes to Laplace's demon [1], requires an overall increase in entropy in the system as a whole, but does not require an increase in entropy in the specific part of the system that is recording the event.
Every time your computer calls a function like memset(dst, 0), or sorts a list, or arranges data into some kind of structured binary tree, your computer is decreasing its own internal entropy by taking a statistically likely arrangement of bits and transforming it into a very unlikely arrangement of bits. The decrease in the internal entropy of your computer is more than offset by an increase in global entropy but that global entropy is radiating way out into the cosmos and has no impact on your computer's ability to register information.
There's a perfectly good explanation for why though, in fact the explanation is what motivated the formalism of entropy to begin with. There are significantly more ways that the energy contained within a closed system can spread throughout that system than there are ways for energy contained within a closed system to condense, so that if you observe the state of a system at two different moments in time, you will expect to see it evolve towards the statistically more likely outcome than the statistically less likely outcome.
And from first principles, that's what entropy is, a measure of how energy is dispersed throughout a system. Of course once you have that first principle understanding of entropy then you can come up with more rigorous formalisms to properly quantify what it means for energy to be distributed throughout a system, such as measuring the number of microstates that correspond to a macrostate, and other various formalisms that are more or less equal to each other... but fundamentally they all start from this basic principle.
Strictly speaking, modern cosmology does not treat the Big Bang as the beginning of all of existence, it's what happens when you take observations about large scale cosmology and run them backwards in time.
Based on the information we have available about our universe, we can't make predictions or formally model anything prior to a certain point in time, consequently it's convenient to treat this moment as the earliest point in time in which physics as we know it makes any sense. So while there may have been some kind of existence prior to the Big Bang, we have no way to make sense of it even at a conceptual level. Given that, we may as well treat this special point in time as the beginning of the universe as we understand it and can explain it using physics, as opposed to some absolute beginning of all of existence.
I used to think the models got worse over time as well but then I checked my chat history and what I noticed isn't that ChatGPT gets worse, it's that my standards and expectations increase over time.
When a new model comes out I test the waters a bit with some more ambitious queries and get impressed when it can handle them reasonably well. Over time I take it for granted and then just expect it to be able to handle ever more complex queries and get dissappointed when I hit a new limit.
>Note that almost every exchange outside the US has been flat or negative for decades.
As someone who works in finance this struck me as a remarkable claim. Upon inspection it turns out to be spectacularly incorrect. After adjusting for inflation it's actually the opposite, the vast majority of countries have seen their own version of the S&P 500 grow over a 30 year period, after adjusting for inflation, not stagnation or decline. Developing countries, particularly those in Asia, have seen incredible returns over a 30 year period, albeit with a great deal of volatility involved.
Our neighbor to the north, Canada, has seen gains that are slightly below the U.S., but our neighbor to the south, Mexico has seen about the same growth as our own, once again accounting for Mexico's own inflation.
Europe has also experienced a great deal of growth with many European countries even growing moreso than the U.S., for example Germany.
While there are examples of decline, they are in countries that are both poor and have unstable governments. Most countries that are strictly poor but don't suffer from instability have for the most part seen growth rather than stagnation.
So I don't know exactly what led you to believe your claim that "almost every exchange" has been flat or negative, but it's certainly not correct.
> Developing countries, particularly those in Asia, have seen incredible returns over a 30 year period, albeit with a great deal of volatility involved.
The level of the MSCI China index 30 years ago was HKD 70 and it's HKD 75 today. It's kind of incredible but not in a good way. Total return is less than 3% p.a. MSCI Thailand is even worse.
MSCI Korea has a total return of 7% (not bad, 4% above inflation) but it doesn't do better than developed countries.
Of course they look much better if we start right after the 1997 Asian financial crisis but, hey, it was you who talked about "a 30 year period".
I don't understand your follow up, are you still maintaining your claim that almost every other exchange in the world has been stagnant or declining?
Worth noting that a stock exchange becoming defunct is not the same as the value of the index associated with the stocks listed on that exchange going to zero.
For example numerous US stock exchanges also go defunct. Nevertheless the value of the stocks that traded on those exchanges remains unaffected. It's not like if NASDAQ went out of business tomorrow that Google and Microsoft would all declare bankruptcy.
This is true, in general LLMs are not great at counting because they don't see individual characters they see tokens.
Imagine you spoke perfect English, but you learned how to write English using Mandarin characters, basically using the closest sounding Mandarin characters to write in English. Then someone asks you how many letter o's are in the sentence "Hello how are you?". Well you don't read using English charaters, you read using Mandarin characters so you read it as "哈咯,好阿优?" because using Mandarin letters that's the closest sounding way to spell "Hello how are you?"
So now if someone asks you how many letter o's are in "哈咯,好阿优?", you don't really know... you are familiar conceptually that the letter o exists, you know that if you spelled the sentence in English it would contain the letter o, and you can maybe make an educated guess about how many letter o's there are, but you can't actually count out how many letter o's there are because you've never seen actual English letters before.
The same thing goes for an LLM, they don't see characters, they only see tokens. They are aware that characters do exist, and they can reason about their existence, but they can't see them so they can't really count them out either.
I see this claim repeated over and over, and while it seems plausible, this should be an easily testable hypothesis right? You don't even need a _large_ model for this because the hypothesis you are testing is whether transformer models [possibly with chain of thought] can count to some "reasonable" limit (maybe it can be modeled in TCS sense as something to do with circuit complexity) and you can easily train on synthetic strings. Is there any paper that shows proof/disproof that transformer networks using single-character tokenization successfully count?
Forget single character tokens, you can just go on OpenAI's own tokenizer website [1] and construct tokens and ask ChatGPT to count how many tokens there are in a given string. For example hello is a single token and if I ask ChatGPT to count how many times "hello" appears in "hellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohello" or variations thereof it gets it right.
Be careful that you structure your query so that all of the "hello" are in their own token, because you could inadvertently ask it where the first or last hello gets chunked into the text just before or just after.
Neat finding, does it generalize to larger samples? Someone should randomly generate a few thousand such strings, feed it to 4o or o3, and get some accuracy results. Then compare the accuracy in cases of counting individual letters in random strings.
I find there's a lot of low-hanging fruit and claims about LLMs that are easily testable, but for which no benchmarks exist. E.g. the common claim about LLMs being "unable" to multiply isn't fully accurate, someone did a proper benchmark and found that there's a gradual decline in accuracy as digit length increases past 10 digits by 10 digits. I can't find the specific paper, but I also remember there was a way of training a model on increasingly hard problems at the "frontier" (GRPO-esque?) that fixed this issue, giving very high accuracy up to 20 digits by 20 digits.
Oh that's fair. I am not actually an LLM expert so I could have some misunderstanding about this. I remember hearing this explanation given for why previous ChatGPT models failed to answer "How many "r"s are in strawberry?", but perhaps this was an over simplification.
Right that's the explanation I've heard too (and I think Karpathy even said it so it's not some fringe theory). I wasn't dismissing the hypothesis but asking out of genuine curiosity, since this feels like something that can easily be tested on "small" large language models. There's lots of little experiments like this can be done with small-ish models trained on purely synthetic data (the stuff about digit multiplication was done on GPT-2 scale model IIRC). Can models learn to count? Can they learn to add? Can they learn to copy text verbatim accurately? Can they learn to recognize regular grammars, or even context-free grammars (this one has already been done, and the answer is yes). And if the answer to one of these turns out to be no, then we'd better find out sooner rather than later, since it means we probably need to rethink the architecture a bit.
I know there's a lot of theoretical CS work on deriving upper-bounds on these models from a circuit-complexity point of view, but as architectures are revised all the time it's hard to tell how much is still relevant. Nothing beats having a concrete, working example of a model that correctly parses CFGs as rebuttal to the claim that models just repeat their training data.
It seems like people think dereferencing an unengaged optional results in a crash or exception, but ironically you are actually less likely to get a crash from an optional than you would from dereferencing an invalid pointer. This snippet of code, for example, is not going to cause a crash in most cases, it will simply return a garbage and unpredictable value:
auto v = std::optional<int>();
std::cout << *v << std::endl;
While both are undefined behavior, you are actually more likely to get a predictable crash from the below code than the above:
int* v = nullptr;
std::cout << *v << std::endl;
I leave it to the reader to reflect on the absurdity of this.
What is the desired behavior? I see at least 3 options: panic (abort at runtime predictably), compiler error (force handling both Some&Nothing cases [needs language support otherwise annoying], exceptions (annoying to handle properly). There is too much undefined behavior already.
I understand what undefined behavior is, I just don't dereference pointers or optionals without first checking them against nullptr or nullopt (respectively). In fact, I generally use the .has_value() and .value() interface on the optional which, to my point in the above comment, is a very similar workflow to using an optional in Rust.
I think if you adopted a more defensive programming style where you check your values before dereferencing them, handle all your error cases, you might find C++ is not so scary. I would also recommend not using auto as it makes the types less clear.
std::optional<int> v = std::nullopt;
if (v == std::nullopt) {
return std::unexpected("Optional is empty.");
}
std::println("{}", *v);
If you are dereferencing things without checking that they can be dereferenced I don't know what to tell you.
There is no reason that ambiguities or paradoxes can't be expressed analytically and formally. Math and computer science are full of such things and they are celebrated.
Being hard to communicate is precisely why it's important to communicate rigorously and formally.
We have a ton of examples of great mathematicians who also happened to be great philosophers and vice-versa. Some philosophers also tried to incorporate mathematical symbols and such into their work. We value both their philosophical works and mathematical works. They were smart people and chose different mediums to express different concepts.
How can you try to explore the ego, consciousness, unconsciousness, dreams, suffering, life's purpose, subjective beauty, symbolism, truth, religion, god, ethics and whatever else that is not easily formalized? We might very well arrive at a formal and unambiguous description of these sometime in the far future, so are we not supposed to at least try to talk about these concepts now? You use different tools for different concepts, and science and philosophy is just 2 of those tools. At the end of the day philosophy undeniably changed the world, so there is at least some value to it. Philosophy is not anti-logic, it is very much for logic.
It is a practical question. Sometimes we need to have or choose a hard answer to make a decision. It inevitably isn't going to be solved formally.
Any more than what is the dividing line between a chair and not-chair. Many patterns we encounter have fuzzy non-formal edges.
Perfect consensus is impossible, but any consensus is valuable. So we invent and argue about the "best" way to "understand" these things.
These arguments are partly objective, partly subjective, partly emergent, and partly just farmed out to favorite "authorities" or social pressure. But important and unavoidable.
--
At the highest level, even how we percieve reality is important. It impacts our values, our motivations, our ethics, how we cope with events, etc. Trickling down to every day choices.
"What is real?" ends up being an important question, no matter how lacking in formal rigor the answers we each have are.
Most languages you mention let you import the individual names that you want, and you can resolve potential clashes at the point of import. In C++ you basically ```#include``` a file and with it comes a flood of names that are often times unpredictable.
C++ has had namespaces for a very long time (since the beginning? not sure, I wasn't using computers when it was created), just like those other languages. The name collision issue that it has is the same as them.
And with those C++ namespaces you can do `using some_namespace::some_specific_name;`. Which resolves the issue just like in those other languages.
There is no such thing as slow photons, photons always travel at the speed of light.
When light enters a medium there are two mostly (but not entirely) equal ways to think about what happens, one is to view light as a purely electromagnetic wave that interacts with atoms and causes the atoms to oscillate. This oscillation produces its own electromagnetic wave that interferes with the original wave. The result of this interference will be an electromagnetic wave with the same frequency, same amplitude, and travelling in the same direction as the incoming light but shifted backwards and it's that shift backwards that gives the appearance of light slowing down.
That explanation is pretty good and accounts for almost everything except for the latency of light through a medium.
If that's what you want to model, then it's better to think of light as made up of photons instead of being a wave, and then when photons enter a material they no longer exist as independent particles but through a process of absorption and reemission by electrons in the material become particles called polaritons. Polaritons do have mass and hence travel slower than the speed of light.
Neither of these explanations are perfect, but the full explanation is ridiculously complicated and there's no suitable metaphor for it. If you are interested in knowing the edge latency of light through a medium, then the polariton explanation is appropriate. If you want to know the "bandwidth" explanation of light through a medium, then the wave explanation is appropriate.
Entropy only statistically tends towards an increase in closed systems and neither your computer or your brain are closed systems. They are both constantly getting energy from an external source of power and in turn dispersing previously consumed energy out into their environment.
And yet you still manage to perceive things just fine... in fact your perception of the world is unlikely to change whether or not the entropy in your brain increases or decreases by some bounded amount (of course too much of either an increase or decrease will destroy your brain).
Your claim about remembering an event, which likely alludes to Laplace's demon [1], requires an overall increase in entropy in the system as a whole, but does not require an increase in entropy in the specific part of the system that is recording the event.
Every time your computer calls a function like memset(dst, 0), or sorts a list, or arranges data into some kind of structured binary tree, your computer is decreasing its own internal entropy by taking a statistically likely arrangement of bits and transforming it into a very unlikely arrangement of bits. The decrease in the internal entropy of your computer is more than offset by an increase in global entropy but that global entropy is radiating way out into the cosmos and has no impact on your computer's ability to register information.
[1] https://en.wikipedia.org/wiki/Laplace%27s_demon
reply