It's actually quite fascinating if you watch it for 5 minutes. Some models are overall bad, but others nail it in one minute and butcher it in the next.
It's perhaps the best example I have seen of model drift driven by just small, seemingly unimportant changes to the prompt.
> model drift driven by just small, seemingly unimportant changes to the prompt
What changes to the prompt are you referring to?
According the comment on the site, the prompt is the following:
Create HTML/CSS of an analog clock showing ${time}. Include numbers (or numerals) if you wish, and have a CSS animated second hand. Make it responsive and use a white background. Return ONLY the HTML/CSS code with no markdown formatting.
presumably the time is replaced with the actual current time at each generation. I wonder if they are actually generated every minute or if all 6480 permutations (720 minutes in a day * 9 llms) were generated and just show on a schedule
It is really interesting to watch them for a while. QWEN keeps outputting some really abstract interpretations of a clock, KIMI is consistently very good, GPT5's results line up exactly with my experience with its code output (overly complex and never working correctly)
We can't know how much is about the prompt though and how much is just stochastic randomness in the behavior of that model on that prompt, right? I mean, even given identical prompts, even at temp 0, models don't always behave identically.... at least, as far as I know? Some of the reasons why are I think still a research question, but I think its a fact nonetheless.
Kimi seems the only reliable one which is a bit surprising, and GPT 4o is consistently better than GPT 5 which on the other hand is unfortunately not surprising at all.
When I checked this a year or so ago, I might have gotten the impression that it was cheaper. Now, it costs the same as what Perplexity charges for search-grounded queries, which is the same as Google charges for Gemini queries with search.
So basically, one player sets a price, and everyone is anchored on that as the pricing for the entire category? I'm just genuinely interested in why every offering in this space is priced like this.
It seems a bit misaligned with how pure LLM queries are priced.
I have a product that would benefit from search grounding, but this pricing wouldn't work with my volume of queries.
This is wild, but many studies have reached the same conclusion.
I remember reading somewhere that heart transplant recipients have random memory flashes that are not their memories, and sometimes they develop new personality traits.
A theory I have seen is that we tend to mix up cause and effect.
So, for example, a dangerous situation causes stress and stress causes the heart to beat faster, all normal. But make the heart beat faster through external means and it will also cause stress. So it is not clear which one is the cause and which one is the effect, probably some weird combination, with all sorts of feedbacks. Life is messy.
So get a heart that isn't yours and it will not beat in a familiar way, which, in turn may be interpreted as changing emotions. And even if memories are entirely contained within the brain, what if the heartbeat is part of these memories, with a heart that reacts differently, the meaning of these memories may change.
For a tech analogy, in order to record a video game session, it is common to only record player input. If the game is deterministic, you just need to run the game with the recorded inputs and the session will be faithfully reproduced. It is much more compact than something like a video. Now imagine we change the game engine so that it responds slightly differently to inputs, now, when replayed, the game will appear different. If we imagine memories are "replays" and the engine is our body, than altering our body will also alter our memories.
> I remember reading somewhere that heart transplant recipients have random memory flashes that are not their memories, and sometimes they develop new personality traits.
Wild. Doesn't necessarily surprise me too much that the body stores some memories outside the brain, but it seems _very_ surprising that another body/brain can read and understand ones created by another. I'd assume that the whole mind and memory system is one big correlated mess, not essentially composed of data files in a ~standard encoding.
It would be hasty to assume that any memories would be transferable in such a way. If your hypothesis is that transplant recipients can have their memories altered by interpreting information carried by foreign organ cells, start by assuming they're reading junk data that they cannot decipher. Brains are great at turning junk data into something that feels real.
I would probably ascribe it to the procedure itself. Like I imagine if you put someone under, opened up their chest, took their heart out and then... put it back in - that the stress of that whole thing would be enough to seriously mess with your head.
That was my followup question, are the memories accurate (even as much as normal memories are), or are they nonsense? Or even better, it'd be fun if they're not completely nonsense, but corrupted in some understandable way (like people/places are substituted for instance). There's no way at all that memories are encoded as essentially mpeg files, so _something_ has to be wrong with them.
But yeah, you're right, odds seem good that they're just nonsense, but even then it just feels weird that the body can even interpret them as memories in the slightest.
Maybe it's all about encoding and it IS pretty standard? Brain can decode vision through tongue nerves [1] as long as it looks like vision data and is correlated with head movements. There were experiments with other senses sent through different means or whole new sense (magnetic [2] and echolocation [3]). Looks like brain is so flexible, that anything resembling sensible information will be decoded.
> In addition to changes in preferences, some recipients describe new aversions after receiving a donor heart. For example, a 5-year-old boy received the heart of a 3-year-old boy but was not informed about his donor’s age or cause of death. Despite this lack of information, he provided a vivid description of his donor after the surgery: “He’s just a little kid. He’s a little brother like about half my age. He got hurt bad when he fell down. He likes Power Rangers a lot I think, just like I used to. I don’t like them anymore though” (p. 70, [8]). Subsequently it was reported that his donor had died after falling from an apartment window while trying to reach a Power Ranger toy that had fallen onto the window ledge. After receiving his new heart, the recipient refused to touch or play with Power Rangers
This is the most fascinating thing I've read in a long time. Thanks for the link
There’s a similar story I’ve read before in a different paper regarding about an organ donor who drowned and then the recipient developed an extreme aversion to water.
I don’t recall what the exact title or link to the article was though.
This is a very sneaky ethically gray company. Their app is not only of terrible quality but also full of dark patterns. I'm convinced that any revenue they make comes from people who can't figure out how to cancel. Stay away from it.
I'm a designer. I built brainglue.ai without Figma, a design system, or a UI library. I just went directly to code (react+tailwind) and let a style organically emerge.
I'm not saying that I'm a unicorn and that my idea-to-code-to-design execution is flawless, but I certainly believe that in this situation, if I hadn't done it this way, I wouldn't have done it all. However, doing this would be wasteful or dumb in almost every other situation that requires my design output.
People pay for Figma precisely because it's a middle ground. It was a middle ground before, and it will continue to be unless something fundamental changes.
Small suggestion, I would preload the contents of each tab and the images in the circle after the main content is loaded. There was a good two second lag loading the images here in Australia.
AppleTV+ is a tiny business. It's nowhere near of generating enough revenue to cover a $20B hole in content production costs.
Yes, Apple generates LOTS of revenue overall, but that doesn't justify bleeding cash on a business line that hasn't produced material returns and has no significant positive trajectory in sight.
It's clear that Apple saw this as their Prime Video bet on their services strategy, but that hasn't worked out. Just look at AppleTV+ market share. It's hilariously miniscule.
Apple can afford to play the long game here though.
TV+ is nice value add on their bundled subscription package so may be driving more people to opt for that. I know it was a major factor in my decision and now I am playing Apple Arcade games and use Apple Music as my primary music service.
Operating TV+ as a halo or loss leader product to get people to try other services within the ecosystem could be a winning strategy for them. Also likely drives some hardware sales.
The hardware sales I was talking about is their streaming box which is in no way dominant but is really nice. Also we have pretty much reached peak iPhone or close to it. These subscription services help keep people in the ecosystem as it raises switching costs a bit.
The article is implying that TV+ is losing money. I don’t know that it is but my point is that for Apple it is likely still worth keeping and growing the service even if they are losing money on it at the moment.
It's a "tiny" part of a services business with a billion subscribers, that generates $20B of revenue in a quarter. $20B in production costs over 4.5 years works out to less than $5B/year in costs, against competitors like Netflix and Disney+ that are spending $20B a year.
I actually haven't seen any numbers on their current marketshare, but I'll give you that they aren't anywhere near their competitors. I don't think the problem is that they're spending too much money.
I own a Model 3, and I honestly think it is the best car I have ever owned.
Despite that, I know people who have ruled out owning a Tesla because they believe the brand mirrors Elon Musk's public persona. They flat-out reject any Tesla product because the brand's visible face is someone they believe doesn't represent their values.
I'm unsure he understood the implications of becoming such a polarizing figure. It was totally unnecessary, yet that was his choice.
I don't think it was a conscious or deliberate choice.
Hubris made him think he was the real Tony Stark, genius playboy philanthropist bullshit, but as talented as he is in some areas, his flaws are clearly visible.
But at the end of the day, I think that if Tesla was making valuable cars, it would not matter that much. The thing is that those cars also have many flaws, and a lot of undelivered promises around self-driving...
It does not take much business acumen that taking a political stance is going to anger someone. Best keep your mouth shut unless you are prepared for the consequences.
I doubt Elon’s personal views are much different from any other billionaire CEO. Yet the majority of them are not in the spotlight and drawing attention to themselves.
I don't doubt that a portion of potential buyers are turned off by Musk's politics and refuse to buy his cars. But a much, much larger group (in my opinion) are people who like gas-powered cars and have zero interest in EVs, no matter how large the subsidies are, how advanced the tech is, or how many chargers are out there. Ignoring this group, and focusing on those caught up with Musk's politics, is missing the big picture, I believe.
So the "truth" would be his primary base of personal wealth is utterly derived from selling cars to...checking notes...the worst people in the country who deserve to be bullied and ridiculed? And that he has no choice but to educate the world about this truth? Even if it means destroying his personal reputation and businesses?
Or - and this is amply documented now - perhaps the "truth" is he's doing way too many drugs (WSJ), making horrible decisions (Supercharger), and per the deeply persuasive data in the linked thread, is killing the company?
> The coverage metric as a goal writing style doesn't work for TDD
Coverage is not a goal of TDD, but in practice you will have 100% coverage by following TDD as you would never have reason to write code that isn't covered by test.
Ultimately, the purpose of coverage tools is to let you know what you might have forgotten to clean up during a refactor, to help you remove what you missed.
I think productivity is lower in the winter, so I'm not sure about quality per se, but intuitively it makes sense that anything written in the winter months is less verbose.
> Poe lets you ask questions, get instant answers, and have back-and-forth conversations with Al. Gives access to GPT-4, gpt-3.5-turbo, Claude from Anthropic, and a variety of other bots.
I'm not sure I would call Poe a rip-off at all? Sounds bundled chatgpt product.
It's perhaps the best example I have seen of model drift driven by just small, seemingly unimportant changes to the prompt.
reply