There's a grain of truth to it — Apple has learned from Microsoft's history that making the whole browser shitty is too obvious and annoys users. Apple was smart enough to keep user-visible parts of the browser in a good shape, while also dragging their feet on all the Web platform features that could endanger the App Store cash cow.
I don't want web apps on my phone (or, in an ideal world, anywhere else) so that's also a good thing. If they're not viable, it forces developers to make real apps or else just make a web page instead of whatever awful-UX nonsense they were planning.
>I don't want web apps on my phone (or, in an ideal world, anywhere else) so that's also a good thing. If they're not viable, it forces developers to make real apps or else just make a web page instead of whatever awful-UX nonsense they were planning.
Well what you personally want is irrelevant to the law and what regulators judge to be unlawful, so that's the real good thing.
>If they're not viable, it forces developers to make real apps or else just make a web page instead of whatever awful-UX nonsense they were planning.
They are perfectly viable and it has nothing to do with UX, but you have already exposed your bias and made clear that you are arguing in bad faith by spreading misinformation in your other comments.
This is tautological. If you keep instructions dumbed-down enough for AI to work well, it will work well.
The problem is that AI needs to be spoon-fed overly detailed dos and donts, and even then the output can't be trusted without carefully checking it. It's easy to reach a point where breaking down the problem into pieces small enough for AI to understand takes more work than just writing the code.
AI may save time when it generates the right thing on the first try, but that's a gamble. The code may need multiple rounds of fixups, or end up needing a manual rewrite anyway, after wasting time and effort on instructing the AI. The ceiling of AI capabilities is very uneven and unpredictable.
Even worse, the AI can confidently generate code that looks superficially correct, but has subtle bugs/omissions/misinterpretations that end up costing way more time and effort than the AI saved. It has uncanny ability to write nicely structured, well-commented code that is just wrong.
I made an STT tool (guess who wrote it for me) and have a bluetooth mic. I spend 10 minutes pacing and telling the AI what I need it to build, and how to build it. Then it goes off and builds it, and meanwhile I go to the next Claude Code instance on a different project, and do the same thing there. Then do the same for a third, and maybe by that time the first is ready for more direction. Depending on how good you are with context switching and quickly designing complex systems and communicating those designs, you can get a whole lot done in parallel. The problems you're describing can be solved, if you're careful and detailed.
It's a brave, weird and crazy new world. "The future is now, old man."
Young man, software often has more than 50 lines of code that merely merges basic examples from two libraries. That stuff is useful too, but that's a 0.5x intern, not a 10x developer.
I've told the same Claude to write me unit tests for a very well known well-documented API. It was too dumb to deduce what edge cases it should test, so I also had to give it a detailed list of what to test and how. Despite all of that, it still wrote crappy tests that misused the API. It couldn't properly diagnose the failures, and kept adding code for non-existing problems. It was bad at applying fixes even when told exactly what to fix. I've wasted a lot of time cleaning up crappy code and diagnosing AI-made mistakes. It would have been quicker to write it all myself.
I've tried Claude and GPT4o for a task that required translating imperative code that writes structured data to disk field by field into explicit schema definitions. It was an easy, but tedious task (I've had many structs to convert). AI hallucinated a bunch of fields, and got many types wrong, wasting a lot of my time on diagnosing serialization issues. I really wanted it to work, but I've burned over $100 in API credits (not counting subscriptions) trying various editors and approaches. I've wasted time and money managing context for it, to give it enough of the codebase to stop it from hallucinating the missing parts, but also carefully trim it to avoid distracting it or causing rot. It just couldn't do the work precisely. In the end I had scrap it all, and do it by hand myself.
I've tried gpt4o and 4-mini-high to write me a specific image processing operation. They could discuss the problem with seemingly great understanding (referencing academic research, advanced data structures). I even got a Python that had correct syntax on the first try! But the implementation had a fundamental flaw that caused numeric overflows. AI couldn't fix it itself (kept inventing stupid workarounds that didn't work or even defeated the point of the whole algorithm). When told step by step what to do to fix it, it kept breaking other things in the process.
I've tried to make AI upgrade code using an older version of a dependency to a newer one. I've provided it with relevant quotes from the docs (I know it would have been newer than its knowledge cutoff), and even converted parts of the code myself, so it could just follow the pattern. The AI couldn't properly copy-paste code from one function to another. It kept reverting things. When I pointed out the issues, it kept apologising, saying what new APIs it's going to use, and then use the old APIs again!
I've also briefly tried GH copilot, but it acted like level 1 tech support, despite burning tokens of a more capable model.
It turns out that deflate can be much faster when implemented specifically for PNG data, instead general-purpose compression (while still remaining 100%-standard-compatible).
Note he also expects a worse compression as tradeoff. I think he implements RLE in terms of zlib:
[...]Deflate compressor which was optimized for simplicity over high ratios. The "parser" only supports RLE matches using a match distance of 3/4 bytes, [...]
Bringing the check immediately is associated with fast food, and overcrowded touristy places that are rushing customers to leave. Places that want to be fancy act like you're there to hang out, not to just eat and leave.
It is sometimes absurd. In the UK there's an often an extra step of "oh, you're paying by card? let me go back and bring the card reader". Some places have just one reader shared among all waiting staff, so you're not going to get it faster unless you tip enough to make the staff wrestle for it.
I like the Japanese style the best — there's a cashier by the exit.
Even with the best intentions, the implementation is going to have bugs and quirks that weren't meant to be the standard.
When there's no second implementation to compare against, then everything "works". The implementation becomes the spec.
This may seem wonderful at first, but in the long run it makes pages accidentally depend on the bugs, and the bugs become a part of the spec.
This is why Microsoft has a dozen different button styles, and sediment layers of control panels all the way back to 1990. Eventually every bug became a feature, and they can't touch old code, only pile up new stuff around it.
When you have multiple independent implementations, it's very unlikely that all of them will have the same exact bug. The spec is the subset that most implementations agree on, and that's much easier to maintain long term, plus you have a proof that the spec can be reimplemented.
Bug-compatibility very often exposes unintended implementation details, and makes it hard even for the same browser to optimize its own code in the future (e.g. if pages rely on order of items you had in some hashmap, now you can't change the hashmap, can't change the hash function, can't store items in a different data structure without at least maintaining the old hashmap at the same time).
Is that so bad though? It's essentially what's already the case and as you said the developers already have an incentive to avoid making such bugs. Most developers are only going to target a single browser engine anyways, so bug or not any divergence can cause end users problems.
Regulations are like code of a program. It's the business logic of how we want the world to be.
Like all code, it can be buggy, bloated and slow, or it can be well-written and efficiently achieve ambitious things.
If you have crappy unmaintainable code that doesn't work, then deleting it is an obvious improvement.
Like in programming, it takes a lot of skill to write code that achieves its goals in a way that is as simple as possible, but also isn't oversimplified to the point of failing to handle important cases.
The pro-regulation argument isn't for naively piling up more code and more bloat, but for improving and optimizing it.
Motion vectors in video codecs are an equivalent of a 2D projection of 3D motion vectors.
In typical video encoding motion compensation of course isn't derived from real 3D motion vectors, it's merely a heuristic based on optical flow and a bag of tricks, but in principle the actual game's motion vectors could be used to guide video's motion compensation. This is especially true when we're talking about a custom codec, and not reusing the H.264 bitstream format.
Referencing previous frames doesn't add latency, and limiting motion to just displacement of the previous frame would be computationally relatively simple. You'd need some keyframes or gradual refresh to avoid "datamoshing" look persisting on packet loss.
However, the challenge is in encoding the motion precisely enough to make it useful. If it's not aligned with sub-pixel precision it may make textures blurrier and make movement look wobbly almost like PS1 games. It's hard to fix that by encoding the diff, because the diff ends up having high frequencies that don't survive compression. Motion compensation also should be encoded with sharp boundaries between objects, as otherwise it causes shimmering around edges.
Motion vectors in video codecs are an equivalent of a 2D projection of 3D motion vectors.
3D motion vectors always get projected to 2D anyway. They also aren't used for moving blocks of pixels around, they are floating point values that get used along with a depth map to re-rasterize an image with motion blur.
They are used for moving pixels around when used in Frame Generation. P-frames in video codecs aim to do exactly the same thing.
Implementation details are quite different, but for reasons unrelated to motion vectors — the video codecs that are established now were designed decades ago, when use of neural networks was in infancy, and the hardware acceleration for NNs was way outside of the budget of HW video decoders.
Third, optical flow isn't moving blocks of pixels around by an offset then encoding the difference, it is creating a floating point vector for every pixel then re-rasterizing the image into a new one.
You've previously emphasised use of blocks in video codecs, as if it was some special distinguishing characteristic, but I wanted to explain that's an implementation detail, and novel video codecs could have different approaches to encoding P-frames. They don't have to code a literal 2D vector per macroblock that "moves pixels around". There are already more sophisticated implementations than that. It's an open problem of reusing previous frames' data to predict the next frame (as a base to minimize the residual), and it could be approached in very different ways, including use of neural networks that predict the motion. I mention NNs to emphasise how different motion compensation can be than just copying pixels on a 2D canvas.
Motion vectors are still motion vectors regardless of how many dimensions they have. You can have per-pixel 3D floating-point motion vectors in a game engine, or you can have 2D-flattened motion vectors in a video codec. They're still vectors, and they still represent motion (or its approximation).
Optical flow is just one possible technique of getting the motion vectors for coding P-frames. Usually video codecs are fed only pixels, so they have no choice but to deduce the motion from the pixels. However, motion estimated via optical flow can be ambiguous (flat surfaces) or incorrect (repeating patterns), or non-physical (e.g. fade-out of a gradient). Poorly estimated motion can cause visible distortions when the residual isn't transmitted with high-enough quality to cover it up.
3D motion vectors from a game engine can be projected into 2D to get the exact motion information that can be used for motion compensation/P-frames in video encoding. Games already use it for TAA, so this is going to be pretty accurate and authoritative motion information, and it completely replaces the need to estimate the motion from the 2D pixels. Dense optical flow is a hard problem, and game engines can give the flow field basically for free.
You've misread what I've said about optical flow earlier. You don't need to give me Wikipedia links, I implement codecs for a living.
The big difference is that if you are recreating an entire image and there isn't going to be any difference information against a reference image you can't move pixels around, you have to get fractional values out of optical flow and move pixels fractional amounts that potentially overlap in some areas and leave gaps in others.
This means rasterization and making a weighted average of moved pixels as points with a kernel with width and height.
Optical flow isn't one technique, it's just a name for getting motion vectors in the first place.
I've started this thread by explaining this very problem, so I don't get why you're trying to lecture me on subpel motion and disocclusion.
What's your point? Your replies seem to be just broadly contrarian and patronizing.
I've continued this discussion assuming that maybe we talk past each other by using the term "motion vectors" in narrower and broader meanings, or maybe you did not believe that the motion vectors that game engines have can be incredibly useful for video encoding.
However, you haven't really communicated your point across. I only see that whenever I describe something in a simplified way, you jump to correct me, while failing to realize that I'm intentionally simplifying for brevity and to avoid unnecessary jargon.
The opposite is true. Ascii and English are pretty good at compressing. I can say "cat" with just 24 bits. Your average LLM token embedding uses on the order of kilobits internally.
You can have "cat" as 1 token, or you can have "c" "a" "t" as 3 tokens.
In either case, the tokens are a necessary part of LLMs. They have to have a differentiable representation in order to be possible to train effectively. High-dimensional embeddings are differentiable and are able to usefully represent "meaning" of a token.
In other words, the representation of "cat" in an LLM must be something that can be gradually nudged towards "kitten", or "print", or "excavator", or other possible meanings. This is doable with the large vector representation, but such operation makes no sense when you try to represent the meaning directly in ASCII.
True, but imagine an input that is ASCII, followed by some layers of NN that result in an embedded representation and from there the usual NN layers of your LLM. The first layers can have shared weights (shared between inputs). Thus, let the LLM solve the embedding problem implicitly. Why wouldn't this work? It is much more elegant because the entire design would consist of neural networks, no extra code or data treatment necessary.
This might be more pure, but there is nothing to be gained. On the contrary, this would lead to very long sequences for which self-attention scales poorly.
No, an LLM really uses __much__ more bits per token.
First, the embedding typically uses thousands of dimensions.
Then, the value along each dimension is represented with a floating point number which will take 16 bits (can be smaller though with higher quantization).
But humans we can feed ascii, whereas LLMs require token inputs. My original question was about that: why can't we just feed the LLMs ascii, and let it figure out how it wants to encode that internally, __implicitly__? I.e., we just design a network and feed it ascii, as opposed to figuring out an encoding in a separate step and feeding it tokens in that encoding.
> But humans we can feed ascii, whereas LLMs require token inputs.
To be pedantic, we can't feed humans ASCII directly, we have to convert it to images or sounds first.
> My original question was about that: why can't we just feed the LLMs ascii, and let it figure out how it wants to encode that internally, __implicitly__? I.e., we just design a network and feed it ascii, as opposed to figuring out an encoding in a separate step and feeding it tokens in that encoding.
That could be done, by having only 256 tokens, one for each possible byte, plus perhaps a few special-use tokens like "end of sequence". But it would be much less efficient.
Because each byte would be an embedding, instead of several bytes (a full word or part of a word) being a single embedding. The amount of time a LLM takes is proportional to the number of embeddings (or tokens, since each token is represented by an embedding) in the input, and the amount of memory used by the internal state of the LLM is also proportional to the number of embeddings in the context window (how far it looks back in the input).
This didn't need Microsoft's teeth to fail. There isn't a single "Linux" that game devs can build for. The kernel ABI isn't sufficient to run games, and Linux doesn't have any other stable ABI. The APIs are fragmented across distros, and the ABIs get broken regularly.
The reality is that for applications with visuals better than vt100, the Win32+DirectX ABI is more stable and portable across Linux distros than anything else that Linux distros offer.
I would like CPUs to move to the GPU model, because in the CPU land adoption of wider SIMD instructions (without manual dispatch/multiversioning faff) takes over a decade, while in the GPU land it's a driver update.
To be clear, I'm talking about the PTX -> SASS compilation (which is something like LLVM bitcode to x86-64 microcode compilation). The fragmented and messy high-level shader language compilers are a different thing, in the higher abstraction layers.