I was actually in the feedback form a few days (just to leave a quick rant what an absolute mess this 'liquid glass' iOS-update is after I sadly chose to update...)
And look what happens, it turns out if you switch app-focus and go back to the app (now immediately focused on the text input) your on-screen keyboard becomes pretty much unreadable (white on light grey...)
The feedback is probably going straight to the bin but I couldn't help myself to file a second bug with a screenshot from the first feedback submission with a broken keyboard.
I have the same. While it's a bit jarring the first time you see it, I now consider this a feature instead of a bug.
Maybe it could be styled a bit differently so the search bar is more prominent and in the center of the screen, but just having a search bar without any distractions is a fantastic feature.
If you want a pure search, you can get there from your address bar without even visiting the site first. Or if you really want the search bar on the site, you can go to youtube.com/search for a nice blank page.
You don't need the obnoxious refusal to show videos on the front page.
I real life if someone with an administrative job would jot 50 * 3,000 in a calculator and not notice the answer 1,500,000 is wrong (a typo) I will consider them most definitely at fault. Similarly I know some structural engineers who will notice something went wrong with the input if an answer is not within a given range.
A calculator can be used to do things you know how to do _faster_ imho but in most jobs it still requires you to at least somewhat understand what is happening under the hood. The same principle applies to using LLMs at work imho. You can use it to do stuff you know how to do faster but if you don't understand the material there's no way you can evaluate the LLMs answer and you will be at fault when there's AI slop in your output.
eta: Maybe it would be possible to design labs with LLM's in such a way that you teach them how to evaluate the LLM's answer? This would require them to have knowledge of the underlying topic. That's probably possible with specialized tools / LLM prompts but is not going to help against them using a generic LLM like ChatGPT or a cheating tool that feeds into a generic model.
> Maybe it would be possible to design labs with LLM's in such a way that you teach them how to evaluate the LLM's answer? This would require them to have knowledge of the underlying topic. That's probably possible with specialized tools / LLM prompts but is not going to help against them using a generic LLM like ChatGPT or a cheating tool that feeds into a generic model.
What you are desribing is that they should use LLM just after they know the topic. A dilemma.
Yeah, I kinda like the method siscia suggests downthread [0] where the teacher grades based on the question they ask the LLMs during the test.
I think you should be able to use the LMM at home to help you better understand the topic (they have endless patience and you can usually you can keep asking until you actually grok the topic) but during the test I think it's fair to expect that basic understanding to be there.
I know a teacher who basically only does open questions but since everything is digital nowadays students just use tools like Cluely [0] that run on the background and provide answers.
Since the testing tool they use does notice and register 'paste'-events they've resorted to simply assigning 0 points to every answer that was pasted.
A few of us have been telling her to move to in-class testing etc. but like you also notice everything in the school organization pushes for teaching productivity so this does require convincing management / school board etc. which is a slow(er) process.
Oxide and Friends recently had a podcast episode [0] with Michael Littman about this topic for anyone who's curious about this topic.
This topic has been an interesting part of the discourse in a group of friends the past few weeks because one of us is a teacher who has to deal with this on an almost daily basis and is struggling to get her students to not cheat and the options available to her are limited (yes, physical monitoring would probably work but requires concessions from the school management etc. it's not something that has an easy or quick fix available.)
I was evaluating codex vs claude code the past month and GPT 5.1 codex being slow is just the default experience I had with it.
The answers were mostly on par (though different in style which took some getting used to) but the speed was a big downer for me. I really wanted to give it an honest try but went back to Claude Code within two weeks.
While it's of course a good thing to be critical the author did provide some more context on the why and how of doing it with LLM's on the hard fork podcast today [0]: mostly as a way to see how these models _can_ help them with these tasks.
I would recommend listening to their explanation, maybe it'll give more insight.
Disclosure: After listening the podcast and looking up and reading the article I emailed @dang to suggest it goes into the HN second chance pool. I'm glad more people enjoyed it.
Do you have a source for this claim of multiple past breaches? The only one I know of is the Okta breach.
For me they're still firmly in the 'one of the best options out there' category because cross-platform usability is incredibly good imho. I will admit it's been quite a while since I migrated from KeyPass so maybe these other options have improved too.
The best I can tell you (from working with LLM's) is that... it's complicated.
There are moments where spending 10 min on a good prompt saves me 2hrs of typing and it finishes that in the time it takes me to go make myself a cup of coffee (~10 min) Those are the good moments.
Then there are moments where it's more like 30 min savings for 10 min of prompting. Those are still pretty good.
Then there are plenty of moments where spending 10 mins on a prompt saves me about 15mins of work. But I have to wait 5 mins for the result, so it ends up being a wash except it has a downside that I didn't really write it myself so the actual details of the solution aren't fully internalized.
There's also plenty of moments where the result at first glance looks like a good / great result but once I start reviewing and fixing things it still ends up being a wash.
I find it actually quite difficult to determine the result quality because at first glance it always looks pretty decent, and then sometimes once you start reviewing it's indeed the case and other times I'm like "well it needs some tweaking" and subsequently spend an hour tweaking.
Now I think the problem is that the response is akin to gambling / conditioning in a sense. Every prompt has a smallish chance to trigger a great result, and since the average result is still about 25% faster (my gut feeling based on what I've 'written' the last few months working with Claude Code) it's just very tempting to pull that slot machine lever even in tasks that I know I will most likely type faster than I can prompt.
I did find a place where (to me, at least) it almost certainly adds value: I find it difficult to think about code during meetings (I really need my attention in the meetings I do) but I can send a few quick prompts for small stuff during meetings and don't really have to context switch. This alone is a decent productivity booster. Refactorings that would've been a 'maybe, one day' can now just be triggered. Best case I spend 10 minutes reviewing and accept it. Worst case I just throw it away.
And look what happens, it turns out if you switch app-focus and go back to the app (now immediately focused on the text input) your on-screen keyboard becomes pretty much unreadable (white on light grey...)
The feedback is probably going straight to the bin but I couldn't help myself to file a second bug with a screenshot from the first feedback submission with a broken keyboard.
Screenshot of this absolute disaster: https://imgur.com/a/CyXiVy2
reply