Hacker Newsnew | past | comments | ask | show | jobs | submit | radioactivist's commentslogin

This is a comparison between a new and interactive medium (+ slides, mind-maps, etc) and a static PDF book as a control. How do we know that a non-AI based interactive book wouldn't give similar (modest) increases in performance without any of the personalization AI enables?


Thank you for this comment, it is exactly my impression of all of this as well.


At one point this states:

> Claude was also able to create a list of leaders with the Department of Energy Title17 credit programs, Exim DFC, and other federal credit programs that the team should interview. In addition, it created a list of leaders within Congressional Budget Office and the Office of Management and Budget that would be able to provide insights. See the demo here:

and then there is a video of them "doing" this. But the video basically has Claude just responding saying "I'm sorry I can't do that, please look at their website/etc".

Am I missing something here?


It happens again in the next video. It says:

> The team came up with a use case the teaching team hadn’t thought of – using AI to critique the team’s own hypotheses. The AI not only gave them criticism but supported it with links from published scholars. See the demo here:

But the video just shows Claude giving some criticism but then just says go look at some journals and talk to experts (doesn't give any references or specifics).


That was really weird. I did do this with ChatGPT 4o and it seems to do a good job of creating this list. But I don't know anything about this field, so I don't know how accurate it is.


I'm not the person you're replying to, but in my subfield (scientist is such a broad term) I would say in my opinion at least half of those key problems that are listed in the article are basically non issues. Things really are quite different field to field.


And in many subfields there is a preprint freely available on the arxiv during those three months.


OpenAI o4-mini-high

   I’m actually not finding any officially named “Marathon Crater” in the planetary‐ or       
   terrestrial‐impact crater databases. Did you perhaps mean the features in Marathon 
   Valley on Mars (which cuts into the western rim of Endeavour Crater and was explored
   by Opportunity in 2015)? Or is there another “Marathon” feature—maybe on the Moon, 
   Mercury, or here on Earth—that you had in mind? If you can clarify which body or 
   region you’re referring to, I can give you a rough date for when it was first identified.


Amazon's Nova Pro:

``` Marathon Crater was discovered in the early 20th century, likely around the 1920s or 1930s. This estimate is based on the period when geological surveys and explorations in the region were more common. ```

https://kagi.com/assistant/8f270eab-cba3-4f39-976c-113ea4392...


Likely around the 1920's or 1930's... Bro, not only is a crater discovery in those decades unlikely, the date certainly wouldn't have gone unnoted.


> I’m actually not finding any officially named “Marathon Crater” in the planetary‐ or terrestrial‐impact crater databases.

So it lied about checking its databases? Or does that have search enabled? Maybe o4 has databases for this and isn't just a single model?


Raw model, or the chat product? The ChatGPT app has this integrated with search.


Raw model I got

   Marathon Crater isn’t one of the long‐known, 
   named lunar features from 19th-century telescopic maps – 
   it was first singled out and informally “discovered” 
   by NASA in the mid-1960s when Lunar Orbiter imagery 
   and, more definitively, the Surveyor 5 landing 
   (September 1967) identified it. So, very roughly, 
   think “around 1966–67.”


Small correction: The meow meow beans episode of Community aired in 2014 and the Nosedive episode of Black Mirror aired 2016. So the Community episode came first.


Most of their categories have straightforward interpretations in terms of students using the tool to cheat. They don't seem to want to/care to analyze that further and determine which are really cheating and which are more productive uses.

I think that's a bit telling on their motivations (esp. given their recent large institutional deals with universities).


Indeed. I called out the second-top category, but you could look at the top category as well:

> We found that students primarily use Claude to create and improve educational content across disciplines (39.3% of conversations). This often entailed designing practice questions, editing essays, or summarizing academic material.

Sure, throwing a paragraph of an essay at Claude and asking it to turn it into a 3-page essay could have been categorized as "editing" the essay.

And it seems pretty naked the way they lump "editing an essay" in with "designing practice questions," which are clearly very different uses, even in the most generous interpretation.

I'm not saying that the vast majority of students do use AI to cheat, but I do want to say that, if they did, you could probably write this exact same article and tell no lies, and simply sweep all the cheating under titles like "create and improve educational content."


I think this comment is significantly more dismissive of science and scientists than the original comment was of AI.


Some hard problems have remain unsolved in basically every field of human interest for decades/centuries/millennia -- despite the number of intelligent people and/or resources that have been thrown at them.

I really don't understand the level optimism that seems to exist for LLMs. And speculating that people "secretly hate LLMs" and "feel threatened by them" isn't an answer (frankly, when I see arguments that start with attacks like that alarm bells start going off in my head).


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: