I am struggling to imagine the frame of mind of someone who, when met with all this LLM progress in standardized test scores, infers that the tests are inadequate.
These tests (if not individually, at least in summation) represent some of society’s best gate-keeping measures for real positions of power.
This has been standard operating procedure in AI development forever: the instant it passes some test, move the goalposts and suddenly begin claiming it was a bad test all along.
You’ve added a technical constraint. I didn’t say arbitrary. Standardised tests are standard. The point is that a simple lookup is all you need. There’s lots of interesting aspects to LLMs but their ability to pass standardised tests means nothing for standardised tests.
You think that it’s being fed questions that it has a lookup table for? Have you used these models? They can answer arbitrary new questions. This newest model was tested against tests it hasn’t seen before. You understand that that isn’t a lookup problem, right?
The comment I replied to suggested that the author was fearful of what LLMs meant for the future because they can pass standardised tests. The point I’m making is that standardised tests are literally standardised for a reason: to test information retention in a standard way, they do not test intelligence.
Information retention and retrieval is a long solved problem in technology, you could pass a standardised test using technology in dozens of different ways, from a lookup table to Google searches.
The fact that LLMs can complete a standardised test is interesting because it’s a demonstration of what they can do but it has not one iota of impact on standardised testing! Standardised tests have been “broken” for decades, the tests and answers are often kept under lock and key because simply having access to the test in advance can make it trivial to pass. A standardised test is literally an arbitrary list of questions.
I have no idea what you are talking about now. You claimed to be able to write a program that can pass the LSAT. Now it sounds like you think the LSAT is a meaningless test because it... has answers?
I suspect that your own mind is attempting to do a lookup on a table entry that doesn't exist.
The original comment I replied to is scared for the future because GPT-4 passed the LSAT and other standardised tests — they described it as “terrifying”. The point I am making is that standardised tests are an invention to measure how people learn through our best attempt at a metric: information retention. You cannot measure technology in the same way because it’s an area where technology has been beating humans for decades — a spreadsheet will perform better than a human on information retention. If you want to beat the LSAT with technology you can use any number of solutions, an LLM is not required. I could score 100% on the LSAT today if I was allowed to use my computer.
What’s interesting about LLMs is their ability to do things that aren’t standardised. The ability for an LLM to pass the LSAT is orders of magnitude less interesting than its ability to respond to new and novel questions, or appear to engage in logical reasoning.
If you set aside the arbitrary meaning we’ve ascribed to “passing the LSAT” then all the LSAT is, is a list of questions… that are some of the most practiced and most answered in the world. More people have written and read about the LSAT than most other subjects, because there’s an entire industry dedicated to producing the perfect answers. It’s like celebrating Google’s ability to provide a result for “movies” — completely meaningless in 2023.
Standardised tests are the most uninteresting and uninspiring aspect of LLMs.
Anyway good joke ha ha ha I’m stupid ha ha ha. At least you’re not at risk of an LLM ever being able to author such a clever joke :)
If a person with zero legal training was to sit down in front of the LSAT, with all of the prep material and no time limit, are you saying that they wouldn’t pass?
We’re rapidly approaching problems (AP Calculus BC, etc) that are in the same order of magnitude of difficulty as “design and implement a practical self-improving AI architecture”.
Endless glib comments in this thread. We don’t know when the above prompt leads to takeoff. It could be soon.
And funnily enough, with the AI community’s dedication to research publications being open access, it has all the content it needs to learn this capability.
Since when was "design and implement a practical self-improving AI architecture" on the same level as knowing "the requisite concepts for getting Transformers working"?
this is such garbage logic. the semantics of that comment are irrelevant. creating and testing AI node structures is well within the same ballpark. even if it wasnt, the entire insinuation of your comment is that the creation of AI is a task that is too hard for AI or for an AI we can create anytime soon -- a refutation of the feedback hypothesis. well, thats completely wrong. on all levels.
We can't predict what is coming. I think it probably ends up making the experience of being a human worse, but I can't avert my eyes. Some amazing stuff has and will continue to come from this direction of research.
I passed Calculus BC almost 20 years ago. All this time I could have been designing and implementing a practical self-improving AI architecture? I must really be slacking.
In the broad space of all possible intelligences, those capable of passing calc BC and those capable of building a self-improving AI architecture might not be that far apart.
hey, im very concerned about AI and AGI and it is so refreshing to read your comments. over the years i have worried about and warned people about AI but there are astonishingly few people to be found that actually think something should be done or even that anything is wrong. i believe that humanity stands a very good chance of saving itself through very simple measures. i believe, and i hope that you believe, that even if the best chance we had at saving ourselves was 1%, we should go ahead and at least try.
in light of all this, i would very much like to stay in contact with you. ive connected with one other HN user so far (jjlustig) and i hope to connect with more so that together we can effect political change around this important issue. ive formed a twitter account to do this, @stop_AGI. whether or not you choose to connect, please do reach out to your state and national legislators (if in the US) and convey your concern about AI. it will more valuable than you know.
That's a pretty unfair comparison. We know the answers to the problems in AP Calculus BC, whereas we don't even yet know whether answers are possible for a self-improving AI, let alone what they are.
Yeah, I'm not sure if the problem is moving goalposts so much as everyone has a completely different definition of the term AGI.
I do feel like GPT-4 is closer to a random person than that random person is to Einstein. I have no evidence for this, of course, and I'm not even sure what evidence would look like.
We are also don't have much real research into actually trying it. And it doesn't have to all the way self-replicating. It more like using local materials to build the heavy parts of robots. Maybe those robots then couldn't build another one of themselves.
While ice is a somewhat extreme example, the idea of bringing over the electronics and using local materials to put together whatever structural components are needed isn't that crazy. It'd save a lot of mass, and things like 3d printers can produce them with reasonable precision. There already is a decent amount of research into the prospect of 3d printing structures with Lunar or Martian regolith, so structural components for robots or machines don't seem too crazy.
I don't remember the details and I don't remember the exact place where I have heard and interview with somebody that works on this stuff.
They wouldn't use pure ice. But in cold places, with ice mixed with some other materials you can actually make quite good materials. Consider that in most places gravity is much lower then on earth so it doesn't need to be carbon fiber to be useful.
I know what you said, I just used carbon fibre to make a point is. You don't always need the best materials to have something useful.
Yes things still have mass but if you are building a robot that moves around there is a big difference in what kind of quality structural materials you need for the robot to be viable.
Part of the research that would go into such project would be to look at what local resources are, and how to make them into useful materials. For example, using ice in combination with some filler material has been shown to be quite usable in cold temperatures.
The exact materials you would use depend on where you would want to use this kind of system. Maybe in the far future these kind of system would look around to analyses the environment and make smart choices about what materials to use to build themselves.
You can use Flutter as an imperative framework. Just ignore widgets (the reactive top layer of Flutter) and use the underlying imperative elements directly.
Well, I'd say the main value proposition of Qt/Delphi/wxWidgets/WinForms/etc... is a rich set of widgets/controls including its event handling and layout mechanisms. Is such stuff available in the imperative impl of Flutter?
1. Humans have not been eating cows produced by modern methods for millions of years. Cow diet, genetics, and lifecycle is quite unnatural at this point. (This is the overwhelming bulk of cow consumption, of course your local micro farm produces cows more similar to ancient cows)
2. Humans haven’t been eating cow (or meat generally) on this scale ever before in history.
3. Even despite the above, we genuinely don’t know to what extent baseline human health is dependent on traditional diet. It’s not impossible that there exists a modern radical diet that greatly improves health and longevity without including any “natural” foods.
Our product works by taking a screenshot using a headless Chrome instance. In this case, it's helpful because we can look at not just the status code of the HTTP request to the page itself, but also any resources the page may fetch. This is particularly useful for SPAs, since they may return a 200 for the page itself, but an API call they make might return a non-200 when logged out.
Discussion about the relevance of Liu Cixin’s The Three-Body Problem trilogy to US defense strategy with respect to Chinese perspectives.
Two things I found interesting from reading this journal article:
1. There exists an academic community of US defense strategists that publish open access material regarding strategy. I didn't know such a community existed.
2. That community sees value in understanding potential Chinese perspectives regarding military strategy through published popular fiction.
These tests (if not individually, at least in summation) represent some of society’s best gate-keeping measures for real positions of power.