Hacker Newsnew | past | comments | ask | show | jobs | submit | more 19h's commentslogin

I'd like to see this for Gemini Pro 1.5 -- I threw the entirety of Moby Dick at it last week, and at one point all books Byung Chul-Han has ever published, and it both cases it was able to return the single part of a sentence that mentioned or answered my question verbatim, every single time, without any hallucinations.


A number of people in my lab do research into long context evaluation of LLMs for works of fiction. The likelihood is very high that Moby Dick is in the training data. Instead the people in my lab have explored recently published books to avoid these issues.

See BooookScore (https://openreview.net/forum?id=7Ttk3RzDeu) which was just presented at ICLR last week and FABLES (https://arxiv.org/abs/2404.01261) a recent preprint.


I suppose the question then is - if you finetune on your own data (eg internal wiki) does it then retain the near-perfect recall?

Could be a simpler setup than RAG for slow-changing documentation, especially for read-heavy cases.


"if you finetune on your own data (eg internal wiki) does it then retain the near-perfect recall"

No, that's one of the primary reasons for RAG.


I think you are misunderstanding. This post is about new capabilities in GPT-4o. So the existing reasons for RAG may not hold for the new model.

Unless you have some evals showing that the previous results justifying RAG also apply to GPT-4o?


I’m not involved in the space, but it seems to me that having a model, in particular a massive model, exposed to a corpus of text like a book in the training data would have very minimal impact. I’m aware that people have been able to return data ‘out of the shadows’ pf the training data but to my mind a model being mildly influenced by the weights between different words in this text hardly constitute hard recall, if anything it now ‘knows’ a little of the linguistic style of the authour.

How far off am I?


It depends on how many times it had seen that text during training. For example, GPT-4 can reproduce ayats from the Quran word for word in both Arabic and English. It can also reproduce the Navy SEAL copypasta complete with all the typos.


Poe's "The Raven" also.


Brothers in username.. :-)


Remember, it's also trained on countless internet discussions and papers on the book.



But this content is presumably in its training set, no? I'd be interested if you did the same task for a collection of books published more recently than the model's last release.


To test this hypothesis, I just took the complete book "Advances in Green and Sustainable Nanomaterials" [0] and pasted it into the prompt, asking Gemini: "What absorbs thermal radiations and converts it into electrical signals?".

It replied: "The text indicates that graphene sheets present high optical transparency and are able to absorb thermal radiations with high efficacy. They can then convert these radiations into electrical signals efficiently.".

Screenshot of the PDF with the relevant sentence highlighted: https://i.imgur.com/G3FnYEn.png

[0] https://www.routledge.com/Advances-in-Green-and-Sustainable-...


Ask it what material absorbs “infrared light” efficiently.

To me, that’s useful intelligence. I can already search text for verbatim matches, I want the AI to understand that “thermal radiations” and “infrared light” are the same thing.


> Answer the following question using verbatim quotes from the text above: "What material absorbs infrared light efficiently?"

> "Graphene is a promising material that could change the world, with unlimited potential for wide industrial applications in various fields... It is the thinnest known material with zero bandgaps and is incredibly strong, almost 200 times stronger than steel. Moreover, graphene is a good conductor of heat and electricity with very interesting light absorption properties."

Interestingly, the first sentence of the response actually occures directly after the latter part of the response in the original text.

Screenshot from the document: https://i.imgur.com/5vsVm5g.png.

Edit: asking it "What absorbs infrared light and converts it into electrical signals?" yields "Graphene sheets are highly transparent presenting high optical transparency, which absorbs thermal radiations with high efficacy and converts it into electrical signals efficiently." verbatim.


Fair point, but I also think something that's /really/ clear is that LLMs don't understand (and probably cannot). It's doing highly contextual text retrieval based on natural language processing for the query, it's not understanding what the paper means and producing insights.


Honestly I think testing these on fiction books would be more impressive. The graphene thing I'm sure shows up in some research papers.


Gemini works with brand new books too; I've seen multiple demonstrations of it. I'll try hunting one down. Side note: this experiment is still insightful even using model training material. Just compare its performance with the uploaded book(s) to without.


I would hope that Byung-Chul Han would not be in the training set (at least not without his permission), given he's still alive and not only is the legal question still open but it's also definitely rude.

This doesn't mean you're wrong, though.


It's pretty easy to confirm that copywritten material is in the training data. See the NYT lawsuit against OpenAI for example.


Part of that back-and-forth is the claim "this specific text was copied a lot all over the internet making it show up more in the output", and that means it's not a useful guide to things where one copy was added to The Pile and not removed when training the model.

(Or worse, that Google already had a copy because of Google Books and didn't think "might training on this explode in our face like that thing with the Street View WiFi scanning?")


Just put the 2500 example linked on the article through Gemini 1.5 Flash and it answered correctly ("The tree has diseased leaves and its bark is peeling.") https://aistudio.google.com/


Interesting!


Wow. Cool. I have access to that model and have also seen some impressive context extraction. It also gave a really good summary of a large code base that I dumped in. I saw somebody analyze a huge log file, but we really need something like this needle in a needlestack to help identify when models might be missing something. At the very least, this could give model developers something to analyze their proposed models.


Funnily enough I ran a 980k token log dump against Gemini Pro 1.5 yesterday to investigate an error scenario and it found a single incident of a 429 error being returned by a third-party API provider while reasoning that "based on the file provided and the information that this log file is aggregated of all instances of the service in question, it seems unlikely that a rate limit would be triggered, and additional investigation may be appropriate", and it turned out the service had implemented a block against AWS IPs, breaking a system that loads press data from said API provider, leaving the customer who was affected by it without press data -- we didn't even notice or investigate that, and Gemini just randomly mentioned it without being prompted for that.


That definitely makes it seem like it's noticing a great deal of its context window. impressive.


Man, we are like 2-5 years away from being able to feed in an ePub and get an accurate graphic novel version in minutes. I am so ready to look at four thousand paintings of Tolkien trees.


If I had access to Gemini with a reasonable token rate limit, I would be happy to test Gemini. I have had good results with it in other situations.


What version of Gemini is built into Google Workspace? (I just got the ability today to ask Gemini anything about emails in my work Gmail account, which seems like something that would require a large context window)


Such tasks don't need a large context window. Just good RAG.


I operate several instances of magentico with 150m+ torrents (several years worth). Is there a way I could run Tribler on a server w/o UI?


You could try https://www.coveapp.info/. It has a superior DHT indexer built in. It is intended as a end user product tho.


Maybe they needed a German company to receive money from the BND for their user data without the US knowing :-D

But in all seriousness, I’ve been a subscriber ever since they started and I’m an ultimate subscriber still, and I’d be sad if they went bankrupt due to mismanagement of the funds.


and here's the announcement by the Human Brain Project: https://www.humanbrainproject.eu/en/follow-hbp/news/2023/08/...


Sounds like sidekick for binary ninja


We’re using HTMs for time series in our quant algorithms and they’re performing pretty well; it’s a shame that it’s mostly ignored my ML scientists..


oh interesting, Jeff Hawkin's HTM?


are you hiring


Did you remove Wirecard from your LinkedIn?


Are you thinking their post is indicating that they were part of the Wirecard audits? They're saying they've undergone similar audits.


No, since I did not work for Wirecard. But my resume is so bad, I wish I had Wirecard on it.


> Did you remove Wirecard from your LinkedIn?

[Not the person you replied to.]

Did the person you replied to work for LinkedIn? What's the context for this question?


I was just curious and visited the LinkedIn profile that's linked to from the ctone.ws website (in KingOfCoders's profile) and was wondering why Wirecard was omitted.


Why on earth would you assume the OP was affiliated with Wirecard?


I believe the term is “hallucinations”. LLMs have fewer of them than many humans.


> As the responsible manager for IT (usually CTO - internal SOX was a different matter) I have been "asked"

I misread this as "reponsible manager for IT [at Wirecard]", hence the question.


This is so embarrassing to read.


All of the PWAs on my iPhone running 17.4 will now open in Safari instead of in fullscreen, and iOS itself warned me the first time I opened a PWA from the home screen after installing 17.4 that iOS will now open all „linked websites“ in the „configured default browser“.

They’re obviously trying to prevent companies from bypassing their extortion proposal in response to DMA by simply offering a PWA to users that can work around the „core tech fee“..


Pretty sure that’s exactly what Steve Jobs decided [0] :-)

[0] https://www.wired.com/2010/04/steve-jobs-porn/


It’s open source software.

MacPaw lists Russian-developed software as a risk because the government can access your data at any time — this is self-hosted open-source software though.

The FSB can’t just access your local server with an arbitrary court order.

Therefore this doesn’t feel like a legitimate concern but more like Russophobia, which I understand but also think is utterly unasked for as I know first hand how much Russian developers are suffering from the stupidity of their government.


You're swapping out your DNS for a Russian controlled DNS service. Seems dumb IMO.


Russian controlled? It runs on your network and it's open source. Where is the "russian control" on this?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: