More

19h · on May 14, 2024

I'd like to see this for Gemini Pro 1.5 -- I threw the entirety of Moby Dick at it last week, and at one point all books Byung Chul-Han has ever published, and it both cases it was able to return the single part of a sentence that mentioned or answered my question verbatim, every single time, without any hallucinations.

nsagent · on May 14, 2024

A number of people in my lab do research into long context evaluation of LLMs for works of fiction. The likelihood is very high that Moby Dick is in the training data. Instead the people in my lab have explored recently published books to avoid these issues.

See BooookScore (https://openreview.net/forum?id=7Ttk3RzDeu) which was just presented at ICLR last week and FABLES (https://arxiv.org/abs/2404.01261) a recent preprint.

theptip · on May 15, 2024

I suppose the question then is - if you finetune on your own data (eg internal wiki) does it then retain the near-perfect recall?

Could be a simpler setup than RAG for slow-changing documentation, especially for read-heavy cases.

k__ · on May 15, 2024

"if you finetune on your own data (eg internal wiki) does it then retain the near-perfect recall"

No, that's one of the primary reasons for RAG.

theptip · on May 15, 2024

I think you are misunderstanding. This post is about new capabilities in GPT-4o. So the existing reasons for RAG may not hold for the new model.

Unless you have some evals showing that the previous results justifying RAG also apply to GPT-4o?

robbiep · on May 15, 2024

I’m not involved in the space, but it seems to me that having a model, in particular a massive model, exposed to a corpus of text like a book in the training data would have very minimal impact. I’m aware that people have been able to return data ‘out of the shadows’ pf the training data but to my mind a model being mildly influenced by the weights between different words in this text hardly constitute hard recall, if anything it now ‘knows’ a little of the linguistic style of the authour.

How far off am I?

int_19h · on May 15, 2024

It depends on how many times it had seen that text during training. For example, GPT-4 can reproduce ayats from the Quran word for word in both Arabic and English. It can also reproduce the Navy SEAL copypasta complete with all the typos.

kaibee · on May 15, 2024

Poe's "The Raven" also.

19h · on May 15, 2024

Brothers in username.. :-)

Salgat · on May 15, 2024

Remember, it's also trained on countless internet discussions and papers on the book.

westurner · on May 15, 2024

HN post re: FABLES: https://news.ycombinator.com/item?id=39982362

FABLES/booklist.md: https://github.com/mungg/FABLES/blob/main/booklist.md

/gscholar_related? FABLES: https://scholar.google.com/scholar?q=related:Y-Hx-kplbEUJ:sc...

/gscholar_citations? BoookScore: https://scholar.google.com/scholar?cites=1796862036168524911...

...

From that one day awhile ago: https://news.ycombinator.com/item?id=38347868#38354679 :

> "LLMs cannot find reasoning errors, but can correct them" [ https://arxiv.org/abs/2311.08516 ] https://news.ycombinator.com/item?id=38353285

Fernicia · on May 14, 2024

But this content is presumably in its training set, no? I'd be interested if you did the same task for a collection of books published more recently than the model's last release.

19h · on May 14, 2024

To test this hypothesis, I just took the complete book "Advances in Green and Sustainable Nanomaterials" [0] and pasted it into the prompt, asking Gemini: "What absorbs thermal radiations and converts it into electrical signals?".

It replied: "The text indicates that graphene sheets present high optical transparency and are able to absorb thermal radiations with high efficacy. They can then convert these radiations into electrical signals efficiently.".

Screenshot of the PDF with the relevant sentence highlighted: https://i.imgur.com/G3FnYEn.png

[0] https://www.routledge.com/Advances-in-Green-and-Sustainable-...

jiggawatts · on May 14, 2024

Ask it what material absorbs “infrared light” efficiently.

To me, that’s useful intelligence. I can already search text for verbatim matches, I want the AI to understand that “thermal radiations” and “infrared light” are the same thing.

19h · on May 14, 2024

> Answer the following question using verbatim quotes from the text above: "What material absorbs infrared light efficiently?"

> "Graphene is a promising material that could change the world, with unlimited potential for wide industrial applications in various fields... It is the thinnest known material with zero bandgaps and is incredibly strong, almost 200 times stronger than steel. Moreover, graphene is a good conductor of heat and electricity with very interesting light absorption properties."

Interestingly, the first sentence of the response actually occures directly after the latter part of the response in the original text.

Screenshot from the document: https://i.imgur.com/5vsVm5g.png.

Edit: asking it "What absorbs infrared light and converts it into electrical signals?" yields "Graphene sheets are highly transparent presenting high optical transparency, which absorbs thermal radiations with high efficacy and converts it into electrical signals efficiently." verbatim.

tristor · on May 14, 2024

Fair point, but I also think something that's /really/ clear is that LLMs don't understand (and probably cannot). It's doing highly contextual text retrieval based on natural language processing for the query, it's not understanding what the paper means and producing insights.

kaibee · on May 15, 2024

Honestly I think testing these on fiction books would be more impressive. The graphene thing I'm sure shows up in some research papers.

a_wild_dandan · on May 14, 2024

Gemini works with brand new books too; I've seen multiple demonstrations of it. I'll try hunting one down. Side note: this experiment is still insightful even using model training material. Just compare its performance with the uploaded book(s) to without.

ben_w · on May 14, 2024

I would hope that Byung-Chul Han would not be in the training set (at least not without his permission), given he's still alive and not only is the legal question still open but it's also definitely rude.

This doesn't mean you're wrong, though.

sebzim4500 · on May 14, 2024

It's pretty easy to confirm that copywritten material is in the training data. See the NYT lawsuit against OpenAI for example.

ben_w · on May 14, 2024

Part of that back-and-forth is the claim "this specific text was copied a lot all over the internet making it show up more in the output", and that means it's not a useful guide to things where one copy was added to The Pile and not removed when training the model.

(Or worse, that Google already had a copy because of Google Books and didn't think "might training on this explode in our face like that thing with the Street View WiFi scanning?")

DominikPeters · on May 14, 2024

Just put the 2500 example linked on the article through Gemini 1.5 Flash and it answered correctly ("The tree has diseased leaves and its bark is peeling.") https://aistudio.google.com/

sftombu · on May 14, 2024

Interesting!

parrt · on May 14, 2024

Wow. Cool. I have access to that model and have also seen some impressive context extraction. It also gave a really good summary of a large code base that I dumped in. I saw somebody analyze a huge log file, but we really need something like this needle in a needlestack to help identify when models might be missing something. At the very least, this could give model developers something to analyze their proposed models.

19h · on May 14, 2024

Funnily enough I ran a 980k token log dump against Gemini Pro 1.5 yesterday to investigate an error scenario and it found a single incident of a 429 error being returned by a third-party API provider while reasoning that "based on the file provided and the information that this log file is aggregated of all instances of the service in question, it seems unlikely that a rate limit would be triggered, and additional investigation may be appropriate", and it turned out the service had implemented a block against AWS IPs, breaking a system that loads press data from said API provider, leaving the customer who was affected by it without press data -- we didn't even notice or investigate that, and Gemini just randomly mentioned it without being prompted for that.

parrt · on May 14, 2024

That definitely makes it seem like it's noticing a great deal of its context window. impressive.

causality0 · on May 14, 2024

Man, we are like 2-5 years away from being able to feed in an ePub and get an accurate graphic novel version in minutes. I am so ready to look at four thousand paintings of Tolkien trees.

sftombu · on May 14, 2024

If I had access to Gemini with a reasonable token rate limit, I would be happy to test Gemini. I have had good results with it in other situations.

cj · on May 14, 2024

What version of Gemini is built into Google Workspace? (I just got the ability today to ask Gemini anything about emails in my work Gmail account, which seems like something that would require a large context window)

underlines · on May 14, 2024

Such tasks don't need a large context window. Just good RAG.

19h · on April 25, 2024

I operate several instances of magentico with 150m+ torrents (several years worth). Is there a way I could run Tribler on a server w/o UI?

anacrolix · on April 25, 2024

You could try https://www.coveapp.info/. It has a superior DHT indexer built in. It is intended as a end user product tho.

19h · on April 12, 2024

Maybe they needed a German company to receive money from the BND for their user data without the US knowing :-D

But in all seriousness, I’ve been a subscriber ever since they started and I’m an ultimate subscriber still, and I’d be sad if they went bankrupt due to mismanagement of the funds.

19h · on April 7, 2024

and here's the announcement by the Human Brain Project: https://www.humanbrainproject.eu/en/follow-hbp/news/2023/08/...

19h · on March 29, 2024

Sounds like sidekick for binary ninja

19h · on March 22, 2024

We’re using HTMs for time series in our quant algorithms and they’re performing pretty well; it’s a shame that it’s mostly ignored my ML scientists..

lucidrains · on March 22, 2024

oh interesting, Jeff Hawkin's HTM?

brcmthrowaway · on March 22, 2024

are you hiring

19h · on March 1, 2024

Did you remove Wirecard from your LinkedIn?

Sebguer · on March 1, 2024

Are you thinking their post is indicating that they were part of the Wirecard audits? They're saying they've undergone similar audits.

Beijinger · on March 1, 2024

No, since I did not work for Wirecard. But my resume is so bad, I wish I had Wirecard on it.

alsetmusic · on March 1, 2024

> Did you remove Wirecard from your LinkedIn?

[Not the person you replied to.]

Did the person you replied to work for LinkedIn? What's the context for this question?

19h · on March 1, 2024

I was just curious and visited the LinkedIn profile that's linked to from the ctone.ws website (in KingOfCoders's profile) and was wondering why Wirecard was omitted.

bitcharmer · on March 1, 2024

Why on earth would you assume the OP was affiliated with Wirecard?

renewiltord · on March 1, 2024

I believe the term is “hallucinations”. LLMs have fewer of them than many humans.

19h · on March 2, 2024

> As the responsible manager for IT (usually CTO - internal SOX was a different matter) I have been "asked"

I misread this as "reponsible manager for IT [at Wirecard]", hence the question.

stolsvik · on March 3, 2024

This is so embarrassing to read.

19h · on Feb 9, 2024

All of the PWAs on my iPhone running 17.4 will now open in Safari instead of in fullscreen, and iOS itself warned me the first time I opened a PWA from the home screen after installing 17.4 that iOS will now open all „linked websites“ in the „configured default browser“.

They’re obviously trying to prevent companies from bypassing their extortion proposal in response to DMA by simply offering a PWA to users that can work around the „core tech fee“..

19h · on Feb 7, 2024

Pretty sure that’s exactly what Steve Jobs decided [0] :-)

[0] https://www.wired.com/2010/04/steve-jobs-porn/

19h · on Feb 7, 2024

It’s open source software.

MacPaw lists Russian-developed software as a risk because the government can access your data at any time — this is self-hosted open-source software though.

The FSB can’t just access your local server with an arbitrary court order.

Therefore this doesn’t feel like a legitimate concern but more like Russophobia, which I understand but also think is utterly unasked for as I know first hand how much Russian developers are suffering from the stupidity of their government.

seanieb · on Feb 7, 2024

You're swapping out your DNS for a Russian controlled DNS service. Seems dumb IMO.

illiac786 · on Feb 7, 2024

Russian controlled? It runs on your network and it's open source. Where is the "russian control" on this?