This is a big aha moment for me. If Gemini can do semantic chunking at the same ...

wussboy · 2025-02-05T23:19:50 1738797590

Could it do exactly the same with a web page? Would this replace something like beautiful soup?

eitally · 2025-02-06T05:27:04 1738819624

I don't know exactly how or what it's doing behind the scenes, but I've been massively impressed with the results Gemini's Deep Research mode has generated, including both traditional LLM freeform & bulleted output, but also tabular data that had to come from somewhere. I haven't tried cross-checking for accuracy but the reports do come with linked sources; my current estimation is that they're at least as good as a typical analyst at a consulting firm would create as a first draft.

fallinditch · 2025-02-05T20:53:51 1738788831

If I used Gemini 2.0 for extraction and chunking to feed into a RAG that I maintain on my local network, then what sort of locally-hosted LLM would I need to gain meaningful insights from my knowledge base? Would a 13B parameter model be sufficient?

jhoechtl · 2025-02-05T21:55:28 1738792528

Ypur lovalodel has littleore to do but stitch the already meaningzl pieces together.

The pre-step, chunking and semantic understanding is all that counts.

yeahwhatever10 · 2025-02-06T00:36:22 1738802182

Do you get meaningful insights with current RAG solutions?

fallinditch · 2025-02-06T02:47:24 1738810044

Yes. For example, to create AI agent 'assistants' that can leverage a local RAG in order to assist with specialist content creation or operational activities.

potatoman22 · 2025-02-05T19:38:30 1738784310

Small point but is it doing semantic chunking, or loading the entire pdf into context? I've heard mixed results on semantic chunking.

panarky · 2025-02-05T19:52:37 1738785157

It loads the entire PDF into context, but then it would be my job to chunk the output for RAG, and just doing arbitrary fixed-size blocks, or breaking on sentences or paragraphs is not ideal.

So I can ask Gemini to return chunks of variable size, where each chunk is a one complete idea or concept, without arbitrarily chopping a logical semantic segment into multiple chunks.

thelittleone · 2025-02-05T21:26:38 1738790798

Fixed size chunks is holding back a bunch of RAG projects on my backlog. Will be extremely pleased if this semantic chunking solves the issue. Currently we're getting around an 78-82% success on fixed size chunked RAG which is far too low. Users assume zero results on a RAG search equates to zero results in the source data.

refulgentis · 2025-02-05T21:46:34 1738791994

FWIW, you might be doing it / ruled it out already:

- BM25 to eliminate the 0 results in source data problem

- Longer term, a peek at Gwern's recent hierarchical embedding article. Got decent early returns even with fixed size chunks

thelittleone · 2025-02-05T21:53:56 1738792436

Much appreciated.

For others interested in BM25 for the use case above, I found this thread informative.

https://news.ycombinator.com/item?id=41034297

mediaman · 2025-02-05T22:10:22 1738793422

Agree, BM25 honestly does an amazing job on its own sometimes, especially if content is technical.

We use it in combination with semantic but sometimes turn off the semantic part to see what happens and are surprised with the robustness of the results.

This would work less well for cross-language or less technical content, however. It's great for acronyms, company or industry specific terms, project names, people, technical phrases, and so on.

jacobr1 · 2025-02-05T23:43:54 1738799034

Also consider methods that are using reasoning to potentially dispatch additional searches based on analysis of the returned data

nnurmanov · 2025-02-06T04:09:13 1738814953

This is my problem as well; do you have lots of documents?

Tostino · 2025-02-05T22:54:17 1738796057

I wish we had a local model for semantic chunking. I've been wanting one for ages, but haven't had the time to make a dataset and finetune that task =/.

hattmall · 2025-02-06T02:32:29 1738809149

It's cheap now because Google is subsidizing it, no?

vrosas · 2025-02-06T02:45:22 1738809922

Spoiler: every model is deeply, deeply subsidized. At least Google's is subsidized by a real business with revenue, not VC's staring at the clock.

panarky · 2025-02-06T04:05:52 1738814752

It's cheap because it's a Flash model, far smaller and much less compute for inference, runs on TPUs instead of GPUs.