It sounded like there would be a big value unlock. Depends on your circumstances...

bandrami · 2026-06-01T11:10:16 1780312216

The big manual task we haven't automated is going through documents and determining "is this sensitive enough to warrant information controls?" We may just be stuck with that in the way of things.

yunusabd · 2026-06-01T14:13:43 1780323223

Just out of curiosity, why would the LLM need network access for this? I.e. feeding the doc to an LLM and asking "is this sensitive information according to these criteria: [...]" should get you there most of the way, no? Probably need a handful of (carefully designed) tool calls and a human in the loop somewhere, but it seems achievable.

bandrami · 2026-06-01T14:20:50 1780323650

Because it needs to look up ITAR and NATO rules as well as current unilateral export restrictions and departmental guidance.

lazide · 2026-06-01T11:20:26 1780312826

How would you expect an LLM to produce reasonable decisions on that anyway?

bandrami · 2026-06-01T12:22:14 1780316534

"Do these documents contain models or descriptions of (list of devices redacted for HN), or personally identifying information?" would be a great question to be able to automate since it sucks up a lot of time that could be more profitably spent doing other things. There's costs to both Type I and Type II errors so deterministic filters only get us so far (which isn't very).

crisnoble · 2026-06-01T14:09:40 1780322980

If it was incorrect 10% of the time would it be of help still?

bandrami · 2026-06-01T14:21:50 1780323710

Our pre-LLM system does better than that, but any improvement would help us do more lucrative things with our labor hours

crisnoble · 2026-06-01T14:37:59 1780324679

I am left wondering if it is such a critical task, how even 1% error rate would reduce human review of all outputs.

lazide · 2026-06-01T16:42:02 1780332122

Humans of course will screw at least 1% of the time, at least judged retroactively.

The fun part is, if you have non-trivial inputs, even if you don’t change anything, you’ll likely get a different 1% set of errors each time no matter how perfect your judges.

10% seems pretty high, but it really all depends on what you’re evaluating. If it’s all weird edge cases….