Joshua Foust, a former Defense Intelligence Agency analyst, pointed out that the NSA performs about 240 million database searches per year. Noting that it reported 2,776 violations of privacy rules in a recent one-year period, it had an error rate of "about 0.001156666667 percent."
This sounds very nice...
One U.S. official, for example, told reporters on a conference call that about 56,000 communications of Americans were inadvertently intercepted each year before Bates shuttered the program. The official called that a "relatively small number."
Until you realise that just one of those reported violations could be talking about 56,000 records.
If I were to perform the same sort of crimes against statistics as Mr Foust, I could multiply his error rate by the 56,000 that the other official states is a relatively small number, however that would give an error rate of over 64%.
So either that number of 56,000 is actually quite big compared to most violations, or there are far more database lookups than reported, or the NSA are spending most of their time looking at the largest source of data, or all of these figures are nonsense as they come from an organisation that is, pretty much by definition, not allowed to give out accurate figures.
note: edited for accident by stupid idiot brain. I confused 56,000 with 65,000 half way through and ended up with 75% instead of 64%. Not that it matters that much.
Not necessarily - it could also mean 2,776 database searches returning just over an average 20 communications per violation. (Correct me if my math is wrong)
(2,776 violations per year) / (.00001156666667 violations per search) = 240,000,000 total searches per year
(56,000 communications) / (2,776 searches) = average 20.172910663 records returned per search
(240,000,000 totals searches) * (20.172910663 records per search) = 4,841,498,559 total results per year
This ignores things like duplicate results and upper/lower bounds on the number of search results. Theoretically, the duplicate results would only factor into the non-violating result set, since any two queries that returned a US communication would both be a violation (although that might inflate the upper bound on search result set). That's also ignoring the possibility that a violating search could return both violating and non-violating results; in that case the average number of records per search would be higher, but not the total number of American communications, resulting in a larger set of total results.
Assuming that those number are somewhere in the ballpark, 56 thousand records of American communications out of at least 4.8 billion total records is a far cry from the 64% error rate that you proposed.
I think Foust's larger point is fair. Based on what we know so far, the NSA's excesses appear to be mostly inadvertent or perhaps careless. I don't mean to minimize it -- even a single instance of violating someone's Constitutional rights is a big deal... but it's still a different beast than Watergate-era programs specifically designed to infiltrate and surveil American anti-war activists.
One current federal prosecutor learned how agents were using SOD tips after a drug agent misled him, the prosecutor told Reuters. In a Florida drug case he was handling, the prosecutor said, a DEA agent told him the investigation of a U.S. citizen began with a tip from an informant. When the prosecutor pressed for more information, he said, a DEA supervisor intervened and revealed that the tip had actually come through the SOD and from an NSA intercept.
"I was pissed," the prosecutor said. "Lying about where the information came from is a bad start if you're trying to comply with the law because it can lead to all kinds of problems with discovery and candor to the court." The prosecutor never filed charges in the case because he lost confidence in the investigation, he said.
When they say "queries" -- are they referring to those initiated by Humans? Or algorithmically by systems?
I'd bet that there are infinitely more informational transactions that happen algorithmically than what they refer to, and thus all these numbers are pointless.
Wouldn't any algorithmic queries ultimately also be initiated by a human? Unless we're dealing Skynet, some human or group of humans would have had to come up with the question time consuming enough to require automated analysis, code up the algorithm to process the data, run the algorithm and ultimately review the results. Any violations should be able to be attributed to an actual human.
>Wouldn't any algorithmic queries ultimately also be initiated by a human?
Not necessarily; assume you have a flow that comes in - it will be parsed for innumerable phrases phonemes etc - and in the case of voice it will be transcribed to text and stored, which will also be parsed for triggers.
Thus, it is conceivable that every single ingress communications flow to the NSA is evaluated algorithmically - and not necessarily viewed/read/heard by a human.
So while their statement of some N # of erroneous searches may happen - it is irreverent as 100% of all ingress comms are already being parsed at some level.
This sounds very nice...
One U.S. official, for example, told reporters on a conference call that about 56,000 communications of Americans were inadvertently intercepted each year before Bates shuttered the program. The official called that a "relatively small number."
Until you realise that just one of those reported violations could be talking about 56,000 records.
If I were to perform the same sort of crimes against statistics as Mr Foust, I could multiply his error rate by the 56,000 that the other official states is a relatively small number, however that would give an error rate of over 64%.
So either that number of 56,000 is actually quite big compared to most violations, or there are far more database lookups than reported, or the NSA are spending most of their time looking at the largest source of data, or all of these figures are nonsense as they come from an organisation that is, pretty much by definition, not allowed to give out accurate figures.
note: edited for accident by stupid idiot brain. I confused 56,000 with 65,000 half way through and ended up with 75% instead of 64%. Not that it matters that much.