Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Keeping “Free Law” Free (benedelman.org)
68 points by thinkcomp on Jan 25, 2018 | hide | past | favorite | 32 comments


https://www.aallnet.org/Blogs/spectrum-blog/45407.html

”In 2008, computers at the Library for the U.S. Courts of the Seventh Circuit in Chicago provided free access to PACER (Public Access to Court Electronic Records), which normally cost eight cents per page. Swartz loaded a script onto a library computer, which automatically downloaded PACER records every three seconds and uploaded them to a cloud server. Over a couple of weeks, he downloaded about 20% of the PACER database. [Aaron] Swartz provided the PACER documents to Public.Resource.Org. The FBI's investigation of this incident eventually ended without charges. Swartz continued to promote free public access to PACER documents by working with RECAP and PlainSite.”


Do note, with PACER, making a search counts as a page. So does clicking on the second page of results - another page. So does clicking the print button. Another page.


> One, I don’t know that RECAP’s expenses need to be particularly high. Open source software development often does not entail paying developers anything at all.

It's incredibly hard to run an open source project like this with zero paid developers. I've done it! And others have too, but it's nearly impossible to pull off in the long term and having paid staff with some kind of recurring revenue is almost essential to keeping these kinds of projects alive and growing.

Sure, you can launch that kind of project and run it for a few years, but over time you have customer support queries to deal with, obnoxious bugs that take hours to ferret out, uptime and DevOps challenges that will ultimately otherwise fall under one volunteer who, if unpaid, has a day job and outside commitments and gets sick or bored or run down.

> With the right motivation including public praise, some people may be inclined to donate their skills. Certainly RECAP needs new features and improvements from time to time, but most such improvements should last indefinitely once built, reducing RECAP’s ongoing expenses.

What's the model here for having an organization with zero expenses run this kind of broad-based public good?

It's very easy to underestimate what it takes projects like RECAP to operate, or to think that open source contributions can bring costs close to zero, but the reality is that these projects are very hard to keep running on the cheap, which is why we keep seeing small projects — even when that get a lot of praise and acclaim — shutter.

And relatively speaking, RECAP is an incredibly cheap project already: It's got one full-time staffer, and in 2016 its expenses were under $100,000:

https://free.law/pdf/taxes/2016-990-EZ.pdf

RECAP is still making its data broadly accessible for free, while also trying to find some kind of sustainability model beyond a hope and a prayer. Donations and grants come and go, and while open source contributions can be helpful they can almost never keep a centralized service like this running for the long-term.


The model is RECAP before Free Law Project took over. It was backed by Princeton (which didn't spend much), had far fewer expenses (certainly no one got paid $90K a year; it was a grad student project), and moved forward code-wise via the coding challenge my foundation sponsored which cost one-fifth as much. See https://free.law/2013/04/02/two-recap-grants-awarded-in-memo....

The problem here is that RECAP may need money, but it is not being transparent about how it gets its money and is misleading users as a result.

Mike has made deliberate choices about how he runs FLP. I run Think Computer Foundation and PlainSite, which does the same thing as CourtListener. PlainSite is financially self-sustaining. CourtListener is not. There are alternatives to FLP's new "model."


To follow up, look at percentage of contributions to CourListener from just one individual vs. other volunteers:

https://github.com/freelawproject/courtlistener/graphs/contr...

"The community will keep it running for free" sounds great until you've actually tried managing that community.


This change makes a ton of sense if RECAP wants to make these documents accessible. Aggregate the data and make it easy for wholesale users. Soon enough, numerous sites/services will crop up, likely funded by ads, subscription fees or non-profit status, competing on usability for retail consumers. That will drive down profits and promote UI development and curation. Compare, e.g., ERISA and EDGAR data.


I'm the author of the article linked at the start of the thread. Replying to try to focus the discussion on the specific change I was writing about.

Piker, can you say more about how "the change" make the documents "accessible" (or more accessible)? They were already at Internet Archive just fine. Several sites already copied the documents from IA and added their own presentation, cross-linking, notifications, and other services on top. I don't see the proposed changes as helping with this. Indeed, by sending the latest data only to CourtListener but not to IA, the proposed changes stand in the way of the other sites and services you envision -- as it seems they'll now have to license the data from FLP/CL (on a paid basis), rather than get it free directly from IA. These are the general concerns I was trying to present in my article.


>> FLP also proposed to upload litigation materials to IA in only machine-readable formats compressed into enormous multi-gigabyte tarballs, ending the human-readable individual HTML files that have for years made it easy for normal users with standard web browsers to see court records.

Perhaps the "only" is telling here. Were they previously also uploading the tarballs? No wholesale user would want to scrape the thousands of extra pages of HTML to download the content. So if they weren't already uploading the tarballs, this is actually a beneficial change.


Previously FLP was uploading files that users can read with a web browser -- HTML, PDF, and also XML with metadata. I could and did link directly to HTML and PDFs, including circulating these materials with coauthors and research assistants and members of the press.

If FLP begins uploading only huge tarballs, and not the individual constituent files, I won't be able to do any of that.


"Soon enough, numerous sites/services will crop up, likely funded by ads, subscription fees or non-profit status, competing on usability for retail consumers."

Will they compete with the Internet Archive, which does not need to serve ads or charge fees?

The authors question remains unanswered. Is there any reason for the data to now be withheld from the Internet Archive?

This comment appears to be an appeal for "competition" that involves eliminating a competitor: the Internet Archive.

But consider that there are some users who prefer the Internet Archive for usability. Perhaps this is why the author writes about this on his blog. Anyone wishing to compete on usability can copy the data from Internet Archive and reformat it as they wish.

In the same way, there are some users who prefer EDGAR for SEC filings versus alternatives in terms of usability. Anyone can copy the data from EDGAR, repackage it and then compete on usability.

The existing source of the public data may present the data in a format free from Javascript tracking, third party advertising, paywalls and "free apis" that would allow access to be limited and optionally denied (contradicts stated objective of "making these documents accessible"; instead seeks to limit access). For some users, being free from these impediments makes the data highly accessible.

With respect to those users, any new proposed source must compete with the existing source. Not to mention the potential it allows for any others who may wish to reformat/repackage the data.

When the people behind the new proposed source call for the discontinuation of the original source, this raises a red flag.

Eliminating a competitor is not a prerequisite for "competition".

Finally, the suggestion of "non-profit status" is interesting.

Lets say someone who enjoys programming wants to start a project/company that repackages donated public information in a way that she believes is more usable than the alternatives.

Lets assume the costs of doing this are not that much, mainly just her time.

If she charges fees to users or advertisers for access, her income from this effort might exceed her costs.

She might reinvest the surplus into the project. She might pay herself a salary.

Is there a limit on how much she can pay herself while the business still remains tax-exempt?


In fairness to free.law which appears to provide unrestricted bulk access and uses open formats, what stops anyone, including the blog author, from downloading the bulk data, and then uploading to the Internet Archive?

To some users, it is actually preferable to have bulk access to raw data than thousands of individual html pages on a www server. Because for some it may be less work to transform the raw data into some other format, e.g., 1000s of pages of html, than it is to scrape all those html pages from a www server, process and store them in a searchable format.

Consider raw text versus PDF. PDF may look great but text is more flexible and more easily searchable. While its easy to convert text to PDF for reading, converting from PDF to text for searching is fraught with difficulty and a high margin for error.

If one accepts

   more difficult: PDF -> text 
   less difficult: text -> PDF
then it stands to reason that text is the more preferable format to start with. Because its both easy to search and easy to generate PDF for aesthetics and reading.


There is: A reasonable salary that's decided upon an an independent board of directors or an independent compensation committee based on evidence that that salary is in line with market rates for similar work.

Some details: http://blueavocado.org/content/how-much-pay-executive-direct...


The misleading thing about the “free law” angle is that PACER does not record “the law.” It’s a system for accessing parties’ legal filings. Opinions rendered by courts, which are “law” are generally posted on the courts’ websites: http://www.nysb.uscourts.gov/judges-info/opinions. PACER is a service that’s primarily used by litigants that’s value is primarily to litigants. Litigants who need PACER access but can’t afford it are given free access.

It’s not unreasonable for the government to charge a user fee to access it, like all the other kinds of user fees the government charges for public services. (Indeed, the government charges substantial filing fees for availing oneself of the courts in the first place.)


It would be nice if the world were so simple.

I run PlainSite, which indexes about 10 million dockets and is mentioned in the OP's post. I have experience in this field.

PACER is of value not only to litigants, but also to journalists, academics, employers, and average citizens who are interested in any given topic, individual or company.

Litigants who need PACER access but can't afford it are almost never given free access. When you file on CM/ECF, the e-filing side of PACER, you are given one free bite at the apple for documents in your own case. Legal research involving other cases is not free. Second attempts to view documents in your own case are not free. In forma pauperis status does not make your access free. PACER waivers are not granted to anyone but select academics, and then, rarely.

Many opinions are not properly tagged as opinions since it is each individual judge's responsibility to handle the tagging. PACER charges for them anyway, even though they are supposed to be free. This applies to tens or hundreds of thousands of documents on the system. See https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3034399 and https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3026779.

It is rayiner's opinion that it's not unreasonable for the government to charge an access fee per page for PACER, but it is the law (the E-Government Act of 2002) that doing so is inappropriate and illegal. Recently, the authors of that Act, Darrell Issa and Joe Lieberman, filed an amicus brief explaining their position on that matter yet again in the ongoing litigation. See https://www.plainsite.org/dockets/download.html?id=246610557....

The marginal cost to the government of filing a new suit is substantial; hence the filing fee. The marginal cost of providing one more PDF to the public is zero. That's an important difference.


So, how does your project obtain data?


We no longer have access to RECAP data due to FLP's actions. Neither do other public access sites. One just sold itself. Another is about to shut down.


> The misleading thing about the “free law” angle is that PACER does not record “the law.”

This is pretty much wrong. There are a variety of things on PACER that would be considered "the law" from opinions (although a lot of them have recently been made available) to court orders and judgments.

> It’s a system for accessing parties’ legal filings.

... as well as rulings made by the court and filings by the government. All of which are public records.

>Opinions rendered by courts, which are “law” are generally posted on the courts’ websites: http://www.nysb.uscourts.gov/judges-info/opinions.

Only some and this is a recent development.

> PACER is a service that’s primarily used by litigants that’s value is primarily to litigants.

What is your basis for this assertion? The information on PACER is used by journalists, historians, litigants, lawyers, scholars and others. Why do we need to be concerned with who or why citizens want to access public information held by the government?

> Litigants who need PACER access but can’t afford it are given free access.

Why should a governments' citizens have to 'afford' access to the public information of said government?

> It’s not unreasonable for the government to charge a user fee to access it, like all the other kinds of user fees the government charges for public services. (Indeed, the government charges substantial filing fees for availing oneself of the courts in the first place.)

PACER charges are not cost based fees, unless you take the view that the "cost basis" is all of the technology infrastructure of the judicial branch. Rather it funds what is essentially a technology orientated slush fund that the AO of the US Courts doesn't want to let go of.


This cannot be emphasized enough.

Accessing PACER from a courthouse is free. Accessing PACER for a case you are involved in is free. Getting judicial opinions from PACER is free. If you convince a judge that the cost of PACER is a burden to you, it is free. And finally, PACER doesn't actually charge you anything until you owe them more than $15 in a 3-month period - it is not cumulative: if your balance is less than $15, it goes to 0.


> Accessing PACER from a courthouse is free.

... other than the half day you had to take off work, transportation to the courthouse and fees for printing out an electronic document -- and that is if you are local. What if I want to get a 'free' document in a court that is 2,000 miles away?

> Accessing PACER for a case you are involved in is free.

This is false. If you are an ECF user you theoretically can get "one free view" of a document as they are filed in a case you are a party, however this rarely works so you end up having to pay the pacer fee anyway. You still have to pay PACER fees anytime you view the docket, search, or view any document in a "case you are involved in".

> Getting judicial opinions from PACER is free.

... a relative recent development and only covers some opinions.

> If you convince a judge that the cost of PACER is a burden to you, it is free.

Really? You have a citation for that? I'm sure I can convince at least one federal judge my several thousand dollar a month pacer bill is a burden. That would be great.


This is true, but on the other hand, anyone who has ever been involved in litigation knows that it takes about one day to exceed 150 pages ($15) worth of documents in legal research.


This is a narrow and problematic view.

While it's true that non-opinion parts of dockets aren't "the law" in the sense of issuing precedent that courts will be expected to follow in other cases, it's also true that these other parts of dockets contribute to important kinds of legal knowledge that we ought to care about in a democracy governed by law.

They include, for example, information about the actions of immensely powerful corporations and individuals. The fact that court dockets are public records allows us to know who has sued Donald Trump, or Google. They also provide information about the actions of the government, including courts, police, prosecutors, again in cases of extreme public concern.

Sorry to get on my high horse, but I literally wrote a book about the notion of the rule of law[1] --- and one of the key ideas of the rule of law is that the people need to have access to information about what the powerful do in order to collectively hold them accountable for their behavior.

For comparison: there's a Federal Register and a Code of Federal Regulations. The latter just contains the stuff that would be "the law" in rayiner's sense---the generally applicable results of rulemaking. The former also contains public notices, proposed rules, executive orders, and other executive branch actions. A world in which we had access to the CFR but not the Federal Register would be worse, from the standpoint of the freedom of law, than a world in which we have access to both (thankfully, the actual world).

[1] http://rulelaw.net


> The misleading thing about the “free law” angle is that PACER does not record “the law.” It’s a system for accessing parties’ legal filings.

I'm 100% certain that PACER has opinions, and those are law.


>PACER is a service that’s primarily used by litigants that’s value is primarily to litigants. Litigants who need PACER access but can’t afford it are given free access.

But everyone needs it, because everyone is a potential litigant. If that were not so, it wouldn't be "the legal system", but "a social club no one has to care about".


Very interesting topic, hadn't ever considered the possibility that the availability of legal proceedings / documents might be something other than a taxpayer-subsidized venture. It sure seems like that's the appropriate source for funds to accomplish this.


When I heard abut this the first time I also was surprised that this stuff is not available for free. In a country governed by laws it should be one of the most important things that the law is easily accessible to all people.


> one of the most important things that the law is easily accessible to all people

As rayiner points out [1], court filings are not case law. Decisions, the only component of a court's output that can properly be considered law, are generally freely and publicly available.

[1] https://news.ycombinator.com/item?id=16230803


That's fine. I still think that if openness is a good thing then the legal system should be one of the first places. Everything should go into a central and open database.


> I still think that if openness is a good thing then the legal system should be one of the first places

And I would tend to agree with you. But make that argument, not something about the laws of the land being locked away. The “free law” pitch comes across as dishonest to anyone remotely familiar with the system.


I mean, they do already. The database is just the court records and you can go to the courthouse and access them. That's how its been for centuries.


I think it's time to take the records online.


Another reply claimed that it occurs that decisions are wrongly tagged and thus accidentally not available for free: https://news.ycombinator.com/item?id=16231087


See "District Court Opinions that Remain Hidden Despite a Longstanding Congressional Mandate of Transparency – The Result of Judicial Autonomy and Systemic Indifference" by Peter Martin at Cornell. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3034399




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: