Been using Django in production since 2008, and so happy with my choice. Absolutely amazing being able to keep the same knowledge and workflow for my whole career so far, and still have a modern, maintained piece of software as a base. The Django Admin still makes my life better all these years later.
Props to the Fellows who are keeping these releases running on time and getting better every year. Boring software FTW!
Yep -- our story here: https://about.readthedocs.com/blog/2024/07/ai-crawlers-abuse... (quoted in the OP) -- everyone I know has a similar story who is running large internet infrastructure -- this post does a great job of rounding a bunch of them up in 1 place.
I called it when I wrote it, they are just burning their goodwill to the ground.
I will note that one of the main startups in the space worked with us directly, refunded our costs, and fixed the bug in their crawler. Facebook never replied to our emails, the link in their User Agent led to a 404 -- an engineer at the company saw our post and reached out, giving me the right email -- which I then emailed 3x and never got a reply.
AI firms seem to be leading from a position that goodwill is irrelevant: a $100bn pile of capital, like an 800lb gorilla, does what it wants. AI will be incorporated into all products whether you like it or not; it will absorb all data whether you like it or not.
Yep. And it is much more far reaching than that. Look at the primary economic claim offered by AI companies: to end the need for a substantial portion of all jobs on the planet. The entire vision is to remake the entire world into one where the owners of these companies own everything and are completely unconstrained. All intellectual property belongs to them. All labor belongs to them. Why would they need good will when they own everything?
"Why should we care about open source maintainers" is just a microcosm of the much larger "why should we care about literally anybody" mindset.
> Look at the primary economic claim offered by AI companies: to end the need for a substantial portion of all jobs on the planet.
And this is why AI training is not "fair use". The AI companies seek to train models in order to compete with the authors of the content used to train the models.
A possible eventual downfall of AI is that the risk of losing a copyright infringement lawsuit is not going away. If a court determines that the AI output you've used is close enough to be considered a derivative work, it's infringement.
I've pointed this out to a few people in this space. They tend to suggest that the value in AI is so great this means we should get rid of copyright law entirely.
That value is only great if it's shared equitably with the rest of the planet.
If it's owned by a few, as it is right now, it's an existential threat to the life, liberty, and pursuit of a happiness of everyone else on the planet.
We should be seriously considering what we're going to do in response to that threat if something doesn't change soon.
Yep. The "wouldn't it be great if we had robots do all the labor you are currently doing" argument only works if there is some plan to make sure that my rent gets paid other than me performing labor.
It depends if you're the only one out of a job. If it really is everyone then the answer will likely be some variant of metaphorically or literally killing your landlord in favor of a different resource allocation scheme. I put these kinds of things in a "in that world I would have bigger problems" bucket.
And that's the ultimate fail of capitalist ethics - the notion that we must all work just so we can survive. Look at how many shitty and utterly useless jobs exist just so people can be employed on them to survive.
This has to change somehow.
"Machines will do everything and we'll just reap the profits" is a vision that techno-millenialists are repeating since the beginnings of the Industrial Revolution, but we haven't seen that happening anywhere.
For some strange reason, technological progress seem to be always accompanied with an increase on human labor. We're already past the 8-hours 5-days norm and things are only getting worse.
> And that's the ultimate fail of capitalist ethics - the notion that we must all work just so we can survive. Look at how many shitty and utterly useless jobs exist just so people can be employed on them to survive.
This isn't a consequence of capitalism. The notion of having to work to survive - assuming you aren't a fan of slavery - is baked into things at a much more fundamental level. And lots of people don't work, and are paid by a welfare state funded by capitalism-generated taxes.
> "Machines will do everything and we'll just reap the profits" is a vision that techno-millenialists are repeating since the beginnings of the Industrial Revolution, but we haven't seen that happening anywhere.
They were wrong, but the work is still there to do. You haven't come up with the utopian plan you're comparing this to.
> For some strange reason, technological progress seem to be always accompanied with an increase on human labor.
No it doesn't. What happens is not enough people are needed to do a job any more, so they go find another job. No one's opening barista-staffed coffee shops on every corner in the time when 30% of the world was doing agricultural labour.
Yes, it is. The fact we have welfare isn't a refutation of that, it's proof. The welfare is a bandaid over the fundamental flaws of capitalism. A purely capitalist system is so evil, it is unthinkable. Those people currently on welfare should, in a free labor market, die and rot in the street. We, collectively, decided that's not a good idea and went against that.
That's why the labor market, and truly all our markets, are not free. Free markets suck major ass. We all know it. Six year olds have no business being in coal mines, no matter how much the invisible hand demands it.
You have a very different definition of free than I do. Free to me means that people enter into agreements voluntarily. It's hard to claim a market is free when it's participants have no other choice...
You are correct, but the real problem is that copyright needs complete reform.
Let's not forget the basis:
> [The Congress shall have Power . . . ] To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.
Is our current implementation of copyright promoting the progress of science and useful arts?
Or will science and the useful arts be accelerated by culling back the current cruft of copyright laws?
For example, imagine if copyright were non-transferable and did not permit exclusive licensing agreements.
AI is going to implode within 2 years. Once it starts ingesting its own output as training data it is going to be at best capped at its current capability and at worst even more hallucinatory and worthless.
The mistake you make here is to forget that the training data of the original models was also _full_ or errors and biases — and yet they still produced coherent and useful output. LLM training seems to be incredibly resilient to noise in the training set.
That's a talking point for bros looking to exploit it as their ticket.
"The upside of my gambit is so great for the world, that I should be able to consume everyone else's resources for free. I promise to be a benevolent ruler."
That's not how conservatism works. AI oligarchs are part of the "in" group in the "there are laws that protect but do not bind the in group, and laws that bind but do not protect the out group" summary. Anyone with a net worth less than FOTUS is part of the "out" group.
AI is worthless without training data. If all content becomes AI generated because AI outcompetes original content then there will be no data left to train on.
When Google first came out in 1998, it was amazing, spooky how good it was. Then people figured out how to game pagerank and Google's accuracy cratered.
AI is now in a similar bubble period. Throwing out all of copyright law just for the benefit of a few oligarchs would be utter foolishness. Given who is in power right now I'm sure that prospect will find a few friends, but I think the odds of it actually happening before the bubble bursts are pretty small.
Are we not past past critical mass though? The velocity at which these things can out compete human labor is astonishing, any future human creations or original content will already have lost the battle the moment it goes online and gets cloned by AI.
OK. To be clear, that wasn't about the OP, but rather the alleged people promoting the abolition of copyright... which would significantly hurt open source.
The people agitating for such things are usually leeches who want everything free and do, in fact, hold an infantile worldview that doesn't consider how necessary remuneration is to whatever it is they want so badly (media pirates being another example).
Not that I haven't "pirated" media, but this is usually the result of it not being available for purchase or my already having purchased it.
I'm curious what will happen when someone modifies a single byte (or a "sufficient" number of bytes) of AI output, thereby creating a derivative work, and then claiming copyright on that modified work.
> The AI companies seek to train models in order to compete with the authors of the content used to train the models.
When I read someone else’s essay I may intend to write essays like that author. When I read someone else’s code I may intend to write code like that author.
AI training is no different from any other training.
> If a court determines that the AI output you've used is close enough to be considered a derivative work, it's infringement.
Do you mean the output of the AI training process (the model), or the output of the AI model? If the former, yes, sure: if a model actually contains within it it copies of data, then sure: it’s a copy of that work.
But we should all be very wary of any argument that the ability to create a new work which is identical to a previous work is itself derivative. A painter may be able to copy Gogh, but neither the painter’s brain nor his non-copy paintings (even those in the style of Gogh) are copies of Gogh’s work.
If you as an individual recognizably regurgitate the essay you read, then you have infringed. If an AI model recongnizably regurgitates the essay it trained on then it has infringed. The AI argument that passing original content through an algorithm insulates the output from claims of infringement because of "fair use" is pigwash.
> If an AI model recongnizably regurgitates the essay it trained on then it has infringed.
I completely agree — that’s why I explicitly wrote ‘non-copy paintings’ in my example.
> The AI argument that passing original content through an algorithm insulates the output from claims of infringement because of "fair use" is pigwash.
Sure, but the argument that training an AI on content is necessarily infringement is equally pigwash. So long as the resulting model does not contain copies, it is not infringement; and so long as it does not produce a copy, it is not infringement.
> So long as the resulting model does not contain copies, it is not infringement
That's not true.
The article specifically deals with training by scraping sites. That does necessarily involve producing a copy from the server to the machine(s) doing the scraping & training. If the TOS of the site incorporates robots.txt or otherwise denies a license for such activity, it is arguably infringement. Sourcehut's TOS for example specifically denies the use of automated tools to obtain information for profit.
I'm curious how this can be applied with the inevitable combinatorial exhaustion that will happen with musical aspects such as melody, chord progression, and rhythm.
Will it mean longer and longer clips are "fair use", or will we just stop making new content because it can't avoid copying patterns of the past?
> I'm curious how this can be applied with the inevitable combinatorial exhaustion that will happen with musical aspects such as melody, chord progression, and rhythm.
They did this in 2020. The article points out that "Whether this tactic actually works in court remains to be seen" and I haven't been following along with the story, so I don't know the current status.
More germane is that there will be a smoking gun for every infringement case: whether or not the model was trained on the original. There will be no pretending that the model never heard the piece it copied.
> AI training is no different from any other training.
Yes, it is. One is done by a computer program, and one is done by a human.
I believe in the rights and liberties of human beings. I have no reason to believe in rights for silicon. You, and every other AI apologist, are never able to produce anything to back up what is largely seen as an outrageous world view.
You cannot simply jump the gun and compare AI training to human training like it's a foregone conclusion. No, it doesn't work that way. Explain why AI should have rights. Explain if AI should be considered persons. Explain what I, personally, will gain from extending rights to AI. And explain what we, collectively, will gain from it.
I have this line of thought as well but then I wonder, if we are all out of jobs and out of substantial capital to spend, how do these owners make money ultimately? It's a genuine question and I'm probably missing something obvious. I can see a benevolant/post-scarcity spin to this but the non-benevolant one seems self defeating.
"Making money" is only a relevant goal when you need money to persuade humans to do things for you.
Once you have an army of robot slaves ... you've rendered the whole concept of money irrelevant. Your skynet just barters rare earth metals with other skynets and your robot slaves furnish your desired lifestyle as best they can given the amount of rare earth metals your skynet can get its hands on. Or maybe a better skynet / slave army kills your skynet / slave army, but tough tits, sucks to be you and rules to be whoever's skynet killed yours.
That's part of the "rare earth metals" synecdoche - hydroelectric dams, thorium mines, great lakes heat sinks - they're all things for skynets to kill or barter for as expedient
I don’t think you’re missing anything, I think the plan really is to burn it all down and rule over the ashes. The old saw “if you’re so smart, why aren’t you rich?” works in reverse too. This is a foolish, shortsighted thing to do, and they’re doing it anyway. Not really thinking about where value actually comes from or what their grandchildren’s lives would be like in such a world.
Capitalism is an unthinking, unfeeling force. The writing is on the wall that AI is coming, and being altruistic about it doesn’t do jack to keep others from the land grab. Their thinking is, might as well join the rush and hope they’re one of the winners. Every one of us sitting on the sidelines will be impacted in some way or the other. So who’re the smart ones, the ones who grab shovels and start digging, or the ones who watch as the others dig their graves and do nothing?
Sure, maybe in 50 years. At the moment, it's a productivity tool. Strangely, by the look of the down votes, the HN community doesn't quite understand this.
Job market is formed by the presence of needs and the presence of the ability to satisfy them. AI - does not reduce the ability to satisfy needs, so only possible situation where you won't be able to compete - is either the socialists will seize power and ban competition, or all the needs will be met in some other ways. In any other situation - there will be job market and the people will compete in it
> there will be job market and the people will compete in it
maybe there will be. I'm sure there also is a market for Walkman somewhere, its just exceedingly small.
The proclaimed goal is to displace workers on a grand scale. This is basically the vision of any AI company and literally the only way you could even remotely justify their valuations given the heavy losses they incur right now.
> Job market is formed by the presence of needs and the presence of the ability to satisfy them
The needs of a job market are largely shaped by the overall economy. Many industrial nations are largely service based economies with a lot of white collar jobs in particular. These white collar jobs are generally easier to replace with AI than blue collar jobs because you don't have to deal with pesky things like the real, physical world. The problem is: if white collar workers are kicked out of their jobs en masse, it also negatively affects the "value" of the remaining people with employment (exhibit A: tech job marker right now).
> is either the socialists will seize power and ban competition,
I am really having a hard time understanding where this obsession with mythical socialism comes from. The reality we live in is largely capitalistic and a striving towards a monopoly - i.e. a lack of competition - is basically the entire purpose of a corporation, which is only kept in check by government regulations.
>The proclaimed goal is to displace workers on a grand scale.
It doesn't matter. What you need to understand - is that in the source of the job market is needs, ability to meet those needs and ability to exchanges those ability on one another. And nothing of those are hindered by AI.
>Many industrial nations are largely service based economies with a lot of white collar jobs in particular.
Again: in the end of the day it doesn't change anything. In the end of the day you need a cooked dinner, a built house and everything else. So someone must build a house and exchange it for a cooked dinners. That's what happening (white collar workers and international trade balance included) and that's what job market is. AI doesn't changes the nature of those relationship. Maybe it replace white collar workers, maybe even almost all of them - that's only mean that they will go to satisfy another unsatisfied needs of other people in exchange for satisfying their own, job marker won't go anywhere, if anything - amount of satisfied needs will go up, not down.
>if white collar workers are kicked out of their jobs en masse, it also negatively affects the "value" of the remaining people with employment
No, it doesn't. I mean it does if they would be simply kicked out, but that's not the case - they would be replaced by AI. So the society get all the benefits that they were creating plus additional labor force to satisfy earlier unsatisfied needs.
>exhibit A: tech job marker right now
I don't have the stats at hand, but aren't blue collar workers doing better now than ever before?
>I am really having a hard time understanding where this obsession with mythical socialism comes from
From the history of the 20th century? I mean not obsession, but we we are discussing scenarios of the disappearance (or significant decrease) of the job market, and the socialists are the most (if not only) realistic reason for that at the moment.
>The reality we live in is largely capitalistic and a striving towards a monopoly
Yeas, and this monopoly, the monopoly, are called "socialism".
>corporation, which is only kept in check by government regulations.
Generally corporation kept in check by economic freedom of other economic agents, and this government regulations that protects monopolies from free market. I mean why would government regulate in other direction? Small amount of big corporations are way easier for government to control and get personal benefits from them.
> In the end of the day you need a cooked dinner, a built house and everything else. So someone must build a house and exchange it for a cooked dinners.
You should read some history.This veiw is so naive and overconfident.
My views on this issue are shaped by history. Starting with crop production and plowing and ending with book printing, conveyor belts and microelectronics - creating tools that increase productivity has always led to increased availability of goods, and the only reason that has lead to decreased availability - is things that has hindered ability to create and exchange goods.
I started a borderline smug response here pointing out how bullshit white collar and service jobs* where in deep shit but folks who actually work for a living would be fine. I scrapped it halfway through when it occurred to me that if everyone's broke then by definition nobody's spending money on stuff like contractors, mechanics, and other hardcore blue collar trades. Toss in AI's force multiplication of power demands in the face of all of the current issues around global warming and it starts to feel like pursuing this tech is fractally stupid and the best evidence to date I've seen that a neo-luddite movement might actually be a thing the world could benefit from. That last part is a pretty wild thought coming from a retired developer who spent the bulk of his adult life in IT, but here we are.
Neo-Luddism is less stupid when you remember that the Luddites weren't angry that looms existed. Smashing looms was their tactic, not their goal.
Parliament had made a law phasing in the introduction of automated looms; specifically so that existing weavers were first on the list to get one. Britain's oligarchy completely ignored this and bought or built looms anyway; and because Parliament is part of that oligarchy, the law effectively turned into "weavers get looms last". That's why they were smashing looms - to bring the oligarchy back to the negotiating table.
The oligarchy responded the way all violent thugs do: killing their detractors and lying about their motives.
>if everyone's broke
>nobody's spending money on stuff like contractors, mechanics, and other hardcore blue collar trades.
Why would this happen? Money is simply a medium of exchange of values that this contractors, mechanics and other hardcore blue collar trades are creating. How can they be broke, if Ai doesn't disturb their ability to create values and exchange it?
Customers that have funds available to purchase the services you offer and who are willing to actually spend that money are a hard requirement to maintain any business. If white collar and service industries are significantly disrupted by AI this necessarily reduces the number of potential customers. Thing is you don't have to lay off that many people to bankrupt half of the contractors in the country, a decent 3-5 year recession is all it takes. Folks stop spending on renovations and maintenance work when they're worried about their next paycheck.
Money mean nothing. It is simply medium of exchange. The question is, is there anything to exchange? And the answer is yeas, and position of white collar workers doesn't affect availability of things for exchange. There's no reason for recession, there is nothing that can hinder ability of blue collar workers to create goods and services, all that things that when combined is called "wealth".
Don't think in the meaningless category of "what set of digits will be printed in the piece of paper called paycheck?". Think in the terms, that are implied: "What goods and services blue collar workers can't afford to themselves?". And it will become clear that the set of unaffordable goods and services to blue collar workers will decrease because of the replacement white collar workers with AI, because it is not hinder their ability to create those goods and services.
You think so? Give me the contents of your checking, savings, and retirement accounts and then get back to me on that.
> position of white collar workers doesn't affect availability of things for exchange.
You appear to be confused about the concept of consumers, let me help. Consumers are the people who buy things. When there are fewer consumers in a market, demand for products and services declines. This means less sales. So no, you don't get to unemploy big chunks of the population and expect business to continue thriving.
>When there are fewer consumers in a market, demand for products and services declines.
No, demand is unlimited and defined by the amount of production.
>You don't get to unemploy big chunks of the population and expect business to continue thriving.
I mean, generally replaced worker with the instruments - is the main way to business (and society) to thrive. In other words, what goods and services will became less affordable to the blue collar workers?
When ~white collar [researchers, programmers, managers, salespeople, translators, illustrators, ...] lose their income/jobs to AI's -> lose their ability to buy products/services and at the same time try to shift in mass to doing some kind of manual work, do you think that would not affect incomes of those who are the current blue collar class?
I mean yeas, values of consumed goods will decrease, so blue color workers will be able to consume more. That's exactly what is called increase of income.
My gut is telling me you're being intentionally obtuse but I'm going to give you the benefit of the doubt. To reiterate in detail:
AI is poised to disrupt large swaths of the workforce. If large swaths of the workforce are disrupted this necessarily means a bunch of people will see their income negatively impacted (job got replaced by AI). Broke people by definition don't have money to spend on things, and will prioritize tier one of Maslow's Hierarchy out of necessity. Since shit like pergolas and oil changes are not directly on tier 1 they will be deprioritized. This in turn cuts business to blue collar service providers. Net result: everyone who isn't running an AI company or controlling some currently undefined minimum amount of capital is fucked.
If you're trying to suggest that any notional increases in productivity created by AI will in any way benefit working class individuals either individually or as a group you are off the edge of the map economically speaking. Historical precedents and observed executive tier depravity both suggest any increase in productivity will be used as an excuse to cut labor costs.
>This in turn cuts business to blue collar service providers.
No, it doesn't. Where's that is come from?
I mean, look at the situation from the perspective of blue collar service providers: what is exactly those goods and services, that they was be able to afford for themselves, but AI will make it unaffordable for them? Pretty obviously, that there's about none of those goods and services. So, in big picture, all that process that you described, doesn't lead to any disadvantage of blue collar workers.
I literally described the mechanism to you twice and you're still acting confused. I'm not sure if we have a language barrier here or what but go check out a Khan Academy course on economics or maybe try running a lemonade stand for an afternoon if you still don't get it.
I think the obvious thing you are missing is just b2b. It doesn’t actually matter if people have any money.
Similar to how advertising and legal services are required for everything but have ambiguous ROI at best, AI is set to become a major “cost of doing business“ tax everywhere. Large corporations welcome this even if it’s useless, because it drags down smaller competitors and digs a deeper moat.
Executives large and small mostly have one thing in common though.. they have nothing but contempt for both their customers and their employees, and would much rather play the mergers and acquisitions type of games than do any real work in their industry (which is how we end up in a world where the doors are flying off airplanes mid flight). Either they consolidate power by getting bigger or they get a cushy exit, so.. who cares about any other kind of collateral damage?
Money is a proxy for control. Eventually humans will become mostly redundant and slated for elimination except for the chosenites of the managerial classes and a small number of technicians. Either through biological agents, famines, carefully engineered (civil?) wars and conflicts designed to only exterminate the non-managerial classes, or engineered Calhounian behavioral sinks to tank fertility rates below replacement.
Why should we care if they make money? Owning things isn't a contribution to society.
Building things IS a contribution to society, but the people who build things typically aren't the ultimate owners. And even in cases where the builders and owners are the same, entitling the builders and all of their future heirs to rent seek for the rest of eternity is an inordinate reward.
You don't. It's like Minecraft. You can do almost everything in Minecraft alone and everything exists in infinite quantity, so why trade in the first place?
This goes both ways. Let's say there is something you want but you're having trouble obtaining it. You'd need to give something in exchange.
But the seller of what you want doesn't need the things you can easily acquire, because they can get those things just as easily themselves.
The economy collapses back into self sufficiency. That's why most Minecraft economy servers start stagnating and die.
What people say is not the same as what people do.. in other words, what is spoken in public repeatedly is not representational of actual decision flows
Money is only a bookkeeping tool for complex societies. The aim of the owner class in a worker-less world would be accumulation of important resources to improve their lives and to trade with other owners (money would likely still be used for bookkeeping here). A wealthy resource-owner might strive to maintain a large zone of land, defended by AI weaponry, that contains various industrial/agricultural facilities producing goods and services via AI.
They would use some of the goods/services produced themselves, and also trade with other owners to live happy lives with everything they need, no workers involved.
Non-owners may let the jobless working class inhabit unwanted land, until they change their minds.
With what and against what? There will be spy satellites and drones and automated turrets that will turn you to pulp if you come within, say, 50KM of their compound borders.
The non-benevolent future is not self-defeating; we have historical examples of depressingly stable economies with highly concentrated ownership. The entirety of the European dark ages was the end result of (western[0]) Rome's elites tearing the planks out of the hull of the ship they were sailing. The consequence of such a system is economic stagnation, but that's not a consequence that the elites have to deal with. After all, they're going to be living in the lap of luxury, who cares if the economy stagnates?
This economic relationship can be collectively[1] described as "feudalism". This is a system in which:
- The vast majority of people are obligated to perform menial labor, i.e. peasant farmers.
- Class mobility is forbidden by law and ownership predominantly stays within families.
- The vast majority of wealth in the economy is in the form of rents paid to owners.
We often use the word "capitalist" to describe all businesses, but that's a modern simplification. Businesses can absolutely engage in feudalist economies just as well, or better, than they can engage in capitalist ones. The key difference is that, under capitalism, businesses have to provide goods or services that people are willing to pay for. Feudalism makes no such demand; your business is just renting out a thing you own.
Assuming AI does what it says on the tin (which isn't at all obvious), the endgame of AI automation is an economy of roughly fifty elite oligarchs who own the software to make the robots that do all work. They will be in a constant state of cold war, having to pay their competitors for access to the work they need done, with periodic wars (kinetic, cyber, legal, whatever) being fought whenever a company intrudes upon another's labor-enclave.
The question of "well, who pays for the robots" misunderstands what money is ultimately for. Money is a token that tracks tax payments for coercive states. It is minted specifically to fund wars of conquest; you pay your soldiers in tax tokens so the people they conquer will have to barter for money to pay the tax collector with[2]. But this logic assumes your soldiers are engaging in a voluntary exchange. If your 'soldiers' are killer robots that won't say no and only demand payment in energy and ammunition, then you don't need money. You just need to seize critical energy and mineral reserves that can be harvested to make more robots.
So far, AI companies have been talking of first-order effects like mass unemployment and hand-waving about UBI to fix it. On a surface level, UBI sounds a lot like the law necessary to make all this AI nonsense palatable. Sam Altman even paid to have a study done on UBI, and the results were... not great. Everyone who got money saw real declines in their net worth. Capital-c Conservative types will get a big stiffy from the finding that UBI did lead people to work less, but that's only part of the story. UBI as promoted by AI companies is bribing the peasants. In the world where the AI companies win, what is the economic or political restraining bolt stopping the AI companies from just dialing the UBI back and keeping more of the resources for themselves once traditional employment is scaled back? Like, at that point, they already own all the resources and the means of production. What makes them share?
[0] Depending on your definition of institutional continuity - i.e. whether or not Istanbul is still Constantinople - you could argue the Roman Empire survived until WWI.
[1] Insamuch as the complicated and ideosyncratic economic relationships of medieval Europe could even be summed up in one word.
[2] Ransomware vendors accidentally did this, establishing Bitcoin (and a few other cryptos) as money by demanding it as payment for a data ransom.
And how could they possibly base their actions on good when their technology is more important than fire? History is depending on them to do everything possible to increase their market cap.
> The entire vision is to remake the entire world into one where the owners of these companies own everything and are completely unconstrained.
I agree with you in the case of AI companies, but the desire to own everything an bee completely unconstrained is the dream of every large corporation.
in the past, you had to give some of your spoils to those who did the conquering for you, and laborers after that. if you can automate and replace all work, including maintening the robots that do that and training them, you no longer need to share anything.
In my view it's the same thing, same trajectory -- with more power in the hands of fewer people further along the trajectory.
It can be better or worse depending on what those with power choose to do. Probably worse. There has been conquest and domination for a long time, but ordinary people have also lived in relative peace gathering and growing food in large parts of the world in the past, some for entire generations. But now the world is rapidly becoming unable to support much of that as abundance and carrying capacity are deleted through human activity. And eventually the robot armies controlled by a few people will probably extract and hoard everything that's left. Hopefully in some corners some people and animals can survive, probably by being seen as useful to the owners.
On the bright side, armies of robot slaves give us an off-ramp from the unsustainable pyramid scheme of population growth.
Be fruitful, and multiply, so that you may enjoy a comfortable middle age and senescence exploiting the shit out of numerous naive 25-year-olds! If it's robots, we can ramp down the population of both humans and robots until the planet can once again easily provide abundance.
Sure, the problem though is it won't be "we" deciding what the robots do, it will most likely be a few powerful people of dubious character and motivations since those are the sort of people who pursue power and end up powerful.
That's why even though technology could theoretically be used to save us from many of our problems, it isn't primarily used that way.
But presumably petty tyrants with armies of slave robots are less interested than consensus in a long-term vision for humanity that involves feeding and housing a population of 10 billion.
So after whatever horrific holocaust follows the AI wars the way is clear for a hundred thousand humans to live in the lap of luxury with minimal impact on the planet. Even if there are a few intervening millennia of like 200 humans living in the lap of luxury and 99,800 living in sex slavery.
The thing is that this will be their destruction as well. If workers don't have any money (because they don't have jobs), nobody can afford what the owners have to sell?
They are also gutting the profession of software engineering. It's a clever scam actually: to develop software a company will need to pay utility fees to A"I" companies and since their products are error prone voila use more A"I" tools to correct the errors of the other tools. Meanwhile software knowledge will atrophy and soon ala WALE we'll have software "developers" with 'soft bones' floating around on conveyed seats slurping 'sugar water' and getting fat and not knowing even how to tie their software shoelaces.
Yes, like the Pixel camera app, which mangles photos with AI processing, and users complain that it won't let people take pics.
One issue was a pic with text in it, like a store sign. Users were complaining that it kept asking for better focus on the text in the background, before allowing a photo. Alpha quality junk.
That's pretty much what our future would look like -- you are irrelevant. Well I mean we are already pretty much irrelevant nowadays, but the more so in the "progressive" future of AI.
Rules and laws are for other people. A lot of people reading this comment having mistaken "fake it til you make it" or "better to not ask permission" for good life advice are responsible for perpetrating these attitudes, which are fundamentally narcissistic.
I think the logic is more like “we have to do everything we can to win or we will disappear”. Capitalism is ruthless and the big techs finally have some serious competition, namely: each other as well as new entrants.
Like why else can we just spam these AI endpoints and pay $0.07 at the end of the month? There is some incredible competition going on. And so far everyone except big tech is the winner so that’s nice.
> One crawler downloaded 73 TB of zipped HTML files in May 2024 [...] This cost us over $5,000 in bandwidth charges
I had to do a double take here. I run (mostly using dedicated servers) infrastructure that handles a few hundred TB of traffic per month, and my traffic costs are on the order of $0.50 to $3 per TB (mostly depending on the geographical location). AWS egress costs are just nuts.
I think uncontrolled price of cloud traffic - is a real fraud and way bigger problem then some AI companies that ignore robot.txt. One time we went over limit on Netlify or something, and they charged over thousand for a couple TB.
> I think uncontrolled price of cloud traffic - is a real fraud
Yes, it is.
> and way bigger problem then some AI companies that ignore robot.txt.
No, it absolutely is not. I think you underestimate just how hard these AI companies hammer services - it is bringing down systems that have weathered significant past traffic spikes with no issues, and the traffic volumes are at the level where literally any other kind of company would've been banned by their upstream for "carrying out DDoS attacks" months ago.
>I think you underestimate just how hard these AI companies hammer services
Yeas, I completely don't understand this and don't understand comparing this with ddos attacks. There's no difference with what search engines are doing, and in some way it's worse? How? It's simply scraping data, what significant problems may it cause? Cache pollution? And thats'it? I mean even when we talking about ignoring robots.txt (which search engines are often doing too) and calling costly endpoints - what is the problem to add to those endpoints some captcha or rate limiters?
Yeah you have a point, hmmm, wish there were a way to somehow generate those garbages with minimum bandwidth. Something like, I can send you a very compressed 256 bytes of data which expands to something like 1 mega bytes.
there is -- but instead of garbage expanding data, add in several delays within the response so that the data takes extraordinarily long
Depending on the number of simultaneous requesting connections, you may be able to do this without a significant change to your infrastructure. There are ways to do it that don't exhaust your number of (IP, port) available too, if that is an issue.
Then the hard part is deciding which connections to slow, but you can start with a proportional delay based on the number of bytes per source IP block or do it based on certain user agents. Might turn into a small arms race but it's a start.
It does not even have to be dynamically generated. Just pre-generate a few thousand static pages of AI slop and serve that. Probably cheaper than dynamic generation.
I kind of suspect some of these companies probably have more horsepower and bandwidth in one crawler than a lot of these projects have in their entire infrastructure.
Thanks for writing about this. Is it clear that this is from crawlers, as opposed to dynamic requests triggered by LLM tools, like Claude Code fetching docs on the fly?
Along with having block lists, perhaps you could add poison to your results that generates random bad code that will not work, and that is only seen by bots (display: none when rendered), and the bots will use it, but a human never would.
Hi, and thanks for the feedback! One of my top priorities right now is learning as much as possible about online book club experiences from others, so I really appreciate you sharing the blog post!
Interesting to hear. Personally, finding a book was usually the easiest part (probably because we have dozens waiting on our 'to-read' lists). I’ll do my best to make the book recommendation experience as smooth as possible.
Yea we all have a lot of books to read, but trying to figure out how the group wants to learn something and discuss is hard. Especially when the goal is having some business benefit, not just reading for fun.
There’s a lot of “paper reading” clubs which might also be interesting to look in to.
That's a really interesting angle, I've never thought about paper reading clubs before. They seem much more fast-paced and could probably benefit from a shared notes side feature! Thank you for the food for thought!
The minor difference is that :q! quits without saving but returns zero as the exit code, but :cq quits with a nonzero exit code. Git interprets the nonzero exit code as "editing failed", following the Unix convention that zero means success. If you didn't save the commit message while working on it, :q! will send the empty template back to Git, which Git is smart enough to not commit. But if you accidentally save your work partway through, :q! will still commit the message you wanted to abandon.
That only works if the edit buffer is blank or only has commented out lines. In other cases, such as when you're trying to cancel a `git commit --amend` that loads up the last commit message in your buffer, :q! will not cancel the commit, but :cq will.
I've been working in this space for a long time, where I'm one of the co-founders of Write the Docs: https://www.writethedocs.org/ -- we focus more on software docs.
The view from the industry is basically that STC was a bit behind the times, and was slowly dwindling in terms of reach and value. They still had some active chapters, magazine, and academic journal that provided value for folks, but membership wasn't as valuable as it had been.
They have been around a long time, and had a wider purview that WTD, focusing on many different types of technical writing. They had members in industries like Automotive, Engineering, and Aerospace, as well as Software.
The best way to think about them is something like the ACM in the software industry. They have been taken over by more current community approaches in various areas (eg. Pycon), but also still doing some more traditional stuff that adds value but isn't as relevant to day-to-day practitioners.
I don't know if the ACM is comparable. They run so many academic conferences that the list gets its own Wikipedia page [0]. Many are heavy hitters in their respective fields, like SIGGRAPH (graphics), SOSP (systems), and PLDI (programming languages). I think of them more as an academic publisher and organizer than as a professional association.
This is a beautiful vision, but I think it would be hard implement in practice. I’m trying to imagine how a pitch like this might work. Do you offer the customer/members some kind of profit sharing? A discount on future services?
Given that customers often want to avoid lock-in on any purchasing decision, it seems hard to build a service that has a larger up front psychological and legal commitment. I love the idea of getting bonus points in life for building structures with collaborative ownership, but realistically most people and businesses only want a simple “buy a service that I can cancel” relationship.
That said, I'd love to see someone try it! I think it could work well in a niche environment, or something like a Kickstarter where people feel they helped bring something into being.
I think my (and other) Community Supported Agriculture (CSA) farms would disagree with you.
I buy shares in the farm per my needs. The CSA takes my money and buys seeds, fertilizer, etc. I get discounted (100% :-) products from them throughout the growing season. They also sell their goods at farmers' markets, do deliveries, etc. My CSA has been growing for years. They're a part of a larger co-op org that spans the NE US, IIUC.
So yes, the "beautiful vision" can be, and is, implemented. Even in tech; I'm sure you've heard of neighbors getting together and building their own local networks because the local ISP won't service them.
And credit unions, and mutual insurance companies, and on and on. I love this commenter who cannot imagine a customer that has an ownership stake in a business or why anyone would want that. No time to think, too much uninformed posting to be done:
We’re talking specifically about a SaaS app in this post. I’m well aware of this working at small local levels, and even mentioned in my comment that it might work in niche environments, but I have a hard time believing it would work for a disconnected SaaS app, where there isn’t some larger form of Community bonding people together.
Read the Docs is pretty much all open source, including the billing code (https://github.com/readthedocs/readthedocs.org/blob/main/rea...), but we are structured like a normal company, with some custom bylaws that protect the OSS codebase if ownership changes hands. We haven't found anyone else setting up a competing instance or anything, but that might also be because the product is kind of niche.
I kinda love the idea of having people in the community that use the service have some kind of ownership over the platform. It would likely lead to longer term loyalty of the userbase, which would help keep the project sustainable and avoid the enshittification cycle.
We've played around with sharing ad revenue that we generate on documentation pages split with the projects, which is partially a win/win way of sharing in the upside of success.
Anyway, I don't have a great answer here, but wanted to say hi, and give a bit of context from our place in the world.
PS: You might also talk with the folks here: https://zebrasunite.coop/ -- they are structured like a co-op and mostly come from the tech/design community.
wow GOOD JOB!!! Were they relatively decent about it, is that why? I feel like normal businesses that are not super shady should be able to accept this kind of conversation and deal with the mistake and issue they causes for you.
Good job perusing it tho, that's fantastic. (ps, big fan of your product, great work on that too!)
In the early 2000s I was working at a place that Google wanted to crawl so bad that they gave us a hotline number to crawl if their crawler was giving us problems.
We were told at that time that the "robots.txt" enforcement was the one thing they had that wasn't fully distributed, it's a devilishly difficult thing to implement.
It boggles my mind that people with the kind of budget that some of these people have are struggling to implement crawling right 20 years later tough. It's nice those folks got a rebate.
One of the problems why people are testy today is that you pay by the GB w/ cloud providers; about 10 years ago I kicked out the sinosphere crawlers like Baidu because they were generating like 40% of the traffic on my site crawling over and over again and not sending even a single referrer.
I have found google severely declining in engineering quality. On January 8th 2025, they stopped accepting JCB credit cards, and emailed customers that their payment info was invalid and would be suspended (search twitter for examples in japanese).
Seems it was a bug, without any explanation to customers receiving the notification, opening a ticket resulted in it being closed immediately while being lied to (my only guess is they wanted to increase their metrics). How was this not quality checked in the first place?
I guess google has the policy of recording the chat transcript (where lies are recorded), but it means nothing when the company doesn't care.
I don't like it, but aws seems the next logical place to move business to. As far as I can tell, the support there is real.
Serious question - if robots.txt are not being honored, is there a risk that there is a class action from tens of thousands of small sites against both the companies doing the crawling and individual directors/officers of these companies? Seems there would be some recourse if this is done at at large enough scale.
If I have a "no publicity" sign in my mailbox and you dump 500 lbs of flyers and magazines by my door every week for a month and cause me to lose money dealing with all the trash, I think I'd have a reasonable ground to sue even if there's no contract saying you need to respect my wish.
End of the day the claim is someone's action caused someone else undue financial burden in an way that is not easily prevented beforehand, so I wouldn't say it's a 100% clear case but I'm also not sure a judge wouldn't entertain it.
I didn't say no one could sue, anyone can sue anyone for anything if they have the time and the money. I said I didn't think someone could sue over non-compliance with robots.txt and win.
If it were possible, someone would have done it by now. It hasn't happened because robots.txt has absolutely no legal weight whatsoever. It's entirely voluntary, which means it's perfectly legal not to volunteer.
But if you or anyone else wants to waste their time tilting at legal windmills, have fun ¯\_(ツ)_/¯.
You don't even need to mention robots.txt, there's plenty of people that have been sued for crawling and had to stop it and pay damages, just lookup "crawling lawsuits".
Your verbs, “sue” and “win”, are separated by ~16 words of flowery language. It’s not surprising that people gave up partway through and reacted to just the first verb.
The „well everyone can sue anyone for anything“ is a non-serious gotcha answer anyways. If someone asks „can I sue XY because of YZ“, the always mean „and have a chance of winning“. Just suing someone without any chance of winning isn’t very interesting.
Hey man, I wanted to say good job on read the docs. I use it for my Python project and find it an absolute pleasure to use. Write my stuff in restructured text. Make lots of pretty diagrams (lol), slowly making my docs easier to use. Good stuff.
Edit 1: I'm surprised by the bandwidth costs. I use hetzner and OVH and the bandwidth is free. Though you manage the bare metal server yourself. Would readthedocs ever consider switching to self-managed hosting to save costs on cloud hosting?
This would be my elegant solution, something like an endless recursion with a gzip bomb at the end if I can identify your crawler and it’s that abusive. Would it be possible to feed an abusing crawler nothing but my own locally-hosted LLM gibberish?
But then again if you’re in the cloud egress bandwidth is going to cost for playing this game.
Better to just deny the OpenAI crawler and send them an invoice for the money and time they’ve wasted. Interesting form of data warfare against competitors and non competitors alike. The winner will have the longest runway
It wouldn’t even necessarily need to be a real GZip bomb. Just something containing a few hundred kb of seemingly new and unique text that’s highly compressible and keeps providing “links” to additional dynamically generated gibberish that can be crawled. The idea is to serve a vast amount of poisoned training data as cheaply as possible. Heck, maybe you could even make a plugin for NGINX to recognize abusive AI bots and do this. If enough people install it then you could provide some very strong disincentives.
An easy way is to give the model the URL of the page so it can value the content based on the reputation of the source, of course the model doesn't know future events, but gibberish is gibberish, and that's quite easy to filter, even without knowing the source.
Judging by how often these scrapers keep pulling the same pages over and over again I think they're just hoping that more data will magically come into existence if they check enough times. Like those vuln scanners which ping your server for Wordpress exploits constantly just in case your not-Wordpress site turned into a Wordpress site since they last looked 5 minutes ago.
While Phi is a good example of this technique, Phi as a model is very anemic. It was recently part of a CTF hosted by Microsoft, where other models were also included- I assume MS was looking to test performance of Phi against the competition... Phi performed the worst. Its outputs easier to predict, quicker to construct injection attacks and jailbreak. All models utilized the same defenses. As I have also trained and fine-tuned models using synthetic data, I have seen this approach increase determinism and increase predictability. Some might see this as a good thing- but I think it depends. On one hand this opens the model to several adversarial attacks such as jailbreaking, extraction, etc, on the other hand some consumers may prefer less random outputs.