More

grantseltzer · 2025-10-02T18:37:55 1759430275

'Watch' isn't the correct word.

UberFly · 2025-10-03T07:10:30 1759475430

Yea I agree. Pretty loose usage of the term. More like "Watch MLB stats change in a terminal window".

grantseltzer · 2025-09-28T13:19:26 1759065566

Alive is a very high bar considering the alternative.

grantseltzer · 2025-09-15T11:16:23 1757934983

bcc hasn't been relevant for years.

_bobm · 2025-09-15T13:00:50 1757941250

I have been a bit out of the loop. what is relevant these days for writing ebpf code? what about ebpf code in python?

grantseltzer · 2025-09-15T14:13:03 1757945583

Writing it in C, compiling with clang, and loading with either C(libbpf), Go (cilium/ebpf), or Rust (Aya).

You can also write bpf in rust with Aya but i'm not sure how feature complete it is.

For very simple use cases you can just bpftrace.

nickysielicki · 2025-09-15T13:28:07 1757942887

bpftrace is nicer to work with and can replace bcc in most cases for debugging.

grantseltzer · 2025-09-12T15:44:50 1757691890

> it's the resulting content that matters, not how it's presented

What a wild thing to say. If you had a coworker who was brilliant and taught you many great things, but only screamed instead of talking, would you feel the same way?

> If a person anthropomorphizes an LLM in their mind (rather than just in their speech patterns), then they probably have pre-existing mental problems.

Correct, and that's why these tools should be built responsibly under the assumption that people with mental problems are going to use them. It's clear in the article I linked (and my wording linking to it) that it can exacerbate issues for people. Chatgpt told him that he's sane and his mom was trying to kill him. He didn't understand what an LLM actually was.

grantseltzer · 2025-09-12T15:40:01 1757691601

I'm not claiming the purpose of this prompt is to get better information. Yes, it's just a prompt.

You're asserting quite a lot of bias when you say "What most people want are useful results." Maybe in our circles of software engineers or lawyers, but many people are using AI for companionship. Even if they're not seeking companionship, unless you have a very clear understanding of how LLMs work, it's very easy to get caught up thinking that the chatbot you're talking to is "thinking" or "feeling". I feel companies that offer chatbots should be more responsible with this as it can be very dangerous.

grantseltzer · on Dec 11, 2024

Excellent analysis and deduction!

grantseltzer · on Dec 30, 2023

Can someone with actual fundamental understanding of LLMs explain to me why they think it's perfectly legal to train models on copyrighted material? I don't know enough about this. Please don't answer by asking chatgpt.

Ukv · on Dec 30, 2023

Consider how commercial search engines are fine to show text snippets, thumbnails and site caches.

AI developers will most likely rely on a Fair Use defense. I think this has a reasonable chance of success since, while the use of a given copyrighted work may affect the market for that work (in this case NYT's article), it can be argued to be highly transformative usage. As in Campbell v. Acuff-Rose Music: "The more transformative the new work, the less will be the significance of other factors", defined as "whether the new work merely 'supersede[s] the objects' of the original creation [...] or instead adds something new".

There's also potential for an "implied license", as in Field v. Google Inc for rehosting a snapshot of a site, where "Google reasonably interpreted absence of meta-tags as permission to present 'Cached' links to the pages of Field's site". As far as I can tell in this case, NYT's robots.txt of the time was obeyed, which permitted automated processing of all but one specific article for some reason.

HumblyTossed · on Dec 30, 2023

> AI developers will most likely rely on a Fair Use defense.

Probably. The question for the courts to decide, then, is how much use is considered fair use.

jibe · on Dec 30, 2023

Why do you think it is legal to train students on copyrighted material? Copyright is supposed to protect from unauthorized reproduction, not unauthorized learning. That the NY Times is able to show some verbatim reproduction, it is a real legal issue, but that should not be extended to training generally.

arduanika · on Dec 30, 2023

Students are humans. LLMs are not. Machine "learning" is a metaphor, not what's actually happening. Stop anthropomorphizing, and show some loyalty to your species.

gaganyaan · on Dec 30, 2023

Bizarre loyalty argument aside, why is it not learning? Can you quantify that statement?

mhss · on Dec 30, 2023

The loyalty argument does sound somewhat bizarre, but I think the overarching point is about whether technology use benefits humans in society or not. We should not implicitly treat LLMs owned by corporations with the same rights as humans. LLMs without some form of legislation is looking like it will benefit corporations that are salivating at profits and the prospect of reducing or eliminating the number of creative workers they need.

arduanika · on Dec 31, 2023

Why would I want to quantify it? The burden of proof is on the thief.

I have a gadget that will, with some probability, steal your life's savings. It operates through a process that is analogous to a human chewing. When engineering it, we just say for simplicity that the gadget "chews". Of course, that's only a metaphor -- machines can't chew.

But (and here's where your argument gets ridiculous), unless you can quantify the fact that my gadget can't chew, then I will steal your savings. Good luck.

rickydroll · on Dec 31, 2023

> I have a gadget that will, with some probability, steal your life's savings.

I can think of 2 instances of that machine already. the finance industry fees and an ex-wife.

gaganyaan · on Dec 31, 2023

Why can't machines chew? That's an even weirder analogy, it would be quite easy to make a machine that chews exactly like humans do.

pointlessone · on Dec 30, 2023

I think your question is incorrect. It’s very likely no-one thinks it’s perfectly legal. There probably are many people who think it’s not a big deal, though. Try coming up with a dataset that doesn’t have any copyrighted material in them. Like seriously try. You can’t use pretty much anything newer than a century old. Everything is copyrighted by default. Very few new things are explicitly in public domain or licensed in a way that would allow usage. Now imagine LLMs trained on early 20th century newspapers, books and letters. Do you think it would be good at generating code or hip copy for homepage of your next startup?

everforward · on Dec 30, 2023

> Now imagine LLMs trained on early 20th century newspapers, books and letters. Do you think it would be good at generating code or hip copy for homepage of your next startup?

Not sure about the rest of the world, but at least for US content I don't think any company would publish that LLM.

That's like 40 years before the civil rights movement, and right about the time of the Tulsa massacre.

It's right around when women got the right to vote.

Trying to get it to not say anything horrible under modern standards seems fraught with issues. I don't know if it would even understand something like "don't be racist", given the context it was trained on.

pointlessone · on Dec 30, 2023

Exactly. Copyright terms are so long that most material with expired copyright is not useful for modern uses of LLMs and looking for modern non-copyrighted materials is too hard to do quickly and its usefulness is unclear. So people who grew up with Internet and are used to making memes with copyrighted material are not exactly averse to do it on a bigger scale.

tmikaeld · on Dec 30, 2023

> Try coming up with a dataset that doesn’t have any copyrighted material in them.

Isn't this what Mistral AI did?

pointlessone · on Dec 30, 2023

Did they? That'd be interesting to take a look at. Do they publish contents of their dataset?

tmikaeld · on Jan 1, 2024

The RAW Weights here: https://docs.mistral.ai/models/

londons_explore · on Dec 30, 2023

I think the main arguments are:

1. Training an LLM is akin to human learning. It is legal to read a textbook about music to learn music, and later to write a book about music which likely includes some of the concepts you earlier learned.

2. Neither the LLM nor the output text contain sufficient elements of the copyrighted work to qualify for copyright protection. Just like if you turned old library books into compost and sold the compost, you wouldn't expect to pay authors of those books a royalty for the compost sales.

madeofpalk · on Dec 30, 2023

> Training an LLM is akin to human learning. It is legal to read a textbook about music to learn music

If you learn a little too hard though, and reproduce the original textbook in it's entirety, you'll get in trouble.

My guess is that courts will determine that the training itself will not be found illegal, but either the AI companies, or the users, will be found liable for reproducing copywrighted work in output, and no one will want to hold liability for that.

HDThoreaun · on Dec 30, 2023

I feel like there’s no way 1 will fly. Very soon ai and humans will explicitly have to follow different laws because they operate very differently.

formercoder · on Dec 30, 2023

I, a human, can read a copyrighted work and then write a new work and own the copyright on that new work as long as it is not substantially the same.

foogazi · on Dec 30, 2023

What if you produce a substantially similar work ?

Who owns the copyright then ?

Vvector · on Dec 30, 2023

If the work goes beyond fair use, it is a copyright violation. It doesn't matter if it was created by a person or an AI.

Technology that makes copyright violations easier/quicker have typically been found legal if "the technology in question had significant non-infringing uses".

karmakaze · on Dec 30, 2023

This makes sense. It was allowed for the content to be read and used in certain ways (e.g. search engines or as references) without substantial reproduction. The NYT would then have to flag specific generated content as infringing a specific work which could then be judged as fair use or not on a case-by-case basis. If a particular site/company was repeatedly and/or primarily using substantial content then perhaps it could be 'delisted' as search engines do for links to pirated copies of works.

michaelmrose · on Dec 30, 2023

It really hinges on substantially similar. If I copy Harry Potter and change every instance of Harry Potter to Michael Rose surely it's infringing. If I write a coming of age story set in a magical land I'm probably OK. Which do you think LLM produce?

foogazi · on Dec 30, 2023

Whatever you ask and the model will be judged by its accuracy.

If you ask for Harry Potter and it gives you Bart Simpson it’s useless.

michaelmrose · on Dec 30, 2023

It's likely not possible of literally giving you Harry Potter. If you specify it narrowly enough that it qualifies as fan fic its probably exactly what you were going for. After all your word processor is capable of producing infringing works but is not itself an infringing work.

Manuel_D · on Dec 30, 2023

Fair use, probably. How many news pieces have you read that amount to, "The New York Times reports..." followed by a summary of the Times' article? It's not illegal to use copyrighted works at a source, as inspiration, or to guide style.

michaelmrose · on Dec 30, 2023

Surely. Remember when the VCR came out and some parties absolutely freaked out and Jack Valenti said

"I say to you that the VCR is to the American film producer and the American public as the Boston strangler is to the woman home alone."

Then we invented from whole cloth reasons why they were perfectly OK because there was a ton of money to be made and everyone would actually be better off if the VCR was a thing and everyone knew it because it ended up argued after millions of VCRs were already in households.

Vvector · on Dec 30, 2023

I was thinking of the VCR as well. SCOTUS ruled that "the technology in question had significant non-infringing uses" making VCRs legal.

endisneigh · on Dec 30, 2023

It’s currently not explicitly legal or not. There are lawsuits in progress to determine that very answer.

MPSimmons · on Dec 30, 2023

Read about the 'fair use' doctrine and put yourself in the shoes of someone who is training a model, and see if you can argue, from their perspective, why it should be allowed.

FpUser · on Dec 30, 2023

We all "train" ourselves on copyrighted materials and later use / or not gained knowledge for our own benefit be it financial or pleasure.

They're just on a hunt for some extra money.

arduanika · on Dec 30, 2023

Humans aren't computers. Come on, people.

mmh0000 · on Dec 30, 2023

I'm essentially a meat-based LLM and I'm trained almost exclusively on copyrighted material (most of which I pirated).

grantseltzer · on Jan 26, 2022

Now if only the rest of facebook would die

grantseltzer · on Sept 9, 2021

Contrary to popular belief, you can get a quality education at programs that don't put you $200,000 in debt.

leetrout · on Sept 9, 2021

I hold a degree from a small, regional university and I am better for it IMO. That wasn't my point... I'm more curious where and what you can actually study.

volkk · on Sept 9, 2021

I think by low cost--OP meant unaccredited. Community colleges can certainly be amazing for the money

wombat-man · on Sept 9, 2021

aren't a lot of unaccredited places like, for profit institutions generally lower quality and more expensive? They're trying to squeeze you for student loan bucks.

CCs or state colleges would be great imo.

grantseltzer · on March 16, 2021

If you want an actual project for this I use this one: https://github.com/knqyf263/pet