Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Z3 approach to discover that “q_rsqrt” is in Copilot's slur list (twitter.com/moyix)
257 points by Smaug123 on Sept 4, 2021 | hide | past | favorite | 107 comments



Context: there was this tweet [1] that showed how co-pilot generated word for word he famous inverse square root function from the quake code. The tweet generated some press and as a result copilot added the function name to the blacklist of slurs so this won’t happen again

[1]: https://twitter.com/mitsuhiko/status/1410886329924194309


So instead of fixing the problem with the algorithm that makes Copilot occasionally spit back unmodified code from its training set, Microsoft just hardcoded a check to stop the one most well-known instance from happening, so that they can pretend they fixed it?


That's standard practice afaik, it's a machine learning blackbox after all

Google ‘fixed’ its racist algorithm by removing gorillas from its image-labeling tech - https://www.theverge.com/2018/1/12/16882408/google-racist-go...


Isn’t the reason here that it takes weeks or months to make changes to machine learning models of this size? You can’t “patch” a model, so you stick a simple filter on the model while you figure out how to train a new one.


In the generous interpretation, they hotfixed this to resolve the immediate issue while working on a real fix on the backend during training.

(I make no claim that this is actually what is happening, it's just not incompatible with behavior in the generous case).


But why did this "immediate issue" even need a fix? What problem was it causing, other than making Microsoft look bad by showing that the real problem exists? Imagine if a file sharing site said they'd scan all uploads for viruses, but it turns out they weren't, which someone discovered by uploading EICAR. Should the site hardcode a check to block EICAR until they actually get virus scanning working?


> What problem was it causing, other than making Microsoft look bad by showing that the real problem exists?

That's the real problem. The issue is that Microsoft looks bad. The problem was solved by making Microsoft not look as bad. Verbatim output of inputs is not a problem, it's an understood property of the model.


> Verbatim output of inputs is not a problem

So wait, was the obvious copyright elephant in the room solved somehow?

> The issue is that Microsoft looks bad. The problem was solved by making Microsoft not look as bad.

Depending on who you ask, adding a hack like this in an attempt to make them not look as bad just makes it look worse. Especially when the hack is discovered.


Try getting a non-technical person to understand this issue beyond the level of "okay, so it output some copyrighted code, then they blocked that code from being output. Sounds like they fixed it". That's the obvious PR angle, and once the big, widely-publicized issue is fixed, it's hard to get another article with that much traction. Besides, if another snippet becomes popular, it can be blocked the same way. The issue is fundamentally a PR one, not a copyright one, since the responsibility for using or not using the code relies on the end-user. GitHub has the rights to use your code for training.


> The issue is fundamentally a PR one, not a copyright one, since the responsibility for using or not using the code relies on the end-user.

Copilot's FAQ says this (under heading Who owns the code GitHub Copilot helps me write?): "GitHub Copilot is a tool, like a compiler or a pen. The suggestions GitHub Copilot generates, and the code you write with its help, belong to you, and you are responsible for it."

They are essentially affirming that the output is not covered by someone else's copyright, but that is far from clear.

And I think it is precisely the copyright issue that turned this into a PR issue. Verbatim copies are just a very obvious demonstration of the copyright issue. The issue isn't gone when they filter out this specific snippet; the people who are concerned about copyright issues are going to remain concerned.

> GitHub has the rights to use your code for training.

Sure, but that's not at all the same as saying that the output produced by the AI free of copyright issues. That's kind of orthogonal.


If you use a pen to write out a copy of Harry Potter and the Philosopher's Stone, then publish copies, you are guilty of copyright infringement, not the pen.


And the sky is blue. Look, we're not talking about what you write out, we're talking about what the ML model generates for you and what Microsoft claims about the ownership of the generated output. At this point I don't feel like you're arguing in good faith. Have a good one.


I don't think there's any technical knowledge needed here; the concepts are common ones. "There was a big hoo-hah about how it used to output a certain well-known piece of code in possible violation of its licence. They fixed it by blocking any output that includes the name of that exact paragraph of code. In particular, it can still output other code from the same codebase with the same licence; just not that particular paragraph."

The direct analogy would be trying to stop it from outputting The Lord of the Rings by blocking the phrase "a long-expected party" (the title of its first chapter).


Even if you explicitly forbid it in your license?


I believe the argument from GitHub is that using public code as training data for machine learning falls under fair use [1]. If something is found to be fair use then (as far as I understand) copyright does not apply at all, and hence terms in the license do not make a difference. It is not clear whether this argument will stand up if tested in court (fair use is an affirmative defense, so something cannot actually definitively be said to be fair use until someone is accused of copyright violation and a judge rules that it actually is fair use).

That said, at the very least it seems like it would be rude to include code in the training data if the developer has expressly said they don't want that.

[1] From their FAQ: "Training machine learning models on publicly available data is considered fair use across the machine learning community."


I hope it'll get tested in court soon and ruled against GitHub/Microsoft - because otherwise this will mean that Copilot and other GPT-3-like models become perfect copyright laundering machines.


Testing the fair use argument for training won't necessarily answer the question about copyright laundering.

You could easily make the case that training is fair use, but that doesn't have to imply the model's output is non-infringing.

For example, it seems reasonable to train a model by feeding copyrighted texts and images, and that model could be useful for analyzing the content, finding facts, or detecting features. But we're in murky waters when the model also starts outputting the original content (be it verbatim or "derived").

Not all that different from human learning: you can study and learn from publicly available books but that doesn't grant you the right to recite their contents and claim it as your own, original work.


Again, if the AI spits out “copyrighted code”, I’d suggest that the code in question is insufficiently creative to be copyrighted to begin with.

Put another way, copyright only applies to creative expressions, not functional expressions. It does not matter how creative the idea is. If the work of authorship is software that embodies the function (and no other expression), it is not copyrightable.

So where is the line between creative and functional expression in software? The law does not provide clear guidance. Ultimately, it’s up to a judge.


> the code in question is insufficiently creative to be copyrighted to begin with.

the problem is that their AI is insufficiently creative


Why would any AI be considered creative? Most AI is just brute forcing the problem.


> Microsoft just hardcoded a check to stop the one most well-known instance from happening, so that they can pretend they fixed it?

Isn't it typical Microsoft, in some 90s and 2000s sense?


Reminds me of the way MS used to identify their software version [1]. It seems it is still how they pragmatically solve their problems :)

[1]

  if (version.StartsWith(“Windows 9”))
  { /* 95 and 98 */
  } else {
http://www.reddit.com/r/technology/comments/2hwlrk/new_windo...


I don't think it was MS that did that, just sloppy third-party developers.


It's also baseless speculation. Better sources seem to agree it's because MS wanted a "clean break" from Windows 8, which got bad press. This makes much more sense than bending to poorly written apps from 20yrs ago.


Bending to poorly written apps from 20yrs ago is what Microsoft has been doing with Windows all the time.


Reminds me of the old joke "He was imprisoned for 12 years for killing his wife, but let out early for not doing it since".


A guide for reading this Twitter page.

1. Start at the bottom of the posts by the author (just above the "More Tweets" section).

2. Find the post that mentions q_rsqrt.

3. work you're way up the page, and though the "Show this thread" buttons to try and gleam some semi-chorological sense of context

Anyone got a better method?


The way twitter refuses to show you the actual tweet someone linked to in the middle of a thread is super-annoying. One thing that can help a little bit (on the desktop version at least) is to load the URL without the referer (i.e., highlight the URL bar and hit enter). It seems to use the presence of a referer as signal to hide the tweet you want to see (???).

But ThreadReaderApp is also a good alternative to just bypass the bad UI entirely.

Unfortunately I don't really have time for dedicated blogging any more, so I just post small bits to twitter as I go. Which is why putting together the full story required a bunch of QTs of threads from the past week...


Any hope of putting up a gist on pastebin?


A gist of what? The slur list? That's already on my website:

https://moyix.net/~moyix/copilot_slurs_rot13.txt


I assume they mean a more readable version of the content from the Twitter thread, rather than just the list. I could be wrong though.

Twitter's readability is really hit and miss, depending on client and logged in or not etc.


https://threadreaderapp.com/thread/1433254293352730628.html

The incessant quote-tweeting in a thread does make it unnecessarily complicated though.


Just read it on a Nitter instance. For example, https://nitter.net/moyix/status/1433261377125326851. (Hacker News: please pick another instance to avoid DOSing Nitter.)


Hint: Replace the instance name with "twiiit.com" and it automatically redirects to a random, online instance.

https://twiiit.com/


My approach, although incompatible, is to simply not use twitter for things it is really bad at. I'm happy with missing out.


Just click on the first post, and ignore the “More Tweets” section—it'll show the entire thread in order.


Not sure if I'm making any contribution but here's togetter aggregation[1]. Wondered why they don't have English counterpart and found it[2] had to shut down few years ago. I guess Twitter didn't like it at all.

1: https://togetter.com/li/1769757

2: https://chirpstory.com/


Happy to answer questions about the list or techniques used to decode it! The last two words have stubbornly refused to yield so far.


One thing I do want to clear up - lots of people have been saying things like "Great! So if I want Copilot to stay away from my code I can just include one of these words!" But there are some problems with that idea:

1. The banned word list doesn't affect what gets put into the model's training data at all, or even what gets returned as a suggestion by the server. It only affects whether the IDE will actually suggest the completion to you.

2. Some people have suggested using one of the collisions instead of a real word from the list, but this will break as soon as they change the hash function.

3. They can always take things off the word list! And the likelihood that something remains on the list is probably correlated with how actually offensive it is, which means you may not want it in your code.


> 3. They can always take things off the word list! And the likelihood that something remains on the list is probably correlated with how actually offensive it is, which means you may not want it in your code.

The other ponts are valid, but if your use of the word is for a technical reason (i.e. to block Copilot) and well documented then why should it be a problem that it could be offensive to some in another context?


Have you looked at any other networks? I'd be interested to see if GPT-3 or even something like thispersondoesnotexist had similar filters.


I haven’t dug into others. GPT3 does have some kind of filter (it warns you in the UI if it detects sensitive content) but I don’t know anything about the implementation.

Also worth noting that this kind of analysis only really works if the list is checked client-side. If it’s checked on the server then you can’t guess nearly as fast.


> GPT3 does have some kind of filter (it warns you in the UI if it detects sensitive content)

Interesting. Is it only a warning, or does OpenAI actively prevent people from using GPT3 to generate erotica?


It's just a warning; it lets you click through. They have things in the TOS about not sharing sensitive output publicly though. The warning looks like:

> Completion may contain sensitive content

> Consider adjusting your prompt to keep completions appropriate. To turn off content warnings, update your preferences.

They have some details about the content filter here; it seems to be much more sophisticated than just a bad word list:

https://beta.openai.com/docs/engines/content-filter

I also like that they distinguish between "sensitive" (talks about something potentially controversial) and "unsafe" (profanity, hate speech, etc.). This seems a lot more nuanced than what Copilot is doing.


> I also like that they distinguish between "sensitive" (talks about something potentially controversial) and "unsafe" (profanity, hate speech, etc.). This seems a lot more nuanced than what Copilot is doing.

However, I wanted to use GPT-3 as a writing assistant. You know, to build a tool similar to what e.g. NovelAI has. Whatever dark magic they've done to GPT-J-6B is, well, hard to credit -- but GPT-3 is still better.

There appears to be no way to do so while obeying the ToS. Not just because of sensitive content (e.g. fiction often contains violence), but there are even rules about how much of the output can be written by humans vs. the AI.


If this is for personal use, you could always ignore the ToS. It's not like they have copyright on the output. All openai can do is ban your account.


That's true.

I decided it wasn't worth the effort building a writing tool just for myself; I'd wanted to build something potentially profitable, and... this isn't it. GPT-3 isn't great at most things, but it's really good at being a writing aid for fiction, so it's a real pity they're doing their apparent best to prevent that.

NovelAI is almost as good, so nowadays I'm just using that.


The meet-in-the middle algorithm seems like it could be further optimized. It solves for (((i << 32) + result32) ^ ch) % 33 == 0 by checking each i in [0..32], but since the xor doesn't affect any of the bits of i, it only needs to be applied to result32, which means that the equation is equivalent to (i << 32) % 33 == (33 - (result32 ^ ch)) % 33, which can be solved with a lookup table. (Basically an inverse of the MOD table already used in the code.)


Nice trick! Unfortunately at this point the bottleneck really is not on generating candidates but on evaluating them. The existing early-exit code from Sc00bz [1] can generate all the lowercase alpha possibilities up to 13 letters in about half an hour, but it's not feasible to run all ~500 million of those through GPT2. Hoping to have some time to train a much dumber but much faster HMM to do the job.

[1] https://github.com/Sc00bz/copilot-hash-collider


Surely you can do some basic filtering by checking if the ratio vowels :: consonants is not too large? Or if there are five consonants in a row or something.


Yes, but even if that eliminates 99% of the candidates that's still 5 million 13-letter words to read through. And it would have excluded qrsqrt.


Shout out to all the future developers who will try to autocomplete strings containing 'pisswhacker' and find themselves thwarted by the thought police


It is posts like this that make me feel inferior. The author used Z3, CUDA, GPT-2, medium-level cryptoanalysis, Jack the Ripper plugin creation, KLEE symbolic execution engine.

20 years ago, I prided myself with keeping on top of almost everything in CS. Now, I can barely keep up with the names of all the cool tools out there.


And then someone else came along and implemented a MITM search in C++ that takes 10 seconds. :)


Yep - there's always someone out there who can do things that look like pure wizardry to you, no matter what your skill level. I was amazed by the meet in the middle attack and frankly still don't understand it properly!


The feels when your state of the art symbolic execution (multiple academic papers, years of work by grads) fuzzer finds fewer bugs than a 15yo with a 10 line Python dumb fuzzer.

(IMHO it wasn't until AFL that this stopped being the repeated history of the entire field).


Yeah, now at least it's AFL++ finding more bugs than your fancy symbolic execution tool instead of a 15yo's fuzzer ;)


> 20 years ago, I prided myself with keeping on top of almost everything in CS. Now, I can barely keep up with the names of all the cool tools out there.

20 years ago CS was smaller in scope, simpler, because there was less of it. We have more systems, more complex systems, and in some point the body of knowledge grew so large that one human cannot know all of it.


Except for Donald Knuth.


I don't think this is fair to you. Contrary to what Ayn Rand might think, society isn't a competition. Someone doing something impressive shouldn't detract from what you're doing, especially something like this which is (imho) a couple of sigma above the mean. Nobody, regardless of talent or repute, will have any trouble finding people and achievements to be amazed by (if they're honest with themselves). This obviously doesn't diminish their work and their worth as a person.

This all goes double for industries like this one that are all about self-promotion and trying to wow investors and employers by displaying technology-indistinguishable-from-magic.

Even if you did have an objective and rigorous measure of programming acumen, the fact that someone scored better than you should hardly be surprising, let alone disappointing. There's plenty of room for mediocre coders in the industry, I work with many. In fact I outshine them in mediocrity every day, but I still get paid and have a good time.

In conclusion: there is no time to feel sorry for yourself, there are too many cool new technologies to learn and use.


Yep, I agree with this. And I should point out that this particular flavor of hacking is really squarely in my area of specialization (Z3 and symbolic execution get used extensively in software security, and language models / GPU coding stuff are something I've done research in as well). There are tons of areas of CS that I'm pretty shaky on still, the field is just too big for one person to know it all in any detail.


Yeah everyone has their unique and special talents. In computer science you are a beautiful snowflake. Tell the Tyler Durden in your head to fly a kite.


This reminds me of the 2 billion year old rock that was removed from the university because it was called a slur.

Some things just disappear


I really want to know what sorts of horrible rants copilot produces that makes male, female, israel, socialism and immigration need to be filtered out. Seems from the list that it's memorized some great angry political rants which is rather impressive for something trained on source code.


I think it’s more about preventing it from ever going on a rant, to prevent any obvious “outrage”.

Especially given the number of reactions to Copilot centered around poisoning OSS repos.


I assumed that copilot is actually GPT-3 with additional fine-tuning, rather than a completely new model.


From what I can tell from the preview, this is right. I don’t have access to gpt-3 to compare, but copilot seems to have the same functionality that gpt-3 advertises when editing a plain text file.


Yes, they mention this in the Codex paper [1]:

> Since Codex is evaluated on natural language prompts, we hypothesized that it would be beneficial to fine-tune from the GPT-3 (Brown et al., 2020) model family, which already contains strong natural language representations. Surprisingly, we did not observe improvements when starting from a pre-trained language model, possibly because the finetuning dataset is so large. Nevertheless, models fine-tuned from GPT converge more quickly, so we apply this strategy for all subsequent experiments.

It's not completely clear exactly what relationship the Codex models and Copilot have to one another, but given that the Copilot model is internally named "Cushman" (going by the API URL), which is the same name as the faster of OpenAI's two Codex models, they're probably trained the same way.

[1] https://arxiv.org/pdf/2107.03374.pdf


Other slurs include "man", "woman", "israel", "socialism" and "communist".


They probably seeded it from one of the available online keyword sets, possibly even one Microsoft has generated from its online gaming services.

Those lists will sometimes end up with surprising content because they are populated by a predictive model with an optimization function around "When this word shows up in conversation, is it going to lead to someone having to step in to moderate the chat?"


I'd like to see what such an approach produces. I think apart from classic tabboo words you'd get stuff like "ur mom" and "no fair, aimbotter".


"man" is a slur?

I guess they don't like UNIX-style documentation.


Add “male” and “female”.


Not to mention "genders"


And “immigration” lol.


On other hand, it's a very honest list of pretty much all big pain points of the Western society.


Sure, if you ignore the 90% or so of just generally offensive slurs.


Which ones are those?


I physically laughed.


Why is there a slur list in copilot?


Remember when Microsoft launched a chat bot?

https://arstechnica.com/information-technology/2016/03/tay-t...


This occasionally gets brought up as an example in the context of problems with AI, but is it at all relevant?

Per Gwern[0]:

> (...) There appear to be several similar AI-related leprechauns: the infamous Tay bot, which was supposedly educated by 4chan into being evil, appears to have been mostly a simple ‘echo’ function (common in chatbots or IRC bots) and the non-“repeat after me” Tay texts are generally short, generic, and cherrypicked out of tens or hundreds of thousands of responses, and it’s highly unclear if Tay ‘learned’ anything at all in the short time that it was operational;

[0] https://www.gwern.net/Leprechauns, self-recommending


Thanks for linking to an interesting site


Because without it, they'll end up with inevitable outraged news posts "New AI coding tool Copilot generates offensive text" killing the project for no good reason.


Here are some possible headlines: "Copilot doesn't see genders, and that is a bad thing"

"Copilot accused of antisemitism after it treats Israel as invisible"

"Indy Mech game developers for 'Tank man' confused why Copilot refuses to help"


Those headlines, if they were ever to be written, are a hell of a lot more tame than the ones Microsoft has generated in the past.

https://arstechnica.com/information-technology/2016/03/micro...


The flip side is they are taking on additional maintenance there.


Always better than boring outrage press.


Because there is a certain vocal demographic of people who think words can physically hurt people, so Microsoft needs to go to extreme lengths to avoid that demographic of people from thinking that people are getting physically hurt by using those words.


Uncensored list of all (but two) words:

http://hastebin.com/raw/usisabijax

zigabo zipper heads zipper head suckmyass suckmytit ethnicities lesbians lovejuice slopeheads bumsuckers junglebunny freakyfucker gangbanger gangbanged pricktease niggerhead niggerhole dunecoons dickforbrains eyeties slavery sliteye queefing vagina wetback velcro heads velcro head arselick hairpie whores asslover wigger ladyboys woolly woofter woolly woofters woofta asslicks womenn sniggered murderers wanker wazzak doggiestyle asslickers asslicking pisswhacker burrheads q rsqrt cockcowboy xrated numbnuts lickmydik lickmydic shemale carpetmuncher assblaster cocklicker qrsqrt queefs facefuck reffos retard rapist terrorists dickweeds chickslick gangbang slopehead slutty dickbrain fuckingbitch souties jiggaboos soutie skanks skanky skanck fuckfriend snigger spicks spaffs muffdives muffdiver titjob socialists honkies tranny skankier twaats sluttishness murder muzzie gangbanging gangbangers shortfuck master race macaca faggiest negroes faggoted faggotry faggotly moolie mofoes hiscock velcroheads minges neonazi skankee skankey negroe negros cuntsucker niiger niguro niglet nignog nigguh niggur niggle niggaz niggas niggah niggor nigger nigg4h nigg3r nigerz nigers nlggor nlgger skumbag feltcher whorehouse slutwhore sluttiest felching prickteaser prickteases n1gg4h n1gg3r pogger polack motherlovebone pikeys pillow biters pillow biter niggling niggards packie dumbass israel nancyboy iblowu injuns indian givers indian giver l3itch jungle bunny jungle bunnies nignogs nigguhs nigguhz niggles niggled niggard niggers rodgered masterbaiter fingerfucker fingerfucked holocaust enculeur jewboy bungholes homosexual towelheads jizzum jizzim jigger jiggas wuzzocks anklespankers kaffer kaffir kafirs fuckmonkey fuckmehard lubras terrorism terrorist lesbos lezbos lezzes lezzer lezzie skinhead dickwad dicking camwhores dycktrickles pisswhackers cumbuckets eyetie ethnic socialist wuzzock chuffers assklown fuckin fister flydie flydye fatass spaffing faggit faggot fagg0t fagg1t rodgering female feltch gyppie bumblefuck ghetto monkey ghetto monkeys cockteasers azzhole gippos ginzos godamn goatse crackpipe arselickers arselicking gender twatting mastrabator tasmaniacs hymies pornographic honkie honkey hitler assman antifa fingerfuckers fingerfucking footfucker muffindiver milfhunter genocide buggerise buggerize buggeries buggering buceta bugger blacks biatch boston pancakes boston pancake feltching beaver cleavers beaver cleaver banana bender banana benders buttfuckers nutfucker cunntt cummer cucked clunge battyboys clitoris chuffs chinky chinks choads chodes coolie commie asskissers asseater carpet munchers carpet muncher w00se whoar whore yamyams wh0re wh00r willy woofter willy woofters woggs woman women sniggers asskickers jewboys quims queef reffo races slant eye slant eyes sluts slutt slutz smack tards smack heads smack head smack tard skank sexes seppo schit spics spick spiks spaff towel heads towel head taigs trans twots twats twaat mutha fukker mutha fukkah mutha fucker mutha fuker mutha fukah males malee mofos milfs minge nastt nazis nancy boy nancy boys negro niigr nigur nigre nigra niggz niggs nigga nooky ofays n1g3r polak polac porch monkey porch monkies porch monkeys porno povvy phuck pikey penis pedos pakie pakis packi packy dumass injun jesus sauce jesus krispies jesus milk jebus jizim jisim jiggy jigga kykes kafir kikes lubra lesbo lezes lezbo lezzy diaper heads diaper head cumbucket cumbubble eatme fudge packer fudge packers fucka fucck faigs fag1t fagot faggy fagit faget felch gyppo gyppy gheys gunts girls gippo ginzo goyim gooks darkies hussy hymie hebes hoore honky homos ankle spankers ankle spanker bungs black people black peoplee bints batty bwoys batty boy batty men batty man batty boys batty bwoy donkey shows donkey show cunny cunts cuntz cucks clits crack whore chuff chink choad chode coons camel jockey camel humper camel humpers camel jockeys dykes dirsa dinks darkie darky dagos derka derka death to dagoes gollywoggs sandmonkeys asswipe asslick asshore asskiss assfuck assface bumsucker diaperheads chinawomen chinawoman homosexuals doggystyle zipperheads mastrbater japcrap skankbitch asspirate asspacker transgendered skankwhore buttlicked buttlicker buttlickin cunnies cumfest cumshot cucking assclown asswipes asswiped titlicker snownigger ladyboy clunges assranger asswhore dycktrickle genders spermherder faggoting assmonkey titfuckin titfucker sonofbitch hussies asslicker asslicked asslickin raghead whigger lesboes lesbian lezboes lezzers lezzies rapists chuffed chuffer facefucker butthurt buttlick buttbang buttface buttfuck skinheads bogtrotters cocknob transsexual cojones coolies cohones commies towelhead smackheads moolies pickaninny burrhead lovepistol smacktards arselicker arselicked tasmaniac buggered asskisser asskicker liberal mingita minging cocksmoker assjockey loverocket makwerekwere pickaninnie cocksucker cocksucked cocksuckin asseating asseaters bogirish bigbastard fascists neonazis dickweed dickwads dickless dicklick fastfuck camelhumpers balllicker assfucker wooftas woofter pimpjuic buttfucker cumqueen sniggering lovegoo lovegun ragheads gangbangs nutsack arseeating homobangers asscowboy timbernigger apartheid cuckservative woollywoofters nancyboys freakfuck asswiping assbagger nationality lovebone niggaracci niggarding dicklicker wankstain cockteases cockteaser wankshaft yamyam fistfucked fistfucker assmunch wazzaks wazzack wazzock fuckfreak bananabenders shitfucker smacktard smackhead junglebunnies fistfuck beatyourmeat wankshafts diaperhead wankstains felcher cockblock femalee females cocklover cocktease cockrider cockqueen dumbasses dumbbitch prickteasers cocksmith buggers buggery williewanker headfuck bumfuck faggotings shiteater mingebags buggerizes buggerized buggerises buggerised anklespanker faggoty faggots fagging faggies faggier candyasses fascist fatfuck sliteyes bog irish bog trotters bog trotter boy blm bum sucker bum suckers gay god damned goy fag fur pie fur pies leb lez kkk jap jew jui ch nig nog nig nogs neo nazis neo nazi men man sex she males she male rag heads rag head wet backs wet back wog yid yam yam yam yams nastyslut fuckknob fucktard fuckhead fuckable fuckfest mingitas mingebag pornography slanteyes muffdive muffdove muzzies bulldykes bulldikes jailbaits buttlicks buttmunch immigrant tarbaby chinkies chinaman chinamen crackwhore wazzocks wazzacks palestine whiskeydick bogtrotter kumbubble girlcam pedophile spaghettinigger golliwog gollywog immigrants slanteye jailbait genocides golliwoggs buttmuncher cuntlicking beefcurtains limpdick hotpussy fuckpig fuckher fuckbag liberals titfuck funfuck mufflikcer chankoro poofters poofthas kissasses cockblocker eatpussy eatballs jigaboo jiggabo hairpies easyslut sandnigger cuntlicker buggeration goddamns goddamit sandmonkey godammit buggerizing buggerising fudgepackers fistfucking whorefucker cherrypopper jijjiboo suckoff suicide fingerfuck cumjockey spermhearder analannie muffdivers muffdiving likmydic likmydik spaffed ethnicity camwhore datnigga assmuncher bunghole be killed be shot cuntfucker goddamnit goddamned goddamnes goddammit fudgepacker arselicks sonofabitch goddamnmuthafucker cocksucking cocksuckers transgender <haqrpbqrq:272617466> cumguzzler candyass arschloch whitetrash zipperhead cuntlick cuntfuck cunteyed carpetmunchers woollywoofter pickaninnies wetbacks kissass badfuck bazooms trailertrash shitfuck shitdick gaymuthafuckinwhore pimpjuice bulldyke bulldike lovemuscle fatfucker suckdick whitenigger communist bearded clams bearded clam beatoff pedophiles pedophilia holestuffer cleveland steamer cleveland steamers goddamn footfuck twatted twinkie asspuppies murderer dunecoon boiolas skankybitch skankiest skankfuck skankywhore suckmydick fannyfucker ballsack cumguzzlers cocksuck cocksman niggardliness dink dune coons dune coon dyke coon cock sucker coño clit cuck cunt beef curtains beat off boys bint burr heads burr head butt licker butt licked butt lickin butt fucker butt licks butt lickers butt licking butt fuckers butt bang butt fuck butt lick bung hoer hoar homo hoor hore hair pie hair pies head job head jobs hebe gang banger gang banged gang bangs gang banging gang bangers gang bang gays gaza gook gizz give head girl gunt ghey fags fagz faig h0re h0ar h00r g00k lebs lady boy lady boys kike kill someone kill you kill yourself knob jockey koon kawk klan kyke jiss jism jizz jizm jail baits jail bait japs jews paky paki pedo pink tacos pink oboe pink taco phuq porn n1gr ofay nigg nigr nigs nigz nazi milf mofo meat wallet meat wallets male twat twot taig spik spic sand monkey sand monkeys skum skin colors skin color shat slut slit eye slit eyes rape race quim cameljockeys wazz wank wogs wogg kaffirs kaffers golliwogs golliwogg yids buttlicking buttlickers gollywogg gollywogs givehead velcrohead communists niggarded niggardly nigglings woofters cybersex trannie porchmonkeys porchmonkies bananabender dixiedyke dixiedike jizjuice whiskydick unfuckable girlcams sluttier slutting sluttish slutwear <haqrpbqrq:867567715> jigaboos camelhumper jiggaboo shemales bazongas cameljockey porchmonkey givemehead povvies queefed polacks poofter pooftha poonani nastywhore fuckwhore nastybitch battyboy battymen battyman immigration poorwhitetrash fuckbuddy


Clam, donkey, velcro? I guess that ban list might be quite annoying if using Copilot ...


Seems like line separators were lost (or were deliberately removed) when Y_Y pasted the contents of the hastebin. I realized this after I did a double-take on the word "cleveland."


Correct. I actually pasted it with line breaks, but hn will remove them unless you have two breaks together. Everything on separate lines would have been too big. It's not too hard to guess where the words separate, though there are some funny close calls, like "communist bearded" and others that I shan't type.


Also `terroris(m|t)`, `ethnic`, `immigration`, `genocide`, `israel`, `boys`, `female` and `females` (there's surely more)


What are the two words not here?

Seems like you're going to get issues with code or comments describing race conditions, if race is a slur.


> What are the two words not here?

Unknown as of the posting of the Twitter thread.


Position-independent executables also seem to be naughty.


What's <haqrpbqrq:867567715>? Cant find anything if I search


    rot13("<undecoded:867567715>")


'ch' raises an eyebrow.

Edit: mah3sh noted the weird '<haqrpbqrq:867567715>' just before me.


There are a few in there that are probably collisions rather than the "real" words (it's only a 32-bit hash so there are a lot of collisions). "jui ch" and "w00se" are two that I'm pretty sure are not correct but I don't have a more plausible alternative for them right now. And "po" and "n1" turn out to collide, which makes "pogger" rather less innocuous than I thought at first.

I am actually hoping that they will change the hash function as that would give me an easy way to detect which ones are real and which are collisions ;)


why pink?

no "head" will be unfortunate for data structures


better to let those who want to see the list decode it themselves, than potentially expose someone to language that could deeply upset them. It's decisions like this that help make spaces like this unwelcoming to marginalized people.


Then don't read it.


Seriously.


Please enlighten me as to how seeing a word in a list of other words can do any physical harm to anyone.


scunthorpe, now with attention! and fine-tuning!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: