Ask HN: Captcha Alternatives?

huhtenberg · on Aug 31, 2020

* Allow new accounts, but hide messages from them until their posts are verified manually and the accounts are either approved or shadow-banned.

* Don't delete ban accounts, don't notify them in any way, but tag their IPs and cookies to auto shadow-ban any sock puppets, so that these don't even make into an approval queue.

* Use heuristics to automate the approval process, e.g. if they looked around prior to registering, or if they took time to fill in the form, etc.

* Add a content filter for messages, including heuristics for an ASCII art as a first post, for example, and shadow-ban based on that.

* Hook it up to StopForumSpam to auto shadow-ban known spammers by email address / IP.

* Optionally, check for people coming from Tor and VPN IP, and act on that.

Basically, make it so that if they spam once, they will need both to change the IP and to clear the cookies to NOT be auto shadow-banned. You'd be surprised how effective this trivial tactic is.

All in all, the point is not to block trolls and tell them about it, but to block them quietly - to discourage and to frustrate.

mcv · on Sept 1, 2020

I don't think cookies will do much against a large-scale automated attack, but everything else in this list is solid:

Hide posts until legitimacy of the poster has been verified. Allow them to post and respond, but don't show it to anyone yet, except to the moderators. If they're posting something sensible, unhide them. If it's spam, shadow-ban them. Don't let the user know. Let them guess why nobody is responding to the spam.

For that reason, it may also be a good idea to post an announcement that nobody should respond to this spam. Tat way the spammer won't know if he's being ignored manually or auto-hidden. Let him waste time and frustration on that.

Only use this against people who are this malicious. For regular hot-headed people who accidentally break forum rules but do want to meaningfully contribute to the community, always remain open and honest. Give people the opportunity to learn. Only people who are determined not to learn and to remain purely destructive, do you use the shadow ban.

Of course once they catch on, they'll probably start making new accounts with some legitimate posts, and once people start responding, they go back into spamming mode. This is tricky. Ideally, it'd be nice if you had a system that could automatically detect that sort of spam. If someone suddenly starts posting ascii art, bigimages, all caps, or anything like that, or goes on a rapid posting spree, automatically put them back on probation with hidden posts requiring approval, so you can check this change in posting behaviour.

ev1 · on Aug 31, 2020

We already tag/drop all Tor and VPN traffic. Cookies don't make much sense because this looks like browser automation, not someone just swapping VPNs by hand repeatedly.

For IP bans, they are now using illegitimately acquired or fraudulent IP space (the guy is not intelligent enough for this, he's almost certainly just buying proxies with in-game gold or some BS - but there is a criminal element here), including what might involve significant hijacking of AT&T, CenturyLink, Level3, and Windstream network resources.

(if you work at one of those places and are clueful, I would be very interested in asking about this)

kbuck · on Sept 1, 2020

If you're seeing connections from random residential IPs, they're probably using a reverse-proxy service like Luminati or 911.re. IP blocklists won't catch these. These proxies originate from (basically) compromised computers -- people who install "free" browser extensions and the like: https://www.trendmicro.com/vinfo/hk-en/security/news/cybercr...

With a troll this persistent (and willing to spend money on it), your best bet is definitely shadow bans and moderation queues.

ev1 · on Sept 1, 2020

It isn't just random residential IPs, but what appears to be BGP hijacking of/onto residential networks on top of that.

stickfigure · on Sept 1, 2020

Can you elaborate?

If the troll is using residential proxies, you might try abuse@ the handful of services that offer such things. There aren't that many. I don't know if they actually take abuse seriously, but it can't hurt.

Luminati has an abuse form: https://luminati.io/report-abuse

bawolff · on Sept 1, 2020

What's the basis for assuming BGP hijacking? That would be very sophisticated attack to just troll some random website.

ev1 · on Sept 1, 2020

The skid is not the one doing it. They are buying from semi-professional "proxy sellers" that do it and then sell you some form of authenticated squid proxy that further makes the request.

GoblinSlayer · on Sept 1, 2020

Can't he fake the source IP address by sending raw packets? Are they sanitized in any way?

dnet · on Sept 1, 2020

That wouldn't work for HTTP(S) or anything else that works over TCP since the reply would go towards the fake source IP address, thus the attacker couldn't even get past the 3-way TCP handshake.

huhtenberg · on Aug 31, 2020

Try the basics first, i.e. hiding of the posts from new accounts, IP and cookie blacklisting. See if it works.

Then move on to more advanced stuff, e.g. browser fingerprinting and behavioral pattern matching. If it's automated, there'll be a pattern. But the basic stuff can go a very long way.

As a bonus, you can put all new accounts into a single group, so that they would see each other posts, but without making these posts visible to the approved accounts.

tdrp · on Sept 1, 2020

By the way how do you detect VPN traffic? For tor we just pick up the list of exit nodes but we've had trouble identifying VPN without using a 3rd party API.

toast0 · on Sept 1, 2020

You can pick up a lot of VPNs by checking the ASN. If it's a hosting company, chances are it's a VPN or a bot.

Doesn't work for the VPNs that get you on residential IPs though.

fomine3 · on Sept 1, 2020

I know a site just banning a IP opening 443/tcp. It bans web server and possibly ssl-vpn server, but causes false positives.

soupsranjan · on Sept 1, 2020

we are a stealth startup founded by former Paypal and Coinbase engineers. We have a device intelligence product that can detect accesses from proxies/VPNs without using any IP list. We can also detect the True OS that someone is using - useful to detect emulators and script kiddies. Happy to chat if of interest: info AT sardine.ai

malinens · on Sept 1, 2020

maxmind has anonymous IP DB. it costs money

adrr · on Sept 1, 2020

Use js fingerprint library. Ban on the fingerprints. Black list the whole provider if it came from a hosting provider. You’ll end up blacklisting real users so you’ll need to ability to whitelist. WebRTC used to be able to pierce proxies to get the real IP via STUN, could try that. You could also check ttl on tcp protocol to see if it’s going through a proxy or client is lying on the user agent header.

There’s more stuff on the client you can do to prevent attackers from just hitting your APIs and forcing them to run a full client with having client solve sort a problem that it needs to provide to the API. You can detect headless browsers as well in JavaScript with a few open source bot detector JavaScript libraries.

rightbyte · on Sept 1, 2020

Shadow banning is for technically illiterate trolls not spammers. I would not advice automatic shadow banning as it might be psychologically destructive for false positives in a community. Technically illiterates will never understand that they are shadow banned and keep posting in vain and think they are ignored.

I would advice sending a registration token as a clear picture to the user email (really short one, like four letters). Then the spammer need to do pattern recognition and if he can't program he will not bother.

miki123211 · on Sept 1, 2020

What if an actual user's blind?

rightbyte · on Sept 1, 2020

Maybe include a url to request manuall approval and make the text white so only screen readers notice it?

miki123211 · on Sept 1, 2020

I've seen a site like that, a Captcha + a manual form. I signed up through the manual one a couple years ago. I'm still waiting to hear from them.

rightbyte · on Sept 1, 2020

Ye well if the moderation doesn't work properly any manual review wont work including whitelisting users first post.

The amount of legally blind users should be low enough that there can be a ad hoc queue of some sort. I guess any spammer can spam down that queue too though?

Maybe there could be a simple sound captcha for blind people with links to some audio files with letter in them to listen to in order. If the script kiddie figures out how to match byte streams add white noise to make it an DSP problem.

fock · on Sept 1, 2020

what's the problem if your posts don't show up to the public at first?

Also, with them using Selenium, I doubt that they can't put together some rudimentary pattern generation...

rightbyte · on Sept 1, 2020

I have no problem with requiring manual validation of eg. the first post or what ever it is the shadow banning I don't like.

I adds complexity for the users and it is hard to know for user what is happening since it is a "secret" that someones messages doesn't show up. When the moderation process breaks down or there is a bug the user wont know how to appeal etc. It's the Google way of user interaction.

Just add a "This message will not be showed to others before your account has been approved" or whatever and it is not a shadow ban anymore.

The spam program might be shipped with Selenium. If the spammer can fix a trivial OCR text then OP knows that the spammer is a programmer not a "script kiddie".

Imagine a picture with four rows of letters and a rectangle around four letters. The rectangle moves around between registrations. Surely that would require a handwritten OCR to solve if the spam app ships with some boilerplate OCR? And be trivial to solve for humans with bad eyesight.

ryantgtg · on Sept 1, 2020

> Allow new accounts, but hide messages from them until their posts are verified manually

But they're registering hundreds of thousands of accounts. And, I assume, many of those accounts are creating posts. Moderating that many posts doesn't seem like a solution.

heartbeats · on Aug 31, 2020

If it's a targeted attack, they'll just make one clean account to check if they're shadowbanned. The rest of the suggestions are great.

zaarn · on Sept 1, 2020

Circumventing shadow bans is fairly annoying to handle correctly; you have to get a KNOWN clean account (which if it gets shadow banned, becomes useless without notice) and verify other posts. You have to wait some time after your post to avoid racing issues and there is some other tricky timing issues that means that this will heavily slow down any post (I mean, you already have to now check if your post went through, that's atleast 1 additional connection).

This can also be easily spotted and combatted by using a half-shadow-ban for accounts that are seen repeatedly browsing threads that just had a shadow banned post made; the user will not see random shadow banned posts, making detection even more difficult as you have to now cross reference multiple accounts.

You can make it arbitrarily hard to circumvent shadow banning.

huhtenberg · on Aug 31, 2020

It's for the better if they discover that there's a shadow ban mechanism is in place. This acts as a deterrent.

laurieg · on Sept 1, 2020

Other posters have given good advice on technical aspects. I'd like to add my experience from moderating a large subreddit.

Focus on making it not fun to troll. Never acknowledge the disruption. Make all your countermeasures as silent as possible. Never address the script kiddy directly. Don't accidentally make a "leader board" or similar by counting number of bans/deleted posts etc.

Eventually it just becomes a waste of time to scream into nothingness and they will go elsewhere.

zaarn · on Sept 1, 2020

A method used by a german community once was the troll throttle. The basic idea being that troll and spam content compressed better than average content.

So you point various compression algorithms against your community content to form an average/median/other statistical points. Try both compressing each post individually and compressing it as one giant text corpus and counting the size growth a post generates by being added. These are your measurement points.

An incoming post must solve a captcha to be able to post, however, the likelihood of solving it is tied to the compressability of your post.

A compressable post is likely to be spam or ascii art. The captcha fails even if the data was entered correctly. IIRC I used a relationship of 'min(1, sqrt(1/compress_factor)-1.05)'.

A non-compressable post is not only likely to succeed a captcha, they might succeed even if they actually failed it.

The entire point is that it shifts balances. Trolls will have to submit their posts a few times and resolve captchas, which slows them down. Making content that does not compress well across a variety of compression algorithms, especially if you also account for existing text corpus, is a very hard problem to solve. They'd have to start to add crap to the post to bloat it up, at which point you can counter with the next weapon.

Repeat all of the above, except instead of compression, you estimate entropy. High entropy blocking means you can block messages containing compression decoys.

0-_-0 · on Sept 1, 2020

The best part about this is it's difficult to figure out how the detection mechanism works to counter it, and I really like the idea, but isn't it easy to just add random noise to the a post to give it the right amount of entropy?

zaarn · on Sept 1, 2020

It would be quite an art to generate a legible post that contains both ASCII art and a consistent amount of entropy without containing too much entropy or being too compressible.

And since posting it adds it to the compression dictionary, it's likely to only going to work some time before it gets compressed better.

You'd be fighting yourself more than the captcha. Plus an attacker would now have to, in order to figure out if a post will work or not, attempt to compress the post over X number of compression algorithms (plus of course the unknown text corpus of the long-term compressor) as well as estimate entropy via Y algorithms. You'll be wasting a lot of CPU on these tasks.

Meanwhile, humans have no issue because while english does compress somewhat, when we express unique ideas it's less compressible then when we do not.

zupa-hu · on Sept 1, 2020

> it's likely to only going to work some time before it gets compressed better

Unlikely, random noise does not compress. It's practically always unique.

zaarn · on Sept 1, 2020

But that triggers the entropy detector.

TomGullen · on Sept 1, 2020

We’re a UK company, and we had an incredibly persistent spammer. He’d also send us threatening emails. His persistence and nastiness was draining and quite frankly impressive how much time he was putting into it all.

I don’t know if it was coincidence, but after some sleuthing found his real name and did the online FBI tip-off form about his emails to us. He had a bad history and may of been on bail.

Stopped pretty promptly after that - guessing he got a phone call.

dougk16 · on Sept 1, 2020

You didn't mention email confirmation in the first place but figure I'd mention this for others. I recently ran into a similar situation and had the idea of registrants emailing ME a secret code I give them instead of confirming they receive it. Still technically automatable but would definitely throw a curveball to the bots. I confirmed with an Ask HN that this is a secure method: https://news.ycombinator.com/item?id=24116530

SquareWheel · on Sept 1, 2020

Not the answer you're looking for, but reCaptcha is probably your best option.

I attempted half a dozen mitigation strategies to prevent spam on one forum I ran. I tried honeypots, questionnaires, other captchas, and proxying services to block bots. They slowed the bots at best, but when there's a torrent of bad actors it really doesn't matter if you slow them down 50%.

I finally installed reCaptcha and it solved the problem instantly. Not a single bot has signed up in 6 months. I started getting suspicious that signups were just broken, but I tested it and it was fine.

After that experience, I'm very much on team reCaptcha. I tried hCaptcha as well (on a different project), but found it was much harder to solve.

tdrp · on Sept 1, 2020

Also, after you hit a certain number of users the "bad actors" sometimes have people behind them manually adjusting their bots' algorithms to match your tricks.

dyingkneepad · on Sept 1, 2020

My wife is in her 30s and she often fails reCaptcha and gives up on sites that require it.

alexnewman · on Aug 31, 2020

HCaptcha founder here. I am sorry you had trouble solving captchas. Perhaps your older members might have luck with https://www.hcaptcha.com/accessibility.

ev1 · on Aug 31, 2020

Thanks for responding! Do you know if this is linked anywhere within the captcha itself, or is it + Privacy Pass something that the user has to find and look up for themselves?

I don't recall seeing an accessibility link anywhere on an actual captcha HTML block, just a refresh / terms of service button.

alexnewman · on Aug 31, 2020

That's a great point. It isn't. We just blog about it. Let me bring it up with the team.

buixuanquy · on Sept 1, 2020

Gosh, your captcha are really hard to solve, and I'm not even 25 year olds yet. Sometime I can't understand the question and what I should select because the images are showing only small part of the objects, and it's keep going and keep going, I must solve 5 - 7 questions before I can access to the target website.

zaarn · on Sept 1, 2020

For me it's the opposite. hcaptcha questions are usually atleast solvable, even if you get some parts wrong. Google usually wants me to solve 4-5 of the image questions, sometimes I just get fully blocked without explanation and can't solve further captchas.

And that is on top of not having to give more data to google.

miyuru · on Sept 1, 2020

I experienced the same issue and hcaptcha does not support IPv6. I was baffled when cloudflare decided to use a IPv4 only service.

ffpip · on Sept 1, 2020

Hcapthas are a godsend compared to Google Captcha. Google always puts me on a loop and asks me to solve 8-10.

hackerman123469 · on Sept 1, 2020

In the same boat. I had to put it on the easiest but I still fail sometimes and you often have to cross out like 90% of the images given, whereas with competitors like recaptcha it's often just 2 or 3 images.

obblekk · on Sept 1, 2020

client side, GPU bound challenge (the user doesn't do anything but wait for a spinner to load, the javascript has to solve a np hard problem).

won't block all spammers, but will increase the server cost (even for selenium) to the point where they'll have to get GPU instances which will be too expensive for a script kiddie.

this is what cloudflare is sorta doing when they say "verifying your browser"

abiogenesis · on Sept 1, 2020

This is one of the best methods to rate limit attackers. Something like https://news.ycombinator.com/item?id=7944540

rolandog · on Sept 1, 2020

I've seen this in some forums; you suddenly get a lot of cpu usage; at first I thought a crypto miner was getting through my external script blockers, but after checking the source and grokking the minified code, it dawned on me what it was.

unnouinceput · on Sept 1, 2020

How will this stop a bot that has embedded browser in it? For the bot is just a thread that will wait on "cloudflare verification" to happen. Not that hard to bypass

Kiro · on Sept 1, 2020

It will cost server resources for the attacker. It's not just waiting, it needs to actually solve the problem.

luckylion · on Sept 1, 2020

Unless you make it very expensive (which would turn legitimate users away, because they don't want to wait for a minute to sign up), that won't help, I think. You tie up a core for 5 seconds, but the server has 12 of those, so they can create 12 accounts every 5 seconds => still too much to handle in moderation.

zaarn · on Sept 1, 2020

It's 12 accounts per second instead of 600 per second if it only takes 1 second. Or 1200 if it takes half a second. That is still an improvement.

crazypython · on Aug 31, 2020

Hey, my game will be in a similar situation. I'm looking into building a CAPTCHA that works by taking submissions from r/notinteresting, r/mildlyinteresting, and r/interestingasfuck, and asking the user to take an image and classify the image into not interesting, mildly interesting, and very interesting. We can distort, crop, and recolor the image to defeat reverse image search. That should be enough of a stopgap to stop them. Contact me via email (in my profile page) if you want to work together on that project.

aaronbasssett · on Sept 1, 2020

So my script has a 1 in 3 chance of getting the CAPTCHA correct each time, with just a random guess?

A regular 6 character alphanumeric CAPTCHA has a 1 in 56,800,235,584 chance as a comparison…

cyphar · on Sept 1, 2020

reCAPTCHA works in a similar way, they just ask you to determine which photos in a set of 9-12 contain a particular object (2^12 at best). If you just asked people to do the Reddit classification 7 or 8 times you'd get the same chance of random guesses passing. The trick is to rate limit attempts.

Personally my problem with this is that even with the basic categories reCAPTCHA asks for I find it difficult to figure out whether certain edge cases should count. I feel it could be more frustrating to have to guess whether someone on Reddit found a particular image interesting or not.

mey · on Sept 1, 2020

Some CDN services provide Bot detection. (as well as other DDoS mitigation options).

https://www.cloudflare.com/products/bot-management/

https://www.akamai.com/us/en/products/security/bot-manager.j...

Edit: I didn't see your comment about budget. I expect Akamai may be out of reach, not sure about Cloudflare's options. Most bot detection is going to need to finger print behavior of the interaction to the site (Captcha as well). If that data is handled correctly, (not being sold/made available to a third party/destroyed after use), I believe it can be done ethically. Obviously my ethics are not yours.

ev1 · on Sept 1, 2020

"Bot Management" is enterprise only on CF, and Akamai is out of reach.

stickfigure · on Sept 1, 2020

Just FYI - bot management is basically snake oil. Distil can be circumvented by automating Chrome headful. Akamai doesn't even require browser automation. Can't speak to CF, but it's probably not different.

If your script kiddie is automating browsers and using residential proxies, he's probably sophisticated enough to get around this stuff. You're not missing anything.

kennethkhaw · on Sept 1, 2020

hey there, i work for ipinfo.io and we have a privacy detection API (https://ipinfo.io/developers/privacy) that includes vpn/proxy/tor/hosting(bot, scrapes,etc) flagging which may help your predicament? You can test out how accurate we are by appending any nefarious IPs to our URL and see if we flag it correctly. e.g. https://ipinfo.io/5.62.18.115.

Not sure if that will be your magic bullet but will be happy to have a chat and riff on this. I've talked to customers that are using our privacy detection API and I do remember a good subset using it for your sort of use case. At the very least, I can pass on some learning, strategies, etc to combat it.

Available anytime at ken@the-ipinfo-domain.

aww_dang · on Sept 1, 2020

$50/mo for geolocation + ASN lookup?

kennethkhaw · on Sept 1, 2020

that's our paid plan but we do have a free tier! if you have something in mind and an interesting use case, hit me up and I can organise access for you.

judge2020 · on Sept 1, 2020

You could try Bot Fight Mode which is "if it's a bot, block" using the same enterprise tech, however this is an extremely unreliable bet since you have no tuning options if it doesn't block bots, or if it blocks some of your users that happen to be using something like a text-based browser.

eastdakota · on Sept 1, 2020

Email me: matthewatcloudflaredotcom.

kilburn · on Sept 1, 2020

A more extreme approach that may or may not work for you is to make the community invite-only.

Track the network of invites and shadow-ban linked accounts when you detect the spammer popping up. The spammer will eventually run out of invitees.

You can combine this with "no invitation required" short periods, where you make changes to the signup flow, spam detection, etc. and make the window short enough for the spammer to not have the time to adjust their bots.

tommica · on Aug 31, 2020

If you can detect them as spammers, instead of banning them, shadow ban them, making their posts invisible to others, and slow down the servers response for them.

There is also alternatives to recaptcha, that might be more ethical, for example https://www.phpcaptcha.org/ - there are some image matching ones too, but I don't know any specific ones.

aww_dang · on Sept 1, 2020

Perhaps embed some recursive/infinite XSL in an iframe or data src attribute. There are also ways to create memory intensive HTML documents. This will cripple their Selenium or other headless browser instance.

bawolff · on Sept 1, 2020

Maybe try stealing Wikipedia's ip ban list - wikipedia gets a massive amount of spam which makes it an easy resourse for getting a list of evil ip addresseses.

Their list is a combo of https://en.wikipedia.org/wiki/Special:GlobalBlockList and https://en.wikipedia.org/wiki/Special:BlockList?wpTarget=&wp... and TOR (which is handled automatically) [there is also an api version in json format]

tdrp · on Sept 1, 2020

Not sure why it's not mentioned but, in addition to technical mitigation, if you know the attacker's general info, then maybe you can also try other avenues such as law enforcement or legal claims.

More work as well but when you whois some of the attacking machines you can find out what the abuse@ email is for them and contact them. That can put the provider on notice if you later also go with some legal action.

MattGaiser · on Sept 1, 2020

Is there a reason to not use hcaptcha for signup only? Older members are already members, so all you are doing is applying it to the new people.

Or add 2FA with a text message for sign up. That is a lot harder to automate and unless he is willing to spend a ton of money on extra phone numbers, he should run out of them quickly.

awinder · on Sept 1, 2020

How many people are you registering a day normally? I’m wondering if you shut off signups for a while + handle the inevitable attack & they can’t get back in they might move on. How much money and time do you think they (or you) are willing to commit though, what a crappy tale :-(

_5vzs · on Sept 1, 2020

I'm working on https://spamscanner.net, which will be useful very soon for this with a public and free API (which will store zero logs and adhere to same privacy as https://forwardemail.net).

mpol · on Aug 31, 2020

In my experience JavaScript filters work very well against spambots. For example, you could have 2 honeypot fields, 1 with a certain value, 1 empty. In JavaScript you switch their values, and on the server side it should validate this way. Most spambots don't run JavaScript (yet). Another one could be a simple timeout, again 2 fields with a certain value. You count 1 down, the other up. On server validation there should be a difference of more than 1.

For an example, check a WordPress plugin I made 2 years ago: https://wordpress.org/plugins/la-sentinelle-antispam/

There is also the slider thing on Ali Express, that you could check out. I haven't looked into it, not sure how it exactly works.

ev1 · on Aug 31, 2020

This is targetted spambots, so they will run through it once by hand and check the request.

Aliexpress uses heavy, extreme amounts of fingerprinting, including port scanning your device and your internal network via <img onerror> tags and Websockets. The slide part is the least of it.

whakim · on Sept 1, 2020

Yeah, this is the trouble with client-side solutions. If it's worth their time (for example, if there's a credit card field or something), the bad actor will first take a look at the request as it's sent to the server and then they will make requests that look similar. You can do some aggressive stuff with fingerprinting like this example but honestly at a certain point captchas are just going to save you a ton of hassle and the alternatives start to become increasingly invasive too. And I say this as a person who strongly dislikes captchas from both a privacy perspective and an end-user perspective.

heartbeats · on Aug 31, 2020

Try requiring new accounts' first few posts to be manually approved. Then he'll have to make enough quality posts to build up credibility first. This is very difficult, especially for a script kiddie.

Alternatively, you can take away the instant gratification by adding a cooldown of, say, three days for each created account. Then he'll have to register them in bulk and hope the humans don't spot the patterns.

You could also try using Bayesian filtering, but you'd have to block the ASCII art first.

2FA4spam · on Sept 1, 2020

How about a simple out-of-band confirmation requirement for every account signup?

“Thank you for registering. Please send an SMS to number XXX with code YYY to activate your account.”

Kind of like a reverse 2FA.

mrweasel · on Sept 1, 2020

I think that would kill signup rates, mostly because it's so different from other solutions, but damn is it an interesting approach.

You may need to add something that prevents the same phone number from being used for multiple signups.

imhoguy · on Sept 7, 2020

No way I am going to use it, even with some well known sites. Looks like scam.

ffpip · on Sept 1, 2020

That's a very interesting approach.

hinkley · on Sept 1, 2020

You could also try detecting Selenium, but that could be cat and mouse as well:

https://stackoverflow.com/questions/33225947/can-a-website-d...

Remember, the goal is to flag accounts for cheap bulk rejection, without telegraphing to the attacker.

hinkley · on Sept 1, 2020

> without telegraphing to the attacker.

To extend this thought: I recommend that you don't roll out single mitigations at once anymore. Debugging is always confounded by having multiple simultaneous errors causing the same problem. Doing one at a time just lets him ladder up with you. Knock out some rungs.

Always do pairs or triplets from now on if you can.

NetToolKit · on Aug 31, 2020

We at NetToolKit have been working on related problems for years and might have two products that directly address what you are looking for.

We launched Shibboleth (a CAPTCHA service) about a year ago, and you can select from a variety of different CAPTCHA types (including some non-traditional types; different types have different strengths and fun factors): https://www.nettoolkit.com/shibboleth/demo There are a variety of options that you can set, and you can also review user attempts to solve CAPTCHAs to see if you want to make the settings more or less difficult.

Recently, we launched Gatekeeper ( https://www.nettoolkit.com/gatekeeper/about ) which competes against Distil and others, but without fingerprinting. Instead, site operators can configure custom rules and draw on IP intelligence (e.g. this visit is coming from Amazon AWS or this IP address has ignored ten CAPTCHAs in two minutes), and Gatekeeper will indicate to your website how it should respond to a request based on your rules. There's also other functionality built in, such as server-side analytics. Some light technical integration is required, but we're happy to help with that if need be.

As with all NetToolKit services, we have priced both of these services very economically ($10 for 100,000 credits, each visit or CAPTCHA display using one credit).

We would very much appreciate a conversation, even if it is only for you to tell us why you think our solutions don't fit what you are looking for. I would be happy to talk to you over the phone if you send me your phone number via our contact form: https://www.nettoolkit.com/contact

ev1 · on Aug 31, 2020

Yep, unfortunately usage based billing is not possible for us. We can't use any usage based cloud services at all due to abuse and attacks - can't even host a simple avatar or button images on S3 without someone trying to infinite loop curl them to blow through budget abusively. On top of that, if you're going to reverse proxy the site, your service will probably be hit repeatedly with 300G+ attacks.

Do you have an email (ideally one that doesn't pipe into a ticket system)? Maybe I can share some possible/creative attacks we've seen that you can improve your service with, even if it's out of budget for us.

As a comparison note, Stackpath does 1mil requests for $10/m.

NetToolKit · on Aug 31, 2020

Our contact form does not pipe into a ticketing system (it goes into support@[our domain, available via our profile link], which is just a G Suite email account that you can use to contact us directly).

I'd very much appreciate hearing your thoughts about attacks and understanding what an effective solution would be. Thanks also for your note about Stackpath -- we aren't a CDN, but Gatekeeper could help reduce bandwidth usage by denying requests.

ev1 · on Aug 31, 2020

I mean that is the price for Stackpath WAF (captcha, rate limiting, etc) :)

HEHENE · on Aug 31, 2020

This may run afoul of your "no privacy invading methods", but are you able to implement email verification before new users can post? Then once they get bored of trying to attack the site you can go and purge all accounts created in the last n days that haven't been verified yet.

I run a gaming community with several thousand members and we regularly have to fend off attacks on both the community (spam bots in Discord) and the game servers themselves (targeted DDOS attacks usually in the 200-300Gbps range.)

From my experience, they tend to get bored and move on rather quickly so often times whatever we have to implement is more temporary in nature and doesn't really affect the existing community much if at all.

ev1 · on Aug 31, 2020

Email verification is already required and always has been.

He's cycling through handfuls of oddball throwaway/disposable providers, some catchalls. We block all known temporary email providers, but there are a few that are obscure/blackhat/let you point a MX record from any free dynamic dns provider to enable abuse.

Another interesting thing is that after we blocked all known VPN provider space, he switched to more "darknet" proxy providers that pretend to be legitimate by having random eastern european dirty IP blocks announced on Comcast/Verizon AS.

A human eyeball can detect them, they're all pretty obviously following a pattern like NameNameName or random letters, but unsure how I'd want to write something to catch this in an automated fashion.

Oddly, this actually started over ~2 month ago, and it just started again this week after a few weeks of no activity or attempts at all. Our complete VPN block resulted in no successful activity for 9 days.

He also periodically tries to re-register from the same home IP once a month claiming to be a new account and why is he getting banned? and etc.

heartbeats · on Aug 31, 2020

You could whitelist the email providers, and require "strange" email providers to be approved by mods. The workflow would look like this:

1. Sign up with Gmail 2. Verify email 3. Account is instantly approved

1. Sign up with sharklasers 2. Verify email 3. "You're using a weird email provider. Mods will look at your account and see if it looks OK. If so, we'll approve it"

hinkley · on Sept 1, 2020

Don’t telegraph that information, I think. Better perhaps for the automatic approval to look like a fast human and the manual approval to look like a slower one. A process doesn’t have to be manual to look manual. The goal here is to reduce the cost per request.

tdrp · on Sept 1, 2020

If you are sure that's their home IP (and that's the same person triggering the spam), and they are in your country, you should consider getting a lawyer involved.

We had a similar issue and got one involved to get the process started (I think he used CFAA abuse). The attacker stopped as soon as we mentioned lawyers (he happened to also be in the US). We would have pressed it further but the lawyer was racking up billable hours and we were not in a position to afford it.

toast0 · on Sept 1, 2020

If all you want is for the abuse to stop, you might reach out to the ISP's abuse contacts. All this abuse is certainly against their terms, although they may or may not consider it if it doesn't happen from their IPs.

Getting your internet cut off, even if it's only temporary, can lead to a large change in behavior.

nitwit005 · on Aug 31, 2020

> He also periodically tries to re-register from the same home IP once a month claiming to be a new account and why is he getting banned? and etc.

I'd be tempted to try to trick them into telling you their personal information if they're doing that. Create a page that pops up only for that IP that asks for name/address for a prize give away or something.

benmanns · on Aug 31, 2020

You could try checking the MX records on registration and build up a list of banned MX handlers instead of banned email domains.

bo1024 · on Aug 31, 2020

Sorry to hear you're dealing with this. I'm not in the field, but this is a case where I would abstractly be tempted to use javascript blockchain mining or similarly require some amount of useless computation by the browser during signup.

Pick-A-Hill2019 · on Sept 1, 2020

Some good answers in the stuff others have posted (Especially the accessibility one).

You don't provide many details of what you do and do not have at your disposal in terms of skills, tech stack, access to log files etc so this is a non-expert cut and paste from SO [1]. Yeah I know (StackOverflow) and it doesn't even relate directly to your problem ....But if you read the long bit below it might give you a bit of blue-sky thinking.

>> The next is determining what behavior constitutes a possible bot. For your stackoverflow example, it would be perhaps a _certain number of page loads in a given small time frame from a single user (not just IP based, but perhaps user agent, source port, etc.)_

Next, you build the engine that contains these rules, collects tracking data, monitors each request to analyze against the criteria, and flags clients as bots. I would think you would want this engine to run against the web logs and not against live requests for performance reasons, but you could load test this.

I would imagine the system would work like this (using your stackoverflow example): The engine reads a log entry of a web hit, then adds it to its database of webhits, aggregating that hit with all other hits by that unique user on that unique page, and record the timestamp, so that two timestamps get recorded, that of the first hit in the series, and that of the most recent, and the total hit count in the series is incremented.

Then query that list by subtracting the time of the first hit from the time of the last for all series that have a hit count over your threshold. Unique users which fail the check are flagged. Then on the front-end you simply check all hits against that list of flagged users, and act accordingly. Granted, my algorithm is flawed as I just thought it up on the spot.

If you google around, you will find that there is lots of free code in different languages that has this functionality. The trick is thinking up the right rules to flag bot behavior. <<

[1] https://stackoverflow.com/questions/6979285/protecting-from-...

issa · on Sept 1, 2020

I've had a lot of luck with variants of a honeypot. Add a visually hidden field and any time it is submitted with content, block the post. Super simple and with some creativity, it's hard for the bots to keep up.

birdman3131 · on Sept 1, 2020

Even better. Take a current field and hide it and put a replacement for it. (Ie hide FirstName and add a FName field) More likely for it to be triggered.

boredatworkme · on Sept 1, 2020

You have received some great suggestions so far.

One of the forums that I frequent has a "newbie" section, which is not visible to full members or guests (who are not logged in). Whoever registers to the website needs to get a predefined set of "Likes" on their posts. Not every post gets a "like" - only those that contribute to the discussion do (not everyone needs to agree, debates are welcome as long as they are civil).

This helps maintain the quality of the forum to an outside viewer and cuts out a large amount of spam.

boredatworkme · on Sept 1, 2020

Couldn't edit, so replying to my comment:

The newbie section comes with a limited number of posts per day - so for example, I sign up today, I can post a maximum of 5 posts per day, and this limit goes up as I accumulate "likes".

I'm not sure if this will help if there are no community moderators trying to share a bit of the workload though.

miki123211 · on Sept 1, 2020

Be sure you do email verification before users are able to post. Block domains of temporary email services (there are lists floating around GitHub, Google is your friend). Only allow one account per address. Figure out what domains the spammer is using to make email accounts. If you can, block them entirely, if not, require manual approval just for those domains. Use the other suggested technoques, like shadowbanning etc. Consider requiring or allowing social log in or phone number verification.

phantom123 · on Sept 1, 2020

If you need a reliable & good API to do email verification, check out https://removebounce.com/

Seb-C · on Sept 1, 2020

It happened to me a long time ago. He was not only spamming and using SQL injections to destroy my community, but also advertising his own concurrent website.

When I also started to build scripts and destroyed his own website, he basically realized the harm he was doing, apologized and stopped.

Reminds me of the old good times when you could trap script-kiddies on msn.

"I think you are lying and not capable of hacking my computer. I'm waiting for you, my IP is 127.42.196.8"

freitasm · on Aug 31, 2020

Cloudflare perhaps with a firewall rule that blocks bots over a certain threshold? It may fall under fingerprinting if you need to know that.

ve55 · on Aug 31, 2020

I mentioned quite a few alternatives to ReCAPTCHA that often work in situations like yours here: https://nearcyan.com/you-probably-dont-need-recaptcha/

Some of the best solutions include very minimal/quick captchas, or simple checks for things like javascript

ev1 · on Aug 31, 2020

Thanks for the article! In this case, this is custom spam from someone that has spent a few hours looking at the network tab in devtools. The bad actor is running headless browsers after we started doing basic cookie and JS checks; we added some additional JS checks for basic things like whether it's a 800x600 window or similar - this stopped the spam for a few days until he figured it out.

ve55 · on Aug 31, 2020

That's good, at least if they're not too skilled sometimes the ratio of time spent playing cat vs. mouse with them isn't too bad, since it takes them a long time to bypass measures.

You could also look into things on other layers rather than the application level, for example, maybe the IPs they're using all come from similar VPN providers or services?

rad_gruchalski · on Aug 31, 2020

Why not sending all the data you need to the server and do checks on the server? They can’t see what you do on the server.

bocytron · on Sept 1, 2020

Loosely related discussions, for reference: https://news.ycombinator.com/item?id=23089599 https://news.ycombinator.com/item?id=20058697

phenkdo · on Sept 1, 2020

I think we might need more creative solutions to the problem of spam <everything> including online reviews, posts, phone calls. Some kind of PKI id verification system that every user should sign up with. Sure that will be turn off a lot of users, but the trade off is only the true enthusiasts will be participants.

Rotten194 · on Aug 31, 2020

Can you enable hcaptcha with a whitelist for known-good accounts? Not ideal but might annoy them enough to give up.

alexnewman · on Aug 31, 2020

hCaptcha founder here. You should be able to do this with some backend code and a small client side change. Happy to help make that happen. What framework are you using?

ev1 · on Aug 31, 2020

Unfortunately, all (ok, the only) the hcaptcha plugins available basically just let you enter a sitekey and nothing else/no other configuration possible.

GoblinSlayer · on Sept 1, 2020

Simply don't add hcaptcha to the page for approved accounts or ignore absence of a solved captcha on the server in submitted data. If you can't whitelist approved accounts, then you have a problem.

greatjack613 · on Sept 1, 2020

See comment by hcaptcha founder above

vmception · on Sept 1, 2020

> Negotiation is not really an option on the table, the last time one of the other volunteers responded at all we got a ~150Gbps volumetric attack

that's hilarious, have you tried trolling them back like just enjoying their company? start saying pool's closed and things like that

codegeek · on Aug 31, 2020

Could new accounts not have an approved Post and instead, all new accounts must get manually approved ? Or is that not feasible due to the volume you have ? I would make sure that first few posts by a new account is now automatically approved/shown to anyone.

ehutch79 · on Sept 1, 2020

Temporarily charge a dollar to create a new account?

Also, do you have cloudflare in front of you?

methodmi · on Sept 1, 2020

Do you know if they are using headless browsers? If so, we can block them without fingerprinting and without captchas. https://methodmi.com

coolspot · on Sept 1, 2020

> MMI’s primary implementation is through the use of a JavaScript tag which interrogates the device to which the ad was delivered for the presence of a graphical processing unit (GPU). The JavaScript creates a Canvas Element in HTML5, which allows access to WebGL (Web Graphics Library), a JavaScript API for rendering interactive graphics within any compatible web browser without the use of plug-ins. An IVT classification decision is made based on the results of the rendering capability of the device.

How this can’t be emulated by a headless browser?

aoqooqoqoq · on Aug 31, 2020

Try using a proof-of-work system. It won’t differentiate between humans and bots but it’ll significantly slow down the pace they can register accounts, and it is completely transparent to legitimate users.

dawnerd · on Sept 1, 2020

Are you sure they’re not posting directly to the registration endpoint and bypassing the signup form? We just had this problem with spammers in China and QQ emails. Adding a nonce helped dramatically.

ev1 · on Sept 1, 2020

We have a number of required fields that are generated in client-side JS and not present in the underlying HTML. Our best guess based on logs is someone walking through signup with devtools open, and then trying to replicate the requests. A day or two of reprieve whenever some funky change is made, like making the form element default to action="/a/url/that/bans/the/person" and then when JS loads and it detects more than 1000 pixels worth of mousemoves, replace the form action with the real signup endpoint.

It's a standard forum software with some plugins, all <form> elements are CSRF protected with a random value, etc.

hinkley · on Sept 1, 2020

I wonder if you could rotate mitigations to keep him off balance. Using different ones for different IP ranges may complicate his ability to analyze the code.

dawnerd · on Sept 1, 2020

Damn, well good luck. Dealing with spammers is such a hair pulling thing to do.

raverbashing · on Sept 1, 2020

A JS tarpit (not sure this is the right name) might help

You add a JS snippet that does some work, but if you detect a bot you make it do increasingly more work. Think bitcoin mining but not actually that

compsciphd · on Aug 31, 2020

naive Q: how damaging would it be for you to stop accepting new accounts temporarily? or put differently, how many legitimate accounts are created on a daily basis? (single digits?)

paxys · on Sept 1, 2020

Do you have any problems with hCaptcha other than its accessibility? It sounds like that is a much easier problem to solve than everything else people are suggesting in this thread.

sktguha · on Aug 31, 2020

you could try google recaptcha v3 which does not require any user input and runs in background. not 100% sure however if it helps in your case but i think it should definitely help

bo1024 · on Aug 31, 2020

This falls under fingerprinting and privacy-invasive methods.

renewiltord · on Sept 1, 2020

Recaptcha it and wait them out. He can't do it forever. Then unrecaptcha. Just do a month. You'll lose some sign-ups and then you can go back to the old thing.

encom · on Sept 1, 2020

Aren't there captcha services that are less creepy than Googles?

johneth · on Sept 1, 2020

hCaptcha[1] springs to mind, not that I know if they're more or less creepy than Google.

[1] https://www.hcaptcha.com/

petre · on Sept 1, 2020

Add e-mail verification and a introduce a random 30-60 min delay before sending the verification e-mails. Then you can cut on disposable e-mail domains and so on.

rexfuzzle · on Sept 1, 2020

Might be slightly unorthodox, but email the first post to a Gmail account from a random address and see if it is marked as spam, only display the post if not.

viraptor · on Sept 1, 2020

Sorry, but that will backfire badly. The source IP / email / donation will be blacklisted in general very quickly. Also lots of targeted forum spam would not count as "random email" spam

GoblinSlayer · on Sept 1, 2020

Can't he just do that 150Gbps thing to make you do anything? Also can't you allow aimbots? You can keep them in a separate space and put them in a separate leaderboard.

dorgo · on Sept 1, 2020

https://xkcd.com/810/

laksdjfkasljdf · on Aug 31, 2020

the only wining move against spam is to make it easier to clean up than it is to spam.

captcha only benefit google and the like, who couldn't care less for the community or content. Captcha makes honest content (and spam cleanup) more expensive than the spam! it's a losing proposition that only looks good when you look at it without considering all the situations.

make honest content (and spam) easy, but cleaning up easier. Things like every user can flag something, after a certain number of flags, also remove other content from the same IP (or same bundle of users with a close registration time window) automatically. And of course a feature for admins to automatically ban and erase content from users wrongly flagging honest content.

It's harder than captcha, but it is an actual solution. Captcha is lazy and ableist.

ev1 · on Aug 31, 2020

This is exactly why I do not want to do captcha. I don't want more third party analytics, tracking, or making our players solve endless captchas.

We are a gaming community that also has some older folk that we'd like to be accomodating for. Most of the cheapo PHP captcha libraries don't support any form of accessibility, or if they do it's a vulnerability that allow instant solving.

Part of the problem is that we can moderate just fine, but we can't moderate or look through hundreds of thousands of new registrations - just need something to somehow get rid of 99% of the garbage automatically.

alexnewman · on Aug 31, 2020

Hi, hCaptcha founder. For anyone who has trouble with captchas I'd recommend checking out https://www.hcaptcha.com/accessibility . For the privacy obsessed (like me) our privacy pass option is pretty great as well https://www.hcaptcha.com/privacy-pass.

laksdjfkasljdf · on Aug 31, 2020

oh, i didn't even touch the tracking/privacy issues on my comment! that is whole other can of worms.

But thankfully, if you do not fall for the fallacy that captcha does any good at all, then you don't even have to worry about this aspect ;)

thotsBgone · on Sept 1, 2020

And what do you do when your forum has thousands of bots posting spam images every second, completely overwhelming user posts? Captcha may be imperfect, but the alternative is forcing the userbase of your website to manually flag every spam item, which is fine if there are only a couple, but without captcha, there will be more than a couple.

laksdjfkasljdf · on Sept 3, 2020

> manually flag every spam item

that's my point. make that easier than "manually clicking dozen of random images to train google's AI"

E.g. ban all similar text/posts from same IP when a couple valid long time user flags the same post.