ok but how is all this implemented? is it a machine learning algorithm that classifies latencies with some pre-trained model? or did the library creator set some manual threshold like latency < 200ms = no VPN else VPN?
No. It works in a very simple way. The visitor's IP address is pinged by the server running the test. At the same time, the client establishes a websocket connection to the server, exchanges a few messages, and latency is measured this way, too. The detector exploits the fact that most for-privacy VPNs have a NAT, and therefore the TCP/IP ping test would measure the latency to the VPN server, while the websocket test would measure the latency to the actual client device. A difference would thus indicate a VPN. In practice, this also triggers by a mobile connection while roaming.
Of course this does not work for VPN services like PureVPN, OVPN, and SwissVPN that provide a real public IP address to the client (so both pings measure the latency to the client), or for VPNs that properly firewalled the external NAT IP so that it is not pingable and does not send TCP RST or ICMP Port Unreachable messages when probed. But PureVPN IP ranges are known, so it is detected that way.
The false positive rate is going to be insane. You mentioned roaming, but there are so many other scenarios where this could trigger - so, the user sits down at a starbucks and suddenly can't access the client's webpage, with some very confusing error about VPNs. Guess what, they are not going to fix their network, they are going to give up on going to that website. Without a plan how to measure/fix false positives, there's no way this would last more than a week in a real e-commerce environment, unless there's already a dramatic problem with VPNs (like at Etsy).
I don't see how Starbucks is going to trigger this. The NAT device is physically in the same building as the laptop, so the segment between the laptop and then NAT (which is what results in the difference in TCP/IP ping vs websocket ping) would be very short and undetectable.
Not every network is as simple as a router in front of a laptop. And some may treat websocket traffic differently. And do weird DNS stuff. Every time your basic assumptions are wrong, your client loses a user, and you don't even have a way to detect that.
All these signals will either be too weak and let through enough false negatives as to be essentially useless, or too strict and produce so many false positives that a significant portion of the legitimate users leave in frustration. Unless you are some oppressive regime cracking down on VPN usage, I truly don't see where this will be useful. I guess it's helpful to compile the list of modern methods for detection and fingerprinting, so VPN providers can mitigate them.