Hacker News new | past | comments | ask | show | jobs | submit login
BitTorrent v2 (2020) (libtorrent.org)
305 points by sph on Jan 10, 2022 | hide | past | favorite | 150 comments



I assume this is being posted because qBittorrent recently released version 4.4 which includes libtorrent 2.x and therefore support for BitTorrent v2.

Hopefully we'll start seeing hybrid torrents/magnets in trackers soon. Although these are mostly automated so I'm not sure how fast the tools will catchup.

One thing I'd like qBittorrent and other torrent clients to do is to allow users to upgrade v1 torrents to hybrid torrents if they're fully seeded, this would allow cross polination for old torrents where peers are scattered between the original torrent and batch torrents, or where trackers add extra files causing the v1 hash to change.


A bunch of open-source projects would like to use v2 torrents, but because Transmission is so popular and has zero support for v2, nobody can use them. Transmission's dev team hasn't said a word on the matter, ever, in four years:

https://github.com/transmission/transmission/issues/458

Elsewhere someone quipped:

> That transmission bug is 4 years old and they've never announced plans to prioritize it (granted the effort is surprisingly high; they've never allowed empty dict keys in bencoding). I think either they either don't know or don't care that their market share makes them an adoption blocker.


In the beginning I used µTorrent, until it got fucked by whatever company that bought it. It was easily the fastest and most resource-smart client I've ever used. After that I tried a bunch, Deluge, Transmission, Azure and plenty more, but only qBittorrent seems to be a suitable replacement (and is open source as well). Now it has BitTorrent V2 support too :) You should really give it a try.


Ironically, the company that bought µTorrent was BitTorrent, Inc[1]. They're the geniuses also responsible for forking IPFS, renaming it to "BTFS", and not acknowledging IPFS as an upstream in any way[2].

1. https://en.wikipedia.org/wiki/Rainberry,_Inc.

2. https://twitter.com/juanbenet/status/1250634833258143744


BTFS is the product of an entirely different group of people than the ones who were responsible for the uTorrent purchase. BitTorrent Inc was itself bought by cryptocurrency company TRON in 2018. Few people who worked at BT Inc prior to the acquisition are still there, and AFAIK none of them ever worked on BTFS.


Different team of people, all working in the same company, correct?


Technically yes, but as is typical with an acquisition there's residual divide between projects which originated from the BT Inc side of the company and those from the TRON side. BTFS is very much a product of the TRON side despite using the BitTorrent branding.


Wow, that's a bummer, had no idea the people behind the hollowing out and filling with shit of µTorrent include the actual creator of the protocol itself.


If it makes you feel any better about it, that tweet thread is from 2020, and the actual creator of the protocol itself hasn't worked there in a day to day capacity since 2017.[0]

[0]https://en.wikipedia.org/wiki/Bram_Cohen


It makes me not worry about this BTFS thing, but utorrent going bad was around 2010.


qBT also has a nice WebUI that has feature parity with the desktop version (or at least near enough that I don't miss anything), which is very useful. Other clients (e.g. Transmission) do too, but I like qBT the most here.


I agree, I put it on an rpi and it works great from any computer/phone in the house that way :)


I just use the version of µTorrent before the company was sold.


Some private trackers require updated software in order to connect to them. And also, you're vulnerable to bunch of security issues if you connect to the public internet with a very old µTorrent version. Do you really run a 10+ year old version of µTorrent without any drawbacks?


> Do you really run a 10+ year old version of µTorrent without any drawbacks?

Absolutely.


> I think either they either don't know or don't care that their market share makes them an adoption blocker.

The truth seems to be that between 2014 and mid-2021 the "transmission dev team" was a single (probably part-time) maintainer:

https://github.com/transmission/transmission/graphs/contribu...


Wow I didn't realize transmission had its own implementation of the torrent protocol, I thought they used libtorrent like everyone else. According to a comment on that issue it seems that not only does it not support v2, but it also breaks on opening v2 .torrent files. Lots of torrents are generated manually using qBittorrent, i.e. private trackers, this means that torrents might start breaking for transmission users soon.


Unless I'm mistaken, we'd need to settle on a "default" piece size[1] if we were to have automatic upgrading in a client. I learned recently[2] that there's no standard set by the protocol, so Client A might choose a 16KiB piece size, and Client B might take the same files and choose a 64KiB piece size, resulting in two different hashes (v1 or v2!) and thus two peer pools for the same file set.

1. https://stackoverflow.com/a/65299426/2700296

2. https://github.com/webtorrent/magnet-uri/pull/43#issuecommen...


If I read the article correctly in v2 each file will contain its own root hash, which is calculated by using 16KB "pieces", the piece size for the torrent creation is still used for data transmission between peers and integrity checks.


I'm not sure if the the piece side continues to have bearing when the hashes are all in a Merkle tree on uniform 16k blocks.

"Enforcing these encodings also help make it more likely that two people creating a torrent from the same files, end up with the same info-hash and the same torrent."

In the discussion of backwards compatibility, it appears that v1 collections of identical content encoded with different piece sizes will all become the same v2 collection when upgraded.


If you have a v1 torrent file, you can’t download from a upgraded client


The release of qBitorrent 4.4 has had an impressive impact. BitTorrent went from something like an average of 10 or so BitTorrent v2 hybrid torrents to 250 or so.

Granted, that's something like 0.4% of the new torrents made every day, but still, it's not a rounding error anymore.

[1] https://imgur.com/a/mPcEaT5


Which tracker is this data from?


It's from a DHT/BitTorrent crawler that runs around downloading all the torrent metadata it can find I've been running for ages.


Do you make visualizations of the traffic?


Some different views. For instance, to give some sense of a scale for the 250 or so v2 torrents seen every day is, here's a view of the total number of torrents it discovers each day [1]. (Or, more properly, this is the number of torrents it gathered complete metadata and validated against the infohash each day).

One of these days I should find somewhere to put all these bits of metadata I've been downloading.

[1] https://imgur.com/a/8pWhjxD


Consider creating a collection at the Internet Archive for these data sets.


Sure:

https://archive.org/details/torrent_metadata_archive_sample

I'll start uploading monthly archives.


I genuinely appreciate that you took the time to do this. Thank you.


For anyone looking to download the data: Click on my name in archive.org to see the monthly archives. I'm slowly working my way through the months, though there may be breaks in the work as other work takes priority.

The first few months will be the biggest, then it should settle down to something like 15-30gb a month.


Neat! I don’t have it handy, but years ago I scrapped the states traffic page and then made time lapses from them. One of the road segments had a bad sensor that would toggle into a stuck state.

Keep it up!


cool. Did you write it yourself or are you using an open source project of some kind? Would be interested to run one of these myself.


I stumbled across a DHT crawler, and started running it. Over time I added random things to it, like IPv6 and the ability to download metadata. I was mostly interested in using it to learn how DHT worked both in theory and in practice, and for that, it did it's job.

The page for the crawler I started with is long gone, but this appears to be a reasonable copy on github:

https://github.com/zlzlovezl/simDHT-1

I'm sure you can find other copies, with features added. Looking around, it seems like quite a few people have taken this and made it their own over time.

It's far from perfect in this state, but a fun toy to play with. I will say, from a point of painful experience: The way this crawler works tends to make some of the various people that track such things think you're sharing pretty much all the things. If that's an issue where you live, don't run it from your home connection.


I'd like qbittorrent to work, personally. Worked for 1 torrent, and now won't even contact trackers, just stalls. Installed deluge beta and works perfectly.


BitTorrent slaps so hard as a protocol, and DHTs are absolutely brilliant. With all of the recent hype about decentralization with the rise of crypto, I'm surprised BitTorrent hasn't seen much hype or revitalization as well.


I'm not surprised. BitTorrent just works, and doesn't provide any opportunity to separate people from their money.


It kind of does if you choose to seed stuff that is ripe for DMCA takedowns/copyright infringement, right?

That's not the protocol's fault, but it's pretty much one of the biggest reasons most common people use the protocol: to pirate.

Get caught pirating? Probably get some fines, right?


DMCA and most copyright holders would only hold power over ~4.25% of the population in the world, as the rest of us lives outside the US and are not impacted by US laws. Some countries like to pretend they actually handle DMCAs as well, but if you push back they won't even take it to court.


> Some countries like to pretend they actually handle DMCAs as well, but if you push back they won't even take it to court.

Sheesh, please do a more thorough research. It's technically not DMCA (because it's not called DMCA), but Germany and Japan do have similar laws analogous to DMCA, and I'm pretty sure that further digging would reveal more countries that have an analogous process written in law.


The DMCA was introduced in order to ratify the WICO Copyright Treaty[1]. Every signatory of the WIPO Copyright Treaty has similar laws on the books.

Though really you could argue that the Berne Convention already covers the most obvious copyright violations you get from torrents which means that effectively every country on Earth has laws against bare-faced copyright infringement. The WIPO Copyright Treaty is most notable for Article 11 which effectively prohibits the circumvention of technical measures (DRM).

It may be practically difficult to litigate someone in another country, but it has happened in the past[2].

[1]: https://en.wikipedia.org/wiki/WIPO_Copyright_Treaty [2]: https://en.wikipedia.org/wiki/The_Pirate_Bay_trial


There is a whole scene around people spending 50-100/m just around this. There is a lot of money to be made if you live in the "right" countries.


Now that I think about it, BitTorrent is the best example of how well decentralisation can work, and basically the only reason I believe that a decentralised web is possible.


Unfortunately BitTorrent is still heavily centralized when it comes to discoverability. Without a website hosting .torrent files or magnet links, it's very limited.


These links are easy to mirror, they probably fit into a reasonably large SQLite file. Since the Web still allows decentralization, that doesn’t seem to be a problem. For example, the pirate bay is blocked in my country, yet I can still consult it on dozens of mirrors.


Not at all, a DHT crawler can gather Torrent links over time in a fully decentralized way.


Without curation it's not worth much.


I'm surprised nobody has build a torrent catalog on bittorrent itself. I think that with the DHT mutable item one should be able to publish a torrent containing a list of torrent with comments and updating the list easily.


It's not so simple. The juicy stuff is on private trackers and you can get into one only when invited. So you have a catalog of crap and not much more.



The creator of BitTorrent, Bram Cohen, is now full time focused on crypto: https://www.chia.net/

It's a novel layer 1 crypto with some interesting tech behind it. A lot of decentralized nodes inspiration from BT went into it as well.


To be fair, IPFS is a frequent player in the current hype, and is built on DHTs and obviously inspired by BitTorrent.

I think a major factor there is that IPFS supports pluggable transports, so it can easily be used in both backend and browsers environments directly, while WebTorrent (powered by WebRTC) can't directly communicate with most traditional BitTorrent clients (which use TCP/UDP directly) without going through a bridge node.


don't many traditional clients support webtorrent as well? https://feross.org/libtorrent-webtorrent/


This is part of why I'm a bit cynical about "web 3" in general. Bittorrent is terrific and yet the real money is streaming video/music platforms.


Bittorrent is basically revenue-hostile, so I don't expect it to get any more popular among content providers.


Because there's no built in coins, NFTs, or tokens that investors can pump and dump. In other words, it's an actually useful tool.


Web3 or whatever should've been based around the concepts present in BitTorrent, not crypto.


wouldnt it be something if we moved away from HTTP to a torrent web, at least for the substantial content of a website, and the HTTP relegated to fetching any HTML landing page with text and magnet links


So as far as I understand the protocol this is the principle behind Zeronet (https://zeronet.io/)


libtorrent was the first project I used Github Sponsor on. Arvid Nordberg is an absolute machine.


The project it's still in pre-alpha, it even stopped paying rewards to its token stakers/seeders due some technical issues during the implementation. Perhaps it just rushed onto it too fast just to catch the hype.


I‘m wondering if they could avoid yet another breaking protocol change if SHA256 proves to be insecure (at some point in the future) if they made use of Multiformats [0]. At least IPFS went this way (Multiformats grew out of the IPFS development).

[0] https://multiformats.io/


Generally speaking this is called “cryptographic agility” and the security community has decided that it’s an antipattern. The alternative is to have a specified suite of security mechanisms pinned to each version. You can still have backwards compatibility, but it helps avoid downgrade attacks.


Multiformats is not really cryptographic agility, it's just adding a small tag that indicates what type of data something is (what kind of encoding, what kind of hash, what kind of binary payload, etc). The benefit is that the tags are standardised rather than being tied to a protocol version or some other out-of-band information. Whether or not you start making negotiation decisions based on that header is what determines whether you have cryptographic agility.

I'm also not sure I'd say the security community as a whole feels that cryptographic agility is an anti-pattern, there are plenty of vocal voices on all sides of that argument. I personally agree that you gain very little from cryptographic agility but have to pay a fair amount, but I don't think it's as settled as you make it out to be.


> I'm also not sure I'd say the security community as a whole feels that cryptographic agility is an anti-pattern, there are plenty of vocal voices on all sides of that argument. I personally agree that you gain very little from cryptographic agility but have to pay a fair amount, but I don't think it's as settled as you make it out to be.

I totally agree with this assessment, and rereading my comment, I say that as though it’s a lot more settled than it is.

I think it’s probably harmless to tag those sorts of bits of data, but also JWT has shown us that having that metadata in band does lead to people using it for those kinds of decisions whether they’re supposed to or not.


You're proposing to add extra complexity for some hypothetical scenario that is unlikely to happen. That's a bad tradeoff.



The Wikipedia link is very reassuring regarding the security of SHA256. All attacks are very far away from going anywhere close to attacking full SHA256 and no major progress has been made lately.

The password hashing remark is correct, but irrelevant. SHA256 is not a good password hash, because it's not made to be a password hash. Don't use it for passwords.

The weaknesses that broke MD5/SHA1 were known since 1994. No similar weakness is known for SHA256.


I’m not a cryptographer. Why isn’t SHA256 good as a password hash? Can you explain? And what’s a better alternative, and why?


Because SHA256 is fast to compute. This means that an attacker can quickly try many passwords, either from a dictionary of common passwords or simply brute forcing a short password.

Also, if you use SHA256 bare without salt you are vulnerable to precomputed dictionaries. There is a fascinating way to make these dictionaries shorter called Rainbow Tables.

Better key derivation functions take a lot of time and memory to make it costly for the attacker to a lot of guesses at the password and they have built in salt. Examples would be SCrypt or Argon2.


> Why isn’t SHA256 good as a password hash?

Because it wasn't designed for it. For password hashing you want a hash that has a salt (so that the same password on two accounts doesn't have the same hash on the database) and is as slow as possible (that is, fast enough to validate on logins) to increase brute-force time.

Historically Bcrypt was a good option, but I think Argon2[0] is the current best option.

[0] https://en.wikipedia.org/wiki/Argon2


I thought adding a salt to a password before hashing it was standard practice, regardless of the hashing function. At least that's how UNIX derivatives have done it for the past 25+ years.

Point taken about hash calculation speed, though.


To add to what others have already said:

A password hash is a function with 3 inputs: the password, the difficulty factor, and a customization structure (most often just a salt, sometimes more). A cryptographic hash is a function with 1 input: the message.

Password hashing functions have variable performance in time and usually memory and cache use, controlled by the difficulty factor. Cryptographic hashes have fixed performance.

Password hashing functions take a salt and possibly other customization data (a secret "pepper", a fixed domain-separation string if it's also a key derivation function, etc).

It's possible to build password hashes from cryptographic hashes. One must be very careful about encoding the multiple inputs into a single "message" for the cryptographic hash, to avoid cannonicalization attacks.

Argon2id is a good password hashing function.


If you want to be reassured then sure. My take from reading that is in 2016 there was practical collision attacks demonstrated, which would be amplified with specialized hardware (compare to bitcoin mining).

Btw cheers! I recall you from correspondence years ago regarding fuzz testing linux packages.


> My take from reading that is in 2016 there was practical collision attacks demonstrated

No such thing happened.

You're likely referring to this paper quoted in Wikipedia: https://eprint.iacr.org/2016/374.pdf

This is 1. not about SHA256, but about the truncated sha2 algorithms and, more important, 2. it's an attack on reduced-round versions of these algorithms. That's a common thing in cryptoanalysis to do. You're basically saying "we have no way of attacking the full algorithm, so let's build a version which is much less secure and try to attack that". That's a valid thing to do for research purposes, but it needs to be interpreted correctly. "I can break this massively weaker version of the algorithm" is very different from "I have shown a practical attack on the real algorithm".


Absolutely, you are correct - the papers does not describe SHA256 as broken, but it is one step of many on the path for SHA256 to become broken in the future - which is in reply to your earlier comment:

> You're proposing to add extra complexity for some hypothetical scenario that is unlikely to happen


> but it is one step of many on the path for SHA256 to become broken in the future

I disagree. It's evidence that the given path is not fruitful for breaking SHA256 in the future. When numerous researchers worldwide have attacked a function, and only broken X out of N rounds, for X << N, and future research hasn't been able to improve X for years, that's pretty good evidence that the technique used isn't going to continue to apply for more rounds. The existence of a multi-billion dollar bug bounty for breaking SHA256 (Bitcoin) that's gone unclaimed for years is further evidence that it's quite strong.


I do not believe that the practical demonstration of a sha1 collision by google contributed at all to any potential future break of sha256.


unlikely ? all methods are broken eventualy when compute becomes faster


This is not true. Modern cryptography is not breakable via brute force any more, nor will it ever be. You can prove that the amount of energy required assuming a thermodynamically ideal computer is more than will ever be available in our galaxy, with classical computing.

Breaks of symmetric crypto or hashes that are practical require an actual cryptographic weakness, whose existence is not a given.

Then there's quantum computing, but that also doesn't break symmetric crypto outright, it just makes it weaker. I'm not sure if anyone has run the math, but we'd probably still be talking galaxy sized quantum computers to get anywhere.


>Modern cryptography is not breakable via brute force any more

Specifically, modern symmetric crypto (and hashing which is a related but different thing) is fine. It's public-key crypto that has always been the worry thanks to Shor's Algorithm, and all modern widely used crypto there is indeed vulnerable if an actual scalable general quantum computer could be constructed. There are a variety of post-quantum cryptographic algorithms under development including some ones that would be ugly slow but likely effective bandaids if required, but that's still a not totally unreasonable concern for certain threat profiles. But yes GP was totally wrong about "all methods are broken when compute becomes faster".

>I'm not sure if anyone has run the math

"The math" here is just Grover's Algorithm, that lowers the cost of generic brute forcing to O(sqrt(n)). So a 128-bit symmetric key could be cracked on average like 2^64 or a 256-bit key like 2^128. Obviously this is utterly trivially countered by doubling the exponent. Even 2^128 is still ridiculous, and going to a 512-bit key brings us right back to 2^256 which is impossible. 128-bit keys should indeed be phased out entirely (and that seems to be well on the way anyway), and perhaps 256-bit too at some point (even if it's only the very paranoid or very long termers, it's also not big stretch on modern hardware at all, so eh), but fundamentally yes symmetric crypto is fine.


I know about Grover's Algorithm, but the reason I didn't want to make any hard claims is that with hashes you also have to worry about collision attacks. Those give you an O(sqrt(n)) time advantage themselves in classical computing, at the expense of space for storing existing hashes (birthday attack), and you can play with the balance as a space-time tradeoff. I can't claim to have any idea how those interact with quantum crypto, plus once you introduce storage at a large scale I imagine you start running into speed of light and communication energy cost issues.


So one thing im thinking could work is to do a sort of probabilistic birhtday attack. Like you could do a binary search, where in every step you narrow down on the bucket with more data.

So like lets say a year had 256 days and you want to find colliding birthdays. First you would check how many birthdays in 0-127 vs 128-255. This takes just two counters - very little storage. Then lets say the former has more. You'd try 0-63 vs 64-127. Etc.

But yeah this will of course miss collisions. But I think if you have enough children (where enough is still pretty close to sqrt(n)) it should have a good chance of finding a collision.

And subdividing into many buckets at a time will miss less collisions, but require more storage - a tradeoff can be used.


This is why they create laws to require you unlock your phone when crossing borders.

I have been constantly surprised that there isnt a phone/company that specifically markets to "buy this temp phone to cross a border"

--

Also, WRT Quantum computing, what are your thoughts on the talk by D-Wave CEO where he said "we can reach into different dimensions" -- and soon after, D-Wave went super silent?

What is the state of quantum currently - did they discover something super secret?


They? Who, where? Which law?


That is not true. There is nothing provable about hash functions. The best that you can say is that there is no known algorithm that can calculate a pre-image for any given hash function in less than some super linear time. You certainly cannot prove that no such algorithm exists. If you can prove it please publish a paper you will be revolutionize mathematics.


> The best that you can say is that there is no known algorithm that can calculate a pre-image for any given hash function in less than some super linear time.

If your algorithm is not linear in the size of the key/hash space, it's not a brute force algorithm. It's a cryptographic break.

There is no proof that such algorithm does not exist, nor is there any proof that it does. Therefore you can't claim it definitely does. That's my point.


AFAIK, this is more of a belief than a fact. I would be very curious if you would show me a rigorous proof that e.g. SHA256 requires like on the order of 2^256 of operations to find a preimage of a random 256-bit string.


If you have something more clever than 2^256 operations, then that's not brute force.


Do you have a cite for this for SHA256? It surprises me a bit as our galaxy is big and we are far from physically ideal computers.


I did the math myself at one point, but I lost the link. However, here's a reference for the same kind of thing:

https://pthree.org/2016/06/19/the-physics-of-brute-force/

The total energy output of a supernova is enough to count up to 2^220, making a lot of generous (invalid) assumptions. You'd need 2^36 supernovas to just count up to 2^256, never mind actually running SHA256. At minimum.

There are circa 2^38 stars in the Milky Way, and they're not going to all go supernova. Add in the actual cost of compute and it just isn't happening, not in this galaxy.

For 128-bit crypto we can look at a more earthly calculation. Using the same math as that article, but at room temperature, and taking the total solar irradiance on the Earth as an energy source, it would take this long to count up to 128 bits (calculated using Google):

((2^128) * (1.38064852 * ((10^(−16)) (ergs / K))) * (298 K)) / ((1361 (W / (m^2))) * (pi * (radius of Earth^2))) = 8.04911615 seconds

Definitely more on the plausible side, but we're already moving away from 128-bit crypto and there's a staggering number of generous assumptions being made here; we aren't going to be getting thermodynamically ideal computers using a significant fraction of the total solar irradiance on the Earth any time soon.

Just to give you an idea of how far away we are from that, looking at actual SHA256 calculations:

https://www.iea.org/data-and-statistics/charts/efficiency-of...

22222 MH/J is the highest, or 4.5 × 10^-12 joules per hash. That's a factor of 2^30 worse, so if we used the total solar irradiance of the Earth to power Bitcoin miner style ASICs, it would take about 272 years to go through 128 bits' worth of brute forcing something of similar complexity to SHA-256.

The factor is more like 2^36 relative to the first "temperature of outer space" calculation. That happens to be about the number of galaxies in the observable universe, so using current technology, it would take somewhere on the order of all the stars in the observable universe going supernova to power through a single SHA-256 brute force.


A supernova may be an inefficient source of energy, it's probably better to take that material and make dwarf stars from it.


thanks, that's interesting, I didn't know this !


Computers can only become faster within the limits of what humans are able to build and what physics allows.

You're not gonna break SHA256 with faster computers.


I swear there was a pertinent article link that goes through how it's physically impossible to construct a computing device fast enough to break SHA-256, even at the most fundamental scale.


All current methods. Isn’t sha256 an order of magnitude more complex than sha1? It’s not like cryptographers can’t learn from their mistakes.


The magnet URL format is at least using multihash:

> Like the urn:btih: prefix for v1 SHA-1 info-hashes, there’s a new prefix, urn:btmh: for full v2 SHA0256 info hashes. For example, a magnet link thus looks like this:

> magnet:?xt=urn:btmh:<tagged-info-hash>&dn=<name>&tr=<tracker-url>

> The info-hash with the btmh prefix is the v2 info-hash in multi-hash format encoded in hexadecimal.


SHA1 collision was never much of a reason to change it in the first place; there is no practical attack that arises from it.


1) Take a popular game torrent 2) add crypto mining code 3) pad it so the SHA1 stays the same 4) seed it, the magnet links will pick up your version 5) profit!


That's a preimage attack. For example, the entire bitcoin network today needs something on the order of a trillion years for a preimage attack on MD5, multiply by another trillion for a SHA1 preimage attack.


Between 1 and 2, remove the crypto mining code that came with it.


Except that this is not how collision attacks work.


>if SHA256 proves to be insecure

slight pedantism here, but "when" not "if"


There's no indication whatsoever of this happening.

In the time since SHA1 came under attack, a lot more research has happened on hash function security. If this has shown anything, it's that SHA256 is more secure than people used to think.

People seem to have this widely believed idea that "all crypto will be broken eventually". But in pretty much all cases you can trace back that broken crypto was known to be weak for a long time. SHA256 is not known to be weak (with the slight caveat that it doesn't protect against length extension attacks, but this is a known property that matters only in very rare circumstances).


That's not a given. As far as I know, there is no proof of the security or lack thereof of this kind of function.

Increases in computing power alone can't render SHA256 insecure; that can be proven by thermodynamic/size-of-the-universe arguments. You need an actual cryptographic weakness.


> Increases in computing power alone can't render SHA256 insecure; that can be proven by thermodynamic/size-of-the-universe arguments.

Fascinating. Could you please recommend a source where a curious layman could learn more about this?


I posted a link and some further analysis in this comment: https://news.ycombinator.com/item?id=29883965


https://news.ycombinator.com/item?id=24401999 - 2020, 1227 points, 562 comments


It would help if HN would automatically link to previous discussions (next to the title?) if there are any... It seems wasteful having people do it manually


That’s what the “past” button does.


Why are there always people linking to previous discussions then? Including dang himself...


They were doing it for you because you didn't know about that button. But now that you do, they will stop doing it.


Wouldn't it be easier to create an automated comment for everyone else?


Because the past button doesn't actually do that. lol

Edit: Oops. I didn't see there are two "past" buttons. I guess one of them does do that. TIL.


Well, it's certainly handy when using a mobile client that doesn't have a "Past" button, like Materialistic :)


It would need to change color when there are past discussions with more than a couple comments, or something.

The "if there are any" part is more important than the actual links.


i do love that it immediately goes into yet another crypto bashing comment chain while not realising that a crypto company owns bittorrent:

https://torrentfreak.com/bittorrent-inc-confirms-acquisition...

and that after the creator of bittorrent left he created his own cryptocurrency:

https://en.m.wikipedia.org/wiki/Chia_(cryptocurrency)


Reading through the wikipedia page for Chia

> The Chia Network was founded in 2017 by American computer programmer Bram Cohen, the author of the BitTorrent protocol.[4] In China stockpiling ahead of the May 2021 launch led to shortages and an increase in the price of hard disk drives (HDD) and solid-state drives (SSD).[5] Shortages were also reported in Vietnam.[6] Phison, a Taiwanese electronics manufacturing company, estimated that SSD prices will rise by 10% in 2021 as a result of Chia.[7] Hard drive manufacturer Seagate said in May 2021 that the company was experiencing strong orders and that staff were working to "adjust to market demand".[6] In May 2021 Gene Hoffman, the president of Chia Network, admitted that "we’ve kind of destroyed the short-term supply chain" for hard disks.[8] Concerns have also been raised about the mining process also potentially being harmful to drives' lifetime.[9]

Thankfully the coin seems to have collapsed completely since May 2021 (https://www.coinbase.com/price/chia-network)

But you can see why people are skeptical


> Thankfully the coin seems to have collapsed completely since May 2021

That doesn't appear to be the case:

https://www.coingecko.com/en/coins/chia

Currently rank #257 with a market cap of $247 million.

The whole altcoin market has been in limbo/declining since May last year.

Last update was also yesterday: https://www.chia.net/2022/01/10/understanding-the-changes-in...


And it wasn't completely original, as Burst (now Signum) was a cryptocurrency using storage space for mining back in 2014.


That only says something about the company and creator instead.


Does Libtorrent have anything todo with Rainberry fka. Bittorrent Inc.? They seem to be the ones working on the protocol nowadays.


That “crypto bashing comment chain” is just as relevant today as it was when it was posted 16 months ago.

People are still struggling to articulate a legitimate use case for cryptocurrencies.

I made some money mining BTC way back in the day with a GPU. I bought a new computer for $1,900 last year and much more than paid for it mining ETH. I’ve spent years now listening to all the claims from BTC, BCH, BSV and a dozen more coins and web3 proponents. But I’ll be damned if I can articulate a legitimate use for it that is better than what we have today or that doesn’t fail to take human nature into consideration.


> But I’ll be damned if I can articulate a legitimate use for it

What's wrong with DeFi?


The wikipedia page doesn't say - do clients generally support this protocol now? https://en.wikipedia.org/wiki/Comparison_of_BitTorrent_clien...


First bullet on qBittorent's changelog for v4.4.0 is that they do. So I'm sure it's rolling out quickly to others.

v4.4.0 changelog: FEATURE: Support for v2 torrents along with libtorrent 2.0.x support (glassez, Chocobo1) - https://www.qbittorrent.org/news.php


BitTorrent v2 has been officially supported in PicoTorrent since the same date as the featured blog post from libtorrent :)


I just love PicoTorrent! It's exactly what I want from a Torrent client. Thanks!


BiglyBT was the first Torrent client with v2 support

[1] https://torrentfreak.com/biglybt-is-the-first-torrent-client...

[2] https://www.biglybt.com/


Not really true - v0.20 of PicoTorrent [1] was released with BitTorrent v2 support on the same day as libtorrent released support for v2 torrents. I think this predates BiglyBT 2.5 with v2 [2] support with about two weeks.

[1] https://github.com/picotorrent/picotorrent/releases/tag/v0.2...

[2] https://biglybt.tumblr.com/post/629947579144290304/2500-rele...


The per file hash to assist in deduplication is pretty neat. Wonder if it could be taken a step further by chunking files into blocks based on a rolling hash, so that it can dedup portions of a file as well (a la lbfs https://pdos.csail.mit.edu/archive/lbfs/). Or does anyone know if it already does that?


It is highly unlikely that the duplicate portions of the file will have an offset that's a multiple of 2^16 which would be required for chunks to have matching hashes. On the client side you could theoretically run lbfs over your files but on the swarm side this isn't going to happen


> It is highly unlikely that the duplicate portions of the file will have an offset that's a multiple of 2^16 which would be required for chunks to have matching hashes.

That's exactly what chunking based on a rolling hash solves. You set the average size of chunks and the content controls the exact boundaries.


Right, exactly. Chunk boundaries are not determined by fixed size chunks, but rather when the rolling hash matches some prefix, which means chunk sizes will vary but by controlling the prefix can set the average size of the prefix. Besides the lbfs paper, another nice writeup here: https://moinakg.wordpress.com/2013/06/22/high-performance-co...


Rabin fingerprinting can do this I believe.

"the idea is to select blocks not based on a specific offset but rather by some property of the block contents"

https://en.wikipedia.org/wiki/Rabin_fingerprint#Applications


I've seen a similar idea for BitTorrent but I don't think it was implemented


From a technical perspective, I love BitTorrent. I remember reading about it years ago and being amazed at the bandwidth efficiency and robustness it had. It still had some centralization (ie trackers) but they themselves could be duplicated (ie you could set up a tracker for an existing torrent).

The part that surprises me from reading this is the issues raised from moving from SHA1 to SHA256, specifically because that's 20 vs 32 bytes respectively. This has created compatibility issues.

The same thing happened with Git and it's a nightmare.

I honestly don't understand how this happens. We've gone through this many times for many years. Replacing your hash function should be built in from day one and for any project to come along in the 21st century and not do this is really gross negligence.

I really wonder how this happens. Like is it just hubris about how this hash function will be different? Or are engineers just so in love with the optimizations they get to make by assuming, say, 20 byte hashes? I really wish I knew.

Anyway, I'm a little sad BitTorrent only really found traction for pirating media. Whether you approve of that or not, it's still an astounding technical achievement.


if you design a protocol to somehow have "upgradable" hashes, clients still need to be patched to support them, and old clients still will not work. Its the same mistake TLS and IPSEC made, supposedly extensible, but not really because to actually get new features, clients all need to be upgraded, and old incompatible clients still exist, and will never work with whatever new features you want to employ.

You aren't really gaining anything, and you might as well just have new versions of the protocol, and save all the pointless ceremony, complexity and security holes.

There is no way (beyond maybe embedding some kind of executable code in the protocol itself, which seems like an insane idea) to have support from "day one" for algorithms and functionality that don't even exist yet.


BitTorrent isn't particularly efficient; a well-designed CDN beats it every time, and is pretty cheap these days. (Generally, you get random peers, not peers that are particularly close to you. You could try to fix that, but you'd quickly end up with second-order problems around e.g. which peers have which chunks.) Which is precisely why you're not seeing much adoption outside the warez scene—they're the only ones for which that kind of decentralization is more important than reliable, fast downloads.


I think a lot of the answer is that hash functions last a while. SHA1 lasted about 30 years, and SHA2 seems like it will last at least as long. furthermore, even if SHA2 becomes broken, it's hard to imagine needing to move to more than 32 bytes.


The choice to use SHA256 is surprising to me given that there are a number of other hash functions such as the BLAKE family that are explicitly designed to have arbitrary output sizes.


This feature of the (newer?) members of the BLAKE family would not help at all in this case.


I totally agree with this one, it just gained traction because of the pirating scene.


Anyone know if there's a performance increase in this spec? Couldn't find any reference to speed in the article.


Can someone explain how the Merkle-tree concept is useful? I didn't quite catch that part. On the one hand it is says:

> all you need is the root hash of the tree.

but on the other hand:

> the .torrent file must still contain these piece hashes

So what are we saving here? If a piece hash doesn't match the downloaded data, we re-download the piece.


The .torrent file is separate to the torrent metadata which you need to fetch for magnet links. This is mentioned a bit later in that section:

> The .torrent file size is not smaller for a v2 torrent, since it still contains the piece hashes, but the info-dictionary is, which is the part needed for magnet links to start downloading.

Also it seems that it'll be easier to detect peers which are sending bad data and correct it. I guess this is because previously you could fetch a piece from more than one peer but because the hashing was done at the piece level you didn't know which part of the block was wrong (meaning you'd have to redownload the whole thing). But now you can -- with a bit of tracking -- tell which block of a piece was sent from which peer and then redownload it from another (likely blocking the original peer), because you know which block hash failed to validate. (EDIT: This is correct -- [1] explains the issue and how torrent clients have worked around this problem in v1 torrents.)

Merkle trees have some other benefits (you can cache parts of the tree in such a way that repeatedly checksumming data where lots of leaves are unchanged is cheaper, and you can efficiently prove that a given leaf was part of the hash), but I don't know if that's going to be useful for BitTorrent.

[1]: https://blog.libtorrent.org/2011/11/smart-ban/


The main use of a Merkle tree that I know of is the filesystem "scrub" operation in ZFS and BtrFS.

https://en.wikipedia.org/wiki/Merkle_tree


Is there a good book that explains modern bittorrent?


> BitTorrent v2 not only uses a hash tree, but it forms a hash tree for every file in the torrent.

Could this potentially make cracking down on illegal content easier?


Mobile readable version: https://outline.com/ZAAtSk


The next question is

"How long will it take copyright lawyers find out about v2-only torrents?"




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: