Hacker Newsnew | past | comments | ask | show | jobs | submit | more pornel's commentslogin

There's address sanitizer, and languages with garbage collectors and runtime bounds checks. There are WASM VMs, and even RLBox that translates WASM back to C that checks its own pointers at run time.

The difficulty is shifting most of these checks to compile time. Proving things at compile time is the holy grail, because instead of paying run-time cost only to make the program crash sooner, you can catch the violations before they even make it into the program, not pay any run-time cost, and provably not have such crashes either.

But that needs reliable static analysis, and C++ doesn't have enough guarantees and information in its type system to make that possible with a high degree of accuracy in non-trivial cases. This is not a matter of writing a smarter tool.

Statically tracking how pointers are used quickly ends up infeasible: every if/else doubles the state space, loops can mix the state in ways that makes symbolic reasoning provably impossible (undecidability), pointer aliasing creates lots of nearly-useless edge cases, and everything done through "escaping" pointers adds the state of the whole program to every individual state analysed, quickly reaching limits of what can be proven. For example, if use of a pointer depends on obj->isEnabled, now you have to trace back all paths that lead to getting this obj instance, and all the code paths that could modify the flag, and cross-reference them to know if this particular obj could have this flag set at this point in time... which can be infeasible. Everything ends up depending on everything, and if you give up and mark it as "unknown", it spreads like NaNs making the rest of the analysis also unknown, and you can't prove safety of anything that is complex enough to need such proof.

Rust and Circle/Safe C++ solve this problem by banning all cases that are hard for static analysis (no temporary pointers in globals, no mutable aliasing, no pointer arithmetic without checkable length, strict immutability, and single ownership and lifetime of memory is baked into the static type system, rather than a dynamic property that needs to be inferred through analysis of the program's behavior). This isn't some magic that can be sprinkled onto a codebase. The limitations are significant, and require particular architectures and coding patterns that are compatible with them. Nobody wants to rewrite all the existing C++ code, and that applies to not wanting to rewrite for Profiles too. I don't see how C++ can have that cake and eat it too.


TIL, the HTTP RFC explicitly allows range end to exceed the length of the content:

https://www.rfc-editor.org/rfc/rfc9110#name-byte-ranges


Is there a mainstream language where this still holds true?

From what I've seen most languages don't want to have a Turing complete type system, but end up with one anyway. It doesn't take much, so it's easy to end up with it accidentally and/or by adding conveniences that don't seem programmable, e.g. associated types and type equality.


pretty sure the C type system is not turing complete, but that doesn't necessarily make it superior


> exceedingly rare

To have a mere one in a billion chance of getting a SHA-256 collision, you'd need to spend 160 million times more energy than the total annual energy production on our planet (and that's assuming our best bitcoin mining efficiency, actual file hashing needs way more energy).

The probability of a collision is so astronomically small, that if your computer ever observed a SHA-256 collision, it would certainly be due to a CPU or RAM failure (bit flips are within range of probabilities that actually happen).


You know, I've been hearing people warn of handling potential collisions for years and knew the odds were negligible, but never really delved into it in any practical sense.

Context is everything.


Reading just the first byte is probably wasting a read of the whole block.

Hashing the whole file after that is wasteful. You need to read (and hash) only as much as needed to demonstrate uniqueness of the file in the set.

The tree concept can be extended to every byte in the file:

https://github.com/kornelski/dupe-krill?tab=readme-ov-file#n...


Yeah, there is definitely some merit to more efficient hashing. Trees with a lot of duplicates require a lot of hashing, but hashing the entire file would be required regardless of whether partial hashes or done or not.

I have one data set where `dedup` was 40% faster than `dupe-krill` and another where `dupe-drill` was 45% faster than `dedup`.

`dupe-krill` uses blake3, which last I checked, was not hardware accelerated on M series processors. What's interesting is that because of hardware acceleration, `dedup` is mostly CPU-idle, waiting on the hash calculation, while `dupe-krill` is maxing out 3 cores.

Thanks for the link!


Giving good feedback about Rust<>C bindings requires knowing Rust well. It needs deep technical understanding of Rust's safety requirements, as well as a sense of Rust's idioms and design patterns.

C maintainers who don't care about Rust may have opinions about the Rust API, but that's not the same thing :)

There are definitely things that can be done in C to make Rust's side easier, and it'd be much easier to communicate if the C API maintainer knew Rust, but it's not necessary. Rust exists in a world of C APIs, none of which were designed for Rust.

The Rust folks can translate their requirements to C terms. The C API needs to have documented memory management and thread safety requirements, but that can be in any language.


Rust already has several server frameworks that are relatively low-level network plumbing, and leave figuring out everything else to the user. If that's what you like, you can pick and choose from all the existing tools.

The Rust's ecosystem is now missing its Rails or Django.

This is an attempt to make something for those "lazy" devs who don't want to write their own cookie parsing middleware, and figure out how to get a database connection pool working with a request router.


> Rust already has several server frameworks

The incredible proliferation of Rust web frameworks should be an almost blinding beacon advertising how well-suited Rust is for web backend development.

The biggest takeaway that anyone new to Rust or new to Rust-on-backend should have: Rust absolutely rocks for backend development. It's getting a tremendous amount of attention, people are trying a lot of things, and it's crystalizing as a major backend powerhouse.

You can be just as performant in Rust as you can in Go, or frankly, Python, and the result is super typesafe, super ergonomic, and blindingly fast. Google recently published a paper that said as much.

Rust already has several Python Flask equivalents (Actix/Axum), and it's waiting on its Rails/Django framework.

For anyone scared of Rust or the borrow checker: due to the nature of HTTP services and request flow logic, you almost never bump into it when writing backend Rust. But if you ever need to write anything with multiple hand-rolled threads or worker pools, you can. Rust opens up a lot of interesting possibilities, such as rich in-memory databases. But you certainly don't have to use these powers either if you don't need them.


> For anyone scared of Rust or the borrow checker: due to the nature of HTTP services and request flow logic, you almost never bump into it when writing backend Rust.

I’d say for anyone worrying about it just use ‘clone()’ everywhere you can. If you’re coming from any interpreted language the performance and efficiency will just be so much better that it doesn’t matter.


That's an excellent way to get your footing. And you can come back in a month and fix it all easily.


clone(); )?; all that stuff is just meh.


I mean, who thinks using .clone() everywhere is such a good idea?


It's a suggestion for beginners writing their first Rust program. You wouldn't do this once you feel comfortable with the language.


They might end up with a bad habit, however.


There’s https://loco.rs/ if you like that sort of Rails experience. Personally I’ve grown more fond of having little cruft in my apps, being “lazy” about what goes into the code isn’t right to me and many of these frameworks don’t really care about that. To me most of the value in these opinionated frameworks is in the scaffolding anyway, not the opinions.


It's not hard to just call C. Rust supports C ABI and there's tooling for converting between C headers and Rust interfaces.

The challenging part is making a higher-level "safe" Rust API around the C API. Safe in the sense that it fully uses Rust's type system, lifetimes, destructors, etc. to uphold the safety guarantees that Rust gives and make it hard to misuse the API.

But the objections about Rust in the kernel weren't really about the difficulty of writing the Rust code, but more broadly about having Rust there at all.


At this point it's too early to even worry about correctness, it doesn't work yet.

But the years of work put into the existing project to make it robust don't mean the exact same years have to be spent on the reimplementation:

- there's been work spent on discovering the right architecture and evolving the db format. A new impl can copy the end result.

- hard lessons have been learned about dealing with bad disks, filesystems, fsync, flaky locks, etc. A new impl can learn from the solutions without having to rediscover them the hard way.

- C projects spend some time on compatibility with C compilers, OSes, and tweaking build scripts, which are relatively easy in Rust.

Testing will need a clever solution. Maybe they'll buy access to the official test suite? Maybe they'll use the original SQLite to fuzz and compare results?


The Limbo team seems to be leaning heavily into deterministic simulation testing (DST) and one of the cofounders on a recent podcast was very enthusiastic about the benefits of the approach.

https://github.com/tursodatabase/limbo/tree/main/simulator

https://changelog.com/podcast/626


I wonder why timelines aren't implemented as a hybrid gather-scatter choosing strategy depending on account popularity (a combination of fan-out to followers and a lazy fetch of popular followed accounts when follower's timeline is served).

When you have a celebrity account, instead of fanning out every message to millions of followers' timelines, it would be cheaper to do nothing when the celebrity posts, and later when serving each follower's timeline, fetch the celebrity's posts and merge them into the timeline. When millions of followers do that, it will be cheap read-only fetch from a hot cache.


This is probably what we'll end up with in the long-run. Things have been fast enough without it (aside from this issue) but there's a lot of low-hanging fruit for Timelines architecture updates. We're spread pretty thin from a engineering-hours standpoint atm so there's a lot of intense prioritization going on.


Just to be clear, you are a Bluesky engineer, right?

off-topic: how has been dealing with the influx of new users after X political/legals problems aftermath? Did you see an increase in toxicity around the network? And how has you (Bluesky moderation) dealing with it.


[flagged]


There's nothing wrong with being partisan if you're partisan against fascists who want to destroy democracy and the rule of law.


I understand why some people vote for some parties and why they’re “voting on inflation” or “right to abortion” but I guess, for me, keeping checks and balances and democracy is the one value above ALL for me.

In the span of human history, not a lot of countries and civilizations have lasted long, marked by constant instability and uncertainty for the future. We have a boring and imperfect political system created by our founding fathers but at least it’s been stable for nearly 250 years. A lot of people have tried standing up their own political system… most fail and everyone suffers. Even the founding fathers completely failed once first.

I know times are tough now but, in the context of history, they can be much worse and I rather not lose what good we currently do have.


> we got 250 years so far without imploding

We may have arguably recovered from it, but we rather famously did not get 250 years without the union violently fragmenting. (Our best record on that is right around 160, currently.)


While it’s true we came close during Civil War, we still decided to keep the same system of government. In the end, while the Civil War did result in some constitutional crises, the root of the problem was more that one half of the country completely disagreed with the other half… I don’t think any political system can really work with that level of division and yet we kept the same one. Obviously the Civil War did very much bring into the question of states’ rights but, for better or worse, the founders were a little vague on that so we can still keep most of the same system and quabble over the details for the rest of eternity…


Trump refusing to accept the 2020 election results should've been the line for many voters, but sadly it wasn't. And the potential crimes he and some of his allies may have committed while trying to overturn it will now never be prosecuted.


2024: > More than 155 million people cast ballots in the 2024 presidential election. It's second only in U.S. history to the 2020 election. Turnout in 2024 represented 63.9% of eligible voters, the second-highest percentage in the last 100 years, according to the University of Florida Election Lab. The only year that beat it – again – was 2020 when universal mail-in voting was more widely available.

2020: > More than 158 million votes were cast in the election

So 3 millions of Democrats suddenly decided to not go out to vote "to save democracy" against "fascism"?


> The only year that beat it – again – was 2020 when universal mail-in voting was more widely available.

You answered your own question. Voting was made more difficult in 2024, so fewer votes were cast.


The simpler and much more likely answer, my friend, is that people didn’t vote from a combination of disillusionment, assuming Kamala would win, and likewise factors.

I saw many people close to me not bother voting because they didn’t enjoy Biden’s presidency, despite voting for him in 2020.

So, I find that FAR more likely as a reason than supposed election fraud.


I'm really confused how tech people shifted from "voting machines are inherently insecure" to simply ignoring the issue despite many political connections between Democrats and voting machine vendors. I'll stick with the results of my research into the matter. If you think you're well enough informed and that your sources actually care about the truth, let's agree to disagree.


This is one of the most investigated issues in American legal history. There was absolutely no indication of fraud. You've fallen for a conspiracy theory. It's now Pizzagate-tier.

(I still argue with Pizzagate adherents on a monthly basis. They think it's perfectly logical.)


Oh fully agreed. But there's a large contingent of folks that are well represented here who think that it's inherently more intelligent to act like/be a centrist, that "both sides have something to offer," which isn't strictly untrue, but in practice especially with American politics just results in mealy-mouthed acceptance of pretty brutal status quos.

Like even left and right in terms of the mainstream here is nonsense. We don't have a left party at all, we have a conservative party, and we have an authoritarian fascist party. As a lefty none of my values are represented at all, I just get to vote each election for the conservative party that doesn't want my friends dead.


Yup. This is a well-tread philosophical problem: the Paradox of Tolerance. Greater minds have concluded "to protect tolerance, one has to be intolerant of intolerance."

And, as always, bsky is a place of business - it is not a public venue. They can decide not to admit individuals who would threaten their business.


I have heard it much more aptly described as “enforcing the social contract”.

You agree to uphold the contract of tolerance with everyone that participates. If someone refuses to uphold the contract with others who do, then you have no obligation to uphold the contract with that individual.


Exactly. Tolerance is an opt-in protection. If you don't opt-in by exercising it yourself, you don't get the benefits.

Or, as a meme: YA_GOTTA_GIVE!.gif


I like that, it's less paradoxical, and likely easier to explain to people with less developed critical thinking skills.


Funny how you call trump administration fascist. (theoretically its anti fascist but its still bad ,

Taking from the description of the video since this was what immediately ringed when you said trump===fascism

The liberal theory of the rise of Trumpism and its supposed fascistic features is inadequate in both effectively analysing and offering solutions to the present situation. Liberals often personalise or individualise people like Donald Trump and Elon Musk, casting them as deviations, as opposed to manifestations of class society. Class analysis suggests that fascism was a unique response to growing anti-capitalist organisations, socialist and/or anarchist, gaining prominence and posing threats to the economic base. The owning class required a mass movement which enveloped otherwise disillusioned people into a political project which had the collectivist, anti-free market appeal that socialist and anarchist organisations had, but nonetheless committed to solidifying and strengthening the economic base and profit motive. In modern America, no such anti-capitalist threat exists. Neoliberalism has created significant disillusionment with mainstream social and political institutions and systems, but this disillusionment hasn’t been captured by anti-capitalist forces, but rather by the populist right. As such, the populist right doesn’t need to give up the economic game, i.e. free markets, deregulation, privatisation, austerity, etc (with the exception of tariffs), but can purely rely on minorities as scapegoats in a constructed culture war, such as immigrants, ‘wokeness’, transgender people, etc. Therefore, capital doesn’t need to be subordinated to the nation-state, like pursued by contemporary fascist governments. Rather, in this ‘inverted’ fascism, capital takes over and exploits the state in a rather oligarchic manner.

https://www.youtube.com/watch?v=pqdLwkyfLdM

This video is really great , I spent 10 minutes looking for this.

I am not a trump supporter , The title might be a little clickbaity (basically the opposite of what it really is) You might find it really great.

It is one of the best videos I have ever watched on politics.


I find communist analysis tiresome, especially when in this case the populist right under Trump seems to be motivated in part by anti-free market ideas. The communist kneejerk reaction to every single situation is "this can be explained by class analysis". It's them trying to shoehorn their pet theory into everything.


You're not less partisan if you prefer a slimmer range of political leanings.



That's insightful. Keep up the good work!


At some point they'll end up just doing the Bieber rack [1]. It's when a shard becomes so hot that it just has to be its own thing entirely.

[1] - https://www.themarysue.com/twitter-justin-bieber-servers/

@bluesky devs, don't feel ashamed for doing this. It's exactly how to scale these kinds of extreme cases.


I've stood up machines for this before I did not know they had a name, and I worked at the mouse company and my parking spot was two over from a J. Beibe'rs spot.

So now we have Slashdot effect, HN hug, and its not Clarkson its... Stephen Fry effect? Maybe can be Cross-Discipline - there's a term for when lots of UK turns their kettles on at the same time.

I should make a blog post to record all the ones I can remember.


TV Pickup aka the Half Time Kettle Effect.

https://en.wikipedia.org/wiki/TV_pickup


We never actually had a literal “Bieber Box”, but the joke took off.

Hot shards were definitely an issue, though.


Given that BlueSky is funded by Twitter, I'm assuming they know a lot more than us on how Twitter architects systems.


Its so crazy.

Thanks a lot for sharing this link.


This problem is discussed in the beginning of the Designing Data-Intensive Applications book. It's worth a read!


Do you know the name of the problem or strategy used for solving the problem? I'd be interested in looking it up!

I own DDIA but after a few chapters of how database work behind the scenes, I begin to fall asleep. I have trouble understanding how to apply the knowledge to my work but this seems like a useful thing with a more clear application.


Yes, we used the Yahoo! “Feeding Frenzy” paper as the basis for the design of Haplocheirus (the timeline service).


> and later when serving each follower's timeline, fetch the celebrity's posts and merge them into the timeline

I think then you still have the 'weird user who follows hundreds of thousands of people' problem, just at read time instead of write time. It's unclear that this is _better_, though, yeah, caching might help. But if you follow every celeb on Bluesky (and I guarantee you this user exists) you'd be looking at fetching and merging _thousands_ of timelines (again, I suppose you could just throw up your hands and say "not doing that", and just skip most or all of the celebs for problem users).

Given the nature of the service, making read predictably cheap and writes potentially expensive (which seems to be the way they've gone) seems like a defensible practice.


> I suppose you could just throw up your hands and say "not doing that", and just skip most or all of the celebs for problem users

Random sampling? It's not as though the user needs thousands of posts returned for a single fetch. Scrolling down and seeing some stuff that's not in chronological order seems like an acceptable tradeoff.


You might mix the approaches based on some cut off point


Why do they "insert" even non-celebrity posts into each follower's timeline? That is not intuitive to me.


To serve a user timeline in single-digit milliseconds, it is not practical for a data store to load each item in a different place. Even with an index, the index itself can be contiguous in disk, but the payload is scattered all over the place if you keep it in a single large table.

Instead, you can drastically speed up performance if you are able to store data for each timeline somewhat contiguously on disk.


Think of it as pre-rendering. Of pre-rendering and JIT collecting, pre-rendering means more work but it's async, and it means the timeline is ready whenever a user requests it, to give a fast user experience.

(Although I don't understand the "non-celebrity" part of your comment -- the timeline contains (pointers to) posts from whoever someone follows, and doesn't care who those people are.)


Perhaps I misunderstanding, I thought the actual content of each tweet was being duplicated to every single timeline who followed the author, which sounded extremely wasteful, especially in the case of someone who has 200 million followers.


From the linked article: "Additionally, a reference to your post is 'fanned out' to your followers so they can see it in their Timelines."

So not the content, just a sort of link to it.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: