aoli-al's comments

aoli-al · 2025-06-08T19:44:01 1749411841

Yes, Fray controls all application threads so it runs one test per JVM. But you can always use multiple JVMs run multiple tests[1].

Fray currently does not support virtual threads. We do have an open issue tracking it, but it is low priority.

[1]: https://docs.gradle.org/current/userguide/java_testing.html#...

aoli-al · 2025-06-08T10:42:24 1749379344

The "randomness" comes from Kotlin coroutines and user-space scheduling. For example, Kotlin runs multiple user-space threads on the same physical thread. Fray only reschedules physical threads. So when testing applications use coroutine/virtual threads, Fray cannot generate certain thread interleavings. Also, It cannot deterministically replay because the thread execution is no longer controlled by Fray.

In our paper, we found that Fray suffers from false negatives because of this missing feature. Lincheck supports Kotlin coroutines so it finds one more bug than Fray in LC-Bench.

We didn't make any claims about false positives in Lincheck.

delusional · 2025-06-08T15:55:23 1749398123

> We didn't make any claims about false positives in Lincheck.

To be clear, I made that claim :) I agree that the paper makes no such claim.

aoli-al · 2025-06-07T19:56:41 1749326201

Thanks for pointing it out! Just did a quick fix using Claude :)

malcolmgreaves · 2025-06-07T20:21:28 1749327688

On mobile (Safari), the lines in the code blocks have different font sizes. They also have different fonts. Some are like 3-4x the size of other lines. No idea what could be going wrong, but it does unfortunately make the code blocks difficult to follow along.

aoli-al · 2025-06-07T20:58:57 1749329937

should be fixed as well :)

NooneAtAll3 · 2025-06-07T21:31:08 1749331868

any chance you can make light/dark mode switch a UI button?

masklinn · 2025-06-08T05:36:45 1749361005

On desktop I’d suggest installing an extension that adds a toggle (they exist for Firefox and chrome at least): adding a toggle manually is a bit of a chore, especially if the css system you use does not build that in.

aoli-al · 2025-05-30T15:05:14 1748617514

https://github.com/cmu-pasta/fray

Is a concurrency testing framework for Java. It also does deterministic simulation.

aoli-al · 2025-03-06T19:06:39 1741287999

https://github.com/cmu-pasta/fray

Fray is a controlled concurrency testing tool for the JVM that supports record and replay. It could be a perfect backend for codetracer. (I'm the author of Fray)

aoli-al · 2025-03-03T20:43:09 1741034589

I'm the author of the post.

I'm not sure Chrome's current caching behavior is helpful because the second response does not indicate which part of the data is returned. So, the application has no choice but to discard the data.

But thank you for your comments. This helped me to crystalize why I think this is a bug.

mananaysiempre · 2025-03-03T21:00:40 1741035640

Yeah, if there's no way to tell from the request which range has actually been returned that seems like a deal-breaker. The spec’s allowance for a partial response is explicitly motivated by the response being self-describing, and if after Chrome’s creative reinterpretation it is not, then it’s not clear what the client could even do.

ajross · 2025-03-03T21:45:37 1741038337

There's no clear way to define "correct" in this case regardless. The whole premise behind a range request is that the data is immutable (because otherwise it wouldn't make sense to be able to fetch it piecewise), and it's mutating here by disappearing! What are you supposed to do, really? The answer is always going to be app-dependent, the browser can't get it right because the server is being obtuse and confusing.

When we handle this in the hardware world it's via algorithms that know about the mutability of the cached data and operate on top of primitives like "flush" and "invalidate" that can restore the inconsistent memory system to a known state. HTTP didn't spec that stuff, but the closest analog is "fetch it again", which is exactly what the suggested workaround is in the bug.

aoli-al · 2025-02-22T22:29:56 1740263396

Using Fray does not require knowledge about "deterministic testing" or "controlled concurrency." This is one of its goals: developers write normal concurrency tests, and Fray controls the execution behind the scenes.

In fact, when we evaluate Fray, we collect all existing concurrency tests from Lucene, Kafka, and Guava, and running them under different thread inter-leavings can already reveal so many bugs. [1]

[1]: https://github.com/cmu-pasta/fray/blob/main/docs/bugs.md

vlovich123 · 2025-02-22T22:56:54 1740265014

Writing good “normal” concurrency tests is hard is what I’m saying. I get that it slots in well with existing tests that are already written.

aoli-al · 2025-02-22T22:25:58 1740263158

Fray does not know if a program is free of data races. Even if there are data races in a program, Fray can still find bugs, but this violates the soundness guarantee, so Fray may miss data race bugs.

aoli-al · 2025-02-22T17:21:43 1740244903

I’m the author of Fray, a concurrency testing framework for the JVM, and I’m excited to finally share what I’ve been building over the past few years!

Fray[1] is a concurrency testing tool for Java that can help you find and debug tricky race conditions that manifest as assertion violations, run-time exceptions, or deadlocks. I’d love to hear your thoughts—feel free to ask me anything! And if you’re curious, give Fray a try.

[1]: https://github.com/cmu-pasta/fray

_benedict · 2025-02-22T18:12:13 1740247933

We have something very similar[1] we use in the Apache Cassandra project to test complex cluster behaviours.

We appear to use exactly the same basic technique, using byte weaving to intercept concurrency primitives such as synchronized, LockSupport etc to pause the system thread and run them on some schedule.

We only currently run (deterministic) probabilistic traces though, we can’t search the interleaving space. But the traces for a whole cluster are extremely complex and probably unsearchable.

I have been meaning to publish it for broader consumption for years now, but there’s always something more important to do. It’s great to see some dedicated efforts in this space.

[1] https://github.com/apache/cassandra/tree/trunk/test/simulato...

aoli-al · 2025-02-22T20:05:45 1740254745

This looks super cool!

It seems that all controlled threads are wrapped with `InterceptibleThread` in the Cassandra simulator. Does this work for ThreadPools (e.g., ForkJoinPool) as well? We had a hard time intercepting thread objects because they are used by the language runtime (e.g., GC threads) as well and we don’t want to interfere with them. Additionally, modifying application code just track thread creation isn’t ideal. To work around this, we came up with this combination of JVMTi and Java Agent solution and we use JVMTi to monitor thread creation and termination.

As for searching schedules, yes, it is hard to search all possible schedules. However, it turns out many searching algorithms such as probabilistic concurrency testing[1] or partial order sampling[2] are still better than random walk. So it is worth to give them a try.

[1] https://www.microsoft.com/en-us/research/wp-content/uploads/... [2] https://www.cs.columbia.edu/~junfeng/papers/pos-cav18.pdf

_benedict · 2025-02-22T20:48:26 1740257306

We do currently require all threads to be created by one of our own factories, but that's primarily because this grew out of a non-byte weaving approach (where we explicitly replaced our concurrency primitives). Looking at the class now, all of its state could easily be stashed in either global or ThreadLocal variables, so I don't see anything that would stop us working with FJP etc.

> Additionally, modifying application code just track thread creation isn’t ideal.

This would certainly be necessary, but don't you anyway need to rewrite the application to trap synchronised, volatile, atomic accesses etc? It doesn't seem all that different to rewrite calls to Thread::start. The issue of JVM threads is perhaps a little trickier, but I am not averse to some ugly integrations. Just take a look at how we make RNGs deterministic

> So it is worth to give them a try.

Thanks for the tips! I am not sure when I will have time to apply these techniques to our simulator, but they are no doubt valuable for the protocol simulations I am relying on today, so maybe I will have a justification to explore them sometime soon.

Really cool work too. I hope it manages to make its way into more hands, so that this technique can be used more widely.

vlovich123 · 2025-02-22T22:00:52 1740261652

How does this compare with a generic tool like Antithesis? I recognize closed source money vs open source free but from a feature perspective would Antithesis be more effective at finding the issues since it’s not limited to stuff happening in the JVM / can test concurrency of more complicated network topologies between components?

aoli-al · 2025-02-22T22:36:05 1740263765

AFAIK, Antithesis uses a hypervisor to achieve deterministic execution. This can be less effective because the hypervisor does not have language semantics and faces a larger search space. You may check Figures 5 and 6 in our technical report[1], where we compare Fray against RR, a record and replay tool that can also be used for concurrency testing at OS level[2].

[1]: https://arxiv.org/pdf/2501.12618

[2]: https://robert.ocallahan.org/2016/02/introducing-rr-chaos-mo...

vlovich123 · 2025-02-23T17:29:44 1740331784

Antithesis is supposed to be quite a bit faster than rr chaos precisely because it’s a hypervisor vs rr which is trying to intercept syscalls which is notoriously slow, so comparing against rr per second feels like a bad proxy.

Unlike rr chaos, which I believe uses a random search without any knowledge of past runs, Antithesis is supposed to do a more targetted search through the orderings with understanding of history between runs, so the executions needed per bug similarly has rr as a bad proxy.

I’m also not sure I see how language semantics can be exploited when you’re interleaving based on different thread orderings. If I understand it correctly, Fray is also slightly more limited than something like Antithesis which can also test I/O failures and different I/O orderings in a distributed setting as well.