More

shachaf · on May 25, 2024

I use signalfd when I can, and this argument doesn't make much sense to me. The main goal of most signals you'd use it with -- like SIGCHLD/SIGPIPE -- is to wake you up and tell you that something happened, not to give you all the information. If you get a SIGCHLD, you can call wait() to get information. If you get SIGPIPE you can poll your file descriptors (though you might be doing that anyway so the signal isn't very useful). For the purpose of waking you up, coalescing isn't a problem -- only missed notifications are a problem, and signalfd handles those correctly.

Of course, for "real" signals that you must handle synchronously, like SIGSEGV, signalfd is less useful and makes less sense (and arguably something closer to Windows's SEH might make more sense). It's an odd historical artifact of Unix that SIGCHLD and SIGSEGV use the same mechanism.

thayne · on May 25, 2024

The point of the article is that signald could and probably should be more useful than it is.

Even if you don't worry about coalescing, it would be a lot more useful if you didn't have to separately set the signal mask, and reset it for child processes.

> If you get a SIGCHLD, you can call wait() to get information

But you don't know how many signals actually happened, so what you actually need to do is call waitpid in non-blocking mode in a loop until you don't get an actual pid back. Or poll pidfd file descriptors.

shachaf · on Aug 25, 2023

The perspective I have in the post is that Multi-Paxos and Raft and so on are still doing pretty much exactly consensus for each log entry, they're just sharing lock IDs/ballots/terms across multiple log entries, and having the leader reuse the first phase -- acquiring the lock, reading the existing state -- across multiple instances of the second phase -- writing with the lock.

I think this is compatible with what you're saying, but maybe makes them seem more similar than treating them as two different problems.

shachaf · on Nov 6, 2022

Hmm, do you mean N+K+1 (to have enough points for both the data and parity shards)? Why isn't N+K sufficient, fitting a polynomial to N points and emitting K more?

corsix · on Nov 7, 2022

Using the notation from the article, N+K is sufficient for RS(N,K). One point of confusion is that different authors use different notation; some use RS(num data shards, num parity shards), some use RS(total num shards, num data shards), and some use RS(total num shards, num parity shards). Per the article, I'll use RS(num data shards, num parity shards).

As for where the +1 comes from, the clue is in the "noting that you shouldn't use the value 0 in the encoding matrix" remark. The TLDR is that the +1 isn't required, and arises from an (incorrect) attempt to fix an incorrect construction. The non-TLDR is rather long. First, we need to move from polynomials (per the article) to matrices (per the quoted remark). For this movement, let F denote the polynomial from the article, then it so happens that F(k+1) can be expressed as a linear combination of F(1), F(2), ..., F(k). Similarly, F(k+2) can be expressed as a linear combination of F(1), F(2), ..., F(k). This continues to be true up to F(k+t) (and beyond). These various linear combinations can be written as a k-by-t matrix, which is what the quoted remark means by "encoding matrix". Second, once thinking with matrices rather than with polynomials, people want to construct the encoding matrix directly, rather than deriving it from a polynomial. In this direct construction, the requirement is that every square submatrix (of any size) of the k-by-t matrix is invertible. Accordingly, no element of the k-by-t matrix can be zero, as the 1-by-1 submatrix containing just that zero element isn't invertible. Third, one common (but incorrect) direct construction is to create a k-by-t Vandermonde matrix. Such a matrix is usually constructed from some number of distinct elements, but if zero is used as such an element, then the resultant matrix will contain zeroes, which is problematic. Excluding zero causes the Vandermonde construction to _sometimes_ work, but it still doesn't _always_ work. Per https://www.corsix.org/content/reed-solomon-for-software-rai..., there's a slightly different Vandermonde construction that _does_ always work, and also a Cauchy construction that always works, both of which work for zero. Both of these correct constructions have a strong parallel to the polynomial construction: they involve choosing N+K distinct field elements, which is akin to choosing the N+K distinct x co-ordinates for the polynomial.

shachaf · on May 16, 2022

I would suggest against relying too closely on this article in its current state. The "precedence climbing" code is needlessly complicated -- it has a nested loop which doesn't do anything, only one loop is necessary -- and it doesn't really explain what's going on (which is a relatively simple idea), just writes some pseudocode and mechanically evaluates it.

shachaf · on July 11, 2020

Laziness isn't the essential thing here -- the article's construction specifically doesn't rely on laziness, and would work in a strict language (almost verbatim -- you might need to write "lambda i: f(i)" instead of "f" in a few places).

I'd say that the thing that makes the article's version work is that you can pass a closure to the predicate, which is a channel for communicating extra information that makes it not quite a black box. In an imperative language, you could imagine passing in a function that mutates some state to get information out (or throws an exception, the way you set it up). The article shows that even that isn't necessary: You can pass in a sequence (represented as a function from index to value) that closes over the predicate itself, and recursively uses it to build a bit sequence that satisfies it, if one exists.

By the way, there's a clearer (I think) version of the core idea here that uses conatural numbers (the naturals + infinity) that people don't usually present, but might be worth writing up.

ridiculous_fish · on July 11, 2020

In the article `p` really does take a list not a function! Haskell lists may be "backed by" functions but these functions must be pure - no back-channels. I'm not sure if this fact illuminates or obscures the point.

Anyways you can't pass a closure to the predicate, and the Haskell version has no "back channels" other than termination. The idea is to arrange the recursive search so that the predicate (not the search) demands the bits. This is what terminates the recursion, and it's hard to arrange without laziness (except by introducing more functions as you say).

shachaf · on July 11, 2020

No, the article's p takes a function:

> type Cantor = Natural -> Bit > (#) :: Bit -> Cantor -> Cantor > x # a = \i -> if i == 0 then x else a(i-1)

(You could represent bit streams with something like data Stream = Cons Bit Stream in Haskell, of course, but that's not what the article does. The code in the article works in e.g. Python almost as-is.)

And you can certainly pass closures to the predicate. Not closing over a mutable variable, of course -- I just mean that the bit stream function itself has to have access to p, which is the trick that makes this possible. If you couldn't do that -- if you could only query it with bit sequences that were defined independently of p -- this wouldn't be possible.

ridiculous_fish · on July 20, 2020

Coming back to this very late!

> No, the article's p takes a function

Yeah it does, but it's just misdirection.

A lot of this theory is gamified "you pick YOUR function, THEN I'll pick MINE..." and the order is very important. Here representing sequences as functions only obscures. The important idea is that the sequence is lazy, BUT predetermined. It can't know what the predicate will be. That's where the black magic comes in.

> You could represent bit streams with something like data Stream = Cons Bit Stream in Haskell

I wish it did, it would be much clearer. The article makes that point itself:

"For the type of infinite sequences, we could use the built-in type of lazy lists..."

which is the point you made (which the article dismisses as insufficiently mathematical).

> If you couldn't do that -- if you could only query it with bit sequences that were defined independently of p -- this wouldn't be possible.

No, there's the rub! The bit sequence and `p` are independent. You choose your bit sequence, I choose `p`, and somehow execution finishes. That's exactly what makes this "seemingly impossible."

There is a back-door dependence where `p` forces a lazy sequence to materialize bits, but nothing more. The sequence cannot introspect `p`, and vice-versa.

I think this my third time encountering this post and it took me a few tries to get it!

shachaf · on July 24, 2020

I didn't see this reply until today either.

On laziness, I only mean that the language feature isn't important. You can certainly represent infinite data using laziness (and you must represent it as some form of codata, such as a function), but it's not really the crucial thing here.

Here's a simpler version of seemingly impossible programs, which is better for explaining what I mean: Instead of bit streams, use conatural numbers, i.e. the natural numbers + infinity. You can define these in various ways (e.g. monotonic bit streams), but we'll use a regular Haskell type:

  > data Conat = Z | S Conat

Ignoring bottoms (which we won't permit), this has all the natural numbers as inhabitants, but it also has infinity:

  > inf :: Conat
  > inf = S inf

Note that, if I give you a conatural, all you can do with it is peel off successors; you can't distinguish infinity from "a really big number", extensionally.

A predicate on conaturals is a total function that takes a conatural and returns a boolean. As in the bit stream case, totality is a very strong requirement. For example, "is n even" isn't a valid predicate, because the way to check for evenness is to peel off pairs of successors until you reach either 0 or 1, but that doesn't halt on infinity. In particular, to be total, every predicate must give up eventually: After seeing k successors (for some unknown k), it has to give up and return true or false. If there was no such k, the predicate wouldn't terminate on infinity.

Now you can ask: Given a predicate, how can I decide whether there's any input that it holds for? And this is "seemingly impossible": You can test the predicate on infinity, and, say, on every finite number up to some N, but that doesn't give you any guarantee about the numbers you haven't tested. If all you can do is query p with any finite set of Conats -- even interactively, deciding what to ask it based on its previous responses! -- you don't get enough information to decide if the predicate is satisfiable.

The trick is to construct a special conatural which is defined in terms of p itself. If you think of this more imperatively, you can imagine a program that queries the predicate about increasingly larger natural numbers -- 0, 1, 2, 3, ... -- and prints out an "S" each time the predicate returns false, and a "Z" when the predicate returns true. If the predicate is false for all inputs, it prints out "S"s forever -- but that's still a valid (productive) conatural number.

This conatural is -- by construction -- the smallest input that satisfies p, if there is one, and infinity if there isn't. Now you can apply p to this number, and, if there's any input that satisfies p, p will return true.

The point I was making above is that it's crucial for this conatural to be able to call p. It doesn't introspect p, in the sense of being able to access its source code, but it's not independent either. And, although laziness expresses this elegantly, I wouldn't really say it's the core of the trick here. You could express the same thing using Turing machines that write out "S"s to their output tape, for example (as long as you don't require them to terminate overall, only to be productive). You do need to guarantee that the "conatural" Turing machine that you give to p has access to p (either by embedding its source code or by being able to call it as an oracle).

Here's the implementation in Haskell (which is simpler than the bit stream case given in the article, and doesn't need mutual recursion):

  > -- find p returns a conatural that satisfies p, if one exists.
  > -- (If one doesn't exist, it returns infinity.)
  > find :: (Conat -> Bool) -> Conat
  > find p = if p Z then Z else S (find (p . S))

  > -- Does any conatural satisfy the predicate?
  > exists :: (Conat -> Bool) -> Bool
  > exists p = p (find p)

shachaf · on Nov 4, 2017

Is there any similar tool for string diagrams? They can be very expressive on paper, but it would be much more convenient to manipulate them on a computer.

shachaf · on Jan 20, 2014

The last part is backwards -- when F is a functor, the only law we need to check is the identity law, but checking the composition law isn't enough. For example,

  fmap _ _ = []

satisfies fmap f . fmap g = fmap (f . g), but not fmap id = id.

tel · on Jan 20, 2014

Whoops, that's correct.

shachaf · on May 11, 2013

On the "Random Thought": The GIF doesn't need to be animated. A GIF is made of multiple frames and each frame contains multiple image blocks. Each image block has its own coördinates and color table. You should be to use multiple image blocks in one frame (GCE block) to get one frame with 24-bit color.

For historical reasons most browsers don't respect a frame delay of 0 (computers used to be slow enough that even a 0 delay was good enough for animation; as they got faster browsers added extra delays to make old animations work correctly), but I think some renderers will correctly treat it as one frame (at least http://slbkbs.org/jsgif/ does! It has other problems with tricky GIFs, though -- the disposal method code is broken, for one).

Of course, you should just use PNG.

rcxdude · on May 11, 2013

There's an example of a true-colour GIF on this page:

http://phil.ipal.org/tc.html

Smerity · on May 11, 2013

Thanks for pointing that example out! I should have searched when I wrote about it, but it just tangentially dawned on me that true-colour GIFs were possible as I was writing the article.

bluedino · on May 11, 2013

If I drag that true color GIF in Safari, it doesn't do the drag with transparency like the other two. Weird.

est · on May 11, 2013

I always wondered if there's efficient GIF compress trick for Cinemagraphs. Looks like GCE is promising.

est · on May 11, 2013

I always wondered if there's efficient GIF compress trick for Cinemagraphs

shachaf · on April 8, 2012

That's what Hoogle (http://www.haskell.org/hoogle/) and Hayoo! (http://holumbus.fh-wedel.de/hayoo/hayoo.html) are for.

shachaf · on Dec 9, 2011

It's possible to make animated GIFs that use more than 256 colors per frame (although they'll probably be bigger than a format designed for that sort of thing).

fennecfoxen · on Dec 10, 2011

Technically, I believe, you're still limited to 255 colors per frame- you can just put two frames with two different sets of 255 colors on top of each other with 0 delay. :)

shachaf · on Dec 10, 2011

Not quite: Each image descriptor has its own Local Color Table, but the delays aren't set by the image descriptor, they're set by the Graphics Control Extension block, and you can have multiple image descriptors per GCE block. So the GIF can look something like: HDR(GCT) GCE(delay10) IMG(LCT) IMG(LCT) IMG(LCT) GCE(delay10) IMG(LCT) IMG(LCT) IMG(LCT) ..., with several image descriptors per frame, each with its own palette.

However, due to historical reasons (GIFs optimized for old browsers on slow computers), modern browsers sometimes insert a small delay between image descriptors even if they're in the same frame. I generally consider this a bug. However, there are many broken GIFs out there...