There already was such a system with more concrete requirements. It is called the EB5 visa and has a path to green card. What does this new method bring to the table?
Processes can die independently so the state of a concurrent shared memory data structure when a process dies while modifying this under a lock can be difficult to manage. Postgres which uses shared memory data structures can sometimes need to kill all its backend processes because it cannot fully recover from such a state.
In contrast, no one thinks about what happens if a thread dies independently because the failure mode is joint.
> In contrast, no one thinks about what happens if a thread dies independently because the failure mode is joint.
In Rust if a thread holding a mutex dies the mutex becomes poisoned, and trying to acquire it leads to an error that has to be handled. As a consequence every rust developer that touches a mutex has to think about that failure mode. Even if in 95% of cases the best answer is "let's exit when that happens".
The operating system tends to treat your whole process as one and shot down everything or nothing. But a thread can still crash in its own due to unhandled oom, assertion failures or any number of other issues
> But a thread can still crash in its own due to unhandled oom, assertion failures or any number of other issues
That's not really true on POSIX. Unless you're doing nutty things with clone(), or you actually have explicit code that calls pthread_exit() or gettid()/pthread_kill(), the whole process is always going to die at the same time.
POSIX signal dispositions are process-wide, the only way e.g. SIGSEGV kills a single thread is if you write an explicit handler which actually does that by hand. Unhandled exceptions usually SIGABRT, which works the same way.
** Just to expand a bit: there is a subtlety in that, while dispositions are process-wide, one individual thread does indeed take the signal. If the signal is handled, only that thread sees -EINTR from a blocking syscall; but if the signal is not handled, the default disposition affects all threads in the process simultaneously no matter which thread is actually signalled.
You can sort of get that behavior on Linux using clone(..., ~CLONE_THREAD|~CLONE_SIGHAND|CLONE_VM, ...), which creates otherwise distinct processes which share an address space.
You can do all sorts of weird things like create threads which don't share file descriptors, threads which chdir() independently... except that CLONE_THREAD|~CLONE_SIGHAND and CLONE_SIGHAND|~CLONE_VM are disallowed.
I think this is conflating two different things. A Rust Mutex gets poisoned if the thread holding it panics, but that's not the same thing as evaporating into thin air. Destructors run while a panic unwinds (indeed this is how the Mutex poisons itself), and you usually have the option of catching panics if you want. In the panic=abort configuration, where you can't catch a panic, it takes down the whole process rather than just one thread, which is another way of making the same point here: you can't usually kill a thread independently of the whole process its in, because lots of things (like locks) assume you'll never do that.
This is a solvable problem though, the literature is overflowing with lock-free implementations of common data structures. The real question is how much performance you have to sacrifice for the guarantee...
Aren't the alternatives you mentioned - icerberg and duckdb - both storage solutions while spark is a way to express distributed compute? I'm a bit out of touch with this space, is there a newer way to express distributed compute?
duckdb is primarily a query engine. It does have a storage format, but one of it's strengths is querying data where it already resides (e.g. a parquet file sitting in S3).
There are some examples[0] of enabling DuckDB to manage distributed workloads, but these are pretty experimental.
I think what many people are finding out is they don’t really need distributed processing. DuckDB on a single node can get you really far, and it’s much simpler.
DuckDB is not only a storage solution. It can directly query a variety of file formats at rest, without having to re-store anything. That's one of its selling points: you can query across archival/log data stored in S3 (or wherever) without needing to "ingest" anything or double-pay to duplicate the data you've already stored.
I’m just getting into DuckDB lately and finding this feature so exciting. It’s a totally new paradigm. Such a great tool for scientists, and probably many other people. I wish I took it seriously sooner.
Flink is designed around streaming first, while Spark is built around batch first and you're likely best off selecting accordingly. Though any streaming application likely needs batch processing to some degree. Latency vs throughput.
GIN does not support range searches (needed for <, <=, >, >=), prefix or wildcard, etc. It also doesn't support index-only scans, last I checked. You cannot efficiently ORDER BY a nested GIN value.
If you relax your constraint to "retain logs for the past N days", you can accumulate the logs from T=0 to T=(today - N) into tables and still benefit from having snapshots from that cutoff onwards.
Resale values are lower in US because they factor in the 7.5k USD tax credit and the state tax credit mostly, there is plenty of demand for used teslas for example.
Similar in other countries but sometimes not as direct.
Various regulations set targets which gives manufacturers incentives to hit sales targets. This leads to discounts or great lease deals just before certain dates if targets aren't met through standard prices.
Intel also had a later chance when Apple tried to get off the Qualcomm percent per handset model. This was far after the original iPhone. Apple also got sued for allegedly sharing proprietary Qualcomm trade secrets with Intel. And Intel still couldn’t pull it off despite all these tailwinds.