Hacker Newsnew | past | comments | ask | show | jobs | submit | more kayson's commentslogin

Any recommendations from HN for a write-once (literally once), data storage format that's suitable for network storage?

sqlite docs recommend avoiding using it on network storage, though from what I can gather, it's less of an issue if you're truly only doing reads (meaning I could create it locally and then copy it to network storage). Apache Parquet seems promising, and it seems to support indexing now which is an important requirement.


SQLite works fine over read-only NFS, in my experience. Just only work on an immutable copy and restart your application if ever changing it. If your application is short lived and can only ever see an immutable copy on the path, then it is a great solution.



Multiple writers on network storage is the issue. Reading should be totally fine.


SQLite does work on NFS even in read-write scenario. Discovered by accident, but my statement still holds. The WAL mode is explicitly not supported over network filesystems, but I guess you don't expect it to :)


My experience has been the opposite... Lots of db lock and corruption issues. The FAQ doesn't call out WAL specifically, just says don't do it at all: https://www.sqlite.org/faq.html#q5


I've had multiple flaky issues with SQLite (e.g. non-HA Grafana) on Azure Files using NFS v4.1 leading to locked DBs. Perhaps some implementations work, I'm not gonna rely on it or advise others to do so.


Yeah trying to write from several hosts will certainly fail if you don't have advisory locks working, which is not a given, so you are right of course


These were singular containers, let alone hosts.


Parquet files are what I use.


SQLite over NFS works if you have one writer and many readers.


Cheaper / smaller? I would say not likely. There is already an enormous amount of market pressure to make SRAM and DRAM smaller.

Device physics-wise, you could probably make SRAM faster by dropping the transistor threshold voltage. It would also make it harder / slower to write. The bigger downside is that it would have higher leakage power, but if it's a small portion of all the SRAM, it might be worth the tradeoff.

For DRAM, there isn't as much "device" involved because the storage element isn't transistor-based. You could probably make some design tradeoff in the sense amplifier to reduce read times by trading off write times, but I doubt it would make a significant change.


But much of the latency in cache is getting the signal to and from the cell, not the actual store threshold. And I can't see much difference in that unless you can actually eliminate gates (and so make it smaller, making it physically closer on average).


Or use shellcheck: https://www.shellcheck.net/


Tl;dr: Use both because they aren't mutex.

Shellcheck isn't a complete solution, and running -e mode is essential to smaller bash files. Shellcheck even knows if a script is in -e mode or not.


The Thunderbird Pro Add-on Repo [1] doesn't really make it clear - if I want to self host Appointment and Send, do I need to build the addon myself and change the endpoints? Or is there some kind of config?

1. https://github.com/thunderbird/tbpro-add-on


That's me. But now everything is done automagically by nzbget and I use nanazip on my Windows desktop.


After some research, it seems much easier to just back up the Proxmox config (and VM disk images, if they're needed) than to define or deploy Proxmox VMs with OpenTofu or ansible.

https://pve.proxmox.com/wiki/Proxmox_Cluster_File_System_(pm...


> According to Cadence’s admissions and court documents, employees of Cadence China did not disclose to and/or concealed from other Cadence personnel, including Cadence’s export compliance personnel, that exports to CSCC were in fact intended for delivery to NUDT and/or the PRC military. For example, in May 2015, a few months after NUDT was added to the Entity List, Cadence’s then-head of sales in China emailed colleagues, cautioning them to refer to their customer as CSCC in English and NUDT only in Chinese characters, writing that “the subject [was] too sensitive.”

Interesting. Sounds like Cadence China employees went rogue. Nonetheless, Cadence USA is on the hook.


Cadence China, a wholy controlled subsidiary of Cadence Design Systems went rogue, and Cadence Design Systems is on the hook.


> EDA tools constantly need to "phone back home" to load updates and validate licenses

This isn't true in my experience. Cadence, Synopsys, and Siemens tools all use local license files or license servers (mainly FlexLM). Updates are just downloaded from their website.


What do they need so much capital for?


My guess is scaling up their ability to manufacture hardware.


I think the downvoting on you is a little harsh. TFA does allude to it, but doesn't explicitly answer your question. I presume the implicit answer is here:

> With growing customer enthusiasm, we were increasingly getting questions about what it would look like to buy a large number of Oxide racks. Could we manufacture them? Could we support them? Could we make them easy to operate together?

i.e. they need the capital in order to be able to satisfy large orders on sane timeframes - but that's very expensive when you're a hardware business.


Thanks. It was a genuine question but I guess I can see how it might be taken otherwise.


They are a hardware company. Hardware costs a lot of money to innovate and build on.


The thing that always gets me about backup consistency is that it's impossibly difficult to ensure that application data is in a consistent state without bringing everything down. You can create a disk snapshot, but there's no guarantee that some service isn't mid-write or mid-procedure at the point of the snapshot. So if you were to restore the backup from the snapshot you would encounter some kind of corruption.

Database dumps help with this, to a large extent, especially if the application itself is making the dumps at an appropriate time. But often you have to make the dump outside the application, meaning you could hit it in the middle of a sequence of queries.

Curious if anyone has useful tips for dealing with this.


I think generally speaking, databases are resilient to this so taking a snapshot of the disk at any point is sufficient as a backup. The only danger is if you're using some sort of on-controller disk cache with no battery backup, then basically you're lying to the database about what has flushed and there can be inconsistencies on "power failure" (i.e. live snapshot).

But for the most part as especially in the cloud, this shouldn't be an issue.


Beware that although databases are resilient to snapshotting, they're not resilient to inconsistent snapshots. All files have to be snapshotted at the exact same moment, which means either a filesystem-level or disk-level snapshot, or SIGSTOP all database processes before doing your recursive copy or rsync.

Some databases have the ability to stop writing and hold all changes in memory (or only append to WAL, which is recursive-copy-safe) while you tell it you're doing a backup.


It's not clear if there are other places that application state is being stored, outside your database, that you need to capture. Do you mean things like caches? (I'd hope not.)

pg_dump / mysqldump both solve the problem of snapshotting your live database safely, but can introduce some bloat / overhead you may have to deal with somehow. All pretty well documented and understood though.

For larger postgresql databases I've sometimes adopted the other common pattern of a read-only replica dedicated for backups: you pause replication, run the dump against that backup instance (where you're less concerned about how long that takes, and what cruft it leaves behind that'll need subsequent vacuuming) and then bring replication back.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: