Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Garage: An open-source distributed object storage service (deuxfleurs.fr)
346 points by alex_hirner on Dec 4, 2022 | hide | past | favorite | 75 comments


Just finished installing it on my OpenIndiana NAS to replace Minio.

Biggest difference so far is that Minio is just files on disk, Garage chunks all files and has a metadata db.

Minios listing operations were horribly slow, still have to see if Garage resolves that.


> Biggest difference so far is that Minio is just files on disk

Minio _was_ just files on disk. They don't support that mode anymore since 2022-10-29 (see the big yellow warning box at [1]).

[1] https://min.io/docs/minio/linux/operations/install-deploy-ma...


Yeah, I have actually frozen Minio at this older version in my stack, as "just files on disk" was the primary feature that drew me to it. I don't want my data locked into some custom format. I'd be willing to bet that ZFS will still be supported in 20 years, but I would not make the same bet about Minio.

For the same reason, it looks like Garage is not an option for my use case.


I use this "just files on disk" feature too, I also use ZFS.

I have a bunch of crawlers uploading data to AWS S3, but S3 is too expensive, I replaced S3 with MinIO.

MinIO stores plain files on disk, which makes it a lot easier to access my data. I can read files without calling MinIO APIs, the speed is super fast.

By the way, which old version are you using? I'm using RELEASE.2022-04-26T01-20-24Z


Well... right now I am using minio version RELEASE.2021-08-31T05-46-54Z. However I really ought to upgrade that to the last supported version.


Ah interesting. I found it appealing to always have a way to get at the data natively as a worst case for restores. The whole use case is for Vertical Backup (from the maker of Duplicacy) to back up VMs.


> Biggest difference so far is that Minio is just files on disk, Garage chunks all files and has a metadata db.

I'd kind of expect most blob storage solutions to use abstractions other than just the file system, or at least consider doing so.

I recently built a system to handle millions of documents as a proof of concept and when I was testing it with 10 million files, the server ran out of inodes, before I went over to storing the blobs in some attached storage that had XFS: https://blog.kronis.dev/tutorials/3-4-pidgeot-a-system-for-m...

With abstracted storage (say, files bunches up into X MB large containers or chunked into such when too large, with something else to keep track of what is where) that wouldn't be such an issue, though you might end up with other issues along the way.

It's curious that we don't advocate for storing blobs in relational databases anymore, even though I can also understand the reasoning (or at least why having a separate DB for your system data and your blob data would be a good idea, for backups/test data/deciding where to host what and so on).


> I'd kind of expect most blob storage solutions to use abstractions other than just the file system, or at least consider doing so.

Honestly, I'd expect the exact opposite. Filesystems are really good at storing files. Why not leverage all that work?

> I recently built a system to handle millions of documents as a proof of concept and when I was testing it with 10 million files, the server ran out of inodes, before I went over to storing the blobs in some attached storage that had XFS

That's a misconfiguration issue though, not a reason to not store blobs as files on disk. Ext4 can handle 2^32 files. ZFS can handle 2^128(?).

> With abstracted storage (say, files bunches up into X MB large containers or chunked into such when too large, with something else to keep track of what is where) that wouldn't be such an issue, though you might end up with other issues along the way.

A few issues that come to mind for me:

* This requires tuning to actually reduce the number of inodes of used for certain datasets. E.g., if I'm storing large media files, that chunking would _increase_ the number of files on disk, not reduce it. At which point, if inode limits are the issue, we're just making it worse.

* It adds additional complexity. Now you need to account for these chunks, and, if you care about the data, check it periodically.

* You need specific tooling to work with it. Files on a filesystem are.. files on a filesystem. Easy to backup, easy to view. Arbitrary chunking and such requires tooling to perform operations on it. Tooling that may break, or have the wrong versions, or.. etc.

> It's curious that we don't advocate for storing blobs in relational databases anymore, even though I can also understand the reasoning

In my experience, the popular RDBMS out there just aren't good at it. With the way locking semantics and their transaction queueing works, storing and retrieving lots of blobs just isn't performant. You can get away with it for a long time though, and it can be pretty nice when you can.


> Filesystems are really good at storing files. Why not leverage all that work?

As an asterisk, the S3 API is key-value pairs, not files; that distinction comes up a lot when interacting with Amazon S3, and I would expect the same with an S3 API clone. For example, ListObjects[1] has a "delimiter" that (AFAIK) defaults to / making it appear to be a filesystem but using "." or "!" would be a perfectly fine delimiter and thus would have no obvious filesystem mapping

1: https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObje...


Why is it useful?


That's a complicated question but allows highlighting what I was bringing up: the Key is any unicode character[1] so while it has become conventional to use "/", imagine if you wanted to store the output of exploded jar files in S3, but be able to "list the directory" of a jar's contents: `PutObject("/some-path/my.jar!/META-INF/MANIFEST.MF", "Manifest-Version: 1.0")`

Now you can `ListObjects(Prefix="/some-path/my.jar", Delimiter="!")` to get the "interior files" back.

I'm sure there are others, that's just one that I could think of off the top of my head. Mapping a URL and its interior resources would be another (`"https://example.com\t/script[1]", "console.log('hello, world')")`

Further fun fact that even I didn't know until searching for other examples: "delimiter" is a string and thus can be `Delimiter=unknown` or such: https://github.com/aws/aws-sdk-go/issues/2130

1: see the ListObject page under "encoding-type"


Minio supports virtual ZIP directories for such use cases. In your example, as long as this was enabled and your jar file was properly detected, you could submit a GET for "/some-path/my.jar/META-INF/MANIFEST.MF" and get the contents of that file just fine.


One will observe I said list, not get, although in this case it's likely a non-issue because Minio supports the S3 API https://github.com/minio/minio-go/blob/v7.0.45/api-list.go#L... and thus should support the 2nd example I provided, too


Imagine you have a billion files in a “directory”. Being able to find files that start with “xyz” in constant time is a very, very useful property.


> Honestly, I'd expect the exact opposite. Filesystems are really good at storing files. Why not leverage all that work?

File systems are optimized for a hierarchical organization on a single machine. However, this kind of organization inhibits storing data in a distributed system because of the links between entries. S3 and similar object stores are a distributed, flat “filesystem”. There’s no relationship between files and there’s no grouping (aside from a virtual one you can simulate but doesn’t really exist). That’s why S3 doesn’t suffer weird directory traversal attacks that bring your file system to a crawl because such potentially expensive operations don’t exist.


> Honestly, I'd expect the exact opposite. Filesystems are really good at storing files. Why not leverage all that work?

There are lots of different file systems out there and you won't always get a say in what your cloud vendor has on offer. However, if you can launch a container on the system that does an abstraction on top of the file system, takes its best parts and makes up for any shortcomings it might have in a mostly standardized way, then you can benefit from it.

That's not always the right way to go about things: it seems to work nicely for relational databases and how they store data, whereas in regards to storing larger bits of binary data, there are advantages and shortcomings to either approach. At the end of the day, it's probably about tradeoffs and what workload you're working with, what you want to achieve and so on.

> That's a misconfiguration issue though, not a reason to not store blobs as files on disk. Ext4 can handle 2^32 files. ZFS can handle 2^128(?).

Modern file systems are pretty good and can support lots of files, but getting a VPS from provider X doesn't mean that they will. Or maybe you have to use a system that your clients/employer gave you - a system that with such an abstraction would be capable of doing what you want to do, but currently doesn't. I agree that it's a misconfiguration in a sense, but not one that you can rectify yourself always.

> * This requires tuning to actually reduce the number of inodes of used for certain datasets. E.g., if I'm storing large media files, that chunking would _increase_ the number of files on disk, not reduce it. At which point, if inode limits are the issue, we're just making it worse.

This is an excellent point, thank you for making it! However, it's not necessarily a dealbreaker: on one hand, you can probably gauge what sorts of data you're working with (e.g. PDF files that are around 100 KB in size, or video files that are around 1 GB each) and tune accordingly, or perhaps let such a system rebalance data into chunks dynamically, as needed.

> * It adds additional complexity. Now you need to account for these chunks, and, if you care about the data, check it periodically.

As long as things keep working, many people won't care (which is not actually the best stance to take, of course) - how many care about what happens inside of their database when they do SQL queries against it, or what happens under the hood of their compatible S3 store of choice? I'll say that I personally like keeping things as simple as possible in most cases, however the popularity of something like Kubernetes shows that it's not always what we go for as an industry.

I could say the same about using PostgreSQL for certain workloads, for which SQLite might also be sufficient, or opting for a huge enterprise framework for a boring CRUD when something that has a codebase one tenth the size would suffice. But hey, as long as people don't constantly get burned by these choices and can solve the problems they need to, to make more money, good for them. Sometimes an abstraction or a piece of functionality that's provided reasonably outweighs the drawbacks and thus makes it a viable choice.

> * You need specific tooling to work with it. Files on a filesystem are.. files on a filesystem. Easy to backup, easy to view. Arbitrary chunking and such requires tooling to perform operations on it. Tooling that may break, or have the wrong versions, or.. etc.

This is actually the only point where I'll disagree.

You're always one directory traversal attack against your system away from having a really bad time. That's not to say that it will always happen (or that accessing unintended data cannot happen on other storage solutions, e.g. even the adjacent example of relational databases will make anyone recall SQL injection, or S3 will have stories of insecure buckets with data leaking confidential information), but being told that you can just use the file system will have many people using files as an abstraction in the programming language of their choice, without always considering the risks of sub-optimal engineering, like directory traversal attacks or file permissions.

Contrast this to a scenario where you're given a (presumably) black box that exposes an API to you - what's inside of the box is code that's written by other people that are more clever than you (the "you" in this example being an average engineer) and that handles many of the concerns that you might not have even thought of nicely. And if there are ever serious issues or good reasons for peeling back that complexity, look up the source code of that black box on GitHub and start diving in. Of course, in the case of MinIO and many other storage solutions, that's already what you get and is good enough. That's actually why I or others might use something S3 compatible, or something that gives you signed URLs for downloading files - so you don't have to think about or mess up how the signing works. That's also why I and many others would be okay with having a system that eases the implications of needing to think about file systems, by at least partially abstracting it away. Edit: removed unnecessary snarky bits about admittedly leaky abstractions you often get.

Honestly, that's why I like databases letting you pick whatever storage engines are suitable for your workloads, similarly to how object storage solutions might approach the issue - just give the user the freedom to choose how they want to store their blobs at the lower level, giving sane defaults otherwise. Those defaults might as well be just files on a filesystem. In regards to object storage, that's before we get into thinking about file names (especially across different OSes), potential conflicts and file versioning, as well as maximum file size supported by any number of file systems that you might need to support.


To put it pretty bluntly: you were off the rails at "getting a VPS from provider X doesn't mean that they will". You're talking in terms of not having a custom kernel, and that's just the wrong layer of abstraction if we're talking about "cloud"; this whole discussion is really about VM and colo levels of abstraction anyway ("Cloud" advice would be "Just use your vendor's S3 blobstore").

Base Ubuntu has xfs support. If your VPS provider won't run plain old Ubuntu with some cloudinit stuff, get a new VPS provider.


> It's curious that we don't advocate for storing blobs in relational databases anymore

That's exactly what I did recently on new work: migrated blobs from DB to S3. It significantly reduced load from the servers (and will reduce more, right now the implementation is primitive - just proxying S3, using URL will allow other services to deal with S3 directly). It solved backup nightmare (those people couldn't do backup because their server run out of space every month). I'll admit that backup issue is more like admin incompetence but I work with what I get. Having database shrink from 200GB to 80MB now allows to backup/restore it in seconds rather than hours.

I didn't find any issues with S3 approach. Even transactions solved by a tiny possibility of leaving junk in S3 which is a non-issue. Just upload all data to S3 before commit and delete if commit fails (and if commit fails and delete fails, so be it).


I thought the "stat" command of Minio was supposed to resolve the "listing is horribly slow" issue?


Maybe, but that didn't help third party tools that I could see.


A very good alternative is seaweedfs https://github.com/chrislusf/seaweedfs/ based on facebook haystack paper (efficient small files) & more.


Did you pull that URL out of some blog post or something? The accurate URL is https://github.com/seaweedfs/seaweedfs


He (the developer) had the repo on his own profile. Then he moved the repo to seaweedfs organization. My browser auto-completed the old url.


It looks similar to minio, which as also an AGPL single binary that implements the S3 API. However Minio is written in Go and Garage is in Rust. I'd love to see a detailed comparison.


Garage design goals and non-goals: https://garagehq.deuxfleurs.fr/documentation/design/goals/

Seaweed design goals / features: https://github.com/seaweedfs/seaweedfs

MinIO: https://min.io/docs/minio/linux/operations/concepts.html

Note Garage list of non goals specify priority features of Seaweed or MinIO, for example erasure coding.


Does it have API parity with S3 ? On top of normal get/put S3 offers a retention policies and abilities to apply say a given header to a file or whole bucket


Reading that only makes Minio and Garage seem more similar. They are both single-binary, clustered, S3-compatible file storage solutions. They are both meant for self-hosting on consumer-grade hard drives, provide redundancy to deal with drive or node failures, and don't aim to implement the entire AWS S3 API surface.


We're running Ceph with Rook and Minio in Kubernetes. I'm evaluating Garage as an alternative for some purposes such as application instance static files for Python or Javascript applications to avoid block volumes in deployments. Garage has a simpler high availability distribution story than Minio's erasure coding striping, but it works well for applications that don't require sub-second writes. It can do high availability with only three nodes versus four for Minio, which works well for smaller setups. It can also serve static websites straight from a bucket, which would work well for static sites like those made with static site generators.

Ceph has an S3 API gateway, but it's a more wholesome solution better suited for larger setups compared to Minio and Garage.


It seems that Garage is designed to be fast when inter-node latency is high. A geodistributed "cluster," for example, would profit from using Garage instead of Minio.


Do we know the largest size of data that these object stores have handled? People seem have been moving away from HDFS, yet companies could host exabytes of data on HDFS and serve TBs of scans per second with a single team of fewer than 15 people. I was wondering how production-ready the other OSS alternatives are for such scale of data.


I remember working at a company that got started before cloud took off. They used mogile fs which I recently found at https://mogilefs.github.io/mogilefs-docs/HighLevelOverview.h... but I never hear about anyone else using it. It wasn't as stable as S3 but it was okay, I guess. Does anyone else here remember that distributed open source file system?


Yeah, I used it long time ago along with GearMan and Memcached.

Brad Fitzpatrick was (still is?) living in the future.


I used to work for an Alexa top 2k VOD website that used MogileFS, I bet they still use it.


Really clean docs page, nice design!


So is this supposed to be a simpler, less "omg you need a full time staff of 10 to manage" version of Ceph?

https://docs.ceph.com/en/quincy/


Doubtful, as Garage's features are a small subset of Ceph's. Ceph supports block storage, object storage (via RadosGW), and file storage (via cephfs). Ceph is highly sophisticated and very complex. It's anything but simple.


Related:

Garage, our self-hosted distributed object storage solution - https://news.ycombinator.com/item?id=30256753 - Feb 2022 (130 comments)


Is there any OSS implementation that is not AGPL?



I don’t think development is very active at all now that Basho is dead, but Riak CS is Apache licensed.

https://github.com/basho/riak_cs


Ceph Object Gateway is LGPLv2.1


Why would that matter? Apps using its api don't need to be AGPL


First of all, it isn't clear this is the case.

But this entire comment thread on the AGPL misses the mark. It doesn't matter that the AGPL hasn't been tested in court or what fine grained distinctions you apply to the license or what the AGPL intends. No company in their right mind would risk using software licensed under the AGPL because the result of being wrong would be catastrophic. The legal advice to be skeptical of the AGPL is absolutely right. There is no conceivable reason to ever use AGPL software when you could simply license it under a commerical license or use a non-AGPL alternative.

Generally when someone licenses something under the AGPL they totally understand this and that is their intention.


> The legal advice to be skeptical of the AGPL is absolutely right.

The legal risk involved by using AGPL software for a company is exactly zero.

AGPL is an open source license which, by the very definition of open source, means that you can freely use the software. Full stop.

The only arguable risk is when modifying the software and on top of that using it in conjunction with other in-house software. But if you are ready to use a proprietary license, you already refrained from modifying the software.

So just use it and end of story. AGPL is a perfectly fine, open source license.


Or, you could use AGPL software and license it under AGPL as well. Considering most money today is made in the hosting and servicing and not selling license, I don't see why you would bother caring about old fashioned ways.


You're right, Google should drop their policy and open source the whole thing, since it's their data and GCP services that are the crown jewels, right? https://opensource.google/documentation/reference/using/agpl...


They are talking about integrating it with rest of their apps in monorepo tho. Not running a completely separate service that you only talk via API

It's kinda easy to miss that detail when you focus on bitching about license instead of the point.


You're right, Google is a sane and healthy model of business to follow and surveillance capitalism, based explicitly on capturing data on users and selling it to third-parties for ads is something that benefits society as a whole. There are absolutely no models other than the one Google is running.


We would like to self host and our legal team forbids us from using AGPL stuff.


Sounds like a problem created entirely by your legal team, not software or licenses.


Working with legal is advisable instead of ignoring it.


The solution to "colleague got it wrong" is not "silently ignore them" and I never suggested that. In fact my suggestion is precisely that you "work with them" and have them review this specific case.

"Don't use AGPL" is a good baseline rule if you don't have a legal team but does not apply in this case as I'm sure they'd advise if they reviewed it.


You should consider a replacement of your legal team.


This doesn’t seem accurate. Isn’t AGPL viral across RPC boundaries requiring open sourcing not just the service but all supporting code for that service?


No it's not. From a practical standpoint, I'm not even sure how that could work. You would have to require all browsers to be open source AGPL in order to load a web page served by it. By way of analogy it seems the equivalent of requiring the mouse and keyboard firmware to be licensed the same as the operating system.

A real life example is Instructure, which makes Canvas (which is agpl) but has other proprietary services that interact heavily with it. It's never been a problem

1: https://github.com/instructure/canvas-lms


> require all browsers to be open source AGPL in order to load a web page served by it

Don't be silly: a web server is not distributing a web browser, and thus when you visit news.ycombinator.com, they don't have influence over whether you do that via netcat, curl, or Awesome AGPL Browser 1.0

If, however, they used https://git.deuxfleurs.fr/Deuxfleurs/tricot to serve the http request, then AIUI the AGPL entitles you, as a "13. Remote Network Interaction; Use with the GNU General Public License. (https://opensource.org/licenses/AGPL-3.0)", to ask for the source code of tricot and potentially any systems that it subsequently interacts with

I'm certain I'm going to regret posting this, given how hot-button the AGPL is in every one of these threads


Instructure doesn't need to comply with AGPL obligations because it owns the product. It isn't licensing it to itself under the AGPL.


As far as I know, this isn't true. Some AGPL users claim that it is a requirement of the AGPL, but I am also unaware of any litigation that substantiates that reading of the license.

Can you cite cases that have established this as fact?


I would suspect no one wants to invest the legal team or time to be the "trailblazer" court case to find out whether your theory or the common interpretation is correct. The "just ban AGPL" stance is by far the safer route since it's not like there are no sane replacements for AGPL stuff

IANAL, and thus far my life is worse for any interaction with the legal system


It totally is safer, and I tend to agree with it in a business context. But the license itself doesn't seem to indicate that any kind of "RPC" is thus encumbered.

When MongoDB was under the AGPL, your application wasn't under the AGPL if you connected to it and asked it to do queries.


For some places, the in-house legal department makes it very difficult to use anything AGPL even if you assure them you're only using the api.



Are you sure this offers the same S3 compatible API? It sure does look like it rolled its own API[1], which I guess is fine so long as you're entirely in the Triton ecosystem, but makes reusing existing software harder than necessary without that compatibility layer. And that's not even getting into this absolutely mess: https://github.com/TritonDataCenter/manta#repositories it reminds me of the "Microservices" video come to life

1: https://github.com/TritonDataCenter/manta/blob/master/docs/u...


Kind of topic. But I’m wondering if it’s possible to get comparable pricing when self-hosting a solution similar to this in the cloud vs say AWS S3.


Yes. Specially S3 which is way overpriced. I have setup already a multi-region minio cluster at the cost of $0.0039/GB.

The only "cloud" solution that could get closer to that is Storj. [0]

[0]: https://www.storj.io/


Isn't storj using a cryptocurrency?


You pay for the service in cash if you want. The token is to pay out the nodes storing it.


Thanks! Where are you hosting the machines? Are they on a vps somewhere, or do you own the hardware?


Hetzner dedicated servers in their different data centers.


Comparison to Minio?


For this use-case, I like JuiceFS better.

* https://juicefs.com/en/

* https://github.com/juicedata/juicefs

I am not affiliated with them, just a regular user.


Juicefs is not a distributed storage, you didn't read the post at all. It's just a filesystem layer on top of distributed storage (S3).


JuiceFS looks like something that could be used with Garage.


How does this compare to Wasabi for hot backups?


Wasabi is a service this is a filesystem




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: