Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It sucks that S3 somehow became the defacto object storage interface, the API is terrible IMO. Too many headers, too many unknowns with support. WebDAV isn't any better, but I feel like we missed an opportunity here for a standardized interface.




?

Its like GET <namespace>/object, PUT <namespace>/object. To me its the most obvious mapping of HTTP to immutable object key value storage you could imagine.

It is bad that the control plane responses can be malformed XML (e.g keys are not escaped right if you put XML control characters in object paths) but that can be forgiven as an oversight.

Its not perfect but I don't think its a strange API at all.


That may be what S3 is like, but what the S3 API is is this: https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/s3

My browser prints that out to 413 pages with a naive print preview. You can squeeze it to 350 pretty reasonably with a bit of scaling before it starts getting to awfully small type on the page.

Yes, there's a simple API with simple capabilities struggling to get out there, but pointing that out is merely the first step on the thousand-mile journey of determining what, exactly, that is. "Everybody uses 10% of Microsoft Word, the problem is, they all use a different 10%", basically. If you sat down with even 5 relevant stakeholders and tried to define that "simple API" you'd be shocked what you discover and how badly Hyrum's Law will bite you even at that scale.


> That may be what S3 is like, but what the S3 API is is this: https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/s3

> My browser prints that out to 413 pages with a naive print preview. You can squeeze it to 350 pretty reasonably with a bit of scaling before it starts getting to awfully small type on the page.

idk why you link to Go SDK docs when you can link to the actual API reference documentation: https://docs.aws.amazon.com/AmazonS3/latest/API/API_Operatio... and its PDF version: https://docs.aws.amazon.com/pdfs/AmazonS3/latest/API/s3-api.... (just 3874 pages)


It's better to link to a leading S3 compatible API docs page. You get a better measure of the essential complexity

https://developers.cloudflare.com/r2/api/s3/api/

It's not that much, most of weirder S3 APIs are optional, orthogonal APIs, which is good design.


Because it had the best "on one HTML page" representation I found in the couple of languages I looked at.

That page crashes Safari for me on iOS.

It gets complex with ACLs for permissions, lifecycle controls, header controls and a bunch of other features that are needed on S3 scale but not at smaller provider scale.

And many S3-compatible alternatives (probably most but the big ones like Ceph) don't implement all of the features.

For example for lifecycles backblaze have completely different JSON syntax


Last I checked the user guide to the API was 3500 pages.

3500 pages to describe upload and download, basically. That is pretty strange in my book.


Even download and upload get tricky if you consider stuff like serving buckets like static sites, or stuff like siged upload URLs.

Now with the trivial part off the table, let's consder storage classes, security and ACLs, lifecycle management, events, etc.


Everything uses poorly documented, sometimes inconsistent HTTP headers that read like afterthoughts/tech debt. An S3 standard implementation has to have amazon branding all over it (x-amz) which is gross.

I suspect they learned a lot over the years and the API shows the scars. In their defense, they did go first.

I mean… it’s straight up an Amazon product, not like it’s an IETF standard or something.

!!!

I’ve seen a lot of bad takes and this is one of them.

Listing keys is weird (is it V1 or V2)?

The authentication relies on an obtuse and idiosyncratic signature algorithm.

And S3 in practice responds with malformed XML, as you point out.

Protocol-wise, I have trouble liking it over WebDAV. And that's depressing.


HTTP isn't really a great back plane for object storage.

I thought the openstack swift API was pretty clean, but i'm biased.

To be fair. We still have an opportunity to create a standardized interface for object storage. Funnily enough when Microsoft made their own they did not go for S3 compatible APIs, but Microsoft usually builds APIs their customers can use.

It was better. When it first came out, it was a pretty simple API, at least simpler than alternatives (IIRC, I could just be thinking with nostalgia).

I think it's only gotten as complicated as it has as new features have been organically added. I'm sure there are good use cases for everything, but it does beg the question -- is a better API possible for object storage? What's the minimal API required? GET/POST/DELETE?


I suspect there is no decent "minimal" API. Once you get to tens of millions of objects in a given prefix, you need server side filtering logic. And to make it worse, you need multiple ways to do that.

For example, did you know that date filtering in S3 is based on string prefix matching against an ISO8601/RFC3339 style string representation? Want all objects created between 2024-01-01 and 2024-06-30? You'll need to construct six YYYY-MM prefixes (one per month) for datetime and add them as filter array elements.

As a result the service abbreviation is also incorrect these days. Originally the first S stood for "Simple". With all the additions they've had to bolt on, S2 would be far more appropriate a name.


Like everything it starts off simple but slowly with every feature added over 19 years Simple Storage is it not.

S3 has 3 independent permissions mechanisms.


S3 isn't JSON

it's storing a [utf8-string => bytes] mapping with some very minimal metadata. But that can be whatever you want. JSON, CBOR, XML, actual document formats etc.

And it's default encoding for listing, management operations and similar is XML....

> but I feel like we missed an opportunity here for a standardized interface.

except S3 _is_ the de-facto standard interface which most object storage system speaks

but I agree it's kinda a pain

and commonly done partial (both feature wise and partial wrong). E.g. S3 store utf8 strings, not utf8 file paths (like e.g. minio does), that being wrong seems fine but can lead to a lot of problems (not just being incompatible for some applications but also having unexpected perf. characteristics for others) making it only partial S3 compatible. Similar some implementations random features like bulk delete or support `If-Match`/`If-Non-Match` headers can also make them S3 incompatible for some use cases.

So yeah, a new external standard which makes it clear what you should expect to be supported to be standard compatible would be nice.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: