It sucks that S3 somehow became the defacto object storage interface, the API is terrible IMO. Too many headers, too many unknowns with support. WebDAV isn't any better, but I feel like we missed an opportunity here for a standardized interface.
Its like GET <namespace>/object, PUT <namespace>/object. To me its the most obvious mapping of HTTP to immutable object key value storage you could imagine.
It is bad that the control plane responses can be malformed XML (e.g keys are not escaped right if you put XML control characters in object paths) but that can be forgiven as an oversight.
Its not perfect but I don't think its a strange API at all.
My browser prints that out to 413 pages with a naive print preview. You can squeeze it to 350 pretty reasonably with a bit of scaling before it starts getting to awfully small type on the page.
Yes, there's a simple API with simple capabilities struggling to get out there, but pointing that out is merely the first step on the thousand-mile journey of determining what, exactly, that is. "Everybody uses 10% of Microsoft Word, the problem is, they all use a different 10%", basically. If you sat down with even 5 relevant stakeholders and tried to define that "simple API" you'd be shocked what you discover and how badly Hyrum's Law will bite you even at that scale.
> My browser prints that out to 413 pages with a naive print preview. You can squeeze it to 350 pretty reasonably with a bit of scaling before it starts getting to awfully small type on the page.
It gets complex with ACLs for permissions, lifecycle controls, header controls and a bunch of other features that are needed on S3 scale but not at smaller provider scale.
And many S3-compatible alternatives (probably most but the big ones like Ceph) don't implement all of the features.
For example for lifecycles backblaze have completely different JSON syntax
Everything uses poorly documented, sometimes inconsistent HTTP headers that read like afterthoughts/tech debt. An S3 standard implementation has to have amazon branding all over it (x-amz) which is gross.
To be fair. We still have an opportunity to create a standardized interface for object storage. Funnily enough when Microsoft made their own they did not go for S3 compatible APIs, but Microsoft usually builds APIs their customers can use.
It was better. When it first came out, it was a pretty simple API, at least simpler than alternatives (IIRC, I could just be thinking with nostalgia).
I think it's only gotten as complicated as it has as new features have been organically added. I'm sure there are good use cases for everything, but it does beg the question -- is a better API possible for object storage? What's the minimal API required? GET/POST/DELETE?
I suspect there is no decent "minimal" API. Once you get to tens of millions of objects in a given prefix, you need server side filtering logic. And to make it worse, you need multiple ways to do that.
For example, did you know that date filtering in S3 is based on string prefix matching against an ISO8601/RFC3339 style string representation? Want all objects created between 2024-01-01 and 2024-06-30? You'll need to construct six YYYY-MM prefixes (one per month) for datetime and add them as filter array elements.
As a result the service abbreviation is also incorrect these days. Originally the first S stood for "Simple". With all the additions they've had to bolt on, S2 would be far more appropriate a name.
it's storing a [utf8-string => bytes] mapping with some very minimal metadata. But that can be whatever you want. JSON, CBOR, XML, actual document formats etc.
And it's default encoding for listing, management operations and similar is XML....
> but I feel like we missed an opportunity here for a standardized interface.
except S3 _is_ the de-facto standard interface which most object storage system speaks
but I agree it's kinda a pain
and commonly done partial (both feature wise and partial wrong). E.g. S3 store utf8 strings, not utf8 file paths (like e.g. minio does), that being wrong seems fine but can lead to a lot of problems (not just being incompatible for some applications but also having unexpected perf. characteristics for others) making it only partial S3 compatible. Similar some implementations random features like bulk delete or support `If-Match`/`If-Non-Match` headers can also make them S3 incompatible for some use cases.
So yeah, a new external standard which makes it clear what you should expect to be supported to be standard compatible would be nice.