It's because none of those companies is considered "Information Technology" according to the official GICS criteria [0] used to classify companies in the index. For instance, Meta and Google are in the "Communications Services" sector; Amazon is in "Consumer Discretionary." There are 69 total companies in the S&P 500 in the "Information Technology" GICS sector [1], and all are excluded from SPXT.
Not really in a similar vein, because there's actually a good reason for this to be very close to an integer whereas there is no such reason for e^pi - pi.
Assuming those 20PB are hot/warm storage, S3 costs roughly $0.015/GB/month (50:50 average of S3 standard/infrequent access). That comes out to roughly $3.6M/year, before taking into account egress/retrieval costs. Does it really cost that much to maintain your own 20PB storage cluster?
If those 20PB are deep archive, the S3 Glacier bill comes out to around $235k/year, which also seems ludicrous: it does not cost six figures a year to maintain your own tape archive. That's the equivalent of a full-time sysadmin (~$150k/year) plus $100k in hardware amortization/overhead.
The real advantage of S3 here is flexibility and ease-of-use. It's trivial to migrate objects between storage classes, and trivial to get efficient access to any S3 object anywhere in the world. Avoiding the headache of rolling this functionality yourself could well be worth $3.6M/year, but if this flexibility is not necessary, I doubt S3 is cheaper in any sense of the word.
Like most of AWS, it depends if you need what it provides. A 20PB tape system will have an initial cost in the low to mid 6 figures for the hardware and initial set of tapes. Do the copies need to be replicated geographically? What about completely offline copies? Reminds me of conversations with archivists where there's preservation and then there's real preservation.
How the heck does anyone have that much data? I once built myself a compressed plaintext library from one of those data-hoarder sources that had almost every fiction book in existence, and that was like 4TB compressed (but would've been much less if I bothered hunting for duplicates and dropped non-English).
I suspect the only way you could have 20PB is if you have metrics you don't aggregate or keep ancient logs (why do you need to know your auth service had a transient timeout a year ago?)
Lots of things can get to that much data, especially in aggregate. Off the top of my head: video/image hosting, scientific applications (genomics, high energy physics, the latter of which can generate PBs of data in a single experiment), finance (granular historic market/order data), etc.
In addition to what others have mentioned, before the "AI bubble", there was a "data science bubble" where every little signal about your users/everything had to be saved so that it could be analyzed later.
> Does it really cost that much to maintain your own 20PB storage cluster?
If you think S3 = storage cluster than the answer is no.
If you think about S3 what it actually is: scalable, high throughput, low latency, reliable, durable, low operational overhead, high uptime, encrypted, distributed, replicated storage with multiple tier1 uplinks to the internet than the answer is yes.
>scalable, high throughput, low latency, reliable, durable, low operational overhead, high uptime, encrypted, distributed, replicated storage with multiple tier1 uplinks to the internet
If you need to tick all of those boxes for every single byte of 20PB worth of data, you are working on something very cool and unique. That's awesome.
That said, most entities who have 20PB of data only need to tick a couple of those boxes, usually encryption/reliability. Most of their 20PB will get accessed at most once a year, from a predictable location (i.e. on-prem), with a good portion never accessed at all. Or if it is regularly accessed (with concomitant low latency/high throughput requirements), it almost certainly doesn't need to be globally distributed with tier1 access. For these entities, a storage cluster and/or tape system is good enough. The problem is that they naïvely default to using S3, mistakenly thinking it will be cheaper than what they could build themselves for the capabilities they actually need.
Very cool. tl;dw: an inverted triple pendulum has 2^3 = 8 equilibria, since each arm of the pendulum can either be up or down (naturally, all but one equilibria are unstable), and this controller is able to make all 8*7 = 56 transitions between them.
Control theory is one of those things that shouldn't possibly work, yet here we are.
>thousands of middle-class "bullshit jobs" are disappearing, but rather than being replaced by a wave of productive jobs [...] we're just seeing unemployment, underemployment.
Jobs are neither fungible nor mutually exclusive; there is no reason to assume that someone working in a bullshit job would thrive in a non-bullshit job that contributes to society in more productive ways, nor does the existence of bullshit jobs prevent people from working non-bullshit jobs. I hate to say it, but perhaps many people are employed in bullshit jobs because they are not capable of anything more challenging.
"Bullshit job" has a specific meaning that's less about being in a pointless field-of-work (like adtech or many parts of fintech) and more about occupying a pointless role, regardless of the field. David Graeber (the originator of the term) gave the following examples [0]:
— Flunkies, who serve to make their superiors feel important, e.g., receptionists, administrative assistants, door attendants, store greeters
— Goons, who act to harm or deceive others on behalf of their employer, or to prevent other goons from doing so, e.g., lobbyists, corporate lawyers, telemarketers, public relations specialists
— Duct tapers, who temporarily fix problems that could be fixed permanently, e.g., programmers repairing shoddy code, airline desk staff who calm passengers with lost luggage
— Box tickers, who create the appearance that something useful is being done when it is not, e.g., survey administrators, in-house magazine journalists, corporate compliance officers, academic administration
— Taskmasters, who create extra work for those who do not need it, e.g., middle management, leadership professionals
My point stands. Its an incentive game. People work in BS fields because they pay more. People work BS jobs because again: they pay well. There is no incentive to work somewhere else.
We live in a very complex system, beyond any one persons comprehension. Some people think devolved decision making allocating resources to things like, advertising better, is the most efficient way of allocating resources. The invisible hand. How much is bullshit and how much is just beyond your awareness? If you were king and allocating so the work, would it be better? For who? I'm doubtful about bullshit jobs.
Twitter is a concrete demonstration of this. There were so many prognostications [0] that Twitter would imminently implode after downsizing from ~8k to ~1.5k employees following Musk's takeover, and when these claims never came to pass, it was a wake-up call to the rest of the industry [1].
Pretending the current iteration of twitter is anything remotely comparable to what existed before is pretty ridiculous. Other than grok, which is by far the worst of all the flavors of models out there (and very technically, made by one of musk's other companies), there haven't been any new features in years, even down to the terrible UI/UX has barely changed at all, and the particular "slant" the site takes in addition to the swarms of boosted bots out there rendered the site practically unusable for me in a very short period of time. I honestly don't understand people that still use it or what they could possibly get out of it. If there was any honest reporting about DAU/MAU I'd bet a large part of my paycheck it's way down from pre-musk levels.
Those are due to deliberate policy changes from Musk to boost engagement of his right-wing sycophants, not due to any technical failings. From a strictly technological point-of-view, Twitter works just as well as it did pre-takeover, and certainly did not catastrophically collapse as many predicted.
I would categorize what happened to the site and it being rendered unusable by anyone even halfway serious as catastrophic - but perhaps my bar is a little higher for the "smartest man in the world" than "I can still get a 200 response from the site" (which actually is also down, in terms of outages).
I agree that the site is barely usable, but that's entirely due to a shift in Twitter's userbase caused by top-down policy changes (e.g. boosting right-wing spam), not any engineering shortcomings.
If Musk had never purchased Twitter and Jack Dorsey performed the same reduction in engineering staff, I doubt the site would be materially different from how it was pre-Musk.
That's because software is immortal. It will continue to run even if you do nothing. What happens, though, is that stuff around it moves.
Of course twitter still works. Even with 0 engineers, it would still work. That's never been the goal of a software company. I can compile Mario 64 right here, right now, decades later. Should Nintendo just go home? Call it quits? Of course not.
It’s rhetoric like this that has created the market we have today.
The perceived success is not the same as actual success. Remember it is a private company and you don’t actually have any idea how bad the balance sheets were after the layoffs. Before the financial engineering that Musk did by using his other companies to invest in Twitter to preserve its valuation, the company was down almost 80%. [1] If public companies go down that route, they’ll very quickly find out what the actual impact of that model is.
Twitter's failures are solely due to Musk's changes in corporate governance (e.g. boosting fringe right-wing content causing its existing userbase and advertisers to flee the platform), not due to any engineering problems caused by reducing headcount. Strictly from an engineering standpoint, Twitter works just as well as it did before Musk took it over.
As I wrote in another post, if Musk had never purchased Twitter and Jack Dorsey performed the same reduction in engineering staff, I doubt the site would be materially different from how it was pre-Musk.
> Twitter works just as well as it did before Musk took over
Just because it works on your phone doesn’t mean there are no engineering problems behind the scene. You’re just not aware of the problems that exist because it’s a private company and you’re not privy to the information.
> Twitter works just as well as it did before Musk took it over.
Not true. The main reason I stopped clicking Twitter links in the first place was the abysmal chance of the tweet loading and not just displaying a generic "Failed to load. Try again?" after the takeover. I mean it occasionally happened before as well, but it became the default behavior.
It lasted long enough that by the time (over a year) they'd finally fixed it, the platform had deteriorated to a right-wing cesspool anyway.
There are going to be very few [*] repeated strings in this 100M line file, since each >seq.X will be unique and there are roughly a trillion random 4-letter (ACGT) strings of length 20. So this is really assessing the performance of how well a hashtable can deal with reallocating after being overloaded.
I did not have enough RAM to run a 100M line benchmark, but the following simple `awk` command performed ~15x faster on a 10M line benchmark (using the same hyperfine setup) versus the naïve `sort | uniq -c`, which isn't bad for something that comes standard with every *nix system.
awk '{ x[$0]++ } END { for(y in x) { print y, x[y] }}' <file> | sort -k2,2nr
The awk script is probably the fastest way to do this still, and it's faster if you use gawk or something similar rather than default awk. Most people also don't need ordering, so you can get away with only the awk part and you don't need the sort.
Using "literally" figuratively or, more precisely, as a hyperbolic intensifier [0], is a tradition employed by notable English writers who lived and died long before you were born.
It’s a bitter monkey paw irony: when you ask FOSS advocates how developers would be paid in a fully FOSS world where piracy cannot exist because all software is free, the answer is often “service contracts.”
The monkey paw curls. Now we live in a world where software is nothing but service contracts and more closed than ever.
It's the Westphalian system which includes not only (protestant) capitalism, but also scientific positivism, liberal humanism and everything else. Which we now call (post-/meta-)modern.
There's nothing we can do about all that and for practical reasons we just accept the world as is and tend to forget/ignore the reasons it is so. But for retaining cognitive sovereignty if think it's good to remember that.
[0] https://www.proshares.com/our-etfs/strategic/spxt (S&P minus tech stocks)
[1] https://www.defianceetfs.com/xmag/ (S&P minus "Magnificent 7")
reply