So this is just showing a bit of your ignorance of stats.
The general notion of compound risk is not specific to MSE loss. You can formulate it for any loss function, including L1 loss which you seem to prefer.
Steins paradox and James Stein estimator is just a special case for normal random variables and MSE loss of the more general theory of compound estimation, which is trying to find an estimator which can leverage all the data to reduce overall error.
This idea, compound estimation and James-Stein, is by now out-dated. Later came the invention of empirical Bayes estimation and the more modern bayesian hierarchical modelling eventually once we had compute for that.
One thing you can recover from EB is the James-Stein estimator, as a special case, in fact, you can design much better families of estimators that are optimal with respect to Bayes risk in compound estimation settings.
This is broadly useful in pretty much any situation where you have a large scale experiment where many small samples are drawn and similar stats are computed in parallel, or when the data has a natural hierarchical structure. For examples, biostats, but also various internet data applications.
so yeah, suggest to be a bit more open to ideas you dont know anything about. @zeroonetwothree is not agreeing with you here, they're pointing out that you cooked up an irrelevant "example" and then claim the technique doesnt make sense there. Of course, it doesnt, but thats not because the idea of JS isnt broadly useful.
----
Another thing is that JS estimator can be viewed as an example of improving overall bias-variance by regularization, although the connection to regularization as most people in ML use it is maybe less obvious. If you think regularization isn't broadly applicable and very important... i've got some news for you.
To me, Polars feels like almost exactly how I would want to redesign Pandas interfaces for small - medium sized data processing, given my previous experience with Pandas and PySpark. Throw out all the custom multi index nonsense, throw out numpy and handle types properly, memory map arrow, focus on method chaining interface, do standard stuff like groupby and window functions in the standard way, and implement all the query optimizations under the hood that we know make stuff way faster.
To be fair, Polars has the benefit of hindsight and designing their interfaces and syntax from scratch. The poor choices in Pandas were made long ago, and its adoption and evolution into the most popular dataframe library for python feels like mostly about timing the market than having the best software product.
1. load and process / aggregate in polars to get the smaller dataset that goes into your plot.
2. df.to_pandas()
3. apply your favourite vis library that works with pandas.
There's no use case i can think of where building a data viz interface more specific to polars than this is beneficial or necessary.
> There is Parquet. It is very efficient with it’s columnar storage and compression. But it is binary, so can’t be viewed or edited with standard tools, which is a pain.
The ability to communicate math this way is honestly rare. It comes from a combination of deep understanding, long experience in communicating math, and a certain level of "culturing" that is specific to the academic experience.
Among the best, Feynman was singular in his ability to communicate math and physics.
In other words, don't be so hard on the teachers who were disappointing in comparison to the stellar examples you see from top mathematical communicators. What you're reading is quite rare and, while education quality could certainly improve, its not fair to expect this of a 5th grade teacher who covers 5 topics in a day. Even for the best, developing this type of material takes time and thought that a school teacher probably does not have.
Disagree. The purpose of a textbook and a lecture is very different. A good textbook can be a helpful resource for teaching and lecturing, but it is not sufficient to guarantee high quality math education. Conversely, a good educator who deeply understands the material can deliver fantastic education without a good textbook. Claiming that profs writing bad textbooks is the cause of poor quality in class math instruction is absurd.
> Claiming that profs writing bad textbooks is the cause of poor quality in class math instruction is absurd.
It certainly doesn't help. The tendency to pile more and more into standards, and then to have haphazard treatment in the textbooks, with problems that don't make sense... isn't great.
Stick a new teacher in the classroom, and they're going to run their book's recommended pacing and content. And even a veteran is probably going to lean on the book a lot in a pinch.
And, your course needs to fit together with 2 other teachers who are too likely to be running the absurd pacing and content in the courses before and after yours. The rushed pace leaves no choice but to devote a huge fraction of the time to procedural knowledge.
The net result doesn't serve anyone: the top students are left unchallenged and without the context and enrichment that could let them really grow. The bottom students are in painful struggle. And the middle are perpetually slightly confused, learning specific tools that they'll immediately forget when the unit completes.
Yeah, the only common theme I see in causal inference research is that every method and analysis eventually succumbs to a more thorough analysis that uncovers serious issues in the assumptions.
Take for instance the running example of catholic schoolings effect on test scores used by the boook Counterfactuals and Causal Inference. Subsequent chapter re-treat this example with increasingly sophisticated techniques and more complex assumptions about causal mechanisms, and each time they uncover a flaw in the analysis using techniques from previous chapters.
My lesson from this: outcomes causal inference is very dependent on assumptions and methodologies, of which the options are many. This is a great setting for publishing new research, but its the opposite of what you want in an industry setting where the bias is/should be towards methods that are relatively quick to test and validate and put in production.
I see researchers in large tech companies pushing for causal methodologies, but I'm not convinced they're doing anything particularly useful since I have yet to see convincing validation on production data of their methods that show they're better than simpler alternatives which will tend to be more robust.
> My lesson from this: outcomes causal inference is very dependent on assumptions and methodologies, of which the options are many.
This seems like a natural feature of any sensitive method, not sure why this is something to complain about. If you want your model to always give the answer you expected you don't actually have to bother collecting data in the first place, just write the analysis the way pundits do.
Because with real world data like in production in tech there are so many factors to account for. Brittle methods are more susceptible to unexpected changes in the data or unexpected ways in which complex assumptions abut the data fail.
From my experience propensity scores + ipw really doesn't get you far in practice. Propensity scoring models rarely balance all the covariates well (more often, one or two are marginally better and some may be worse than before). On top of that, IPW either assumes you don't have any cases of extreme imbalance, or, if you do you end up trimming weights to avoid adding additional variance, but in some cases you do even with trimmed weights..
It may be ideal to have all these items checked off. I think a productive way to look at this is "how many of these items does the roadmap check"?
If the answer is "very few" then that might be an early warning sign you're working for a product or org that is headed for serious problems. As a technical IC who can't hope to solve those large scale management problems, this can be useful a red flag. Don't wait for shit to hit the fan to leave.
I have seen a large scale failure of this kind and the items listed here line up very well with some of the root causes I observed.
- Is the roadmap flexible or iterative? The roadmap was hard, aggressive business targets.
- Are the roadmap initiatives scoped and prioritized based on evidence? The roadmap initiatives were derived by working backwards from business targets and then evidence was found after the fact.
- Does the roadmap identify major dependencies or risks? Many risks were identified much later because techincal teams were not part of input to initial planning.
- Does the roadmap feel aggressive but achievable? Aggressive but not physically possible.
- Does the roadmap take on appropriate risk? No, there was multiple possible independent points of failure.
Anyways, if you ever see this kind of product culture I suggest running for the hills unless you like having your time wasted. And if you are a technical manager I hope you push back like hell when presented with this situation.
The general notion of compound risk is not specific to MSE loss. You can formulate it for any loss function, including L1 loss which you seem to prefer.
Steins paradox and James Stein estimator is just a special case for normal random variables and MSE loss of the more general theory of compound estimation, which is trying to find an estimator which can leverage all the data to reduce overall error.
This idea, compound estimation and James-Stein, is by now out-dated. Later came the invention of empirical Bayes estimation and the more modern bayesian hierarchical modelling eventually once we had compute for that.
One thing you can recover from EB is the James-Stein estimator, as a special case, in fact, you can design much better families of estimators that are optimal with respect to Bayes risk in compound estimation settings.
This is broadly useful in pretty much any situation where you have a large scale experiment where many small samples are drawn and similar stats are computed in parallel, or when the data has a natural hierarchical structure. For examples, biostats, but also various internet data applications.
so yeah, suggest to be a bit more open to ideas you dont know anything about. @zeroonetwothree is not agreeing with you here, they're pointing out that you cooked up an irrelevant "example" and then claim the technique doesnt make sense there. Of course, it doesnt, but thats not because the idea of JS isnt broadly useful.
----
Another thing is that JS estimator can be viewed as an example of improving overall bias-variance by regularization, although the connection to regularization as most people in ML use it is maybe less obvious. If you think regularization isn't broadly applicable and very important... i've got some news for you.