More

izyda · 2025-06-03T00:10:37 1748909437

New Hedge fund | Sr/Staff Frontend Engineer | On-site (NYC)

We are a new discretionary, tech industry focused hedge fund, founded by a portfolio manager with 15+ years of experience at a Tiger Cub. I lead the data/technology team here and previously spent ~6 years in a similar role at a well known Tiger Cub.

We are hiring for a frontend senior/staff engineer for our internal research management web application.

Our tech stack: * Frontend: Typescript, Vue3 * Backend: Cloudflare, Supabase/Postgresql, Typescript *

Obviously, we are not the only ones trying to figure out how to use AI in investing but in a world where there are lots of people trying to build picks & shovels, this is an opportunity to be one of the goldminers at a greenfield, new firm. This role makes sense for people who are interested in finance/investing (although you do not necessarily have to have experience in the industry).

We're in full-time in office in NYC, so that's a requirement for FTE. We are potentially open to exceptional contractors located remotely.

We work very hard, hours are long, and people are very dedicated to the mission. It is not for everyone, but for those interested in competing in financial markets, love startup atmosphere without near-term existential risk, and want to work with super smart colleagues (both technical and investors!), this could be a great fit.

Target salary 180-250K, depending on experience + Bonus/Carry (hedge fund equity equivalent)

Contact: [email protected] , I'm Alex.

work_hard93 · 2025-06-06T04:29:29 1749184169

Interested and emailed

izyda · 2025-05-18T05:00:00 1747544400

I work on alternative data in the hedge fund industry. We're not quants -- we don't try to predict the stock market... instead, we try to forecast how individual companies are performing using aggregated clickstream, point-of-sale, and payments data. It's a data cleaning, timeseries, and modeling problem with a lot of domain knowledge necessary.

LLMs can be helpful (Ie. for example, entity resolution for data cleaning) but the core models you have to use to actually make the predictions (this looks a lot more like "old" tabular data approaches).

itsmekali321 · 2025-05-18T18:28:45 1747592925

If i may ask, who are your clients? The investors or the companies themselves?

This seems like a fun(i mean enjoyable) domain.

Also, again, if i may ask, what is your field of study? Is it related to finance or statistics?

izyda · 2025-05-05T16:47:50 1746463670

New Hedge fund | Sr Frontend/Fullstack Tech Lead, AI Engineer | On-site (NYC)

We are a new discretionary, tech industry focused hedge fund, founded by a portfolio manager with 15+ years of experience at a Tiger Cub. I lead the data/technology team here and previously spent ~6 years in a similar role at a well known Tiger Cub.

We are hiring for a frontend (or fullstack) tech lead for our internal research management web application.

We also are hiring talented engineers interested in applying AI to the investment research process, focused on

Our tech stack: * Frontend: Typescript, Vue3 * Backend: Cloudflare, Supabase/Postgresql, Typescript * Data Stack: S3/R2, Snowflake, Iceberg, Dagster, dbt

Obviously, we are not the only ones trying to figure out how to use AI in investing but in a world where there are lots of people trying to build picks & shovels, this is an opportunity to be one of the goldminers at a greenfield, new firm. This role makes sense for people who are interested in finance/investing (although you do not necessarily have to have experience in the industry). You can read a little bit about the type of data work we do here: https://magis.substack.com/p/how-to-do-alt-data-research

Target salary 180-250K, depending on experience + Bonus/Carry (hedge fund equity equivalent)

Contact: [email protected] , I'm Alex.

izyda · 2024-06-12T01:09:03 1718154543

Any interest in getting this dataset on Snowflake too?

(I run a Snowflake funded data provider on Snowflake marketplace)

dang · 2024-06-12T01:14:25 1718154865

The BigQuery dataset is entirely 3rd party, based on the public HN data (presumably via the HN Firebase API - https://github.com/HackerNews/API), so anyone is free to do this.

wietsevenema · 2024-06-12T18:17:28 1718216248

Yes, that’s how it works. Same data as in the API.

Raed667 · 2024-06-12T13:24:34 1718198674

a bit off topic: is there a way to display a HN page using the firebase API/sdk without doing N+1 requests ?

(sorry for noob question)

dang · 2024-06-12T17:49:59 1718214599

No, unfortunately. We're going to make a new API that will be much easier, but there's at least one other major project that needs to get done first.

Raed667 · 2024-06-12T19:22:08 1718220128

Thanks for the response!

izyda · on May 25, 2024

The two other main competitors, aimed at startups (and really venture investors) are Pitchbook and CB Insights.

There are several equivalent products like CapitalIQ (owned by S&P Global), Preqin, and offerings from Factset/Refinitiv that are aimed more at private equity investors (later stage) but also include some startup data.

Finally, there are specialized startup data providers like Harmonic.ai (in depth scraping of stealth startups), G2 (Yelp for enterprise software), or Clay.run (innovative UI) that all specialize in something specific but are not at the scale of the above.

How do they get the data?

The first place is SEC Form D Filings. These are required in the US after private funding rounds (lots of caveats, if, buts, etc. but let's keep it simple). This data alone can give you a decent database to start with. After that, it is web-scraping news articles, news wires, LinkedIn, etc. For very specialized areas (ie. Dev Tools), specialized data sources (say Github Archives) might be useful.

Most importantly, many of these providers aim for give-to-get dynamics. Once they become popular enough, startups will actually seek out having a profile (create data) or fix incorrect data (contribute). This is a great dynamic, of course, because it essentially creates proprietary but free data collection.

Websites like TheOrg.com have done a nice job with org charts -- they take a guess at who you report to... and a lot of employees, annoyed at being "layered", will freely fix the data. If you get enough volume, you create a give-to-get flywheel.

I agree with you what is valuable here is the proprietary data. But, behind that, is the _process_ for creating the proprietary data. You could get very good at web-scraping, parsing esoteric government filings, etc. And, maybe that space can get disrupted by someone better (say with LLMs). But ultimately, if you can get users to contribute data -- that's the "promised land" in DaaS.

I also think UI/interface is not value-less. Companies like Clay.run have done a great job making proprietary data accessible to more users. There is value there -- but the data owner collects a (fair) toll on that.

robinyapockets · on May 28, 2024

Thanks for this overview!!!

SEC filings + crowd sourcing content seems the way to go. Plus, who wouldn't want to celebrate their latest funding round :)

Curious, how much would you pay for a service where you get the same data as crunchbase, but with a delightful UI, focused on Pre Seed to A, in a vertical like "Dev Tools"?

izyda · on May 29, 2024

I think there is a subset of VCs that would pay for this.. unfortunately, that very particular subset of VCs has the smallest budget to pay for things based on their fixed fees/fund sizes.

Firms like CapitalIQ or Pitchbook have their largest contract with giant asset managers for whom a 6- or 7-figure deal would be a very small percentage of AUM (and thereby small percentage of management fees).

For angels/seed stage VCs, you are likely looking at "pro-sumer" like prices. So, something like 100-1000/month at most.

izyda · on April 27, 2024

There is a new approach of using static site generators to make BI pages feel instantly responsive.

- Evidence.dev (https://evidence.dev/) - Observable Framework (https://observablehq.com/framework/)

I have found that both the speed and the frontend control either of these tools gives you is pretty good (with Evidence looking better out-of-the-box just in my personal opinion).

The main problem I always had with the embedded versions of Tableau, Looker, etc. is that they felt super canned (it was obvious it was a poorly/"lightly" white labeled solution -- when you see an embedded Tableau dashboard, you _know_ it is Tableau, etc) and they were slow.

More on this here: https://magis.substack.com/p/an-observation-on-dashboard-spe...

PS -- I would add that the "headless" version of the above tools that I have seen is https://cube.dev/

rogansage · on April 29, 2024

What do you think of a solution where it's headless (i.e bring your own charts / components) but with a no-code builder that gives you the benefits of an off-the-shelf tool to manage and update easily? (that's what we're building at embeddable.com)

izyda · on April 29, 2024

It sounds like an interesting product but not a fit for us:

- We actually don't want no-code. We want configuration based (or something else that can fit into version control). - We want very nice, very customizable charts, but don't want to bring our own (our team is Python/SQL and data based; no one writes Javascript)

izyda · on March 24, 2024

I loved this game and then this style of game (all the Tycoons, SimCity, AoE, etc.).

I did not know it, but it was my first experience with economics. I mistakenly thought that this is what policy planners _actually_ did in the real world. Imagine the disappointment when I found out that was not true.

Nonetheless, it ultimately inspired me to go into data science in market/competitive intelligence. First at hedge funds and now as my own startup.

I have never been able to shake the notion that build a real-time view of the real economy was the most interesting thing to work on.

izyda · on March 22, 2024

I do not have a horse in the race, but it is interesting to see open source comparisons to traditional timeseries strategies: https://github.com/Nixtla/nixtla/tree/main/experiments/amazo...

In general, the M-Competitions (https://forecasters.org/resources/time-series-data/), the olympics of timeseries forecasting, have proven frustrating for ML methods... linear models do shockingly well and the ML models that have won, generally seem to be variants of older tree-based methods (ie. LightGBM is a favorite).

Will be interesting to see whether the Transformer architecture ends up making real progress here.

wenc · on March 22, 2024

They are comparing a non-ensembled transformer model with an ensemble of simple linear models. It's not surprising that the ensemble models of linear time series models will do well, since ensembles optimize for the bias-variance trade-off.

Transformer/ML models by themselves have a tendency to overfit past patterns. They pick up more signal in the patterns, but they also pick up spurious patterns. They're low bias but high variance.

It would be more interesting to compare an ensemble of transformer models with an ensemble of linear models to see which is more accurate.

(that said, it's pretty impressive that an ensemble of simple linear models can beat a large scale transformer model -- this tells me the domain being forecast has a high degree of variance, which transformer models by themselves don't do well on.)

gradascent · on March 22, 2024

fyi I think you have bias and variance the wrong way around. Over-fitting indicates high variance

wenc · on March 22, 2024

Thank you for catching that. Corrected.

hackerlight · on March 22, 2024

> ensemble of transformer models

Isn't that just dropout?

mikkom · on March 22, 2024

No. Why do you think so?

hackerlight · on March 23, 2024

Geoffrey Hinton describes dropout that way. It's like you're training different nets each time dropout changes.

wenc · on March 23, 2024

Dropout is different from ensembles. It is a regularization method.

It might look like an ensemble because you’re selecting different subsets but ensembles combine different independent models rather than just subset models.

wenc · on March 23, 2024

That said random forests are an internal ensemble, so I guess that could work.

In my mind an ensemble is like a committee. For it to be effective, each member should be independent (able to pick up different signals) and have a greater than random chance of being correct.

hackerlight · on March 24, 2024

I am aware it is not literally an ensemble model, but Geoffrey Hinton says it achieves the same thing conceptually and practically.

one_buggy_boi · on March 22, 2024

Are these models high risk because of their lack of interpratability? Specialized models like temporal fusion transformers attempt to solve this but in practice I'm seeing folks torn apart when defending transformers against model risk committees within organizations that are mature enough to have them.

tomrod · on March 22, 2024

Interpretability is just one pillar to satisfy in AI governance. You have build submodels to assist with interpreting black box main prediction models.

rdedev · on March 22, 2024

Is there a way to directly train transformer models to output embeddings that could help tree based models downstream? For tabular data tree based models seems to be the best but I feel like foundational models could help them in some way

izyda · on March 7, 2024

It has been very impressive to see what LLMs can do for transforming data into useful structured datasets in Snowflake.

Snowflake Cortex has been critical in this process for open-source and Mistral models. We have also found that using GPT4 and Claude3 to be meaningful within Snowflake. Setting up connections between Snowflake and these LLM providers is not too complex but is annoying, so we've done it once so you don't have to.

izyda · on Feb 22, 2024

The lack of revocability, marginal temporal value, and downstream governance I think makes the prospect of more such data deals happening slim -- or at least, slim without regret.

I wrote an essay on this here: https://magis.substack.com/p/llm-data-sales-a-market-for-lem...