Hacker Newsnew | past | comments | ask | show | jobs | submit | pablomendes's commentslogin

In what kinds of workloads or usage patterns do you see the biggest performance gains vs traditional FaaS + storage stacks?


In a nutshell, data and AI workloads require fast re-building and vertical scaling:

1) you should not need to redeploy a Lambda if you you're running January and February vs only January now. In the same vein, you should not need to redeploy a lambda if you upgrade from pandas to polars: rebuilding functions is 15x faster than lambda, 7x snowpark (-> https://arxiv.org/pdf/2410.17465)

2) the only way (even in popular orchestrators, e.g. Airflow, not just FaaS) to pass data around in DAGs is through object storage, which is slow and costly: we use Arrow as intermediate data format and over the wire, with a bunch of optimizations in caching and zero-copy sharing to make the development loop extra-fast, and the usage of compute efficient!

Our current customers run near real-time analytics pipelines (Kafka -> S3 / Iceberg -> Bauplan run -> Bauplan query), DS / AI workloads and WAP for data ingestion.


What does the tech stack look like?


Is the course focused on LLMs used to generate text or does it also talk about other kinds of testing like search, images, etc?


Cool! What's next in the roadmap?


The main thing we need to add is metadata filtering, as that's required for a lot of use cases. We're also thinking about adding hybrid search support and multi-factor ranking.


Congrats on the launch! I'm an inveterate search nerd so I can't help but ask how did you implement search?


What are you using for the search tech?


Cassandra and Elastic Search


That statement is both kind of true and, well, revisionist. Originally there was a strong focus on logics, clean comprehensive modeling of the world through large complicated ontologies, and the adoption of super impractical representation languages, etc. It wasn't until rebellious sub-communities went rogue and pushed for pragmatic simplifications that things got any widespread impact at all. So here's to the crazy ones, I guess.


One thing that works well for me is going working-to-working. Get a simple, de-scoped, incomplete, probably crappy version done end-to-end. Now it's not about finishing, it's about improving. And if it was worth building in the first place, it will beg for improvement. And then it's easier to just keep turning the crank, working-to-working.


In Pragmatic Programmer, this is called the "tracer bullet" approach. Basically just another word for end-to-end, but with the emphasis that by completing a slice end-to-end you'll get feedback much faster (as in the feedback tracer bullets provide when shooting a target).


So... MVP?


Hmm, how about... End-to-end: "can really implement?". Tracer-bullet: "can adjust aim?". MVP: "can make user happy?". Though there's more to each.

So then market fit exploration benefits from tracer-bullet, but you might do iterative reimplementation instead. And it might be overkill for a one-shot market test. MVP is largely orthogonal to end-to-end - "Submit button emails founder who does the thing by hand over breakfast" is fine MVP but isn't very end-to-end. And a non-MVP non-product end-to-end can be tracer-bullet or not. A soundly architected low-debt end-to-end yes, a hackathon high-debut "to adjust, rewrite", or a throwaway end-to-end exploratory spike, no.


Hey @afandian, can I interview you about the tech stack you all used to build search at Crossref.org?


Always happy to share more details about how we build open scholarly infrastructure! Can do here, or reply with contact details.


Are you planning to implement search and filter tags?


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: