More

sspaeti · 2025-10-30T11:52:48 1761825168

Write the lessons learned. The mesmerizing content is the content shared, whether there is a return or not. You can either buy someone's attention or you can earn it. If you earn it, it's vastly deeper. You put good stuff out to the cosmos; usually, it will return good stuff. The more you share, the more you get.

Sharing what you learn is a source of energy, not hard work you want to avoid.

As an old saying goes, if you want to learn something, you should try teaching it.

sspaeti · 2025-10-17T13:17:06 1760707026

oh this is amazing. I'm using https://boxy-svg.com/ for SVGs generated by Claude, and then edit and update it myself. SVG is very powerful.

sspaeti · on Oct 4, 2023

Great insights, articsputnik. My journey with Vim began in a similar way. Initially, I was overwhelmed by the "Vim way" of doing things, but over time, the rationale behind its design choices began to make sense.

The idea that you're editing text more than you're writing it is profound. And as you rightly pointed out, the Vim language makes so much sense once you grasp its fundamental grammar. It's not just about memorizing commands, it's about understanding the logic and structure behind them, and then combining them in intuitive ways.

Regarding Neovim, I've been hearing a lot about it, especially in the context of Lua configurations and extensions. Do you think it's worth the switch for someone who's deeply invested in Vim? Or should they only consider it if they're looking to write plugins or heavily customize their setup?

Lastly, I agree wholeheartedly with your advice on when not to learn Vim. It's not for everyone, and there's no shame in that. Some people are perfectly content with other editors, and that's okay. But for those who are willing to climb that steep learning curve, the rewards are immense.

sspaeti · on Jan 5, 2023

The goal with the open data stack is that companies can reuse existing battle-tested solutions and build on top of them instead of reinventing the wheel by re-implementing key components from the Data Engineering Lifecycle for each component of the data stack.

In the past, without these tools available, the story usually went something like this:

- Extracting: “Write some script to extract data from X.” - Visualizing: “Let’s buy an all-in-one BI tool.” - Scheduling: "Now we need a daily cron." - Monitoring: "Why didn't we know the script broke?" - Configuration: "We need to reuse this code but slightly differently." - Incremental Sync: "We only need the new data." - Schema Change: "Now we have to rewrite this." - Adding new sources: "OK, new script..." - Testing + Auth + Pagination: "Why didn't we know the script broke?" - Scaling: "How do we scale up and down this workload?"

sspaeti · on Dec 19, 2022

> The column-oriented storage format used by data warehouses allows them to efficiently leverage modern SIMD computer architectures for columnar-vectorized processing.

I find it interesting how these vectorized processing engines with DuckDB and Photon Engine of Databricks try to combine row and columnar-oriented strengths.

zX41ZdbW · on Dec 19, 2022

The author of the article underestimates the capabilities of the modern OLAP DBMS. The old beliefs are becoming less and less relevant.

> When to use a data warehouse > Data warehouses are good for OLAP (online analytical processing) workloads such as the following:

> A small number of users, each of which may execute heavy analytics workloads

Not necessarily. ClickHouse is being used for user-facing analytics where every query can take in the order of 10 ms while supporting many concurrent queries.

> Downtime is permitted – generally not used as the one-and-only operational system

Not necessarily. ClickHouse is being used for HA setups replicated across multiple regions when the service has to survive the outage of a whole region.

sspaeti · on Dec 11, 2022

Excerpt:

    - Language: SQL and Python Continue to Dominate, With Rust Looking Promising
    - Abstraction: Custom ETL Is Replaced by Data Connectors
    - Latency: Better Tools Available In The Streaming Domain
    - Architecture: New Data Architectures Like Lakehouse and Semantic Layer Are on the Rise
    - Trust: Data Quality and Observability Become an Essential
    - Usability: The New Focus Is on Data Products and Data Contracts
    - Openness: When It Comes to Data Tools, Open Source Is the Answer
    - Standards: Database Open Standard Simplified by DuckDB
    - Roles: Data Practitioners Expand Their Knowledge into Data Engineering
    - Collaboration: DataOps Reduces Data Silos and Improves Teamwork
    - Adoption: Data Engineering Is Key Regardless of Industry or Business Size
    - Foundations: The Data Engineering Lifecycle is Evergreen

sspaeti · on Nov 24, 2022

I agree that for new content, the initial time handwriting makes sense, before adding it to your second brain. For me, Writing is therapy!

I usually write as much as possible on my laptop because I can type as fast as I think. At the same time, my handwriting can’t keep up with my thinking speed. There is no slowing down and forgetting ideas or thoughts. On the other hand, I can reformat, re-arrange, add, and delete, which will help my thinking process which wouldn’t happen in my brain. The advantage of pen and paper is that I use different muscles and brain activities when I write, which helps me think differently. I usually use them when I need to outline my blog post, if I’m stuck or distracted on something, or if I go out in nature and only bring my physical journal.

Also, when writing journals or other ideas within my second brain, I can start connecting them. Improving my thoughts over time and generally refine easily and reading them, whereas, on paper, notebooks get lost over time, and finding the right things when needed is very hard.

sspaeti · on Nov 13, 2022

I love the article, and it gave me the essential pointers to start with. I loved it so much that it inspired me to write a piece focusing on python vs. rust in data engineering. I leave the article here, in case of interest: https://airbyte.com/blog/rust-for-data-engineering

sspaeti · on Sept 13, 2022

A new way of building a glossary. It's a second brain for data. Instead of having single terms on the top level, you go into the glossary and explore related ideas/definitions.

sspaeti · on Aug 25, 2022

I created a small guide with the most important topics on a Data Lake and Lakehouse containing the following chapters.

- What is a Data Lake & Why do you need one? - Differences between Lakehouse & Warehouse - Components of Data Lake - Storage Layer (AWS S3, Azure Blob Storage, Google Cloud Storage) - File Format (Apache Parquet, Avro, ORC) - Table Format (Delta Lake, Apache Hudi, and Iceberg) - Trends in the Market - How to turn it into a Lakehouse

I hope that's interesting to one or the other. Curious to hear your thoughts and opinions. What's your Data Lake Table Format of choice, and why?