Write the lessons learned. The mesmerizing content is the content shared, whether there is a return or not.
You can either buy someone's attention or you can earn it. If you earn it, it's vastly deeper. You put good stuff out to the cosmos; usually, it will return good stuff. The more you share, the more you get.
Sharing what you learn is a source of energy, not hard work you want to avoid.
As an old saying goes, if you want to learn something, you should try teaching it.
Great insights, articsputnik. My journey with Vim began in a similar way. Initially, I was overwhelmed by the "Vim way" of doing things, but over time, the rationale behind its design choices began to make sense.
The idea that you're editing text more than you're writing it is profound. And as you rightly pointed out, the Vim language makes so much sense once you grasp its fundamental grammar. It's not just about memorizing commands, it's about understanding the logic and structure behind them, and then combining them in intuitive ways.
Regarding Neovim, I've been hearing a lot about it, especially in the context of Lua configurations and extensions. Do you think it's worth the switch for someone who's deeply invested in Vim? Or should they only consider it if they're looking to write plugins or heavily customize their setup?
Lastly, I agree wholeheartedly with your advice on when not to learn Vim. It's not for everyone, and there's no shame in that. Some people are perfectly content with other editors, and that's okay. But for those who are willing to climb that steep learning curve, the rewards are immense.
The goal with the open data stack is that companies can reuse existing battle-tested solutions and build on top of them instead of reinventing the wheel by re-implementing key components from the Data Engineering Lifecycle for each component of the data stack.
In the past, without these tools available, the story usually went something like this:
- Extracting: “Write some script to extract data from X.”
- Visualizing: “Let’s buy an all-in-one BI tool.”
- Scheduling: "Now we need a daily cron."
- Monitoring: "Why didn't we know the script broke?"
- Configuration: "We need to reuse this code but slightly differently."
- Incremental Sync: "We only need the new data."
- Schema Change: "Now we have to rewrite this."
- Adding new sources: "OK, new script..."
- Testing + Auth + Pagination: "Why didn't we know the script broke?"
- Scaling: "How do we scale up and down this workload?"
> The column-oriented storage format used by data warehouses allows them to efficiently leverage modern SIMD computer architectures for columnar-vectorized processing.
I find it interesting how these vectorized processing engines with DuckDB and Photon Engine of Databricks try to combine row and columnar-oriented strengths.
The author of the article underestimates the capabilities of the modern OLAP DBMS. The old beliefs are becoming less and less relevant.
> When to use a data warehouse
> Data warehouses are good for OLAP (online analytical processing) workloads such as the following:
> A small number of users, each of which may execute heavy analytics workloads
Not necessarily. ClickHouse is being used for user-facing analytics where every query can take in the order of 10 ms while supporting many concurrent queries.
> Downtime is permitted – generally not used as the one-and-only operational system
Not necessarily. ClickHouse is being used for HA setups replicated across multiple regions when the service has to survive the outage of a whole region.
- Language: SQL and Python Continue to Dominate, With Rust Looking Promising
- Abstraction: Custom ETL Is Replaced by Data Connectors
- Latency: Better Tools Available In The Streaming Domain
- Architecture: New Data Architectures Like Lakehouse and Semantic Layer Are on the Rise
- Trust: Data Quality and Observability Become an Essential
- Usability: The New Focus Is on Data Products and Data Contracts
- Openness: When It Comes to Data Tools, Open Source Is the Answer
- Standards: Database Open Standard Simplified by DuckDB
- Roles: Data Practitioners Expand Their Knowledge into Data Engineering
- Collaboration: DataOps Reduces Data Silos and Improves Teamwork
- Adoption: Data Engineering Is Key Regardless of Industry or Business Size
- Foundations: The Data Engineering Lifecycle is Evergreen
I agree that for new content, the initial time handwriting makes sense, before adding it to your second brain. For me, Writing is therapy!
I usually write as much as possible on my laptop because I can type as fast as I think. At the same time, my handwriting can’t keep up with my thinking speed. There is no slowing down and forgetting ideas or thoughts. On the other hand, I can reformat, re-arrange, add, and delete, which will help my thinking process which wouldn’t happen in my brain. The advantage of pen and paper is that I use different muscles and brain activities when I write, which helps me think differently. I usually use them when I need to outline my blog post, if I’m stuck or distracted on something, or if I go out in nature and only bring my physical journal.
Also, when writing journals or other ideas within my second brain, I can start connecting them. Improving my thoughts over time and generally refine easily and reading them, whereas, on paper, notebooks get lost over time, and finding the right things when needed is very hard.
I love the article, and it gave me the essential pointers to start with. I loved it so much that it inspired me to write a piece focusing on python vs. rust in data engineering. I leave the article here, in case of interest: https://airbyte.com/blog/rust-for-data-engineering
A new way of building a glossary. It's a second brain for data. Instead of having single terms on the top level, you go into the glossary and explore related ideas/definitions.
I created a small guide with the most important topics on a Data Lake and Lakehouse containing the following chapters.
- What is a Data Lake & Why do you need one?
- Differences between Lakehouse & Warehouse
- Components of Data Lake
- Storage Layer (AWS S3, Azure Blob Storage, Google Cloud Storage)
- File Format (Apache Parquet, Avro, ORC)
- Table Format (Delta Lake, Apache Hudi, and Iceberg)
- Trends in the Market
- How to turn it into a Lakehouse
I hope that's interesting to one or the other. Curious to hear your thoughts and opinions. What's your Data Lake Table Format of choice, and why?
Sharing what you learn is a source of energy, not hard work you want to avoid.
As an old saying goes, if you want to learn something, you should try teaching it.
reply