More

mike_heffner · 2025-11-19T16:01:10 1763568070

there went 10 mins of debugging why kafka wouldn't start

berkeymis · 2025-11-27T18:11:17 1764267077

there went 6 hours of debugging why my applications are getting out of memory errors when trying to connect to my inside-docker-running kafka. they were getting an actual answer from antigravity but because of the first 4 bytes of the message, the clients were thinking a huuuuge message is coming and tried to allocate byte array memory for that. such an unthoughtful decision to actually use one of the most common used ports for a developer when developing an actual IDE for DEVELOPERS.

mike_heffner · 2025-06-05T22:56:07 1749164167

Hi all — one of the authors of Rotel here. Thanks for the kind words, Bilal and Michael.

We're excited to test our Clickhouse integration with Clickstack, as we believe OTel and Clickhouse make for a powerful observability stack. Our open-source Rust OpenTelemetry collector is designed for high-performance, resource-constrained environments. We'd love for you to check it out!

smetj · 2025-06-06T07:15:28 1749194128

wow didn't know about rotel ... looks very interesting indeed. Especially those python bindings ... Bookmarked!

mike_heffner · 2025-05-14T19:51:18 1747252278

Thanks for sharing this — it’s a really promising direction. The advantages of Arrow for OTLP, especially when used end-to-end, are compelling given the protocol overhead of OTLP.

We’ve been thinking along similar lines with the use of Rust, particularly for OpenTelemetry collection in environments where high performance and low resource overhead are critical, such as edge and serverless. With that in mind, we’ve open-sourced a lightweight OpenTelemetry collector written in Rust to address these use cases. We’ve also developed a native Lambda extension around it, and have seen encouraging interest from folks aiming to improve cold start times.

The project is still fairly early, but we’re optimistic that Rust can open up new opportunities for efficient observability pipelines. Vendors like Datadog are also moving in this direction with their Lambda extension and appear to be adopting Rust more broadly for data-plane components.

If this resonates, feel free to take a look here: https://github.com/streamfold/rotel. We’d love to hear your thoughts on how this could be useful.

mike_heffner · on Nov 22, 2023

Just in time for Advent of Code.

marshallward · on Nov 22, 2023

Unfortunately I don't think the compilers will add it in time.

mike_heffner · on Sept 17, 2022

A campaign you only launch late on a Friday

mike_heffner · on Oct 13, 2019

Very cool, a similar site I use is: https://onetsp.com.

mike_heffner · on June 1, 2018

SolarWinds Cloud | Sr Data Engineer | SF / US-REMOTE | Full-time | https://solarwinds.jobs/jobs/?q=cloud

We're looking for a full-time software engineer to take a key role in building the large-scale distributed systems that power Solarwinds Cloud products: Papertrail (Real Time Logging), AppOptics (Server, Infrastructure, Application Performance Monitoring and Distributed Tracing), Pingdom (DEM) and Loggly (Structured Log Analysis).

We’re a small team so everyone has the opportunity to have a big impact. We’ve built our platform out largely on Java8 Dropwizard services, a handful of Golang services and some C++ where performance is critical. We leverage Kafka as our main service bus, Cassandra for long term storage, our in-house stream processing framework for online analytics, ClickHouse for large scale log storage, and we rely on Zookeeper as a core part of intra/inter-service coordination. Our data pipeline pushes millions of messages a second and 50TB of logs per day.

All team members, whether in one of our offices or those remote, commit code to Github, communicate over Slack and Hangouts, push code to production via our ChatOps bot, and run all production applications on AWS. We also use an array of best-breed SaaS applications to get code to production quickly and reliably. We are a team that is committed to a healthy work/life balance.

At SolarWinds Cloud you get all the benefits of a small startup, with the backing of a big company so there is no worry about the next round of funding. SolarWinds offers competitive bonus and matching 401k programs that create an attractive total compensation package.

Learn more at: https://solarwinds.jobs/jobs/?q=cloud or contact me directly at mike-at-solarwinds.cloud (no recruiters).

mike_heffner · on May 1, 2018

SolarWinds Cloud | Sr Data Engineer | SF / US-REMOTE | Full-time | https://solarwinds.jobs/jobs/?q=cloud

We're looking for a full-time software engineer to take a key role in building the large-scale distributed systems that power Solarwinds Cloud products: Papertrail (Real Time Logging), AppOptics (Server, Infrastructure, Application Performance Monitoring and Distributed Tracing), Pingdom (DEM) and Loggly (Structured Log Analysis).

We’re a small team so everyone has the opportunity to have a big impact. We’ve built our platform out largely on Java8 Dropwizard services, a handful of Golang services and some C++ where performance is critical. We leverage Kafka as our main service bus, Cassandra for long term storage, our in-house stream processing framework for online analytics, ClickHouse for large scale log storage, and we rely on Zookeeper as a core part of intra/inter-service coordination. Our data pipeline pushes millions of messages a second and tens of terabytes of logs per day.

All team members, whether in San Francisco, one of many offices, or remote, commit code to Github, communicate over Slack and Hangouts, push code to production via our ChatOps bot, and run all production applications on AWS. We also use an array of best-breed SaaS applications to get code to production quickly and reliably. We are a team that is committed to a healthy work/life balance.

At SolarWinds Cloud you get all the benefits of a small startup, with the backing of a big company so there is no worry about the next round of funding. SolarWinds offers competitive bonus and matching 401k programs that create an attractive total compensation package.

This is an example of some of the technology we build and work with on a regular basis: http://www.heavybit.com/library/blog/streamlining-distribute....

Learn more at: https://solarwinds.jobs/jobs/?q=cloud or contact me directly at mike-at-solarwinds.cloud (no recruiters).

mike_heffner · on Jan 13, 2018

Would love to know if anyone else had data on:

* Impact on M5/C5 instances over similar time period, any difference with the Nitro hypervisor?

* Were Dedicated instances (https://aws.amazon.com/ec2/purchasing-options/dedicated-inst...) patched as well?

* Other examples of software that adapted batching performance automatically with increase in call latency.

cthalupa · on Jan 13, 2018

Not able to answer your questions, but a comment on the article -

>During this same time period, we saw additional CPU increases on our PV instances that had been previously upgraded. This seems to imply some level of HVM patching was occurring on these PV instances around the same time that all pure-HVM instances were patched

This is likely due to Vixen: https://lists.xenproject.org/archives/html/xen-devel/2018-01...

>.... Instead of trying to make a KPTI-like approach work for Xen PV, it seems reasonable to run a copy of Xen within an HVM (or PVH) domU ..... >.... all PV instances in EC2 are using this ....

So the initial bump after the reboot would have been the shim hypervisor which mitigates Vixen. The secondary bump, and bump the native HVM instances saw, would have been the Spectre related stuff.

Based on https://aws.amazon.com/security/security-bulletins/AWS-2018-... - guessing Intel microcode updates

taf2 · on Jan 13, 2018

We had a lot of m5 and c5 servers randomly die. It was as if someone was running chaos monkey from Netflix in our VPCs...

otterley · on Jan 13, 2018

Likewise. Can you reach out to me privately? I'd love to have independent corroboration.

_msw_ · on Jan 15, 2018

Could you send a list of instance IDs and timeframes where you saw this?

mike_heffner · on Oct 2, 2017

Librato/Papertrail/TraceView | Sr Data Engineer | SF / US-REMOTE | Full-time | https://www.librato.com/jobs

We're looking for a full-time software engineer to take a key role in building the large-scale distributed systems that power Solarwinds Cloud products: Papertrail (hosted logs), Librato (time-series metrics) and TraceView (APM and distributed tracing).

We’re a small team so everyone has the opportunity to have a big impact. We’ve built our platform out largely on Java8 Dropwizard services, a handful of Golang services and some C++ where performance is critical. We leverage Kafka as our main service bus, Cassandra for long term storage, our in-house stream processing framework for online analytics, and we rely on Zookeeper as a core part of intra/inter-service coordination. Our data pipeline pushes millions of messages a second and tens of terabytes of logs per day.

All team members, whether local in San Francisco or remote, commit code to Github, communicate over Slack and Hangouts, push code to production via our ChatOps bot, and run all production applications on AWS. We also use an array of best-breed SaaS applications to get code to production quickly and reliably. We are a team that is committed to a healthy work/life balance.

Papertrail/Librato/TraceView are wholly owned by SolarWinds Inc. so you get the benefits of a small startup, with the backing of a big company so there is no worry about the next round of funding. SolarWinds offers competitive bonus and matching 401k programs that create an attractive total compensation package.

This is an example of some of the technology we build and work with on a regular basis: http://www.heavybit.com/library/blog/streamlining-distribute....

Learn more at: https://www.librato.com/jobs or contact me directly at [email protected] (no recruiters).

avip · on Oct 2, 2017

Thanks for tagging `US-REMOTE`!