The following comes from the complete opposite side of computing, microcontrolle...

msarnoff · 2025-04-18T16:02:26 1744992146

I’ve done C++ on a Cortex-M0+ with 8KB of flash. Code size is a big issue. You have to disable a bunch of stuff (no exceptions, nothing that does dynamic allocation) but you can still use classes, virtual methods, templates, constexpr, etc. These are all things that are a pain to do in C and usually require a bunch of gross macros.

uecker · 2025-04-19T13:28:20 1745069300

As a former C++ programmer now writing C, I think this only true for templates, but it if you limited to somewhat this is also fine. For constexpr it depends what you use it for. If it something expensive to compute I would just run a program at build time (caching the output) and include the result. This seems preferable to me anyhow. The same for tests.

chrisrodrigue · 2025-04-18T18:29:54 1745000994

Yeah, embedded C++ is a wildly different experience from vanilla. I've worked in large embedded C++ codebases where we couldn't use the STL and had to use homegrown containers for everything.

I wonder how Rust is stacking up (no pun intended) in the embedded game these days.

ashvardanian · 2025-04-18T15:09:34 1744988974

Very true! I'd also go for similar optimizations when processing texts or sparse linear algebra on Nvidia and AMD GPUs. You only have ~50 KB of constant memory, ~50 MB of shared memory, and ~50 GB of global memory. It is BIG compared to microcontrollers but very little compared to the scope of problems often solved on GPUs. So many optimizations revolve around compressed representations and coalesced memory accesses.

I am still looking for a short example of such CUDA kernels, and I would love to see more embedded examples if you have thoughts ;)

boricj · 2025-04-18T16:19:04 1744993144

I haven't had to reach for them so far either professionally or personally, but custom memory allocators (slab allocation, bump allocator...) and allocation strategies is something I've been meaning to look into. Too bad that the one game I've done reverse-engineering on used dynamic memory allocation for just about everything, with an allocator that uses a singly-linked list of used/free chunks that wouldn't look out of place in the 1980s.

I'm aware that the C++ standard library has polymorphic allocators alongside a couple of memory resource implementations. I've also heard that the dynamic dispatch for the polymorphic allocators could bring some optimization or speed penalties compared to a statically dispatched allocator or the standard std::allocator that uses operator new(), but I have no concrete data to judge either way.

leni536 · 2025-04-18T18:20:51 1745000451

> CTRE is fine as long as you don't overflow the stack

Which is to say CTRE is mostly not fine, if you use it on user-provided strings, regardless of target environment. It's heavily recursion based, with never spilling to the heap and otherwise no safeguards for memory use/recursion depth.

boricj · 2025-04-18T20:27:46 1745008066

The regex which was overflowing the stack was something like this (simplified and from memory):

    ^http:\/\/([a-z0-9.-]+)\/?:([1-9]|[0-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-5][0-9][0-9][0-9][0-9]|6[0-4][0-9][0-9][0-9]|65[0-4][0-9][0-9]|655[0-2][0-9]|6553[0-5])$

Once I've given up validating the port number with the regex, it no longer blew up the stack:

    ^http:\/\/([a-z0-9.-]+)\/?:([1-9][0-9]{0,4})$

I'll admit I haven't done a thorough job of auditing the stack usage afterwards, but not all regexes look like Perl codegolf. For simple, straightforward patterns I don't see any problems using CTRE, but I'd be interested to see some proof to the contrary if you have some.

leni536 · 2025-04-19T08:18:08 1745050688

The problem can occur in general if there is a greedy match within the regex: https://github.com/hanickadot/compile-time-regular-expressio...

Although it looks like that this got fixed for simple patterns.

loeg · 2025-04-18T18:06:10 1744999570

Not sure I'd reach for C++ or regexes in such a constrained micro environment. Anything where you don't directly understand the precise memory use is probably out.

boricj · 2025-04-18T21:27:29 1745011649

The NumWorks N0100 graphical calculator had 1 MiB of Flash and 256 KiB of RAM. It packed seven mathematical apps (calculation, grapher, equations, statistics, regression, sequences, distributions) with a decently powerful maths engine/equation typesetter written in C++ and a MicroPython shell. They've paid a fair amount of attention to details in order to fit all of that in (least of all no STL), but C++ wielded correctly for embedded is no more of a memory hog than C.

Our target has ~1.5 MiB of Flash for program code and 512 KiB of RAM. We're using half of the former and maybe a third of the latter, the team barely paid any attention to program size or memory consumption. One day the project lead became slightly concerned about that and by the end of the day I shed off 20% of Flash and RAM usage going for the lowest hanging fruits.

I find it a bit amusing to call a 250 MHz STM32H5 MCU a constrained micro environment, if anything it's a bit overkill for what we need.

loeg · 2025-04-19T00:08:20 1745021300

> least of all no STL

That's certainly a restricted dialect of C++.

> I find it a bit amusing to call a 250 MHz STM32H5 MCU a constrained micro environment, if anything it's a bit overkill for what we need.

I took an "embedded" systems class in college 15+ years ago that targeted a 32-bit ARM with megabytes of ram, so using these kBs of RAM micros in 2025 definitely feels like a constrained environment to me. The platforms I work on with C++ professionally have, ya know, hundreds of gigabytes of RAM (and our application gets ~100% of it).