Hacker Newsnew | past | comments | ask | show | jobs | submit | ComputerGuru's favoriteslogin

Be veeeery careful. STM32H QSPI peripheral is FULL OF very nasty bugs, especially the second version (supports writes) that you find in STM32H0B chips . You are currently avoiding them by having QSPI mapped as device memory, but the minute you attempt to use it with cache or run code from it, or (god help you) put your stack, heap, and/or vector table on a QSPI device, you are in for a world of poorly-debuggable 1:1,000,000 failures. STM knows but refuses to publicly acknowledge, even if they privately admit some other customers have "hit similar issues". Issues I've found, demonstrated to them, and wrote reliable replications of:

* non-4-byte-sized writes randomly lost about 1/million writes if QSPI is writeable and not cached

* non-4-byte-sized writes randomly rounded up in size to 2 or 4 bytes with garbage, overwriting nearby data about 1/million writes if QSPI is writeable and cached

* when PC, SP, and VTOR all point to QSPI memory, any interrupt has about a 1/million chance of reading garbage instead of the proper vector from the vector table if it interrupts a LDM/STM instruction targeting the QSPI memory and it is cached and misses the cache

Some of these have workarounds that I found (contact me). I am refusing to disclose them to STM until they acknowledge the bugs publicly.

I recommend NOT using STM32H7 chips in any product where you want QSPI memory to work properly.


I love how varied the responses are. I’ll play…

Walk every aisle of your local Best Buy or MicroCenter. Do a mental calculation of how would this hardware product be 10x better with (present day’s) latest software trend.

Eg can I make this blender 10x better by bolting it to an LLM? Can I make this coffee machine 10x better by adding on a coffee+distilled water+third wave salt subscription? Can I make this network router 10x better shipping it with an invisible-to-normies tailscale + Speedify + 5G SIM card for ultra low latency, highly reliable zoom calls from the middle of nowhere using fast open and MPTCP?

Why 10x? Because if it’s not a 10x better experience the switching costs (aka “activation energy”) will be too high to gain market traction before you run out of funding and die.

Stack rank your items by categories such as “will this increase usage from once every 9 months to 9 times a day (ie smoke detector to co2 environmental sensor)”. “Does this have have a viral k factor that the original didn’t (access control for package delivery in door locks)”. “Does this unlock a recurring revenue stream that previously wasn’t there?” (smart decoding of video feed events). Etc.

Narrow your list down from 220 to 5. Build some prototypes. Test your market amongst fellow founders. Find the one niche vertical that has surprising utility and becomes a runaway success which exceeds your ability to fabricate in your living room using arduinos and off the shelf parts until you’ve exhausted Best Buy, Amazon, Spark Fun, and Adafruit’s supply of said item and you need to make your own version.

Get partnered up with a second rate design firm in one of the flyover states that has weird connections for manufacturing in this space from a random guy you met a JS conference three years earlier that cold calls you. Launch a successful crowd funding campaign. Raise funding. Fly to China to meet some contract manufacturers.

Pick the wrong contract manufacturer based on your own inexperience and your design firm’s prior relationship. Have things go terribly wrong because one didn’t use glass reinforced abs instead of regular abs because injection molded tools are expensive and a lot of tooling adjustments boil down to “guess and check”.

Move to China. But like not the cool part like Shenzhen or Shanghai - like a semi-obscure part - like Yiwu. Have way too many hot pots over Mao Tai trying to find a T2 vendor that knows how to injection mold helical gears. Learn some gutter slang from an obscure dialect of Chinese like Gan. Fire half a dozen T2’s. Hire five more. Fire three more. Refine it down to one final T2 that’s giving you problems. Eg try to find a paint vendor that can match the white on your injection molded part to the white on your metal part.

Nope out of the first paint vendor who can actually do this because of sketchy work conditions like a half naked infant running around the factory floor wearing 开裆裤.

Almost rage quit.

Question why you’re pirating grad school text books on material science from Turkish warez sites instead of taking that aqui-hire to AirBnB when they were like 12 people. Make a futile effort to give out safety equipment to T2’s regarding hearing and eye protection because you begin to question how everything is done in this country.

Nearly burn out again.

Begin to see the light. Parts fit. Production is scaling. Find a critical flaw. Fail.

Then do it all again until you succeed or run out of money.


I'll throw out a VC's perspective on liquidation prefs:

1) I think 1x is very fair and meant to protect investors from bad company behavior. If you didn't have 1x preference, this would be an easy way for an unscrupulous founder to cash out: raise $X for 20% of the company, no liquidation preference. The next day, sell the company and its assets ($X in cash) for, say, 0.9x. If there's no liquidation preference, the VC gets back 0.18x and the founder gets 0.72x, even though all that the founder did was sell the VC's cash at a discount the day after getting it.

2) >1x liquidation preferences are sometimes the founder's fault and sometimes the VC's fault. Sometimes it's an investor exploiting a position of leverage just to be more extractive. That sucks. But other times it's a founder intentionally exchanging worse terms for a higher/vanity valuation.

For example, let's say a founder raised a round at $500m, then the company didn't do as well as hoped, and now realistically the company is worth $250m. The founder wants to raise more to try to regain momentum.

A VC comes and says "ok, company is worth $250m, how about I put in $50m at a $250m valuation?"

Founder says "you know, I really don't want a down round. I think it would hurt morale, upset previous investors, be bad press, etc. What would it take for you to invest at a $500m+ valuation like last time?"

VC thinks and says "ok, how about $500m valuation, 3x liquidation preference?"

The founder can now pick between a $250m and a 1x pref, or $500m and a 3x pref. Many will pick #1, but many others will pick #2.

It's a rational VC offer -- if the company is worth $250m but wants to raise at $500m, then a liquidation preference can bridge that gap. The solution is kind of elegant, IMHO. But it can also lead to situations like the one described in the article above where a company has a good exit that gets swallowed up by the liquidation preference.

3) generally both sides have good lawyers (esp. at later stages of funding), so the liquidation preference decision is likely made knowingly.

Related to #3, if you're fundraising, please work with a good lawyer. There are a few firms that handle most tech startup financings, and they will have a much better understanding of terms and term benchmarks than everyone else. Gunderson, Goodwin, Cooley, Wilson Sonsini, and Latham Watkins are the firms I tend to see over and over.


MLK Jr. has a lot to say to disagree with you.

https://letterfromjail.com/

> over the past few years I have been gravely disappointed with the white moderate. I have almost reached the regrettable conclusion that the Negro’s great stumbling block in his stride toward freedom is not the White Citizen’s Counciler or the Ku Klux Klanner, but the white moderate, who is more devoted to “order” than to justice; who prefers a negative peace which is the absence of tension to a positive peace which is the presence of justice; who constantly says: “I agree with you in the goal you seek, but I cannot agree with your methods of direct action”;

just because missiles do not fly, does not mean there is a positive peace.


It was interesting talking to my father, a former Christian minister, about AI. ChatGPT interactions had instilled some misconceptions and it was difficult to convince him that its responses were just cleverly weighted randomness. It produced compelling theological debate. I told him not to trust any chat bot unless it could cite verifiable sources, and when prompted ChatGPT could only fabricate. Trust eroded.

In consolation I sat up a vector index of The Works of Josephus (his interest at the time) and a StableBeluga chatbot. It answered questions fairly well, but most importantly supplied the references that were used as context. In the end there was still just too much cultural and historical context missing to be a useful alternative to scholarly analysis.


I assume that you’ve read what Daniel Ellsberg had to say about the effect of having access to such information?

<https://www.motherjones.com/kevin-drum/2010/02/daniel-ellsbe...>


Sorry about this. I work at Stripe — could you email me at edwin@stripe.com and we can dig into this further?

My codebase is significantly larger than yours (mine's a mix of mostly-C++ & some C) — perhaps 10–12 million lines. Clean builds are ~10m; clean-with-ccache are ~2m; incremental are millisecond.

I know this probably won't help with your current project, but you should think of your compiler as an exotic virtual machine: your code is the input program, and output executable is the output. Just like with a "real" CPU, there are ways to write a program that are fast, and ways to write a program that are slow.

To continue the analogy: if you have to sort a list, use `qsort()`, not `bubble sort()`.

So, for C/++ we can order the "cost" of various language features, from most-expensive-to-least-expensive:

    1. Deeply nested header-only (templated/inline) "libraries";
    2. Function overloading (especially with templates);
    3. Classes;
    4. Functions & type definitions; and,
    5. Macros & data.
That means, if you were to look at my code-base, you'll see lots-and-lots of "table driven" code, where I've encoded huge swathes of business logic as structured arrays of integers, and even more as macros-that-make-such-tables. This code compiles at ~100kloc/s.

We don't use function-overloading: one place we removed this reduced compile times from 70 hours to 20 seconds. Function-overloading requires the compiler to walk a list of functions, perform ADL, and then decide which is best. Functions that are "just C like" require a hash-lookup. The difference is about a factor of 10000 in speed. You can do "pretend" function-overloading by using a template + a switch statement, and letting template instantiation sort things out for you.

The last thing is we pretty much never allow "project" header files to include each other. More importantly, templated types must be instantiated once in one C++, and then `extern`ed. This is all the benefit of a template (write-once, reuse), with none of the holy-crap-we're-parsing-this-again issues.


Preface: It's difficult to write about stuff like this and I know reading it back to myself after will leave me wanting to redraft it 5 times over but it's 1am already. But, if this post is wrong/incorrect or is missing other factors that are too important to leave out then please let me know and I'll amend or even delete it. My focus is on the technical side of things. So anyway:

> why it itsn't the case for other image formats in which encodings color in RGB?

The term "RGB" doesn't mean anything w.r.t. gamut, gamma, dynamic-range, nor how necessarily non-linear transformations between color-spaces are performed. But all that's irrelevant: JPEG'S colour support is actually quite fine (...if you don't mind chroma subsampling), whereas the problem here relates to how JPEG decides what part of the image are important for high-quality preservation (more bits, more quality) than other areas (less bits, less quality).

Obligatory ADHD relevant-yet-irrelevant diversion: this excellent (fun and interactive!) article about how JPEG works: https://parametric.press/issue-01/unraveling-the-jpeg/

(Caution: incoming egregious oversimplification of JPEG):

Now, just imagine a RAW/DNG photo of the sky with some well-defined cumulus clouds in, and another RAW/DNG photo of equal pixel dimensions and bit-depth (so if they were *.bmp files they'd have identical size), except the second photo shows only a grey overcast sky (i.e. the sky is just one giant miserable old grey duvet that needs washing). Now, the _information-theoretic_ content of the cumulus clouds photo is higher than the overcast clouds photo; the clearly-defined edges of the cumulus clouds are considered "high-frequency" data while the low-constrast bulbous gradients that make-up the overcast photo are considered "low-frequency" data, and less overall information is required to reconstruct or represent that overcast sky compared to the cumulus sky.

This relates to JPEG because JPEG isn't simply a single "JPEG algorithm", it's actually a sequence (or pipeline?) of completely different algorithms that each operate on the previous algorithm's output. The specific step that matters here is the part of JPEG that was designed with the knowledge that human vision perception doesn't need any high-frequency data in a scene where the subject is largely low-frequency data (i.e. we don't need a JPEG to preserve fine details in overcast clouds to correctly interpret a photo of overcast clouds, whereas contrarywise we do need JPEG to preserve fine-details that make-up the high-constrast edges between cumulus clouds and the blue sky behind them, otherwise we might think it's some other cloud formation with less well-defined edges (Altrostatus? Cirrostatus? Cirrus? I'm not a meteorologist - just someone on the spectrum using abusing analogies).

So my point so-far is that JPEG looks for and preserves the details in areas of images (where high-constrast data is) while disregarding high-frequency data in an overall low-frequency scene where it thinks those details don't matter to us humans with our weird-to-a-computer visual perception system. I imagine by now I've made JPEG sound like a very effective image compression algorithm (well, it is...) - so where's the problem?

...well, consider that when a person has a low-level of melanin in their skin they will feature areas of high-contrast on their faces: their skin's canvas is light - while facial-features like folds, creases, crows' legs, varicose veins, spots, etc, are like the hard-edges of those cumulous clouds from earlier: the original (pre-compression) image data will feature a relatively high image constrast where those features lie, and JPEG will try to preserve the details of that constrast by using more bits. So that's neat: JPEG tries to ensure that the parts of us that of us that are detailed remain detailed after compression.

...but what if a person's face is naturally less...contrast-y? That's the problem: when a person simply has darker skin their faces will feature less contrast between the skin-parts of their face compared to the features of their face - as caught by a camera. So if a simplistic JPEG compressor is being run against a United Colors of Benetton coffee-table book or stills taken from the end of a certain Michael Jackson music video that I'm particularly fond of[3] then we will, as you can imagine, unfortunately, see the JPEG image files of people with darker skin will feature less definition and detail of their faces compared to the photos of people with lighter-skin, simply due to how areas of higher, and lower, contrast are _mechanically_ processed by JPEG.

...which means that sloppily applying JPEG[1] to photos of everyone might work great for people who have faces-with-naturally-high-contrast/high-frequency-information on them - but not-so-great for everyone else - and I think we can agree hat we can improve on that.

-----------

Confounding things, the distribution of melanin in the JPEG development working group was not representative of the general population in the US at the time[5], let alone the rest of the world, which in-practice meant that less thought (if any) was paid to JPEG's stock handling of people with lower overall skin feature contrast (i.e. black people) - though no malice or ill-will or anything of the sort is required: like all of us (I assume...) they probably thought "these photographs of me (or people who generally look like me) processed by JPEG are good enough, it's been a long week, let's ship it and go for a pint" - without realising or appreciating that it wasn't "good enough" for large chunks of the population. At least, I hope that's the explanation... (and given Hanlon's Razor too).

The situation can be compared to the... uh... colourful stories about the history of Kodak's colour film, its shortcomings, and the social-consequences thereof[2]. Or for something more recent: the misbehaving face-detecting webcams of the last decade that lead HP to make a certain public statement[3].

-----------

[1] I say carelesslessly because lots of tooling for JPEG, especially at the pro-level (think: Photoshop) gives you a lot of control over how much information-loss is allowed for each macroblock (e.g. Photohop's Save-for-Web dialog (the old, good one, not the new dumbed-down one) lets you use its own paintbrush tool to mask areas that should lose fewer bits (i.e. higher-quality) even if the JPEG compressor doesn't think there's much value in that part of the image. MSPaint, on the other hand, does not.

[2] https://www.nytimes.com/2019/04/25/lens/sarah-lewis-racial-b...

[3] https://www.reuters.com/article/urnidgns852573c4006938800025...

[4] https://www.youtube.com/watch?v=F2AitTPI5U0

[5] I would like to mention that there was at least something for gender representation: the co-inventor of JPEG was a woman (no, I'm not referring to Lena.jpg) - Joan L. Mitchell, who sadly passed-away relatively young a few years ago: https://en.wikipedia.org/wiki/Joan_L._Mitchell


There's not much to say aside from "the physical shape of the thing matters", and that people are discovering new shapes all the time.

I don't know if there's an easy introduction. But lemme try to make one really quick anyway.

* Transistors themselves are just a hunk of metal, silicon (n-type and p-type) arranged on a platter in a particular way. They are physical objects, and you will do well to remember that. New "shapes" are invented all the time: FinFET, GAA (Gate All Around), and Nanosheets (brought up in this article) are "just" different shapes, with GAA having the best attributes so far.

* The primary difficulty isn't really about "coming up" with a better shape. Everyone knew GAA would have the best attributes compared to others. The question is how do you __MANUFACTURE__ the darn thing. These things are nanometers in size, its not very easy to make these shapes when your shapes are so incredibly tiny.

* Transistors are an analog device, not binary/digital. You need an arrangement of transistors to do something: Diode-logic, Diode-resistor logic, Transistor-transistor logic, nMOS, and other arrangements have existed in the past. But... CMOS is the big winner from the 1970s onwards. As such, understanding how transistors are arranged to make your AND/OR/NAND/Flip flops is kinda important. That being said, I'll skip over the details, aside from saying "CMOS" is the status quo, and has the following characteristics.

* In CMOS-arrangements... when the transistor is "on", you want less resistance. A transistor that offers 0.1 ohms of resistance will be better than one that offers 0.5 ohms of resistance (within the realm of CMOS)

* When the transistor is "off", you want less leakage. The switch will always "leak" some electrons down the wrong path, its the nature of physical object. A transistor that leaks 1-femtoamp is better than a transistor that leaks 5-femtoamps. (temperature dependent: the hotter a transistor is, the more it leaks). Again, CMOS-specific.

* Transistors take a certain amount of time to switch from 0 to 1, largely based off of the gate-capacitance. The lower the capacitance, the faster you can turn the switch on (or off). Being able to go from 0V to 1V in 0.1 nanosecond with 1 femtoamp of electricity... is better than doing the same in 0.2 nanoseconds.

* Note: there is a CMOS specific tradeoff mentioned here. Maybe you can keep the same clockrate (5GHz / 0.2 nanoseconds) but use 1/2 the power (0.5 femtoamps instead of 1 femtoamp). In practice, this relationship is complicated as it varies with voltage, but there's usually a region where 2x the voltage leads to 2x the current and 1/2 the delay (aka 2x the clock rate for 4x power consumption). Or... 1/2 the voltage is 1/2 the current and 2x the delay (aka: 1/2 speed for 1/4th power).

* For CMOS, lower capacitance means faster switching (meaning more GHz), and lower power usage (meaning more power efficiency). Cutting down capacitance requires the "gate" of the transistor to be surrounded with more-and-more metal. When you increase the surface area of an object, you decrease its capacitance (this is a physical law that applies to your hands, feet, desk, etc. etc. Its how your phone knows where your finger is: by the amount of surface area your finger has over the phone's screen. As that surface area changes, your capacitance changes and the phone tracks your finger as it moves).

* Capacitance / surface area applies at nano-scale objects like transistors. So when you made "fins" (aka: FinFET), your capacitance decreased, because "fins" have more surface area than planar (flat) transistors. FinFET became standard like 5 to 10 years ago. To go even further: you need an "even better" shape (where "better" is more surface area). This is called "GAA", gate-all-around, where you surround the gate entirely (physically above, below, and left and right).

* * Photolithography + magic is how these things are physically constructed. So called "Planar" transistors made sense and are relatively simple: you basically shove a bunch of chemicals onto the silicon, and then "reverse take a picture" of it (taking your film, shining light through the film, and then shoving that light through a lens to shrink it down. Like photographs in the 1980s but backwards). Because "planar transistors" are all flat, it was obvious how to make them.

* But how do you _MAKE_ a GAA? Well, no one will tell us. They just show us the pictures of them successfully doing it. The secret sauce is in their magic processes that deposits the bits of metal / silicon / etc. etc. in the correct spots. Photolithography is an innately 2D process: built up layer-by-layer by successive chemicals + light emitted from a film-like substance. They had to make this shape from a bunch of 2D steps (maybe 120+ such steps) played out over the course of 2 or 3 months.

--------

So TL;DR: the name of the game is:

1. Think of a shape with more surface area (less capacitance).

2. Figure out a way how to take ~120+ steps of the photolithography process to actually _make_ that shape in practice. And remember: you're mass producing 10-billion of these per chip, so you wanna make sure whatever process you do is 99.99999% reliable. A single mistake will cause the chip to be worthless.

That's it. Really. All the "better" shapes have more surface area. The "older" shapes were easier to figure out on #2, while the "future" shapes look really hard for #2, but are obviously better from a surface area perspective.


Um. Now I feel like I'm 106 instead of "just" 53.

OK, so, basically all modern mass-market OSes of any significance derive in some way from 2 historical minicomputer families... from the same company.

Minicomputers are what came after mainframes, before microcomputers. A microcomputer is a computer whose processor is a microchip: a single integrated circuit containing the whole processor. Before the first one was invented in 1974 (IIRC), processors were made from discrete logic: lots of little silicon chips.

The main distinguishing feature of minicomputers from micros is that the early micros were single-user: one computer, one terminal, one user. No multitasking or anything.

Minicomputers appeared in the 1960s and peaked in the 1970s, and cost just tens to hundreds of thousands of dollars, while mainframes cost millions and were usually leased. So minicomputers could be afforded by a company department, not an entire corporation... meaning that they were shared, by dozens of people. So, unlike the early micros, minis had multiuser support, multitasking, basic security and so on.

The most significant minicomputer vendor was a company called DEC: Digital Equipment Corporation. DEC made multiple incompatible lines of minis, many called PDP-something -- some with 9-bit logic, some with 12-bit, 18-bit, or 36-bit logic.

One of its early big hits was the 12-bit PDP-8. It ran multiple incompatible OSes, but one was called OS-8. This OS is long gone but it was the origin of a command-line interface with commands such as DIR, TYPE, DEL, REN and so on. It also had a filesystem with 6-letter names (all in caps) with semi-standardised 3-letter extension, such as README.TXT.

This OS and its shell later inspired Digital Research's CP/M OS, the first industry-standard OS for 8-bit micros. CP/M was going to be the OS for the IBM PC but IBM got a cheaper deal from Microsoft for what was essentially a clean-room re-implementation of CP/M, called MS-DOS.

So DEC's PDP-8 and OS-8 directly inspired the entire PC-compatible industry, the whole x86 computer industry.

Another DEC mini was the 18-bit PDP-7. Like almost all DEC minis, this too ran multiple OSes, both from DEC and others.

A 3rd-party OS hacked together as a skunkworks project on a disused spare PDP-7 at AT&T's research labs was UNIX.

More or less at the same time as the computer industry gradually standardised on the 8-bit byte, DEC also made 16-bit and 32-bit machines.

Among the 16-bit machines, the most commercially successful was the PDP-11. This is the machine that UNIX's creators first ported it to, and in the process, they rewrote it in a new language called C.

The PDP-11 was a huge success so DEC was under commercial pressure to make an improved successor model. It did this by extending the 16-bit PDP-11 instruction set to 32 bits. For this machine, the engineer behind the most successful PDP-11 OS, called RSX-11, led a small team that developed a new, pre-emptive multitasking, multiuser OS with virtual memory, called VMS.

VMX is still around: it was ported to DEC's Alpha, the first 64-bit RISC chip, and later to the Intel Itanium. Now it has been spun out from HP and is being ported to x86-64.

But the VMS project leader, Dave Cutler, and his team, were headhunted from DEC by Microsoft.

At this time, IBM and Microsoft had very acrimoniously fallen out over the failed OS/2 project. IBM kept the x86-32 version OS/2 for the 386, which it completed and sold as OS/2 2 (and later 2.1, 3, 4 and 4.5. It is still on sale today under the name Blue Lion from Arca Noae.)

At Microsoft, Cutler and his team got given the very incomplete OS/2 version 3, a planned CPU-independent portable version. Cutler _et al_ finished this, porting it to the new Intel RISC chip, the i860. This was codenamed the "N-Ten". The resultant OS was initially called OS/2 NT, later renamed – due to the success of Windows 3 – as Windows NT. Its design owes as much to DEC VMS as it does to OS/2.

Today, Windows NT is the basis of Windows 10 and 11.

So the PDP-7, PDP-8 and PDP-11 directly influenced the development of CP/M, MS-DOS, OS/2, & Windows 1 through to Windows ME.

A different line of PDPs directly led to UNIX and C.

Meanwhile, the PDP-11's 32-bit successor directly influenced the design of Windows NT.

When micros grew up and got to be 32-bit computers themselves, and vendors needed multitasking OSes with multiuser security, they turned back to 1970s mini OSes.

This project is a FOSS re-implementation of the VAX CPU on an FPGA. It is at least the 3rd such project but the earlier ones were not FOSS and have been lost.


Sounds like https://www.delimiter.com/slot-hosting/ which I’ve used in the past. Same cost per drive, but 4x the bandwidth included.

The Windows 10 login screen definitely uses the UWP XAML framework. And no, I'm not revealing any inside knowledge; I knew this fact before I joined Microsoft (because I used to develop a third-party screen reader for Windows).

I don't think ACID terminology is vague at all, and it sounds like you're trying to fit a square peg into a round hole here, terminology-wise.

Atomicity (A) means that changes must be committed or not committed, "all or nothing". If you commit the change set (X, Y) then upon successful commit, both X and Y must be present; if either X or Y are missing, it's not atomic. Conversely, if the commit fails, no changes may have been made.

Consistency (C) means that data is always valid, according to whatever rules are imposed by the data model. For example, classical RDBMSes enforce referential integrity (aka foreign keys), "not null" constraints, unique primary keys, etc. Consistency is the guarantee that every update conforms to these rules; a transaction cannot be committed if it doesn't. Consistency has nothing to do with conflict resolution (although in a concurrency environment, you do need both).

Isolation (I) means that one transaction must create the illusion that it is isolated from all other transactions, as though all transactions were applied serially. Any concurrent commits during the transaction must not be visible to it. Most databases implement a less strict level of isolation by default that is often called "read committed"; the transaction can see any changes from parallel transactions that are committed during the transaction (which means that a query may return different results if run multiple times), but it will not see uncommitted changes from other transactions. Many databases do implement the "serialized" isolation level, and will fail if you try to do execute two conflicting transactions at the same time.

Durability (D) means that transactions must remain permanently stored after they are committed. This is pretty much the vaguest rule, since there are too many variables in real life: it doesn't say anything about redo/undo logs, RAID caches, etc.

It should be added that ACID makes the most sense in situations where you combine multiple updates in a single transaction. ACID is of course useful for single-key, or single-object, updates, but it really comes into play when you have longer-running aggregate updates that need to perform both reads and writes across a bunch of different sets of data.


motherboard features such as x16 or 2x8 are achieved with "pcie mux" chips. these are devices which select which of N pairs of differential wires is attached to the input/output differential pair. search for "pcie mux" will find many, such as [0]. if you look at the diagram you'll see that it connects wire pair A+/A- to either B+/B- or C+/C- based on the value of the SEL line.

these generally basic passive devices operating at analog signals level, no higher layer activity required. however some may exist which operate as "retimers", which do participate in the lowest layer of the PCIe electrical protocols (generally to extend reach). these are unlikely to be used for a typical x16 <-> 2x8 sort of motherboard feature though.

the example i picked here is 4 lanes, and you would need 4 such chips to do a x16 <-> 2x8. (spoiler: you mux lanes 8-15 from slot X to lanes 0-7 of slot Y, and there are both TX and RX pairs which need muxing.)

there do exist devices called "pcie switches" which operate at all layers of the pcie protocols, and allow for all sorts of sharing of the point-to-point links. examples at microsemi [1] ... for example a 48 lane switch could be used to connect two 16 lane GPUs to a 16 lane slot. this would allow either of the GPUs to burst to the full 16 lanes, or on average if both GPUs are communicating with the host then they would see 8 lanes of bandwidth. there's a picture of such a dual GPU card in this article [2], you can see the PCIe switch ASIC centered in between the two GPUs, above and to the right of the edge connector.

[0] http://www.ti.com/product/HD3SS3412

[1] https://www.microsemi.com/product-directory/ics/3724-pcie-sw...

[2] https://graphicscardhub.com/dual-gpu-graphics-cards/


Simply, what makes it special is I had a hand in it. Every time you fly on one, you're betting your life on my parts. I know how it works, and I know why it's safe.

I worked specifically on the stabilizer trim system (like the one in the news on the 737MAX), and did some work on the elevator system. In particular on the latter, I did many calculations to prove it would not flutter (dynamic instability).

My very first assignment was to size the stabilizer trim jackscrew. I panicked and told my lead I had no idea how to do that. He laughed and said of course you do, it's a simple column buckling problem. Which of course it was, and I sized it.

A couple years later, and the first jackscrew gearbox assembly came off the line, and was doomed to be subjected to the ultimate load test. Any buckling, cracks, deformation, etc., would be a failure. The test guys told me they were gonna bust my jackscrew.

They hooked it up to this big ugly iron I-beam with a hydraulic ram to compress my green BMS10-11 painted gearbox and shiny chrome steel jackscrew (made by Saginaw Gear, who made the best kick-ass forgings).

They started cranking up the pressure, while I stood around anxiously watching it. Slowly, the I-beam bent into a nice curve. We didn't have to make any changes to any of the stab trim system due to test failures. Whaddya know, the math works! So I am not a bit afraid to fly on a 757. The seating can be cramped, but that's the way it goes these days.

As far as know, there have been no in-service failures of that system. The 757 itself has a fantastic safety record, of which I am proud to have contributed to. The last D conference I flew to on a 757 operated by Iceland Air, yay!

At the time I bought a bunch of Boeing stock as a result of my confidence in Boeing, and it has paid off handsomely.


To be clear, the profiling of Powerpoint in the middle of the presentation was a stunt, planned in advance. I just didn't tell anybody I was going to do that.

At that point I was an ex-Microsoft person giving a talk at a Microsoft conference using Microsoft tools to profile Microsoft's presentation software. It may have been a cheeky thing to do, but it was so much fun.

I'm not aware of any publicly available video but I did a writeup of the issue: https://randomascii.wordpress.com/2011/08/29/powerpoint-poor...


I created something similar years ago and it's open source:

https://stream.ht/

Try it out, you can pipe your terminal:

  exec > >(nc stream.ht 1337) 2>&1
Or just pipe a file:

  tail -F file.log | nc stream.ht 1337
Pipe htop (delay is required to see share url)

  (sleep 5; htop) | nc stream.ht 1337
Doesn't require installing anything! Requires netcat which is installed on most *nix systems but you can use a plain tcp socket connection to pipe data too:

  exec 3<>/dev/tcp/stream.ht/1337 && head -1 <&3 && exec &> >(tee >(cat >&3))

Yep, the "hack" (X86EmulatorPkg) allows running the card in the UEFI out of the box. AMD also provides native aarch64 builds of their UEFI GOP driver though. And none of this is necessary for running the GPU in the OS – amdgpu POSTs the GPU just fine.

> memory-cacheability-attibutes issues

Recently I've added aarch64 support to FreeBSD's port of the DRM/KMS drivers :) Took a couple hours to realize that our implementations of Linux's mapping functions used normal uncacheable memory instead of device memory – fixing that stopped the hangs on driver load and allowed everything to work.

Then there was some corruption on the screen – our drm-kms is from Linux 5.0 for now, and I've had to cherry-pick a fix that only landed in 5.1 I think: https://patchwork.kernel.org/patch/10778815/


I've written documents for Jeff, and IMO, the six-page narrative memo is a key part of Amazon's success. It's so easy to fool both yourself and your audience with an oral presentation or powerpoint slides. With narrative text that has to stand on its own, there is no place for poor reasoning to hide. Amazon's leadership makes better decisions than their competitors in part because they are routinely supplied with better arguments than their competitors.

"Writing is nature's way of letting you know how sloppy your thinking is." -Dick Guindon, via Leslie Lamport


That's not a particularly good article with regards to high performance techniques.

You wouldn't be using compression or encryption for a file that you wanted to be able to submit asynchronous file I/O writes to in a highly concurrent network server. Those have to be synchronous operations. You'd do everything you can to use TransmitFile() on the hot path.

If you need to sequentially write data, wanted to employ encryption or compression, and reduce the likelihood of your hot-path code blocking, you'd memory map file-sector-aligned chunks at a time, typically in a windowed fashion, such that when you consume the next one you submit threadpool work to prepare the one after that (which would extend the file if necessary, create the file mapping, map it as a view, and then do an interlocked push to the lookaside list that the hot-path thread will use).

I use that technique, and also submit prefaults in a separate threadpool for the page ahead of the next page as I consume records I'm writing to. Before you can write to a page, it needs to be faulted in, and that's a synchronous operation, so you'd architect it to happen ahead of time, before you need it, such that your hot-path code doesn't get blocked when it writes to said page.

That works incredibly well, especially when you combine it with transparent NTFS compression, because the file system driver and the memory manager are just so well integrated.

If you wanted to do scatter/gather random I/O asynchronously, you'd pre-size the file ahead of time, then simply dispatch asynchronous writes for everything, possibly leveraging SetFileIoOverlappedRange such that the kernel locks all the necessary sections into memory ahead of time.

And finally, what's great about I/O completion ports in general is they are self-aware of their concurrency. The rule is always "never block". But sometimes, blocking is inevitable. Windows can detect when a thread that was servicing an I/O completion port has blocked and will automatically mark another thread as runnable so the overall concurrency of the server isn't impacted (or rather, other network clients aren't impacted by a thread's temporary blocking). The only service that's affected is to the client that triggered whatever blocking I/O call there was -- it would be indistinguishable (from a latency perspective) to other clients, because they're happily being picked up by the remaining threads in the thread pool.

I describe that in detail here: https://speakerdeck.com/trent/pyparallel-how-we-removed-the-...

> > Be careful when coding for asynchronous I/O because the system reserves the right to make an operation synchronous if it needs to. Therefore, it is best if you write the program to correctly handle an I/O operation that may be completed either synchronously or asynchronously.

That's not the best wording they've used given the article is also talking about blocking. If you've followed my guidelines above, a synchronous return is actually advantageous for file I/O because it means your request was served directly from the cache, and no overlapped I/O operation had to be posted.

And you know all of the operations that will block (and they all make sense when you understand what the kernel is doing behind the scenes), so you just don't do them on the hot path. It's pretty straight forward.


  In an effort to get people to look 
  into each other’s eyes more, 
  and also to appease the mutes, 
  the government has decided 
  to allot each person exactly one hundred   
  and sixty-seven words, per day. 
  
  When the phone rings, I put it to my ear   
  without saying hello. In the restaurant   
  I point at chicken noodle soup. 
  I am adjusting well to the new way. 

  Late at night, I call my long distance lover,   
  proudly say I only used fifty-nine today.   
  I saved the rest for you. 

  When she doesn’t respond, 
  I know she’s used up all her words,   
  so I slowly whisper I love you 
  thirty-two and a third times. 
  After that, we just sit on the line   
  and listen to each other breathe.
  
  The Quiet World by Jeffrey McDaniel

You might want to just learn the physics of these things.

When you have any solid object, it's got a crystalline latice structure at the molecular level. Do you remember valence shell electrons from chemistry? They're the outermost shell. They are only tenuously connected to their atom and so it's easy to push them around, but the atom wants to be electrically neutral (#electrons = #protons)

In a conductor, let's say copper, you can shove on one end of the crystal with an electric field and induce a movement on those valence shell electrons that propagates your shove across the lattice. You could measure how much energy each electron gets from its shove, we'll call this Voltage. The rate of electric field moving through a particular surface area is called Current. In a perfect conductor, the voltage of the wave moving across the surface doesn't drop off. We can think of wires as being pretty good conductors, and in diagrams they exist as platonic perfect conductors. Also note that the amount of current in one end of the wire has to equal the current out or you would be losing or gaining a net electric charge.

So now you know the rules for a straight line in a diagram. The rest is learning how electric fields behave in other kinda of bulk materials. In the resistor, the voltage change across the entire resistor is proportional (linearly) to the flow of current through the resistor.

In a capacitor, we place two wide metal plates across from each other by a small distance. This doesn't form a complete connector - charges can't cross - but initially current flows in to charge the plates, then dies off as the voltage difference between the two opposing plates equalizes with the voltage applied to the capacitor. So these devices hold a charged electric field, and they allow high frequency changes in voltage to pass through, but low frequency is blocked by the charge saturation.

Inductors are similar to capacitors, but the energy is stored in a magnetic field. If you wind some wire into a coil, the magetic field from the wire acts as a sort of flywheel. These devices allow low frequencies to pass through.

Diodes are made out of crystalline solids called semiconductors, where a pure ingot of silicon (an insulator) is doped with a donor like Boron or Arsenic. Depending on whether the donor has fewer or more valence electrons than silicon, we get either a p type or an n type material. If we put a p type next to an n type, there's a thin band near the junction where the opposite types "cancel out" and no charge can pass. But if we apply a small voltage across the ends of the diode, the cancel out region shrinks thinner and thinner until eventually charge does pass through. So these are sort of like a one-way valve for circuits, and you need about 0.7V in the forward direction to get them going.


Looks like I already do pretty much all those things in twoskip:

https://blog.fastmail.com/2016/12/03/cyrus-databases-twoskip...

(except for the level choice. Which is a bit meh to me, because a write will probably trigger 3 fsyncs, so the cost of level choice is very low)

That said, my next DB won't have any skiplists in it I don't think, or at most an in-memory one, but not on disk.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: