More

ack_complete · 2025-06-12T14:33:25 1749738805

Perforce does not lock files on checkout unless you have the file specifically configured to enforce exclusive locking in the file's metadata or depot typemap.

ack_complete · 2025-05-30T14:56:13 1748616973

Not to dismiss the thought that's been put into this, but the aspect ratio of the image (display aspect ratio) is almost always the wrong way to approach this. Retro computers typically have a limited number of dot clocks and video timings compared to the frame sizes that they can support. The Atari 8-bit, for example, supports only a single pair of horizontal and vertical refresh timings, but is very flexible regarding how wide and tall the display is. If you try to estimate aspect ratios based on the image area, you'll get lots of variance because many games deviate from the 320x192 area that the operating system uses.

On top of that, it's a mistake to assume that the original images had squares. With a small number of pixels per tile and both an aesthetic and efficiency requirement for the tiles to be uniform, it's highly likely that what look like squares actually weren't -- they could only get so close with the available resolution. I see a lot of people try to guess aspect ratios based on what they think the art was intended to look like, and on multiple occasions have had to boot games on original hardware and contemporary displays to prove that, no, what looked like a circle was actually slightly elliptical.

The correct way to approach this is by starting from the pixel dot clock and video timings to determine pixel aspect ratio and work backward to display aspect ratio. This also reveals another typical sign that aspect ratios are being determined wrong -- when significantly different pixel aspect ratios are determined for multiple systems that all supported NTSC artifacting colors, like the Apple II and the Atari. Supporting NTSC artifacting means NTSC-compatible timings and a dot clock that is an integer multiple of the color subcarrier, which means similar pixel aspect ratios.

LocalH · 2025-05-30T17:26:15 1748625975

DAR is perfectly fine to use, as long as the entire image (including any borders) is included in the calculations. Thus, all images from the same system should end up with the same scaling factor.

ack_complete · 2025-05-30T20:04:23 1748635463

You can, but it's not a useful basis for comparison. The full pixel display area including borders for an Atari 8-bit system is 352x240. The 22:15 ratio that comes out of this is not generally useful, because most displays do not show this full area, nor can it be compared to broadcast specifications to determine how it will be nominally displayed. It certainly is not comparable to the 4:3 ratio that is frequently used to try to fit retro system displays.

The pixel aspect ratio is not affected by how large the active display region is. Displays can't even detect the border if the border is at blanking level black as older systems tend to do. It's determined by the horizontal/vertical timings and the pixel clock. Those can be compared to the specifications for NTSC/PAL square pixels to calculate the resulting display size and aspect ratio on a standard-tuned display for a given image pixel size.

Firehawke · 2025-05-30T20:48:13 1748638093

Right. You're not going to use V-size and H-size to remove the borders because that screws with literally every other use of the TV (other computers, TV shows, etc)..

About the only way to properly calibrate what the borders "should" be is to calibrate the TV to what would be a reasonable approximation for SD TV signals, and POSSIBLY make small adjustments after that point if the computer looks wrong.

Even then, every TV is going to be somewhat different and so there's a huge amount of variance on how it's going to look in the end. Same applies for computer monitors back then, though calibration of an RGB monitor is going to be even harder than composite since you can't easily run a VCR over it to try to get SD TV calibration.

ack_complete · 2025-05-28T14:38:11 1748443091

Windows allows loading process-private registry hives without elevation using the RegLoadAppKey() function. This is used by Visual Studio.

https://visualstudioextensions.vlasovstudio.com/2017/06/29/c...

mananaysiempre · 2025-05-28T15:16:45 1748445405

Yeah, several paragraphs down TFA mentions that unprivileged (and docunented) hive loading was introduced in Vista. Which checks out as far as my knowledge cutoff regarding Windows :)

ack_complete · 2025-05-28T04:44:35 1748407475

There's a subtlety -- word adds are only 8 cycles when adding to an address register. They're 4 cycles to a data register. This is because the 68000 always does address computations in 32-bit, and 16-bit operands are sign extended to 32-bit when adding to an address register. A word add to a data register, on the other hand, only produces a 16-bit result. This is reflected by the canonical instruction being ADDA.W instead of ADD.W for address register destinations.

ack_complete · 2025-05-24T17:49:18 1748108958

Direct3D doesn't, but the kernel can eat exceptions if 32-bit code triggers an exception from a user mode callback on a 64-bit system. Rendering code is vulnerable to this when triggered from a WM_PAINT message. The call SetProcessUserModeExceptionPolicy() is needed to override the default behavior:

https://code.google.com/archive/p/crashrpt/issues/104

It was introduced in a Windows 7 update and documented in a knowledge base article that has since been removed instead of the regular Win32 docs, so information on it is harder to find these days.

ack_complete · 2025-05-24T17:37:33 1748108253

The Registry keys for WER, even the per-application ones, are all under HKEY_LOCAL_MACHINE. They cannot be set without elevation. WER is also useless if you want to capture contextual in-process data about the crash.

This problem is so rampant that even Office hooks SetUnhandledExceptionFilter.

p_ing · 2025-05-24T20:09:22 1748117362

Requiring elevation wasn't uncommon for games or most applications back then. Installing to local app data is somewhat new, though platforms like Steam smartly modified the NTFS permissions in their own app dir to prevent elevation specifically for binary deployment; other components like C runtime, etc. during game install would require elevation.

Office is a poor example of 'what to do'. The title bar is a hack. It only supports ~214 character path length even though Win32 API has been lifted to 32k, etc.

ack_complete · 2025-05-24T23:50:59 1748130659

Sure, if an application uses an elevated installer -- but as you note, not all do. It does look like WER may support options being set in HKCU (per-user) as well as HKLM (machine-wide), which would be a way of handling local installs.

I wouldn't characterize Steam's world-writable folder strategy as smart, compared to a more secure model using an elevated downloader and installer.

I fail to see what Office's title bar rendering has to do with its exception handling strategy. As for path length handling, Office also hosts a large in-process plugin ecosystem, so it has to be conservative with such application-level policy changes.

ack_complete · 2025-05-22T19:46:49 1747943209

I'm surprised there's no mention of store forwarding in that discussion. The -O3 codegen is bonkers, but the -O2 output is reasonable. In the case where one of the structs has just been computed, attempting to load it as a single 32-bit load can result in a store forwarding failure that would negate the benefit of merging the loads. In a non-inlined, non-PGO scenario the compiler doesn't have enough information to tell whether the optimization is suitable.

mshockwave · 2025-05-23T05:08:58 1747976938

> In the case where one of the structs has just been computed, attempting to load it as a single 32-bit load can result in a store forwarding failure

It actually depends on the uArch, Apple silicon doesn't seem to have this restriction: https://news.ycombinator.com/item?id=43888005

> In a non-inlined, non-PGO scenario the compiler doesn't have enough information to tell whether the optimization is suitable.

I guess you're talking about stores and load across function boundaries?

Trivia: X86 LLVM creates a whole Pass just to prevent this partial-store-to-load issue on Intel CPUs: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Targ...

Dylan16807 · 2025-05-23T04:42:36 1747975356

> In the case where one of the structs has just been computed, attempting to load it as a single 32-bit load can result in a store forwarding failure that would negate the benefit of merging the loads

Would that failure be significantly worse than separate loading?

Just negating the optimization wouldn't be much reason against doing it. A single load is simpler and in the general case faster.

ack_complete · 2025-05-24T23:24:09 1748129049

Usually, yeah, it's noticeably worse than using individual loads and stores as it adds around a dozen cycles of latency. This is usually enough for the load to light up hot in a sampling profile. It's possible for that extra latency to be hidden, but then in that case the extra loads/stores wouldn't be an issue either.

ack_complete · 2025-05-07T16:17:42 1746634662

That Unity itself or Unity-based games use LGPL components doesn't matter. What matters here is what is allowed on the Unity Asset Store. There is nothing requiring Unity to allow everything on their Asset Store that could be linked into a Unity game, and apparently at the time, the Provider agreement simply said: you can't sell LGPL assets on the store.

It isn't surprising or unreasonable that the Store might have additional requirements, and there are plenty of reasons to do so. One is Unity limiting their risks as a distributor of third-party content. Another is that the Unity Asset Store does not require assets sold to be used with Unity, and for some assets it can be allowed depending on the specific asset's license:

https://support.unity.com/hc/en-us/articles/34387186019988-C...

On the other hand, not enforcing the LGPL rule evenly against other assets also currently being distributed with LGPL components, on the other hand, is more problematic.

ack_complete · 2025-04-30T01:26:57 1745976417

I once encoded an entire TV OP into a multi-megabyte animated cursor (.ani) file.

Surprisingly, Windows 95 didn't die trying to load it, but quite a lot of operations in the system took noticeably longer than they normally did.

ack_complete · 2025-04-25T20:37:10 1745613430

AVX-512 has a lot of instructions that just extend vectorization to 512-bit and make it nicer with features like masking. Thus, a very valid use of it is just to double vectorization width.

But it also has a bunch of specialized instructions that can boost performance beyond just the 2x width. One of them is VPCOMPRESSB, which accelerates compact encoding of sparse data. Another is GF2P8AFFINEQB, which is targeted at specific encryption algorithms but can also be abused for general bit shuffling. Algorithms like computing a histogram can benefit significantly, but it requires reshaping the algorithm around very particular and peculiar intermediate data layouts that are beyond the transformations a compiler can do. This doesn't literally require assembly language, though, it can often be done with intrinsics.