> The specific method that's causing me trouble right at the moment is "computeC...

> The specific method that's causing me trouble right at the moment is "computeCommandEncoder"

Yeah, it looks like there's no way to avoid autorelease here.

> It looks like objc_retainAutoreleasedReturnValue might be exactly what I'm looking for, even if it isn't 100% guaranteed; if I'm understanding, it would be safe, and wouldn't actually leak as long as you had an autorelease pool somewhere in the chain.

Indeed it would be safe and wouldn't leak, but the optimization is very much not guaranteed. It's based on the autorelease implementation manually reading the instruction at its return address to see if it's about to call objc_retainAutoreleaseReturnValue. See the description here:

https://github.com/apple-opensource/objc4/blob/a367941bce42b...

In fact – I did not know this before just now – on every arch other than x86-64 it requires a magic assembly sequence to be placed between the call to an autoreleasing method and the call to objc_retainAutoreleaseReturnValue.

It looks like swiftc implements this by just emitting LLVM inline asm blocks:

    %6 = call %1* bitcast (void ()* @objc_msgSend to %1* (i8*, i8*, %0*)*)(i8* %5, i8* %3, %0* %4) #4
    call void asm sideeffect "mov\09fp, fp\09\09// marker for objc_retainAutoreleaseReturnValue", ""()
    %7 = bitcast %1* %6 to i8*
    %8 = call i8* @llvm.objc.retainAutoreleasedReturnValue(i8* %7)

This is optimistically assuming that LLVM won't emit any instructions between the call instruction and the magic asm, which is not guaranteed, especially if compiler optimizations are off. But if it does emit extra instructions, then you just don't get the autorelease optimization: the object is added to the autorelease pool, and objc_retainAutoreleaseReturnValue simply calls objc_retain.

(…Though, on second look, it seems that swiftc and clang sometimes use a different, more robust approach to emitting the same magic instruction… but only sometimes.)

Regardless, enough stars have to align for the optimization to work that you shouldn't rely on it to avoid a (temporary) memory leak; you should only treat it as an optional micro-optimization.

That said, the C++ buildings could have implemented the same scheme using inline assembly. And so could the Rust crate (edit: well, I guess inline asm is not stable in Rust yet). It's not the like magic instructions are ABI unstable or anything, given that clang and swiftc happily stick them in when compiling any old Objective-C or Swift code. But I'm guessing the authors of the C++ bindings either didn't want to bother with inline assembly, or considered it an unnecessary micro-optimization. Or perhaps didn't even know about it. /shrug/