There is no way to write inline SASS (assembly equivalent) for CUDA code. You can inline PTX, but PTX is a high level bytecode designed to be portable.
PTX is sometimes referred to as assembly, and it is an ISA, much lower level than C++. When people talk about writing inline assembly for CUDA, they mean PTX, and the C++ compiler’s inline assembly “asm” statement assumes PTX. For the most part you have much more control and ability to produce exactly the SASS you want when using PTX compared to when using C/C++.