One possibility, a little backwards maybe, is to produce a discrete SDF from e.g. a mesh, by inserting it in an octree. The caching becomes the SDF itself, basically. This would let rendering be done via the SDF, but other logic could use the mesh (or other spatial data structure).
Or could the engine treat animated objects as traditional meshed objects (both rendering and interactions)? The author says all physics is done with meshes, so such objects could still interact with the game world seemingly easily. I imagine this would be limited to characters and such. I think they would look terrible using interpolation on a fixed grid anyways as a rotation would move the geometry around slightly, making these objects appear "blurry" in motion.
Sampling an implicit function on a grid shifts you to the world of voxel processing, which has its own strengths and weaknesses. Further processing is lossy (like with raster image processing), storage requirements go up, recovering sharp edges is harder...
But isn't this what the author is doing already? That's what I got from the video. SDF is sampled on a sparse grid (only cells that cross the level set 0) and then values are sampled by interpolating on the grid rather than full reevaluation.
Or could the engine treat animated objects as traditional meshed objects (both rendering and interactions)? The author says all physics is done with meshes, so such objects could still interact with the game world seemingly easily. I imagine this would be limited to characters and such. I think they would look terrible using interpolation on a fixed grid anyways as a rotation would move the geometry around slightly, making these objects appear "blurry" in motion.