Technically there are two clients: The camera and whatever device is used to access the feed.
I can absolutely imagine an architecture where video can be streamed in an encrypted manner, or stored in encrypted time-stamped blobs, allowing the server to provide rough searching, and then the client can perform fine-grained scanning.
This obviously doesn't enable any kind of processing of the video data on the server side, and doing it on the receiving client would require the feed to be active This means that any kind of processing would almost necessarily have to happen on the sending device, which would probably increase the power and compute requirements by a lot.
Yeah, the entire point of this seems to be "we'll watch your baby monitor and provide alerts if something happens". That requires either processing on a server (as they do), processing on the uploading client (the camera), or having a receiving client which is constantly receiving that data and analyzing it to provide alerts.
The third option is unreliable because if that "client" (a desktop app, a phone app, etc.) dies, then the process stops working completely. The second option is unreliable because if you increase the cost of the camera then most users will buy the other camera because everyone is financially constrained these days.
That basically just leaves the first option as the only practical one at an appealing price point.
No, this doesn't get at the point of end-to-end encryption. Better to look at it in terms of the parties involved -- E2EE implies that there are two or more parties, and that only some of those parties should have unencrypted access.
In the case in point, the parent (camera owner) is one party and Nanit is another party. (Prior to the work in the linked post, AWS S3 was another party). The goal of E2EE is to deny plaintext access to some of these parties. So, in an E2EE deployment, Nanit (and AWS) would not have unencrypted access to the video content, even though they're storing it.
As chrismorgan pointed out, if Nanit did not have access to the unencrypted data, they could not do server-side video processing.
(Also, FWIW, there are multiple clients in this scenario -- the parents' phones are clients, and need unencrypted access to the video stream.)
(As an aside, where I used to work, we did some cool stuff with granting conditional access to certain server-side subsystems, so that the general data flow was all end-to-end encrypted, but customers could allow certain of our processes to be "ends" and have key access. This was really elegant; customers could dial in the level of server-side access that we had, and could see via the key authorization metadata which services had that access.)
> The video is privately analyzed by your home hub using on-device intelligence to determine if people, pets, or cars are present.
You can use a cloud provider's infrastructure without giving it access to your material. My devices generate the content, my devices do the processing and analysis, I consume the content. The cloud just coordinates the data in flight, and stores it at rest, all encrypted. It's possible but most companies don't bother because they have to put effort and their "payoff" is that they can't monetize your data anymore.
In the case of this product, there is only one client (and a server).
E2EE bills then down to having the traffic encrypted like you have with a https website.