One thing I'm hoping for is an increase in speed. Right now, the agent is slow for complex tasks, so we're still in an era where it might be better to codify popular tasks (eg: sending a WhatsApp message) instead of handling them with browser automation. Have yall looked into Groq / Cerberus?
One option could be for the main apps like WhatsApp to have defined custom actions, which are almost like an API to the service. I think the interplay between LLM and automation scripts will succeed here:
Agent call 1:
Send WhatsApp message (to=Magnus, text=hi)
Inside, you open WhatsApp and search for Magnus (without LLM)
Agent call 2:
Select contact from all possible Magnus contacts
Script 3: Type the message and click send
So in total, 2 calls - with Gemini, you could already achieve this in 10-15 seconds.
https://x.com/caydengineer/status/1889835639316807980
One thing I'm hoping for is an increase in speed. Right now, the agent is slow for complex tasks, so we're still in an era where it might be better to codify popular tasks (eg: sending a WhatsApp message) instead of handling them with browser automation. Have yall looked into Groq / Cerberus?