More

sagarkava · 2025-09-02T18:56:43 1756839403

Hi HN — I’m launching an open-source WhatsApp AI Voice Agent for phone calls.

Tech stack: It runs on VideoSDK for the SIP gateway, bridging WebRTC ↔ SIP under the hood. For the AI side you can plug in whatever stack you prefer (LLM + STT + TTS). The repo includes example configs.

Why open-source? Most WhatsApp/voice AI projects out there are closed or tied to a single vendor. I wanted something people can actually hack on, fork, and extend — whether that’s experimenting with different voices, building domain-specific agents, or integrating with CRMs.

Performance: End-to-end round-trip latency is ~400–600ms in typical setups. With faster STT/TTS backends there’s headroom to improve this.

I’d love feedback on use cases you’d actually want to build with this: customer support lines, personal AI assistants, language tutors, appointment scheduling, etc. Curious what directions the HN crowd would push this in.

GitHub Repo: https://github.com/videosdk-community/videosdk-whatsapp-ai-c...

Video demo: https://youtu.be/KWfCWE8S_4U?si=yb5WWr4J4n2dgBm8

I’d love feedback: what use cases would you build with this? Customer support, personal AI assistants, language tutors… or something else?

sagarkava · 2025-08-27T15:16:32 1756307792

Hi HN, I'm excited to share our new open-source project: an AI voice agent specifically designed for call centers. This project aims to streamline customer interactions and reduce the workload on human agents by automating initial call handling.

Imagine using it to manage customer inquiries, handle reservations, or conduct surveys without human intervention. It's a game-changer for businesses looking to improve efficiency.

Key features include: - Real-time, low-latency voice conversation. - A cascading pipeline using Deepgram for STT, OpenAI (GPT-4o) for LLM, and ElevenLabs for TTS (customizable). - Advanced turn detection and voice activity detection (VAD) for smooth, natural conversations. - Fully open-source and easily customizable. - Support for Agent2Agent and MCP protocols.

Check out the repo: AI Voice Agent for Call Center https://github.com/videosdk-community/ai-voice-agent-for-cal... Main framework: VideoSDK Agents https://github.com/videosdk-live/agents

What use-cases do you envision for this AI voice agent?

sagarkava · 2025-08-04T09:34:50 1754300090

Thanks for the mention. Curious—what challenges are you finding with Pipecat that you're hoping something else (like https://github.com/videosdk-live/agents) might fix?

Always looking to improve based on real gaps devs are facing.

sagarkava · 2025-08-04T09:33:06 1754299986

That is one of our goals. If a solution is under 500 lines of Python, you should not need to pay $499 per month for it. We want to lower the barrier for developers and businesses to build their own voice agents.

GitHub: https://github.com/videosdk-live/agents

https://docs.videosdk.live/ai_agents/voice-agent-quick-start

https://docs.videosdk.live/ai_agents/sip

sagarkava · 2025-08-04T09:29:38 1754299778

The same technology can also enable businesses that never had live phone support to offer it affordably. The goal is augmentation and access, not mass replacement.

sagarkava · 2025-08-04T09:28:25 1754299705

That risk is real. That’s why we made this open-source to empower smaller businesses to build responsible systems with their own logic, prompts, and escalation paths. GitHub: https://github.com/videosdk-live/agents

sagarkava · 2025-08-04T09:28:02 1754299682

That’s valid. But many people, including elderly users, prefer voice interfaces. Our system can serve those customers without requiring a smartphone or web access.

GitHub: https://github.com/videosdk-live/agents

sagarkava · 2025-08-04T09:27:27 1754299647

Only if misused. Our system supports human fallback, logging, and prompt tools to prevent poor user experiences. The key is thoughtful automation. GitHub: https://github.com/videosdk-live/agents Docs on HITL: https://docs.videosdk.live/ai_agents/human-in-the-loop

sagarkava · 2025-08-04T09:27:08 1754299628

Fair point. But when implemented properly, these agents can reliably handle narrow, production-grade tasks like appointment reminders or smart call routing.

sagarkava · 2025-08-04T09:25:36 1754299536

We experienced similar challenges. That’s why we made audio handling, turn detection, and LLM retries modular. You can swap models or providers as needed. GitHub: https://github.com/videosdk-live/agents Blog: https://www.videosdk.live/blog/ai-telephony-agent-inbound-ou...