Hacker Newsnew | past | comments | ask | show | jobs | submit | sagarkava's commentslogin

Hi HN — I’m launching an open-source WhatsApp AI Voice Agent for phone calls.

Tech stack: It runs on VideoSDK for the SIP gateway, bridging WebRTC ↔ SIP under the hood. For the AI side you can plug in whatever stack you prefer (LLM + STT + TTS). The repo includes example configs.

Why open-source? Most WhatsApp/voice AI projects out there are closed or tied to a single vendor. I wanted something people can actually hack on, fork, and extend — whether that’s experimenting with different voices, building domain-specific agents, or integrating with CRMs.

Performance: End-to-end round-trip latency is ~400–600ms in typical setups. With faster STT/TTS backends there’s headroom to improve this.

I’d love feedback on use cases you’d actually want to build with this: customer support lines, personal AI assistants, language tutors, appointment scheduling, etc. Curious what directions the HN crowd would push this in.

GitHub Repo: https://github.com/videosdk-community/videosdk-whatsapp-ai-c...

Video demo: https://youtu.be/KWfCWE8S_4U?si=yb5WWr4J4n2dgBm8

I’d love feedback: what use cases would you build with this? Customer support, personal AI assistants, language tutors… or something else?


Hi HN, I'm excited to share our new open-source project: an AI voice agent specifically designed for call centers. This project aims to streamline customer interactions and reduce the workload on human agents by automating initial call handling.

Imagine using it to manage customer inquiries, handle reservations, or conduct surveys without human intervention. It's a game-changer for businesses looking to improve efficiency.

Key features include: - Real-time, low-latency voice conversation. - A cascading pipeline using Deepgram for STT, OpenAI (GPT-4o) for LLM, and ElevenLabs for TTS (customizable). - Advanced turn detection and voice activity detection (VAD) for smooth, natural conversations. - Fully open-source and easily customizable. - Support for Agent2Agent and MCP protocols.

Check out the repo: AI Voice Agent for Call Center https://github.com/videosdk-community/ai-voice-agent-for-cal... Main framework: VideoSDK Agents https://github.com/videosdk-live/agents

What use-cases do you envision for this AI voice agent?


Thanks for the mention. Curious—what challenges are you finding with Pipecat that you're hoping something else (like https://github.com/videosdk-live/agents) might fix?

Always looking to improve based on real gaps devs are facing.


That is one of our goals. If a solution is under 500 lines of Python, you should not need to pay $499 per month for it. We want to lower the barrier for developers and businesses to build their own voice agents.

GitHub: https://github.com/videosdk-live/agents

https://docs.videosdk.live/ai_agents/voice-agent-quick-start

https://docs.videosdk.live/ai_agents/sip


The same technology can also enable businesses that never had live phone support to offer it affordably. The goal is augmentation and access, not mass replacement.


That risk is real. That’s why we made this open-source to empower smaller businesses to build responsible systems with their own logic, prompts, and escalation paths. GitHub: https://github.com/videosdk-live/agents


That’s valid. But many people, including elderly users, prefer voice interfaces. Our system can serve those customers without requiring a smartphone or web access.

GitHub: https://github.com/videosdk-live/agents


Only if misused. Our system supports human fallback, logging, and prompt tools to prevent poor user experiences. The key is thoughtful automation. GitHub: https://github.com/videosdk-live/agents Docs on HITL: https://docs.videosdk.live/ai_agents/human-in-the-loop


Fair point. But when implemented properly, these agents can reliably handle narrow, production-grade tasks like appointment reminders or smart call routing.


We experienced similar challenges. That’s why we made audio handling, turn detection, and LLM retries modular. You can swap models or providers as needed. GitHub: https://github.com/videosdk-live/agents Blog: https://www.videosdk.live/blog/ai-telephony-agent-inbound-ou...


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: