Stream’s Post

View organization page for Stream

18,000 followers

This is why we build. Seeing developers go from stuck → shipping again is everything. Huge shoutout to the Cognivise journey here, especially pushing the boundaries of real-time AI with vision + speech. Keep building. 🚀

Update on Cognivise: Building the English AI Coach A short time ago, I was dealing with personal issues and depression that completely blocked me from writing code. I had stopped developing entirely. But thankfully, being introduced to the Stream(Vision_AI_SDK(https://lnkd.in/gwdp7yjz)) by Kunal Kushwaha during the WeMakeDevs hackathon reignited my spark. I am finally building again. My ultimate goal remains the same: to create the most accurate real-time cognitive monitoring system for students and learners. Today, I’m excited to share my latest major addition to Cognivise. 🗣 New Feature: The English AI Coach I wanted to see if I could build a conversational AI that doesn't just listen to your pronunciation, but literally watches your face as you speak. Here is what the English Coach is doing under the hood: 🔹 Dual-AI Architecture: Combining local browser processing (MediaPipe FaceMesh) for 30fps physical tracking, synced with a cloud LLM (Gemini/Groq) for speech parsing and reasoning. 🔹 Canvas Layer Multiplexing: Built 4 distinct visual feedback modes (Composite, Landmarks-only, Full HUD Analysis, Raw Frame) by manipulating HTML5 canvas layers behind the live WebRTC stream. 🔹 Real-time Synchronicity: Overcame strict browser autoplay policies and synchronous call-stack limits to ensure the AI's Text-To-Speech (TTS) responds naturally to the user's analyzed audio. ___________ 🛠 What Was Actually Hard This Time Getting the Vision SDK to accurately track full movement is incredible. But translating that raw physical movement—every micro-expression and blink—into an accurate emotional state is intensely difficult. Right now, the system catches the physical layout perfectly, but the semantic "emotion tracking" is still not exactly where I expect it to be. Sending frame-by-frame snapshot analysis over WebSockets without colliding with the synchronous audio pipelines required deep debugging of React state refs and Canvas requestAnimationFrame loops. Testing this in a remote location with an unstable internet connection makes debugging real-time video+WebSocket systems extremely painful. I am learning exactly why tracking every single frame matters, and how quickly it scales in complexity. _________________________________________________________________ ⚠️ Honest Status & What's Next Cognivise is not perfect yet. Capturing true "emotion" is a work in progress. But getting out of my slump was step one. I’m heading back to my hometown in Delhi soon, and I now have 3 brand new AI project ideas I plan to start sprinting on this month. ----------------------------------------------------------------------------- Massive thank you to Kunal Kushwaha and the WeMakeDevs community for creating environments that push developers back into motion. The journey continues! 🔗 GitHub: https://lnkd.in/gZvJnCfa #AI #VisionAgents #VisionPossible #MachineLearning #WebRTC #FastAPI #ReactJS #BuildInPublic #OpenSource #algsoch #MentalHealthInTech

To view or add a comment, sign in

Explore content categories