Thursday
Room 3
11:30 - 12:30
(UTC+01)
Talk (60 min)
How a Tiny Head Detector Powers Intelligent Video Conferencing
In a world obsessed with massive LLMs and billion-parameter models, sometimes the most impactful AI solution is surprisingly small.
I want to tell you a story. This story, like many others before it, started with a frustrated engineer who concluded that the software he relied on to do his work was not good enough. Surely he could do better? Thus, a little object detection model was born. Now with a whole team behind it, this little model has become the backbone of an entire intelligent video conferencing platform.
In video conferencing, you might think you need complex AI for speaker tracking, automatic framing, people counting, and camera control. But it turns out there's one piece of information that matters above all else: knowing exactly where the people are and which direction they're facing. Everything else builds from there.
This talk explores the story of our real-time head detection model, considerations you have to make when running on edge in compute- and time-constrained environments and how a tiny, specialized model can be good enough to serve as the backbone for a whole suite of intelligent features.