How a Tiny Head Detector Powers Intelligent Video Conferencing

In a world obsessed with massive LLMs and billion-parameter models, sometimes the most impactful AI solution is surprisingly small.

I want to tell you a story. This story, like many others before it, started with a frustrated engineer who concluded that the software he relied on to do his work was not good enough. Surely he could do better? Thus, a little object detection model was born. Now with a whole team behind it, this little model has become the backbone of an entire intelligent video conferencing platform.

In video conferencing, you might think you need complex AI for speaker tracking, automatic framing, people counting, and camera control. But it turns out there's one piece of information that matters above all else: knowing exactly where the people are and which direction they're facing. Everything else builds from there.

This talk explores the story of our real-time head detection model, considerations you have to make when running on edge in compute- and time-constrained environments and how a tiny, specialized model can be good enough to serve as the backbone for a whole suite of intelligent features.

Sindri Ingolfsson

Sindri is a Computer Vision Engineer at Cisco Norway, where he has spent the past 5 years working across various teams. His most recent focus has been on Camera Intelligence for meeting rooms and real-time computer vision, where he also leads an internal ML paper reading group. He holds a BSc in Computer Science and Discrete Mathematics from Reykjavik University and an MSc in Computer Science from the University of Oxford.
> Outside of work he enjoys rock climbing, camping, and exploring the outdoors with his wife and two small kids.

NDC { AI }

How a Tiny Head Detector Powers Intelligent Video Conferencing

Sindri Ingolfsson