Audio Prototype

Challenge

The main idea is that participants could be placed randomly in front of the video camera. This would significantly improve the usability of the entire solution. Thus, there is no need to build complex schemes of multiple cameras and additional equipment to place all the participants.

In addition to the particular 360-degree camera, our customer needed to implement a component that allows the user to detect the position of the speaker in live-mode and to transmit this data to the other side for which the playback is being performed. This is necessary so the participants on the other side can see the person who is currently speaking.

Detector operation scheme

Solution

To implement the component for detecting the speaker, together with a 360-degree video camera, we decided to use a special microphone that allows the recording of spatial audio in Ambisonics A-format.

 

Then we needed to investigate and build a software solution (algorithm) for processing Ambisonics A-audio, which allows us to detect and calculate the direction vector to the loudest audio source. For that, we used the algorithms FFT, Convolution, AGC, HRTF, as well as a number of algorithms for signal processing of the OpenCV library. The main idea of this approach was to build a sound field map in polar coordinates.
Further, using algorithms for digital image processing (Threshold, Erode, Dilate, contour detection), the coordinates (the direction vector) of the loudest sound source are analyzed and calculated.

Key features

The app allows the user to:

ACF5AE80-99FF-4DD8-A90C-C5E14CB4F5F3Created with sketchtool.

analyze Ambisonics A-audio from various sources (sound card, audio file, HLS stream)

4AAAA9D7-73B5-4B5E-9657-2D64CF0FD4BACreated with sketchtool.

implement metadata containing direction data into H.264 video stream using Wowza WMS

6D9C81D8-3B68-4579-8146-5902BE5AA235Created with sketchtool.

perform vector detection and calculation in live-mode

4E3A7F49-590E-4B55-80F5-69F58FF01438Created with sketchtool.

generate various debugging information, and visualize the 360-degree map of sound field levels

Technology used

Digital Signal Processing
PortAudio
ZeroMQ
FFMPEG
OpenCV
Wowza WMS

Do you have a similar project idea?

Contact us — and get a project cost estimate for free!