To implement the component for detecting the speaker, together with a 360-degree video camera, we decided to use a special microphone that allows the recording of spatial audio in Ambisonics A-format.
Then we needed to investigate and build a software solution (algorithm) for processing Ambisonics A-audio, which allows us to detect and calculate the direction vector to the loudest audio source. For that, we used the algorithms FFT, Convolution, AGC, HRTF, as well as a number of algorithms for signal processing of the OpenCV library. The main idea of this approach was to build a sound field map in polar coordinates.
Further, using algorithms for digital image processing (Threshold, Erode, Dilate, contour detection), the coordinates (the direction vector) of the loudest sound source are analyzed and calculated.
The app allows the user to: