The main idea is that participants could be placed randomly in front of the video camera. This would significantly improve the usability of the entire solution. Thus, there is no need to build complex schemes of multiple cameras and additional equipment to place all the participants.
In addition to the particular 360-degree camera, our customer needed to implement a component that allows the user to detect the position of the speaker in live-mode and to transmit this data to the other side for which the playback is being performed. This is necessary so the participants on the other side can see the person who is currently speaking.
To implement the component for detecting the speaker, together with a 360-degree video camera, we decided to use a special microphone that allows the recording of spatial audio in Ambisonics A-format.
Then we needed to investigate and build a software solution (algorithm) for processing Ambisonics A-audio, which allows us to detect and calculate the direction vector to the loudest audio source. For that, we used the algorithms FFT, Convolution, AGC, HRTF, as well as a number of algorithms for signal processing of the OpenCV library. The main idea of this approach was to build a sound field map in polar coordinates.
Further, using algorithms for digital image processing (Threshold, Erode, Dilate, contour detection), the coordinates (the direction vector) of the loudest sound source are analyzed and calculated.
The app allows the user to:
Digital Signal Processing