15 Sep 2025

Perspectives for speech transcription and analysis

We thought about buying 10 cheap lavaliers microphones to optimize locutor separation. There are two options :

one consisting in live acquisition on a computer, requiring an audio interface with 10 inputs
the other consisting in wireless lavaliers with internal memory storage, so that the files can be transfered to a computer at the end of the recording session

The second option seems much easier to deploy. The only problem is that the number of wireless lavaliers that you can remotely operate is limited (for example a DJI Pocket 3 camera can deal with 2 microphones maximum), so you’ll have to buy the bundles with the dedicated interfaces, in order to centralize the monitoring. Such as the DJI Mic 2 pack or the Neewer CM28 pack.

The last option is automatic locutor recognition and separation from a multi-locutor audio file, which is called speaker “diarization” (the most famous automatic transcription software, Dragon, doesn’t implement diarization).

Top Free and Commercial Speaker Diarization APIs and SDKs - https://picovoice.ai/blog/top-speaker-diarization-apis-and-sdks/
Commercial tools
Open source ones :
- Github list : https://github.com/wq2012/awesome-diarization
- https://github.com/pyannote/pyannote-audio

Other resources for automatic transcription (single locutor) :

Other resources for manual transcription and annotation :

https://dictoapp.github.io/dicto/
https://github.com/zenml-io/awesome-open-data-annotation
https://www.annotationstudio.org/
https://en.wikipedia.org/wiki/ELAN_software
https://chronoviz.com/
https://www.advene.org/
https://www.iri.centrepompidou.fr/outils/lignes-de-temps/
https://www.iri.centrepompidou.fr/outils/metadata-player-2/
https://shs.cairn.info/article/DN_133_0153 - L’annotation discursive et sémantique pour la pratique de « débats 2.0 »