Perspectives for speech transcription and analysis
We thought about buying 10 cheap lavaliers microphones to optimize locutor separation. There are two options :
- one consisting in live acquisition on a computer, requiring an audio interface with 10 inputs
- the other consisting in wireless lavaliers with internal memory storage, so that the files can be transfered to a computer at the end of the recording session
The second option seems much easier to deploy. The only problem is that the number of wireless lavaliers that you can remotely operate is limited (for example a DJI Pocket 3 camera can deal with 2 microphones maximum), so you’ll have to buy the bundles with the dedicated interfaces, in order to centralize the monitoring. Such as the DJI Mic 2 pack or the Neewer CM28 pack.
The last option is automatic locutor recognition and separation from a multi-locutor audio file, which is called speaker “diarization” (the most famous automatic transcription software, Dragon, doesn’t implement diarization).
- Top Free and Commercial Speaker Diarization APIs and SDKs - https://picovoice.ai/blog/top-speaker-diarization-apis-and-sdks/
- Commercial tools
- Open source ones :
Other resources for automatic transcription (single locutor) :
- https://github.com/bugbakery/audapolis/
- https://github.com/coqui-ai/TTS
- https://github.com/ideasman42/nerd-dictation
- https://alphacephei.com/vosk/integrations
Other resources for manual transcription and annotation :
- https://dictoapp.github.io/dicto/
- https://github.com/zenml-io/awesome-open-data-annotation
- https://www.annotationstudio.org/
- https://en.wikipedia.org/wiki/ELAN_software
- https://chronoviz.com/
- https://www.advene.org/
- https://www.iri.centrepompidou.fr/outils/lignes-de-temps/
- https://www.iri.centrepompidou.fr/outils/metadata-player-2/
- https://shs.cairn.info/article/DN_133_0153 - L’annotation discursive et sémantique pour la pratique de « débats 2.0 »