End-to-end speaker segmentation for overlap-aware resegmentation

Hervé Bredin; Antoine Laurent

Communication Dans Un Congrès Année : 2021

End-to-end speaker segmentation for overlap-aware resegmentation

(1) , (2)

1
2

Hervé Bredin

Fonction : Auteur
PersonId : 15856
IdHAL : hbredin
ORCID : 0000-0002-3739-925X
IdRef : 121165779

Équipe Structuration, Analyse et MOdélisation de documents Vidéo et Audio

Antoine Laurent

Fonction : Auteur
PersonId : 13586
IdHAL : antoine-laurent
ORCID : 0000-0002-2653-1008
IdRef : 147099072

Laboratoire d'Informatique de l'Université du Mans

Résumé

Speaker segmentation consists in partitioning a conversation between one or more speakers into speaker turns. Usually addressed as the late combination of three sub-tasks (voice activity detection, speaker change detection, and overlapped speech detection), we propose to train an end-to-end segmentation model that does it directly. Inspired by the original end-to-end neural speaker diarization approach (EEND), the task is modeled as a multi-label classification problem using permutation-invariant training. The main difference is that our model operates on short audio chunks (5 seconds) but at a much higher temporal resolution (every 16ms). Experiments on multiple speaker diarization datasets conclude that our model can be used with great success on both voice activity detection and overlapped speech detection. Our proposed model can also be used as a post-processing step, to detect and correctly assign overlapped speech regions. Relative diarization error rate improvement over the best considered baseline (VBx) reaches 17% on AMI, 13% on DIHARD 3, and 13% on VoxConverse.

Mots clés

speaker diarization speaker segmentation voice activity detection overlapped speech detection resegmentation

Domaines

Intelligence artificielle [cs.AI] Informatique et langage [cs.CL] Réseau de neurones [cs.NE]

Fichier principal

2104.04045.pdf (438.9 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Antoine LAURENT : Connectez-vous pour contacter le contributeur

https://univ-lemans.hal.science/hal-03257524

Soumis le : vendredi 11 juin 2021-08:48:06

Dernière modification le : mardi 16 janvier 2024-16:27:01

Archivage à long terme le : dimanche 12 septembre 2021-18:19:52

Dates et versions

hal-03257524 , version 1 (11-06-2021)

Identifiants

HAL Id : hal-03257524 , version 1
ARXIV : 2104.04045

Citer

Hervé Bredin, Antoine Laurent. End-to-end speaker segmentation for overlap-aware resegmentation. Interspeech 2021, Aug 2021, Brno, Czech Republic. ⟨hal-03257524⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLSE2 CNRS UNIV-LEMANS UT1-CAPITOLE GENCI LIUM LIUM-LST IRIT IRIT-SAMOVA ANR IRIT-SI IRIT-CNRS TOULOUSE-INP UNIV-UT3 UT3-TOULOUSEINP

234 Consultations

184 Téléchargements

End-to-end speaker segmentation for overlap-aware resegmentation

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager