Fast threshold optimization for multi-label audio tagging using Surrogate gradient learning

Thomas Pellegrini; Timothée Masquelier

doi:10.1109/ICASSP39728.2021.9414091

Communication Dans Un Congrès Année : 2021

Fast threshold optimization for multi-label audio tagging using Surrogate gradient learning

(1) , (2)

1
2

Thomas Pellegrini

Fonction : Auteur
PersonId : 741962
IdHAL : thomas-pellegrini
ORCID : 0000-0001-8984-1399
IdRef : 127577955

Équipe Structuration, Analyse et MOdélisation de documents Vidéo et Audio

Timothée Masquelier

Fonction : Auteur
PersonId : 742027
IdHAL : timothee-masquelier
ORCID : 0000-0001-8629-9506
IdRef : 129018015

Centre de recherche cerveau et cognition

Résumé

Multi-label audio tagging consists of assigning sets of tags to audio recordings. At inference time, thresholds are applied on the confidence scores outputted by a probabilistic classifier, in order to decide which classes are detected active. In this work, we consider having at disposal a trained classifier and we seek to automatically optimize the decision thresholds according to a performance metric of interest, in our case F-measure (micro-F1). We propose a new method, called SGL-Thresh for Surrogate Gradient Learning of Thresholds, that makes use of gradient descent. Since F1 is not differentiable, we propose to approximate the thresholding operation gradients with the gradients of a sigmoid function. We report experiments on three datasets, using state-of-the-art pre-trained deep neural networks. In all cases, SGL-Thresh outperformed three other approaches: a default threshold value (defThresh), an heuristic search algorithm and a method estimating F1 gradients numerically. It reached 54.9\% F1 on AudioSet eval, compared to 50.7% with defThresh. SGL-Thresh is very fast and scalable to a large number of tags. To facilitate reproducibility, data and source code in Pytorch are available online: https://github.com/topel/SGL-Thresh

Mots clés

Audio tagging Surrogate Gradient Learning Automatic threshold optimization

Domaines

Intelligence artificielle [cs.AI] Son [cs.SD]

Fichier principal

ICASSP_2021_seuils.pdf (117.51 Ko)

main_icassp2021.pdf (80.61 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Thomas Pellegrini : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03153644

Soumis le : vendredi 26 février 2021-15:38:35

Dernière modification le : lundi 20 novembre 2023-11:44:22

Archivage à long terme le : jeudi 27 mai 2021-18:49:51

Dates et versions

hal-03153644 , version 1 (26-02-2021)

Identifiants

HAL Id : hal-03153644 , version 1
ARXIV : 2103.00833
DOI : 10.1109/ICASSP39728.2021.9414091

Citer

Thomas Pellegrini, Timothée Masquelier. Fast threshold optimization for multi-label audio tagging using Surrogate gradient learning. IEEE International Conference on Acoustics, Speech and Signal Processing, Jun 2021, Toronto, Canada. pp. 651-655, ⟨10.1109/ICASSP39728.2021.9414091⟩. ⟨hal-03153644⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSERM UNIV-TLSE2 CNRS CERCO SMS UT1-CAPITOLE IRIT IRIT-SAMOVA ANR ANITI IRIT-SI TOULOUSE-INP UNIV-UT3 UT3-TOULOUSEINP

187 Consultations

126 Téléchargements

Fast threshold optimization for multi-label audio tagging using Surrogate gradient learning

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager