An Inertial Newton Algorithm for Deep Learning

Camille Castera; Jérôme Bolte; Cédric Févotte; Edouard Pauwels

Pré-Publication, Document De Travail Année : 2020

An Inertial Newton Algorithm for Deep Learning

(1, 2) , (3) , (4, 2) , (5)

1
2
3
4
5

Camille Castera

Fonction : Auteur
PersonId : 175473
IdHAL : camille-castera
ORCID : 0000-0002-7384-6387

Université de Toulouse

Signal et Communications

Jérôme Bolte

Fonction : Auteur
PersonId : 995617

Université Toulouse Capitole

Cédric Févotte

Fonction : Auteur
PersonId : 184864
IdHAL : cedric-fevotte
ORCID : 0000-0003-3801-5534
IdRef : 083298460

Centre National de la Recherche Scientifique

Signal et Communications

Edouard Pauwels

Fonction : Auteur
PersonId : 12830
IdHAL : edouard-pauwels
ORCID : 0000-0002-8180-075X

Université Toulouse III - Paul Sabatier

Résumé

We introduce a new second-order inertial optimization method for machine learning called INDIAN. It exploits the geometry of the loss function while only requiring stochastic approximations of the function values and the generalized gradients. This makes INDIAN fully implementable and adapted to large-scale optimization problems such as the training of deep neural networks. The algorithm combines both gradient-descent and Newton-like behaviors as well as inertia. We prove the convergence of INDIAN for most deep learning problems. To do so, we provide a well-suited framework to analyze deep learning loss functions involving tame optimization in which we study the continuous dynamical system together with the discrete stochastic approximations. We prove sublinear convergence for the continuous-time differential inclusion which underlies our algorithm. Besides, we also show how standard optimization mini-batch methods applied to nonsmooth nonconvex problems can yield a certain type of spurious stationary points never discussed before. We address this issue by providing a theoretical framework around the new idea of $D$-criticality; we then give a simple asymptotic analysis of INDIAN. Our algorithm allows for using an aggressive learning rate of $o(1/\log k)$. From an empirical viewpoint, we show that INDIAN returns competitive results with respect to state of the art (stochastic gradient descent, ADAGRAD, ADAM) on popular deep learning benchmark problems.

Mots clés

Nonconvex optimization Deep Learning Algorithms Stochastic Optimization Deep learning nonconvex optimization second-order methods dynamical systems stochastic optimization

Deep Learning Optimisation Stochastique Optimisation non convexe Algorithmes pour le deep learning

Domaines

Apprentissage [cs.LG] Optimisation et contrôle [math.OC] Machine Learning [stat.ML]

Fichier principal

arxiv.pdf (1.38 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Camille Castera : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02140748

Soumis le : lundi 12 octobre 2020-18:49:58

Dernière modification le : lundi 20 novembre 2023-11:44:21

Dates et versions

hal-02140748 , version 1 (27-05-2019)

hal-02140748 , version 2 (06-06-2019)

hal-02140748 , version 3 (12-12-2019)

hal-02140748 , version 4 (12-10-2020)

hal-02140748 , version 5 (02-07-2021)

hal-02140748 , version 6 (20-08-2021)

Identifiants

HAL Id : hal-02140748 , version 4
ARXIV : 1905.12278

Citer

Camille Castera, Jérôme Bolte, Cédric Févotte, Edouard Pauwels. An Inertial Newton Algorithm for Deep Learning. 2020. ⟨hal-02140748v4⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

SMS

706 Consultations

346 Téléchargements

An Inertial Newton Algorithm for Deep Learning

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager