Optimization of visual SLAM by semantic analysis of the environment

Optimization of visual SLAM by semantic analysis of the environment
(Optimisation du SLAM visuel par analyse sémantique de l'environnement réel)

Gonzalez, Mathieu - (2022-11-30) / Universite de Rennes 1
Optimization of visual SLAM by semantic analysis of the environment

Accéder au document :

https://ged.univ-rennes1.fr/nuxeo/site/esupversion...

Langue : Anglais

Directeur(s) de thèse: Marchand, Éric; Royan, Jérôme

Discipline : Signal, image, vision

Laboratoire : IRISA

Ecole Doctorale : MATHSTIC

Classification : Informatique, Sciences de l'ingénieur

Mots-clés : SLAM, sémantique, localisation, cartographie, suivi, vision par ordinateur, robotique, réalité augmentée

Robotique
Réalité augmentée
Vision par ordinateur

Résumé : Le but du SLAM (Simultaneous Localization And Mapping) est d'estimer la trajectoire d'une caméra en mouvement tout en cartograhiant l'espace. Les algorithmes classiques construisent généralement une cartographie purement géométrique et homogène, ainsi il y a un écart sémantique entre la représentation interne du SLAM et le monde réel dans lequel le système évolue. Notre but dans ce manuscrit est de construire un système de SLAM pouvant exploiter l'information sémantique pour repousser les limites du SLAM. Dans ce but nous proposons un réseau de neurones léger pour estimer la pose d'objets dans la scène. Les objets peuvent servir de repères haut niveau pour un SLAM, améliorant la pose de la caméra et ajoutant de l'information dans la cartographie. Puis nous proposons un SLAM capable de créer des groupes de points 3D correspondant à des objets génériques dans la scène. En utilisant une connaissance a priori sur la classe des objets nous pouvons estimer leur géométrie pour améliorer la cartographie et la pose de la caméra. Enfin, nous proposons un SLAM capable d'estimer la pose de la caméra dans des scène dynamiques tout en estimant la trajectoire de tous les objets dans la scène. Un a priori sur les objets nous permet de contraindre leur mouvement afin qu'il soit cohérent avec la structure du monde. Nous proposons également d'améliorer le suivi des objets en injectant des données LiDAR dans notre SLAM.

Abstract : The goal of SLAM (Simultaneous Localization and Mapping) is to estimate the trajectory of a moving camera while building a map of its environment. Classical algorithm usually build a purely geometrical and homogeneous map, thus there is a semantic gap between the internal representation and the real world in which the system is evolving. Our goal in this manuscript is to build a SLAM system that can harness semantic information to push forward the limits of SLAM. To this end, we first propose a light neural network to estimate the pose of objects in the scene. Objects can serve as high level landmarks for a SLAM system, improving camera pose and adding information into the map. This network however has to be trained for specific objects. We then propose a SLAM system that can create clusters of 3D points corresponding to generic objects in the scene. With some a priori knowledge about object classes we can estimate their geometry in real time to improve both the map and camera pose estimation. Finally we propose new SLAM able to robustly estimate camera pose in dynamic scenes and to estimate the trajectories of all moving objects in the scene. A priori knowledge allows us to constrain the movement of objects to be plausible with respect to the structure of the world. We also propose to improve object tracking by injecting LiDAR data into our SLAM system.