Simultaneous Localisation and Mapping (SLAM) is an increasingly important topic (especially in the creatio process of 3D configurators) within the Computer Vision Community and is of particular interest in the Augmented und Virtual Reality industry. Since a large number of SLAM systems from science and industry are available, it is worth exploring what SLAM is. This article gives a brief introduction to what SLAM is and what it is used for in computer vision research and development, in particular augmented reality.
What is SLAM?
SLAM is not a particular algorithm or software, but refers to the problem of simultaneous localization, i.e. the position/orientation of a sensor in relation to its environment, while simultaneously mapping the structure of that environment. So when we talk about a “SLAM system” it means “a set of algorithms to solve the problem of simultaneous localization and mapping”.
SLAM is not necessarily just a computer vision problem and does not need to contain visual information. In fact, much of the early research was associated with ground-based robots equipped with laser scanners. For this article, however, I will focus mainly on visual SLAM – where the primary mode of perception is via a camera – as it is of major interest in the context of augmented reality, but many of the topics discussed can be applied more generally.
The requirement to restore both the camera`s position and the map when none is known distinguishes the SLAM problem from other tasks. For example, marker-based tracking is not SLAM because the marker image is known beforhand. Even 3D reconstruction with a fixed camera rig is not SLAM, because while the map is being restored, the positions of the cameras are already known. The challenge in SLAM is to restore both the camera position and the map structure, while initially neither is known.
An important difference between SLAM and other seemingly similar methods of pose and structure restoration is the requirement to work in real time. This is a somewhat shaky concept, but in general it means that the processing of each incoming camera image must be completed by the time the next one arrives, so that the camera pose is immediately available and not as a result of post-processing. This distinguishes SLAM from techniques such as structure and motion, where a set of disordered images is processed offline to restore the 3D structure of an environment in a potentially time-consuming process. This can lead to impressive results, but what matters is that you don`t know where the camera is during shooting.
A short story about SLAM.
Research into the SLAM problem began within the robotics community, usually with wheeled robots traversing a flat base plate. Typically, this was achieved by combining sensor readings with information about the control input and the measured robot condition. This may seem far from tracking a handheld camera moving freely in space, but embodies many of SLAM`s core problems, such as creating a consistent and accurate map and making optimal use of multiple unreliable information sources.
Recently, the use of visual sensors has become an important aspect of SLAM research, also because an image is a rich source of information about the structure of the environment. Much of the research on visual SLAM uses stereo cameras or cameras alongside other sensors, but since about 2001 a number of studies have shown how SLAM successfully works with a single camera (known as monocular visual SLAM). One example is the groundbreaking work of Andrew Davison at Oxford University.
This was crucial in making SLAM a much more usefl technology, as devices equipped with a single camera – such as webcams and mobile phones – are much more common and accessible than special measuring devices. Recent work has shown how monocular visual SLAM can be used to create large-format maps, how maps can be automatically extended with meaningful 3D structures, and how extremely detailed shapes can be restored in real time. SLAM is an active field of computer vision research and new and improved techniques are constantly emerging.