Virtual (VR), Augmented (AR) and Mixed Reality (MR) are all terms that are used interchangeably in creating 3D configurators. But by nature they have different meanings in understanding these technologies.

Introduction Mixed Reality

In this article we will discuss the following contents:

  • Differences between VR, AR and MR
  • Displays and some of the trackers used
  • Processing techniques and representation
  • Microsoft HoloLens
  • Development of Apps for Windows Mixed Reality

Virtual reality.

VR aims to make you feel completely immersed in another world and hide everything else. It is a total virtual environment in which nothing of reality is visible. The user is in a completely different room than the actual location. The room is either computer-generated or recorded and completely closes the actual environment of the user.

VR technologies typically use compact, opaque head-mounted displays.

Augmented Reality.

AR is any type of computer technology that superimposes data on their current view as you continue to see the world around you. In addition to the visual data, you can also improve the sound. A GPS app that gives you directions as you walk could technically be an AR app. Augmented reality is the integration of digital and computer-generated information with the real environment in real time. In contrast to Virtual Reality, which creates a complete artificial environment, Augmented Reality uses the existing real environment and superimposes new information.

Augmented Reality headsets superimpose data, 3D objects and videos into their otherwise natural view. The objects do not necessarily have to be real emulations. For example, if you move a text in a vision, the text is not an emulation of a real object, but useful information.

Mixed Reality.

Mixed Reality expands the real world with virtual objects as if they were really placed in the real environment. The virtual objects lock their positions according to their real counterparts to create a seamless view (e.g. by placing a virtual cat or ball on a real table and leaving the cat or ball on the table while we walk around the table and look at it from different angles). Mixed Reality is everything that lies between completely real projections and completely virtual projections.

Mixed Reality works towards a seamless integration of augmented reality with their perception of the real world. This means linking “real” entities with “virtual” entities.

In the research work of Paul Milgram and Fumio Kishino from 1994 entitled “A Taxonomy of Mixed Reality Visual Displays” one of the first references of Mixed Reality appears. They define “the Continuum of Virtuality”, also known as “Reality-Virtuality (RV) Continuum”.

mixed reality taxonomie

Real Environment describes views or environments that contain only real things, it does not contain computer-generated elements. This includes what is observed through a conventional video representation of a real scene and the direct viewing of the same real scene through or without a simple glass. Virtual environment describes views or environments that contain only virtual or computer-generated objects. The entire environment is created virtually and does not contain a real object that is viewed directly or through a camera. A computer graphic simulation of an airplane is an example of a virtual environment. Mixed Reality is defined as an environment in which real and virtual world objects are presented in a single display. It lies somewhere between the fully real and fully virtual environments, i.e. the extremum of the virtual continuum.

Augmented Reality is the place where virtual objects are brought into the real world, like a heads-up display (HUD) in a flying windshield.

Augmented Virtuality refers to environments in which a virtual world contains certain elements of the real world, such as an image of the hand in a virtual environment or the ability to see other people in space within the virtual environment.

Classification based on display environments for MR.

  1. Absolutely real video displays such as computer monitors on which software-generated graphics or images are digitally superimposed.
  2. Video displays with immersive Head Mounted Displays (HMDs), but similar to monitors, the computer-generated images are superimposed electronically or digitally.
  3. Head-mounted displays with viewing function. These are partially immersive HMDs in which the computer-generated scenes or graphics are optically superimposed on the real scenes.
  4. Head-mounted displays without a viewing function, but the real, immediate outside world is shown on the screen with videos recorded with cameras. This displayed video or environment must orthoscopically match the real outside world. It creates a video view that corresponds to an optically real view.
  5. Fully graphical display environments to which video reality is added. These types of displays are closer to Augmented Virtuality than to Augmented Reality. They can be complete, partial or otherwise immersive.
  6. Fully graphical environments with real physical objects connected to the computer-generated virtual environment. These are partially immersive environments. An example is a fully virtual space where the user’s physical hand can be used to grab a virtual doorknob.

Displays that are used for Mixed Reality.

Monitors: Conventional computer monitors or other display devices such as televisions, etc.

Handheld devices: These include mobile phones and tablets with a camera. The real scene is captured by the camera and the virtual scene is dynamically added.

Head-Mounted Displays: These are display devices that can be mounted on the user’s head, with the display floating in front of their eyes. These devices use six degrees of freedom sensors that monitor virtual information and allow the system to adapt virtual information to the physical world and thus to the user’s head movements. VR headsets and Microsoft HoloLens are examples.

Glasses: These devices are worn like normal glasses. The glasses can contain cameras to intercept the real world view and display the extended view of the eye parts. In some devices, the AR image is projected or reflected by the lens parts of the lens. Google Glass and the like are examples of such displays.

Head-Up Displays: The display is at eye level and the user does not need to move his gaze to capture the information. One example is information projected onto windscreens or cars and flights.

Contact lens: In the near future, these will include ICs, LEDs and antennas for wireless communication.

Virtual Retina Display: With this technology, a display is scanned directly onto the retina of the human eye.

Spatial Augmented Reality: SAR uses multiple projectors to display virtual or graphical information on real objects. With SAR, the display is not tied to a single user. It can therefore be scaled to groups of users, enabling collocated collaboration between users.

Tracker to define the Mixed Reality scene.

Digital cameras and other optical sensors: Used to capture video and other optical information from the real scene.

Accelerometers: Used to calculate the movement, speed and acceleration of the device with respect to other objects in the scene.

GPS: Used to identify the geoposition of the device, which in turn can be used to provide location-specific inputs to the scene.

Gyroscopes: Used to measure and maintain the orientation and angular velocity of the device.

Solid State Compasses: Used to calculate the direction of the device.

RFID or Radio Frequency Identification: Works by attaching tags to objects in the scene that can be read by the device via radio. RFIDs can be read even when the tag is out of sight and several meters away.

Other wireless sensors are also used to track devices in real/virtual space.

Techniques for processing scenes and representations.

Image Registration.

Registration is the process by which pixels in two images exactly match the same points in the scene. The two images can be images from different sensors simultaneously (stereo imaging) or from the same sensor at different times (remsote sensing). In an augmented reality platform, the two images can mean the real and the virtual environment. This regsitration is necessary so that the images can be combined to improve information extraction.

Image Registration uses different methods of Computer Vision that deal with methods by which computers achieve a high level of understanding of digital information such as images or videos. Computer Vision methods related to video tracking are particularly useful in mixed reality environments.

Video tracking is the process of localizing moving objects that are trapped in a camera over time. The goal is to map target objects to different video images. To do this, algorithms are available that analyze sequential video images, examine the motion of the target objects, and output them. Two main components of a visual tracking system are target representation and localization (identification of the object and determination of its position) as well as filtering and data association (inclusion of previous scene knowledge, treatment of the dynamics of the object and evaluation of different hypotheses).

Common algorithms for the representation and localization of targets are:

  1. Kernel-based tracking – a localization technique based on maximizing a similarity measure (real-world function that quantifies the similarity between two objects). This works iteratively. This method is also called Mean Shift Tracking.
  2. Contour Tracking – These methods develop an initial contour from the previous frame to its new position in the next frame. This is a method to derive object boundaries e.g. Condensation Algorithm.

Augmented Reality Mark-up Language (ARML).

ARML is a data standard used to describe and define augmented reality (AR) scenes and their interactions with the real world. ARML is developed by a special working group for ARML 2.0 standards. ARML is part of the OGC (Open Geospatial Consortium). It contains XML grammar to describe the position and appearance of virtual entities in the AR scene. It also contains ECMAScript bindings, a scripting language that provides dynamic access to virtual element properties and event handling.

Because ARML is based on a generic object model, it enables serialization in multiple languages. ARML now defines XML and JSON serialization for the ECMAScript bindings. The ARML object model consists of the following concepts:

Features: This describes the physical object to be added to the real scene. The object is described by a set of metadata like ID, Name, Description, etc. A feature has one or more anchors.

Visual Assets: This describes the appearance of the virtual objects in the Augmented Scene. Visual assets that can be described include plain text, images, HTML content, and 3D models. They can be oriented and scaled.

Anchor: An anchor describes the position of the physical object in the real world. Four different Anchor types are Geometries, Trackables, RelativeTo, and ScreenAnchor.

Microsoft HoloLens.

HoloLens is essentially a holographic computer integrated into a headset with which you can see, hear and interact with holograms in an environment such as a living room or office. HoloLens is a Windows 10 PC in itself, unlike other mixed reality devices like Google Glass, which are just peripherals or accessories that need to be wirelessly connected to another processing device. HoloLens incorporates high-resolution glass and surround technology to create this immersive, interactive holographic experience.

Input for a HoloLens can be received via glances, gestures, voice, gamepad and motion controller. It also receives perceptual and spatial features such as coordinates, surround sound and spatial mapping. As mixed reality devices, including HoloLens, use the inputs already available with Windows, including mouse, keyboard, gamepads, and more. With HoloLens, hardware accessories connect to the device via Bluetooth.

Develop for Mixed Reality.

Windows 10 OS is built from the bottom up to be compatible with Mixed Reality devices. Apps developed for Windows 10 are therefore compatible with multiple devices, including HoloLens and other immersive headsets. Which environments are used to develop MR devices depends on the type of app we want to create.

For 2D applications, we can use any tool to develop universal Windows applications suitable for all Windows environments (Windows Phone, PC, Tablets, etc.). These apps are experienced as 2D projections and can work across multiple device types.

But the immersive and holographic applications need tools that take advantage of the Windows Mixed Reality API. Visual Studio can use 3D development tools such as Unity 3D to build such applications. If you are interested in developing your own engine, we can use DirectX and other Windows APIs.

Universal Windows platform applications exported from Unity run on any Windows 10 device. But for HoloLens, we should take advantage of features that are only available on HoloLens. To achieve this, we need to set TargetDeviceFamily to “Windows.Holographic” in the Package.appxmanifest file in Visual Studio. The resulting solution can be executed on the HoloLens emulator.

The view is how the focus is placed on holograms. It is the center of the field of view when a user looks through the HoloLens and is essentially a mouse pointer. This cursor can be developed individually for your app. HoloLens uses the position and orientation of the user’s head, not his eyes, to determine his gaze vector. As soon as the object is targeted with gaze, gestures can be used for the actual interaction. The most common gesture is the “tap”, which works like a left click. “Tap and Hold” can be used to move objects in 3D space. Events can also be triggered with custom voice commands. RayCast, GestureRecognizer, KeywordRecognizer are some of the objects and OnSelect, OnPhraseRecognized, OnCollisionEnter, OnCollisionStay and OnCollisionExit are some of the useful event handlers that can be used in development environments to capture these interactions.

Why discuss now?

It is rather impossible to explain the topic in a short blog, but the idea is to open the doors to the possibilities and development areas of the Mixed Reality environment and talk about them. Mixed Reality has an immense scope, including and not limited to areas such as literature, archaeology, fine arts, commerce, architecture, education, medicine, industrial design, flight training, military, emergency management, video games and more.

For software developers and other stakeholders developing traditional software for desktop, mobile and enterprise environments, creating apps and experiences for HoloLens and similar devices is a step into the unknown. The way we think about experience in 3D space will challenge and change our traditional understanding of software or application development. The entire development environment will change immediately, if not sooner.

The future of digital realities such as augmented and mixed realities is shining bright and early capture will make it possible to take full advantage of the platform.

Thank you very much for your visit.