Categories: Augmented Reality

What you should know about ARCore and ARKit.

A few months ago, Google announced ARCore (well-suited to view 3D configurators in AR), its direct competitor to Apple`s ARKit. Obviously, this is good news for the augmented reality (AR) industry, but what does it really mean for developers and consumers?

Isn`t ARCore just Tango-lite?

A developer I was jokingly talking to said, “I just looked at the ARCore SDK and just renamed it to Tango SDK, commented on the Depth camera code, and changed a compiler flag.” I guess it`s a bit more than that, but not much more. For example, the new web browsers that support ARCore are fantastic for developers, but they are separate from the core SDK. In my last ARKit post, I wondered why Google hadn`t released a version of Tango VIO 12 months ago, as many components were partially taken over completely.

This is great news because ARCore is a very mature and well tested software and there is a rich roadmap of features developed for Tango that don`t all depend on 3D Depth data.

Apart from the name, if you have added the hardware for the Depth camera sensors to a mobile device running ARCore, then you have a Tango device. Now Google has a much easier way to achieve wide acceptance of the SDK by beaing able to offer it on the flagship smartphones. No one would give up a great Android phone for a worse one with AR. Now consumers are buying the smartphone they would have bought anyway and ARCore comes with it for free.

I`ve always found it interesting that Tango has long been described as “a mobile device that knows its location”. I´ve never met anybody who was impressed. I associate Tange more with a device hosted on Google Maps. AR is rather a later thought. With the new name, it`s always AR.

But what about all the calibration you talked about?

To make ARKit rock solid, Apple implemented a total of 3 types of hardware and software calibration: Geometric (simple), Photometric (hard) for the camera, and IMU error distance (crazy hard). The clock synchronization of the sensors is also of great importance.

Calibration is not a binary “yes, it is calibrated” or “no, it is not” situation. There are statistics and more steps to make the system stable enough for the application. The more “calibrated” the system is, the longer it will take for the errors in the pose calculations to become noticeable. Here`s something I think Google did;

First, they`re very careful about the devices they support. For now, they are the Samsung Galaxy S8 and Google Pixel. On both platforms, Google engineers were already working on supporting sensor calibration for inside-out tracking for Daydream VR. Google recently had engineers working with Samsung in South Korea to calibrate and tune sensors in their next mobile devices to fully support Daydream, so it`s not completely unthinkable to transfer some of that work to the S8. So we have two devices where the cameras and IMUs are already reasonably well calibrated and clocked (for Daydream).

For privacy reasons YouTube needs your permission to be loaded. For more details, please see our Datenschutzerklärung.

Google had spent a lot of time this year merging the Tango and Daydream SDKs. By the end of Auguts 2018, much of the low-level work was done, meaning that the Tango/ARCore VIO system could already take advantage of Daydream sensor integration.

Finally, the real benefits of calibration become visible at the outer limits of system performance. Both ARKit and ARCore can both walk quite well over many meters before the user notices a drift. I haven`t seen any head-to-head tests done over long time/distances, but it`s not really important. Deverlopers are still moving their heads around setting AR content right in front of you. The user can hardly understand that he can run freely over quite large distances. As far as the actual use of AR applications is concerned, differences in calibration are virtually impossible to detect. Until the developers cross the boundaries of the SDKs, Google is confident that there will be a new generation of devices on the market that integrate sensor calibration much more tightly at the factory.

For example, this week we talked to one of the largest IMU OEMs on the subject and he said that their mobile IMUs are factory calibrated to a single operating temperature to reduce costs. This means that the IMU hardware is tuned to provide the least error at this one temperature. If you continue to use the phone, it will get hotter & this will cause the IMU to behave slightly differently than it is calibrated and errors will occur. This is fine for most IMU applications, but for VIO, once the device has warmed up, the IMU measurements for calculating dead reckoning will become unreliable and the tracking data will be distorted. This OEM can easily start calibrating for multiple temperature ranges when asked, which means this is one less source of error than Google`s ARCore VIO code, which must eliminate device type for device type. Apple could overcome these challenges much faster, while Android has to wait for the changes to filter through an ecosystem.

Lighting.

For privacy reasons YouTube needs your permission to be loaded. For more details, please see our Datenschutzerklärung.

Both ARKit and ARCore provide a simple estimate of the natural light in the scene. This is an estimate for the scene, regardless of whether the real world is illuminated with ambient light or with sharp spotlights. ARKit returns intensity and color temperature to the developer, while ARCore provides either a single pixel intensity value or a shader. Both approaches seem to provide similar results from early demos. Subjectively, Google`s demos look a little better to me, but that may be because Tango developers have been working on them for some time. However, Google has already shown what is coming soon, namely the ability to dynamically adapt virtual shaders and reflections to the movements of real world lights. This will give an enormous boost in the oresent where we unconsciously believe that the content is “really there”.

Mapping.

Mapping is an area where ARCore today has a clear advantage over ARKit. Mapping is the “M” in SLAM. It refers to a data structure in memory that contains a lot of information about the 3D Real World scene that the tracker can use to locate it. Localizing just means finding out where I am on the map. If I blindfolded you and dropped you in the middle of a new city with a paper map, then that`s the process you`re going through by looking around, then looking at the map, and then looking around again until you figure out where you are on the map… that`s localization. At its simplest level, a SLAM map is a diagram  of 3D points representing a sparse point cloud, each point corresponding to the coordinates of an optical feature in the scene. they also usually have a lot of additional metadata, such as how “reliable” this point is, measured by the number of frames recently detected in the same coordinates. Some maps contain “keyframes”, which are just a single video image stored in the map every few seconds to help the tracker align the world with the map. Other maps use a dense point cloud, which is more reliable, but requires more GPU and memory: Both ARCore and ARKit currently ise expandable maps.

It works like this: When you start an ARCore / ARKit app, the tracker checks if there is a map that is pre-installed and ready to use, so the tracker initializes a new map by performing a stereo calculation, as I wrote in my last post. This means that we now have a nice little 3D map that shows what is in the camera`s field of view. When you start moving and move new parts of the background scene into the field of view, more 3D points are added to the map and it gets bigger and bigger. This was neber a problem before because the trackers were so bad that they would drift away before the map became too big to manage. This is no longer the case and map management is the place where much of the interesting work in SLAM takes place. ARKit uses a “slider window” for its map, which simply means that it only stores a variable amount of the recent past in the map and discards everything old. The assumption is that you`ll never have to relocalize against the scene from a while back. ARCore manages a larger map, which means the system should be more reliable.

So the point is that with ARCore, even if you lose tracking, it will recover better an you won`t be affected.

Both ARCore and ARKit use a cooled concept called Anchors to make the map feel that it covers a larger physical are that it does. I first saw this concept in Holelens, who as usual are a year or more ahead… Usually the system manages the map completely invisible to the user or app developer. Anchors allow the developer to communicate with the system: “Remember this part of the map here, don`t throw it away:” The physical size of the anchor is 1m x 1m square meters. The developer usually drops anchor when content is placed in a physical location. This means that if the user then walks at anchor, the map around the physical location where the content is supposed to exist is thrown away and the content is lost. With anchors, the content always stays where it should be, with the worst UX effect being a possible minor error in the content when the system relocalizes and jumps to correct the accumulated drift.

The purpose of the map is to help the tracker in two ways: The first is that as I move my phone back and forth, the map is built from the initial movement, and on the way back, the features recognized in real time can be compared with the features stored in the map. This helps to make tracking more stable by calculating only the most reliable features from the current & previous view of the scene in the pose calculation.

The second way the may helps is to locate tracking. There will come a time when you cover the camera, drop your phone or move too fast or something random happens and when the camera next sees the scene. It doesn`t match what the last update of the map said it should see. It was blindfolded and dropped off at a new location. That`s the definition of “I`ve lost Tracking” that groundbreaking AR developers would say about 1000 times a day in recent years. At this point the system can do 2 things:

  • just reset all coordinator systems and start over. That`s what a pure odometry system does. What you experience is that all your content jumps into a new position and stays there. It doesn`t create a good user experience.
  • Or the system can take the set of 3D features it sees at the moment and search the entire map to find a match that is then updated as the right virtual position and you can continue using the application as if nothing had happened. There are two problems here: 1) the bigger the map, the longer the search process takes, the more likely it is that the user can move again, which means that the search has to start again… and 2) the current position of the phone never exactly matches the position the phone has had in the past, so this also increases the difficulty of the map search and increases the calculation and time of relocation. So basically even with mapping, if you are too far away from the map, you are too… and the system has to be reset and restarted.

Note that when I refer to a large map for mobile AR, it means about a map that covers the physical area of a very large room or a very small apartment. Also note that this means that for outdoor AR we need to think about a completely new way of mapping.

Robust relocalization against a map is a very hard problem and will not be solved by anyone up to a consumer UX level nor IMO. Anyone who claims to offer multiplayer or persistent AR content will have severely limited their UX by the ability of a new phone to shift from a cold start to a map either created by Player 1 or donwloaded from the cloud. You`ll find that player 2 must be pretty close to player 1 and must hold their phone in roughly the same way. This is a PITA for users. They just want to sit on the couch opposite them and turn on their phone and immediately see what you see. Or to stand somewhere within a few meters of a previous position and see the “permanent” AR content.

Note that there are app-specific workarounds for multiplayers that you can also try out, such as using a marker ir hard coding a remote starting position for P2, etc. Technically, they can work, but you still have to explain to the user what they are doing.

Especially in China, the largest market for OEMs, GMS meets with great resistance and is not welcome. You are looking for an AR software solution that works globally (like any developer and AR toolmaker), should you rely on ARCore now? ARCore makes sense if you are an Android fan and already have a Samsung Galaxy S8 and Pixel. But if you`re more into iPhones , then it won`t be worth making the switch. Developers should focus on creating AR apps at this time, as this remains a big challenge. It will be far less effort to learn how to build on ARKit and ARCore than it will be to learn what you`re building. You should always remember that ARKit/ARCore are SDKs version 1.0. They are really basic (VIO, surface editing, background lighting) and will become much more extensive in the next few years (3D scene understanding, occlusion, multiplayer etc.). In the near future, developers and consumers will learn a lot in these areas. At the moment you should concentrate in learning what is hard enough and stick to what is known as the underlying technology (Android, iOS, Xcode etc.) Of course, you will have to make a decision about the platform for the launch against the background of relevant factors such as market reach, AR feature support, monetization etc.

Is ARCore better than ARKit?

As a technical solution, you will be very close to the limits of technical performance. Today, users can hardly effectively differentiate when it comes to the user experience you can create today. ARKit has some technical advantages around hardware/software integration and more reliable tracking. ARCore, on the other hand, offers some advantages in terms of mapping and more reliable recovery. Both benefits are usually only noticeable to computer vision engineers who know exactly what matters.

Apple has a clear Go to Market advantage with a large base of devices that immediately upgrade to the latest iOS that ARKit will receive. Apple users usually spend more money, so AR apps should better monetize ARKit in the medium term. Android`s advantage has always been the framework. It will take at least 12 months until the Android ecosystem has all the parts together and the deals are complete for ARCore to be supported by the hardware in most new devices.

ARCore has a nice advantage at the Tango R&D pipeline which can be used immediately and many of which have gone through at least a few user/market tests. It will be fun to see how quickly these systems will evolve over the next 12 to 24 months after the basics are put in place.

The technology is finally (just) good enough at the moment to build applications for the masses. If you are an AR developer and don`t have a Senior Product/Interaction Designer on your team yet, you need to seriously consider hiring another person.

The question of whether to prefer ARKit or ARCore depends largely on the developer`s personal preferences and goals. Both systems have their strenghts and weaknesses, but what is important is that both can provide a sufficiently good consumer experience that developers can explore with great freedom.

Thank you for reading.

3DMaster