Linking Physical and Virtual Worlds with Visual Markers and Handheld Devices Michael Rohs PhD thesis No. 16193 ETH Zurich, Zurich, Switzerland August 2005 Abstract Linking the physical and the virtual world is a major research theme of ubiquitous and pervasive computing. This dissertation describes concepts and techniques for linking information and services to physical objects as well as for interacting with this information using mobile devices and embodied user interfaces. Such interfaces use gestures on the device body as a means of input. In the recent past, there have been considerable research efforts in linking computation to physical objects. However, these projects were mainly concerned with the physical linking technology per se or with the infrastructure required for identifier resolution. Other work on manipulative and embodied user interfaces focused on improving interaction with a handheld device itself, but did not integrate physical objects of the user's environment. In our work, we combine physical linking and embodied interaction and allow the interaction semantics to be a function of the object and the gestural sequence. The proposed approach uses camera phones and similar devices as mobile sensors for two-dimensional visual markers. We not only retrieve the value that is encoded in the marker, but also detect the spatial orientation of the device relative to the marker in real time. We use the detected orientation for embodied interaction with the device and augment the live camera image according to the orientation with graphical overlays. By providing a video see-through augmented reality view on the background, the handheld device embodies a "symbolic magnifying glass." This allows for fine-grained interaction and enhances the currently limited input capabilities of mobile devices. We call this approach marker-based interaction. It turns camera phones and similar devices into versatile interfaces to - and mediators for - real-world objects. In this thesis, we present a system for recognizing two-dimensional visual markers. The markers we developed are called visual codes. The recognition system provides a number of parameters for determining the spatial orientation of the device relative to the marker, such as the target point in code coordinates, rotation, tilting, distance, and movement of the device relative to the background. It is specifically designed for the requirements of mobile phones with limited computing capabilities and low resolution cameras. Moreover, the system provides the basis for augmenting objects in the live camera image with precisely aligned graphical overlays. Based on this foundation we have developed several mechanisms and concepts for marker-based interaction, namely: (1) a framework of physical interaction primitives, (2) marker-based interface elements, called visual code widgets, (3) interaction techniques for large-scale displays, and (4) handheld augmented reality applications. Our conceptual framework of physical interaction primitives enables the use of camera-equipped mobile devices as embodied user interfaces, in which users can specify input through physical manipulations and orientation changes of the device. The framework defines a set of fundamental physical gestures that form a basic vocabulary for describing interaction when using mobile devices capable of reading visual codes. These interaction primitives can be combined to create more complex and expressive interactions. The interaction primitives and their combinations have been evaluated in a usability study. In comparison to interaction primitives, visual code widgets operate at a higher level of abstraction. Visual code widgets are printable elements of physical user interfaces, comparable to the interactive elements of conventional graphical user interfaces. Each widget type addresses a particular input problem and encapsulates a specific behavior and functionality. Visual code widgets thus define building blocks for applications that incorporate mobile devices as well as resources in the user's environment, such as paper documents, posters, and public electronic displays. For large-scale displays, we have developed two interaction techniques that rely on visual movement detection and visual code recognition, respectively. The first one enables relative positioning of a cursor and is suited for direct manipulation of objects that are visible on the screen. The second one allows for absolute positioning on the screen and can be used for the selection of displayed objects. Both techniques have been evaluated in a qualitative usability study and are especially useful for displays that are not available for direct touch-based interaction, such as displays in public spaces. The concepts and techniques that were developed in the scope of this dissertation have been investigated in various application areas. Examples that are detailed in the dissertation are: entry points into a smart campus environment, augmented board games, an interactive photo wall, a collaborative game for large-scale displays, digital annotations of physical objects, and smart product packaging.