A fundamental paradigm shift is currently taking place in the field of computing: due to the miniaturization of computing devices and the proliferation of embedded systems, tiny, networked computers can now be easily integrated into everyday objects, turning them into smart things. In the resulting Internet of Things, physical items are no longer disconnected from the virtual world but rather become accessible through computers and other networked devices, and can even make use of protocols that are widely deployed in the World Wide Web, in a paradigm that we call the Web of Things. Eventually, smart things will be able to communicate, analyze, decide, and act { and thereby provide an invisible background assistance that should make life more enjoyable, entertaining, and also safer. However, in an environment that is populated by hundreds of Web-enabled smart things, it will become increasingly difficult for humans to interact with devices that are relevant to their current needs, and to find, select, and control them. The objective of this thesis is to investigate how human users could be enabled to conveniently interact with individual smart objects in their surroundings and to interconnect devices and configure the resulting physical mashups to perform higher-level tasks on their behalf. To achieve basic interoperability between devices, we rely on the World Wide Web with its proven protocols and architectural patterns which emphasize scalability, generic interfaces, and loose coupling between components. As a first step to facilitate the interaction with smart things on top of the basic Web principles, we propose the embedding of metadata for automatically generating user interfaces for smart devices. Our specific approach enables not only the generation of more intuitive graphical widgets but also the mapping of interactive components to gesture-based, speech-based, and physical interfaces by describing the high-level interaction semantics of smart devices instead of specifying purely interface-specific information. The provisioning of an interaction mechanism with a smart object is thus reduced to the embedding of simple interaction information into the representation of the smart thing. Before users can start interacting with a smart device, it must, however, first be selected. To permit users to choose which of the many smart objects in their surroundings should be involved in an interaction, we propose to use technologies for optical image recognition. The visual selection of smart things and automatically generated user interfaces enable end users to conveniently interact with individual services in their surroundings that are embodied as specific physical objects. To complement the direct interaction with smart devices, the second part of this thesis focuses on more complex use cases where multiple smart objects must collaborate to achieve the user's goal. Such situations arise, for instance, in home or office automation scenarios, or in smart factories, where machines or assembly lines could adjust to better support the operator. To put users more in control of entire environments of smart devices, we present a system that records interactions between smart things and with remote services and displays this data to users in real time. To do this, we use an augmented reality overlay on the camera feed of handheld or wearable devices such as smartphones and smartglasses.Next, we propose a management infrastructure for smart things that makes the services they offer discoverable and composeable, and fully integrates them with more traditional Web-based information providers. This system enables humans to find and use data and functionality provided by physical devices and allows machines to support users in finding services within densely populated smart environments and even to discover and use required services themselves, on behalf of the user. The basis for these applications is a generic mechanism that allows smart devices to provide semantic descriptions of the services they offer. Specifically, our infrastructure supports the embedding of functional semantic metadata into smart things that describes which functionality a concrete object provides and how to invoke it. Based on this metadata, a semantic reasoning component can find out which composite tasks can be achieved by a user's smart environment and can provide instructions about how to reach concrete goals, thus enabling the configuration of entire smart environments for end users. As a concrete use case, we present a platform that applies our proposed interaction modes with smart things to automobiles: a mobile application recognizes cars, downloads information about them from a back-end server, and displays this information - as well as interaction capabilities with the car and its services - on the user's interface device. The back-end server furthermore exposes functional metadata about the capabilities of individual cars to make their services automatically usable within physical mashups. Finally, it records client interactions to enable car owners to monitor in real time who accesses which kind of data and services on their vehicles. The overarching objective of this thesis is to show how current technologies could support the interaction of end users with Web-enabled smart devices. To achieve this, we make use of a number of technologies from different areas of the computer science discipline: A management infrastructure makes smart things discoverable for human users and machines and builds upon current research in the distributed systems domain. State-of-the-art computer vision technologies allow users to select devices in their environment using handheld or wearable computers such as smartphones or smartglasses. Novel methods from the field of computer-human-interaction enable the embedding of metadata that allows for automatically generating user interfaces. Finally, semantic technologies enable exible compositions of smart things that collaborate to achieve the user's goal.