Distributed stream processing in a global sensor grid for scientific simulations

Benzing, Andreas

Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen: http://dx.doi.org/10.18419/opus-3603

Autor(en):	Benzing, Andreas
Titel:	Distributed stream processing in a global sensor grid for scientific simulations
Sonstige Titel:	Verteilte Datenstromverarbeitung in einem Global Sensor Grid für wissenschaftliche Simulationen
Erscheinungsdatum:	2015
Dokumentart:	Dissertation
URI:	http://nbn-resolving.de/urn:nbn:de:bsz:93-opus-103578 http://elib.uni-stuttgart.de/handle/11682/3620 http://dx.doi.org/10.18419/opus-3603
Zusammenfassung:	With today's large number of sensors available all around the globe, an enormous amount of measurements has become available for integration into applications. Especially scientific simulations of environmental phenomena can greatly benefit from detailed information about the physical world. The problem with integrating data from sensors to simulations is to automate the monitoring of geographical regions for interesting data and the provision of continuous data streams from identified regions. Current simulation setups use hard coded information about sensors or even manual data transfer using external memory to bring data from sensors to simulations. This solution is very robust, but adding new sensors to a simulation requires manual setup of the sensor interaction and changing the source code of the simulation, therefore incurring extremely high cost. Manual transmission allows an operator to drop obvious outliers but prohibits real-time operation due to the long delay between measurement and simulation. For more generic applications that operate on sensor data, these problems have been partially solved by approaches that decouple the sensing from the application, thereby allowing for the automation of the sensing process. However, these solutions focus on small scale wireless sensor networks rather than the global scale and therefore optimize for the lifetime of these networks instead of providing high-resolution data streams. In order to provide sensor data for scientific simulations, two tasks are required: i) continuous monitoring of sensors to trigger simulations and ii) high-resolution measurement streams of the simulated area during the simulation. Since a simulation is not aware of the deployed sensors, the sensing interface must work without an explicit specification of individual sensors. Instead, the interface must work only on the geographical region, sensor type, and the resolution used by the simulation. The challenges in these tasks are to efficiently identify relevant sensors from the large number of sources around the globe, to detect when the current measurements are of relevance, and to scale data stream distribution to a potentially large number of simulations. Furthermore, the process must adapt to complex network structures and dynamic network conditions as found in the Internet. The Global Sensor Grid (GSG) presented in this thesis attempts to close this gap by approaching three core problems: First, a distributed aggregation scheme has been developed which allows for the monitoring of geographic areas for sensor data of interest. The reuse of partial aggregates thereby ensures highly efficient operation and alleviates the sensor sources from individually providing numerous clients with measurements. Second, the distribution of data streams at different resolutions is achieved by using a network of brokers which preprocess raw measurements to provide the requested data. The load of high-resolution streams is thereby spread across all brokers in the GSG to achieve scalability. Third, the network usage is actively minimized by adapting to the structure of the underlying network. This optimization enables the reduction of redundant data transfers on physical links and a dynamic modification of the data streams to react to changing load situations. Mit der großen Anzahl an Sensoren, die um den Globus verteilt sind, ist eine enorme Menge an Messwerten zur Integration in Anwendungen verfügbar geworden. Insbesondere wissenschaftliche Simulationen von Umweltphänomenen können von genauen Informationen über die physikalische Welt profitieren. Bei der Integration von Sensordaten in Simulationen ist dabei das automatisierte Überwachen von geographischen Regionen auf relevante Informationen und die anschließende Bereitstellung von kontinuierlichen Datenströmen ein Problem. Aktuell werden Sensoren üblicherweise mit statischen Konfigurationen oder gar manuell an Simulationen angebunden. Während dieser Ansatz sehr robust ist, verursacht die Veränderung der Sensoren oder der Simulationen dabei hohe Kosten. Ähnliches gilt für die Verarbeitung der Daten, wobei hier einer robusten manuellen Kontrolle die Kosten der Verabeitungsdauer gegenüber stehen. Für kleinere drahtlose Sensornetze wurden daher bereits Ansätze vorgestellt um den Messprozess in einem eigenen System zu automatisieren. Allerdings zielen diese Ansätze auf die Lebenszeit dieser Netze und nicht auf die hochauflösenden Datenströme, welche von Simulationen benötigt werden. Um Sensordaten für wissenschaftliche Simulationen zu liefern, müssen zwei Aufgaben gelöst werden: i) Sensoren müssen kontinuierlich überwacht werden, um bei interessanten Beobachtungen Simulationen starten zu können und ii) hochauflösende Datenströme des simulierten Gebietes müssen bereitgestellt werden. Da eine Simulation im Allgemeinen nicht über die verfügbaren Sensoren informiert ist, müssen dabei die Sensoren durch geeignete Anfrageschnittstellen adressiert werden. Die Herausforderung besteht dabei darin, effizient die relevanten Sensoren zu identifizieren und zu überwachen sowie Datenströme skalierbar an eine große Zahl von Simulationen zu liefern. Dabei muss das System dynamisch auf die Gegebenheiten im Internet reagieren können. Das Global Sensor Grid (GSG), welches in dieser Arbeit vorgestellt wird, geht drei Kernprobleme an: Erstens wurde ein verteiltes Aggregationsschema entwickelt, um die effiziente Überwachung von geographischen Regionen zu ermöglichen. Zweitens wird die skalierbare Verteilung von Datenströmen mit unterschiedlichen Auflösungen durch ein Netzwerk von Brokern ermöglicht. Drittens wird die Netzwerklast aktiv minimiert, indem sich das System an das zugrundeliegende Netzwerk anpasst.
Enthalten in den Sammlungen:	05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Dateien zu dieser Ressource:

Datei	Beschreibung	Größe	Format
diss_benzinas.pdf		3,61 MB	Adobe PDF	Öffnen/Anzeigen

Zur Langanzeige

Alle Ressourcen in diesem Repositorium sind urheberrechtlich geschützt.

Universität Stuttgart

OPUS - Online Publikationen der Universität Stuttgart