Please use this identifier to cite or link to this item: http://dx.doi.org/10.18419/opus-9005
|Title:||Data provisioning in simulation workflows|
|Abstract:||Computer-based simulations become more and more important, e.g., to imitate real-world experiments such as crash tests, which would otherwise be too expensive or not feasible at all. Thereby, simulation workflows may be used to control the interaction with simulation tools performing necessary numerical calculations. The input data needed by these tools often come from diverse data sources that manage their data in a multiplicity of proprietary formats. Hence, simulation workflows additionally have to carry out many complex data provisioning tasks. These tasks filter and transform heterogeneous input data in such a way that underlying simulation tools can properly ingest them. Furthermore, some simulations use different tools that need to exchange data between each other. Here, even more complex data transformations are needed to cope with the differences in data formats and data granularity as they are expected by involved tools. Nowadays, scientists conducting simulations typically have to design their simulation workflows on their own. So, they have to implement many low-level data transformations that realize the data provisioning for and the data exchange between simulation tools. In doing so, they waste time for workflow design, which hinders them to concentrate on their core issue, i.e., the simulation itself. This thesis introduces several novel concepts and methods that significantly alleviate the design of the complex data provisioning in simulation workflows. Firstly, it addresses the issue that most existing workflow systems offer multiple and diverse data provisioning techniques. So, scientists are frequently overwhelmed with selecting certain techniques that are appropriate for their workflows. This thesis discusses how to conquer the multiplicity and diversity of available techniques by their systematic classification. The resulting classes of techniques are then compared with each other considering relevant functional and non-functional requirements for data provisioning in simulation workflows. The major outcome of this classification and comparison is a set of guidelines that assist scientists in choosing proper data provisioning techniques. Another problem with existing workflow systems is that they often do not support all kinds of data resources or data management operations required by concrete computer-based simulations. So, this thesis proposes extensions of conventional workflow languages that offer a generic solution to data provisioning in arbitrary simulation workflows. These extensions allow for specifying any data management operation that may be described via the query or command languages of involved data resources, e.g., arbitrary SQL statements or shell commands. The proposed extensions of workflow languages still do not remove the burden from scientists to specify many complex data management operations using low-level query and command languages. Hence, this thesis introduces a novel pattern-based approach that even further enhances the abstraction support for simulation workflow design. Instead of specifying many workflow tasks, scientists only need to select a small number of abstract patterns to describe the high-level simulation process they have in mind. Furthermore, scientists are familiar with the parameters to be specified for the patterns, because these parameters correspond to terms or concepts that are related to their domain-specific simulation methodology. A rule-based transformation approach offers flexible means to finally map high-level patterns onto executable simulation workflows. Another major contribution is a pattern hierarchy arranging different kinds of patterns according to clearly distinguished abstraction levels. This facilitates a holistic separation of concerns and provides a systematic framework to incorporate different kinds of persons and their various skills into workflow design, e.g., not only scientists, but also data engineers. Altogether, the pattern-based approach conquers the data complexity associated with simulation workflows, which allows scientists to concentrate on their core issue again, namely on the simulation itself. The last contribution is a complementary optimization method to increase the performance of local data processing in simulation workflows. This method introduces various techniques that partition relevant local data processing tasks between the components of a workflow system in a smart way. Thereby, such tasks are either assigned to the workflow execution engine or to a tightly integrated local database system. Corresponding experiments revealed that, even for a moderate data size of about 0.5 MB, this method is able to reduce workflow duration by nearly a factor of 9.|
|Appears in Collections:||05 Fakultät Informatik, Elektrotechnik und Informationstechnik|
Files in This Item:
|Dissertation_Peter_Reimann_DataProvisioningSimWorkflows.pdf||4,88 MB||Adobe PDF||View/Open|
Items in OPUS are protected by copyright, with all rights reserved, unless otherwise indicated.