Institute of Architecture of Application Systems

University of Stuttgart
Universitätsstraße 38

D–70569 Stuttgart

Masterarbeit

Multi-Deployment-Technology
Instance Model Retrieval and

Instance Management

Alexandros Fouskas

Course of Study: Informatik

Examiner: Prof. Dr. Dr. h.c. Frank Leymann

Supervisor: Lukas Harzenetter, M.Sc.

Commenced: May 10, 2021

Completed: November 10, 2021


Abstract

Many enterprise applications are built up from multiple components. Deployment and management
of these applications is complex and error-prone, especially if performed manually. Thus, automation
is a key factor especially with the advent of cloud computing. To cope with this, a variety of
deployment technologies has been introduced in recent years. These technologies automate
the deployment and management of applications and have been widely adopted in industry and
research. Many organizations even use multiple deployment technologies in parallel. However,
the management capabilities provided by these technologies are often limited. Thus, complex
management operations, e. g., backups of all components, must still be executed manually. Moreover,
deployment technologies may interfere with management operations, so the deployment technologies
must be considered when executing the operations. This becomes even harder, if different deployment
technologies are used to deploy different parts of the application that should be managed. Thus, this
work extends the existing management workflow generation approach to support applications that
have been deployed by multiple deployment technologies. To achieve this, this work connects to
the APIs of the deployment technologies, in order to retrieve instance information. The retrieved
information is used to derive an instance model that represents the current state of the application.
The instance model is enriched with management functionality and is used to generate management
workflows that can be executed on-demand.

Kurzfassung

Viele Enterprise-Anwendungen bestehen aus einer Vielzahl von einzelnen Komponenten. Sowohl
Deployment als auch Management dieser Anwendungen sind komplexe und fehleranfällige Auf-
gaben, besonders wenn diese manuell ausgeführt werden. Daher stellt Automatisierung einen
Schlüsselfaktor dar, besonders mit dem Einzug von Cloud Computing. Um dem zu begegnen,
wurde in den letzten Jahren eine Vielzahl sogenannter Deployment Technologien eingeführt.
Diese Technologien automatisieren Deployment und Management von Anwendungen und haben
sowohl in Industrie als auch in der Forschung weite Verbreitung gefunden. Viele Organisationen
setzen sogar mehrere verschiedene Deployment Technologien gleichzeitig ein. Die Mangement-
funktionalitäten dieser Deployment Technologien sind jedoch meist begrenzt. Als Folge daraus,
müssen komplexe Managementoperationen, zum Beispiel Backups aller Komponenten, weiterhing
manuell ausgeführt werden. Zudem, können Deployment Technologien auch die Ausführung
von Managementoperationen behindern. Daher müssen die Deployment Technologien beim er-
stellen der Managementoperationen berücksichtigt werden. Durch den parallelen Einsatz mehrerer
Technologien wird dies noch erschwert, da unterschiedliche Komponenten einer Anwendung von
unterschiedlichen Technologien verwaltet werden. Aus diesem Grund erweitert diese Arbeit den
bestehenden Ansatz der Management-Workflow-Generierung, damit dieser auch Anwendungen
unterstützt, welche durche mehrere Deployment Technologien bereitgestellt werden. Um dies zu
erreichen, werden Instanzinformationen von den Schnittstellen der Deployment Technologien bezo-
gen und benutzt, um ein Instanzmodell zu erzeugen. Dieses Instanzmodell spiegelt den aktuellen
Zustand der Anwendung wieder. Darüber hinaus, wird das Instanzmodell mit zusätzlichen Manage-
mentoperationen angereichert. Das so angereicherte Instanzmodell wird benutzt um automatisiert
Management-Workflows zu generieren, welche bei Bedarf ausgeführt werden können.

3


Contents

1 Introduction 13

2 Foundations 17
2.1 Automated Application Deployment . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Technology and Orchestration Specification for Cloud Applications (TOSCA) . 24
2.3 OpenTosca . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Instance Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3 Related Work 31

4 Concept 35
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 Discovering involved Deployment Technologies . . . . . . . . . . . . . . . . . 37
4.3 Instance Model Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4 Instance Model Completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.5 Feature Enrichment and Workflow Generation . . . . . . . . . . . . . . . . . . 52

5 Implementation 55

6 Validation 59
6.1 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

7 Summary and Future Work 67

Bibliography 69

5


List of Figures

2.1 Example of an imperative deployment model . . . . . . . . . . . . . . . . . . . 18
2.2 An example topology template . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Puppet workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 An example type hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5 Structure of a CSAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6 Overview of the OpenTOSCA ecosystem . . . . . . . . . . . . . . . . . . . . . . 27
2.7 Example instance model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.1 Mapping of Kubernetes entities to TOSCA entities . . . . . . . . . . . . . . . . 35
4.2 Running example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3 Mapping of Kubernetes entities to TOSCA entities . . . . . . . . . . . . . . . . 39
4.4 Mapping of Puppet entities to TOSCA entities . . . . . . . . . . . . . . . . . . . 41
4.5 Mapping of AWS CloudFormation entities to TOSCA entities . . . . . . . . . . . 42
4.6 Mapping of Terraform entities to TOSCA entities . . . . . . . . . . . . . . . . . 43
4.7 Example for representing deployment technologies as dedicated node templates . 45
4.8 Example for representing deployment technologies as deployment technology

descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.9 Example for merging service templates, that shows different options for the merge

result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.10 Instance model retrieved by the Instance Model Retriever for the running example

application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.11 An example refinement performed by the Instance Model Completer . . . . . . . 50
4.12 Example feature management representation for backing up a MySQL Database . 52

6.1 Structure of the example application Sock Shop . . . . . . . . . . . . . . . . . . 59
6.2 Validation scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.3 Resulting instance model after executing the Instance Model Retriever . . . . . . 61
6.4 Resulting instance model after executing the Instance Model Completer . . . . . 62
6.5 Resulting instance model after executing the Instance Model Enricher . . . . . . 63

7


List of Listings

2.1 Example definition of a Kubernetes Pod . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Example Puppet manifest to copy a file and execute it . . . . . . . . . . . . . . . 21
2.3 Example template to create an EC2 Instance with AWS CloudFormation . . . . . 22
2.4 Excerpt of a Terraform file to create a AWS EC2 instance . . . . . . . . . . . . . 23
5.1 Excerpt of an example configuration file for the Instance Model Retriever . . . . 55
5.2 Lifecycle interface for deployment technology specific retrieval plugins . . . . . 56
5.3 Abstract description of the implementation of the Instance Model Completer . . . 58
6.1 Configuration file for the Instance Model Retriever in the case study . . . . . . . 61

9


Acronyms

API Application Programming Interface. 3, 14

AWS Amazon Web Services. 7, 9, 17

BPEL Business Process Execution Language. 25

BPMN Business Process Modeling and Notation. 25

CLI Command Line Interface. 23

CSAR Cloud Service Archive. 7, 26

DBMS Database Management System. 18

DrACO Discovering Available Cloud Offerings. 31

DSL Domain-Specific Language. 21

EAM Enterprise Architecture Management. 31

EC2 Elastic Compute Cloud 2. 9, 22

EDMM Essential Deployment Meta Model. 32

EDMMi Essential Deployment Meta Model instance. 32

EFS Elastic File System. 22

ETG Enterprise Topology Graph. 31

HTTP Hypertext Transfer Protocol. 33

IA Implementation Artifact. 26

IaaS Infrastructure as a Service. 31

IP Internet Protocol. 25

JAR Java Archive. 18

JSON Java Script Object Notation. 22, 45, 46, 51, 57, 58, 61

JVM Java Virtual Machine. 18

MARIO Managing Applications Running In Opportunistic Fog. 32

MFEW Management Feature Enrichment and Workflow Generation. 32

OASIS Organization for the Advancement of Structured Information Standards. 24

OS operating system. 14

11


Acronyms

PaaS Platform as a Service. 31

REST Representational State Transfer. 23

SaaS Software as a Service. 19

SSH Secure Shell. 33

TCP Transmission Control Protocol. 31

TOSCA Technology and Orchestration Specification for Cloud Applications. 5, 7, 15

UDP User Datagram Protocol. 31

UI User Interface. 23

VM Virtual Machine. 13

YAML YAML Ain’t Markup Language. 20, 22, 55

12


1 Introduction

Many modern applications are built up from several components and often use complex middle-
ware [Dea07]. The deployment and configuration of such applications are inherently complex tasks,
that require deep technical knowledge [BKH05]. Thus, the manual execution of these tasks is not
only time-consuming, but also error-prone [Opp03]. To tackle this issue, Oppenheimer [Opp03]
proposes to automate the deployment and management processes. With the advent of cloud
computing, IT resources can be provisioned by external providers in on-demand self-service
model and are payed-per-use [LF09]. Thus, it can also save costs when the deployment process
is automated [LF09]. As a response to this, many so called deployment technologies have been
developed that automate the deployment process. Examples are AWS CloudFormation [Ama21],
Puppet [Pup21a], Chef [Pro21] or Terraform [Has21b]. These technologies differ in the modeling
languages used and the use cases they best fit. As an example, Terraform focuses on providing
infrastructure in cloud environments, e. g., Virtual Machines (VMs), using its own configuration
language. Yet, in order to deploy software components onto the provided infrastructure, Terraform
proposes to use other tools, e. g., Puppet [Has21a]. However, most deployment technologies take a
model of the application, the deployment model, and provide mechanisms to put the application
into production in the target environment [BBF+18]. In most cases, a declarative deployment
model is used [WBF+19]. A declarative model describes the desired application state i. e., the
components and their relations. A simple example consists of two components, a web application
and a database, and one relation, the web app is connecting to the database. The deployment
technology transforms this model into concrete execution steps for deployment and executes them.
In contrast, imperative deployment models define concrete workflows that contain ordered steps
necessary to deploy the application.

In addition to deployment automation, most deployment technologies provide simple management
functionalities, like scaling of single components or observing component health [HBB+21].
However, modern enterprise applications require more complex management operations, e. g.,
(cross-) component backups or installation of security patches. Such operations are not supported
by most modern deployment technologies [HBB+21]. As a consequence, these operations must
again be performed manually, which makes them cumbersome and error-prone [Opp03].

As a solution Harzenetter et al. [HBL+19] introduced an approach to automate the execution of
management operations based on the declarative deployment model of an application. They propose
to enrich the components in an existing deployment model with predefined, reusable management
features. For example, a database component may be enriched with a backup management feature
that defines how the backup operation can be executed. The enriched deployment models are used
to generate management workflows for each defined management feature, that can be executed
on-demand. The approach allows for extensive automation but requires an existing deployment
model as input. As a consequence, it does not support the management of already running
applications or applications for which no single deployment model exists.

13


1 Introduction

Moreover, deployment technologies may interfere with the execution of management workflows, as
the deployment technologies are monitoring component state for deviations from the desired sate
and possibly revert applied changes [HBB+21]. For example, consider the installation of security
updates for the operating system (OS) of a VM. The deployment technology used to create the VM
might detect a deviation of the VM from its desired state. Thus, it could stop the VM and restart it
without the applied security updates. Another example are component backups. For a successful
backup, it might be necessary to stop or pause specific components for the duration of the backup
operation. For example, a database may need to stop accepting new connections to allow a clean
snapshot for backup purposes. However, a deployment technology may interpret this behavior as
failure and restart the database service, thus interrupting the backup operation.

To solve this, Harzenetter et al. [HBB+21] proposed the retrieval of instance models for running
applications. Instance models are declarative models that represent the current state of the
application. The approach retrieves information about the running application using the Application
Programming Interface (API) of a single deployment technology. This instance model can the
again be enriched with management features by adapting the previous approach of Harzenetter
et al. [HBL+19]. In addition, the instance model contains information about the used deployment
technology. This allows the management workflow generation to consider the deployment technology,
to prevent interference in workflow execution. However, the approach only considers a single
deployment technology.

Complex applications might not only use a single but multiple deployment technologies to deploy
different parts of the application. As an example, the VMs for an application might be deployed
using Terraform, while the software components are deployed upon these machines using Puppet.
The usage of different deployment technologies often implies that there is no single deployment
model that describes the complete application. As a consequence, previously presented approaches
cannot be used to automate the management of these applications. Thus, this work introduces an
approach, to automatically generate executable management workflows for running applications
that have been deployed using multiple deployment technologies.

With the usage of multiple deployment technologies, several additional issues arise for the
management of running applications. First, all deployment technologies, used for the deployment of
an application, must be identified. This can be tedious, as the information may have to be retrieved
across development team boundaries. Second, instance information must be retrieved from all
involved deployment technologies and must be mapped to a normalized instance model. This
requires deep technical knowledge, in order to connect to the deployment technologies, retrieve
the provided instance information, and map the deployment technology specific information to a
normalized instance model. Moreover, retrieving instance information from multiple deployment
technologies leads to multiple instance models, i. e., one for each deployment technology. To
provide a single holistic instance model for the complete application, these instance models must be
merged. This process is hard, since the retrieved instance models may overlap in some parts, e. g.,
specifying the same components. In addition, the instance model must contain information about
the involved deployment technologies and must specify, which deployment technology manages
which component. In addition, the usage of multiple deployment technologies impacts the process
of management feature enrichment as well as management workflow generation. Both processes
must be adapted to consider the deployment technologies represented in the instance model.

14


To tackle the aforementioned issues, this work retrieves instance models for running applications,
that are deployed by multiple deployment technologies. Moreover, it describes, how the retrieved
instance model can be used to generate management workflows that target the different deployment
technologies in use. The approach extends the existing work of Harzenetter et al. [HBB+21] by
describing, how application components, that are managed by different deployment technologies
can be modeled in an instance model based on the Technology and Orchestration Specification for
Cloud Applications (TOSCA) [OAS13a], a standard for modeling the provisioning and management
of cloud applications. In addition, this work describes how the APIs of multiple deployment
technologies can be queried, in order to retrieve such an instance model. As the APIs of deployment
technologies often provide only limited instance information [HBB+21], this work also includes how
the retrieved instance models can be refined with additional information. The completed instance
model is enriched with management operations, based on the work of Harzenetter et al. [HBL+19]
and management workflow generation is extended to work with multiple deployment technologies.
To validate the proposed approach, a prototypical implementation based on the OpenTOSCA
ecosystem [BEK+16] is also part of this work.

The remainder of this work is structured as follows: Chapter 2 describes and explains the fundamental
concepts of cloud application deployment, instance models and deployment technologies. Chapter 3
lists related work and highlights the differences to the concepts and approaches presented in this
work. The main approach proposed by this work is described in Chapter 4. Chapter 5 describes
the details of the prototypical implementation and Chapter 6 presents a case study that applies the
prototypical implementation to an example application. Chapter 7 summarizes all presented aspects
and describes remaining issues for future work.

15


2 Foundations

This chapter explains the fundamentals, necessary for understanding the remainder of this work.
First, the concepts of a deployment model and deployment technologies are presented. This
includes information on the concrete deployment technologies that are later used to validate the
approach. In addition, the TOSCA [OAS13a] standard is presented, including a description of the
OpenTOSCA [BEK+16] ecosystem, an open source implementation to model and manage TOSCA
deployment models. Lastly, the concept of an instance model is described.

2.1 Automated Application Deployment

Modern enterprise applications are often complex distributed systems which consist of many
components and make use of complex middleware [Dea07]. As a consequence, the complexity of
deploying and managing these applications has increased as well [BBKL14b]. Oppenheimer [Opp03]
investigated the causes of failures of such complex systems and came to the following conclusions:
(i) Operating such complex applications manually is cumbersome and operator errors are a
primary cause for failure and unavailability. (ii) Tools that visualize the components and their
relationships may prevent such errors, as visualizations ease the understanding of complex
applications. (iii) Automated processes also reduce errors, as automation prevents misconfigurations
or issuing wrong commands.

Further, automation can not only reduce errors, but also cost by speeding up the execution of
recurring management operations. This is especially important, with the advent of cloud computing,
where computing resources can be acquired on demand with a pay-per-use model [LF09]. Highly
automated deployment and management allows for fast creation and release of these cloud resources,
minimizing their cost. Thus, providers of cloud resources, e. g., Amazon Web Services (AWS),
provide APIs that can be integrated in automated deployment and management processes. However,
manually defining dedicated deployment processes for every application, suffers the same issues as
performing the deployment manually. As a result, many deployment technologies emerged, that
assist in automating the deployment process for complex applications. Most of these deployment
technologies require a deployment model, which has to be provided by the software engineers, and
use that model to automatically deploy the application into the target environment.

2.1.1 Deployment Model

A deployment model of an application contains all information, necessary for deploying said
application. There exists a plethora of different approaches on how this information is modeled.
However, there are two main categories: imperative deployment models and declartive deployment
models [EBF+17; WBF+19]. An imperative deployment model explicitly defines the complete

17


2 Foundations

Create 
AWS VM

Install
Java

Install
MySQL

Create 
DB

Copy 
JAR file

Run JAR 
file

Connect 
to DB

Figure 2.1: Example of an imperative deployment model

workflow to deploy the application, with all necessary steps in the correct order. Each step of the
workflow describes a specific operation, e. g., a call to an API or the execution of a shell command.
In Figure 2.1 an example for an imperative deployment model is shown. The depicted workflow
first creates a VM on the AWS cloud. Once the VM is running, the workflow installs a Java Virtual
Machine (JVM) and the MySQL Database Management System (DBMS) [Ora21] on the VM.
Following the installation, the workflow copies a Java Archive (JAR) file to the VM and executes
it. Moreover, the workflow creates a new MySQL database in the MySQL DBMS. The workflow
specifies that, the setup of the Java Application and the MySQL database can run in parallel.
However, the setup of the Java application and the database must be completed successfully, before
the last step, that connects the Java application to the database, can be executed.

In contrast to an imperative deployment model, a declarative deployment model does not define
the necessary deployment steps, but describes the components and their relations that make up
an application. For example, a component can be a virtual machine, a database, or a web server.
To model the dependencies between the different components, the declarative deployment model
can include different types of relations: a service connects to a database, a webserver is hosted
on a virtual machine or a service depends on another service. To specify additional information,
components and relations can have properties defined, e. g., the port on which a web server should
listen for incoming requests or the protocol which is used for a connects to relationship. An example
declarative deployment model is shown in Figure 2.2. It specifies a VM that runs Ubuntu [Can21]
as its OS which is indicated by the type Ubuntu VM. The VM is deployed onto the AWS cloud as
indicated by the HostedOn relation between the VM and the Public Cloud component. Moreover,
the VM is specified to have 8 GB main memory, as indicated by the ram property. In addition, the
VM hosts a Tomcat application server [The21a] that is listening on port 80 and in turn hosts a Java
application. The Java application accesses a database which is indicated by the ConnectsTo relation
between the Webshop and the Database component. The database is installed on the same VM as
indicated by the respective HostedOn relations.

Most deployment technologies use a declarative deployment model, as it is deemed superior to
imperative models [HAW11; WBF+19]. While imperative models offer more control over the
deployment process, declarative models are more intuitive. In declarative models the operator
simply defines the desired state, without considering the current state, which simplifies the task,
reducing errors and cost. Thus, this work only focuses on declarative modeling.

18


2.1 Automated Application Deployment

test

baseUrl: /shop

Webshop
(Java App)

port: 80

App Server
(Tomcat 9)

ram: 8 GB

VM
(Ubuntu VM)

region: us-east-1

Public Cloud
(AWS)

port: 3306

DBMS
(MySQL DBMS)

schema: shop

Database
(MySQL DB)

property: propertyValue

Name
(Type)

Node Template

HostedOn

ConnectsTo

Operation Management 
Interface

Figure 2.2: An example topology template

2.1.2 Deployment Technologies

In industry and research a plethora of deployment technologies has been developed. Most of them
are based on the same principles, but differ in focus, as identified by Wurster et al. [WBF+19]. They
conducted a survey and discovered three categories for deployment technologies: (i) provider-specific
deployment technologies, (ii) platform-specific deployment technologies, and (iii) general-purpose
deployment technologies. Provider-specific technologies are often developed by cloud providers
and are mostly part of their Software as a Service (SaaS) portfolio. These technologies can
be used to provision various resources the provider offers, e. g., creating VMs or allocate data
storage. However, they are not able to provision resources at cloud providers. An example
for such a technology is AWS CloudFormation [Ama21] which supports only AWS resources.
Platform-specific technologies on the other hand, focus on specific technologies or platforms. While
they support deployment to different cloud providers, they may limit the types of artifacts that
can be deployed. Kubernetes [Clo21] is an example for such a technology, as it only provides
the orchestration of containers. General-purpose technologies do not have any of the limitations
provided above. They support all cloud resources as well as any cloud providers and thus, are
very flexible. However, these technologies may still have a specific focus. For example, Terraform
and Puppet are both general-purpose deployment technologies. Still, Terraform specializes on

19


2 Foundations

Listing 2.1 Example definition of a Kubernetes Pod

kind: Pod

spec:

containers:

- name: Tomcat

image: Tomcat:9.0.45

ports:

- containerPort: 80

orchestrating infrastructure resources, e. g., VMs or data storage. Configuring the provisioned
resources, e. g., installing software on a VM, is technically possible but conceptually out of scope.
Puppet, on the other hand, focuses on managing existing infrastructure, for example, by providing the
possibility to edit configurations files or install software on an already running VM. These differences
in functionality and scope can be a reason to use multiple deployment technologies side-by-side
for a single application. For example, the infrastructure, e. g., VMs, is provided by Terraform,
while Puppet is used to manage the software running on the infrastructure. In addition, different
teams may prefer different deployment technologies. For example, two teams are responsible for
different components of an application. While one team prefers to use AWS CloudFormation for
provisioning VMs, the other team uses Terraform for the same task.

To cover all of the categories presented by Wurster et al. [WBF+19], this work investigates the
following deployment technologies: (i) AWS CloudFormation as a provider-specific technology, (ii)
Kubernetes as a platform-specific technolgy, (iii) Puppet and (iv) Terraform as general-purpose
technologies. Puppet and Terraform are both selected as they have a different focus and complement
each other, as described above. In the following, these deployment technologies are briefly explained,
including their offered functionality, their focus and a brief description of how they operate.

Kubernetes

Kubernetes [Clo21] is a platform-specific, declarative deployment technology. It aims to orchestrate
and manage containerized applications over a cluster of computing nodes, including autmoated
scaling and load balancing, automated rollouts and rollbacks, self-healing as well as secret and
configuration management [The21b]. Kubernetes employs a manager-worker architecture [The21d].
Multiple computing nodes, i. e., physical servers or VMs, may be joined to form a Kubernetes
cluster. Every node in a cluster must run the kubelet service. The kubelet service connects the node
to the cluster and exchanges information with the control plane, which acts as the manager. The
control plane may be spanned over multiple nodes to ensure fault-tolerance and availability.

To deploy an application with Kubernetes, its services must be provided as containers. Containers
can be seen as light-weight Virtual Machines that provide their own operating system but are
executed as isolated processes on the host. To run containers, every computing node in the
Kubernetes cluster must execute a container runtime, e. g., Docker [Doc21].

To describe the desired state of an application, Kubernetes uses YAML formatted text files. These
files, are used to specify Kubernetes objects that should be deployed. The smallest deployable unit
in Kubernetes is a pod, which defines a list of containers that should be run together. Listing 2.1

20


2.1 Automated Application Deployment

Listing 2.2 Example Puppet manifest to copy a file and execute it

node 'agent01' {

file { 'app-file':

ensure => 'file',

source => 'puppet:///modules/app/app',

path => '/usr/local/bin/app',

}

exec { 'exec-app':

command => '/usr/local/bin/app',

require => [File["app-file"]]

}

}

shows an excerpt of a pod definition as an example. The property kind specifies which type of
Kubernetes object should be created, in this case a pod. The spec property contains the specification,
i. e., all properties, of the pod. Here, a single container running a Tomcat application server is
specified to be deployed inside the pod. A Deployment is a Kubernetes object that allows to scale
one or multiple pods. A deployment definition references pod definitions and defines additional
properties. For example, the deployment defines the replication strategy for pods. The replication
strategy determines if and how a pod should be replicated to handle increased workload. There are
more kinds of Kubernetes objects, however these are out of scope for this work.

The control plane of a Kubernetes cluster offers an API which can be used to upload definition files
or to update existing ones. The control plane then analyzes the files and derives and executes any
necessary action, to apply the specified changes. This API can also be used to retrieve information
about currently deployed Kubernetes objects, like pods or whole deployments.

Puppet

Puppet [Pup21a] is an agent-based, general-purpose, declarative configuration management tool. Its
focus is to provide automated configuration management of running infrastructure, e. g., installing
software on a server or manipulate configuration files [Pup21b]. Puppet uses a primary-secondary
architecture, where every server that should be managed by Puppet must have the Puppet agent
installed on it and the agent must be connected to the primary server. The primary server is
responsible for managing the configuration of the agents it controls. Puppet uses a declarative
approach for the deployment model, which can be defined in manifest files. These manifests are
text files that contain declarations of required resources using a custom Domain-Specific Language
(DSL) [Pup21c]. Each declared resource, has a type, a name and several properties. For example,
the manifest shown in Listing 2.2 specifies two resources. First, a file to be copied from the primary
server to the node with the logical name “agent01”. Here, the resource is of type file and has the
name “app-file”. Moreover, the source property specifies that the file contents can be found on the
primary server, while the path property specifies where the file should be copied to on the agent
node. The second resource of type exec specifies that the previously copied file should be executed.
The file can only be executed, if the copy operation was successful, thus, the exec resource specifies

21


2 Foundations

Puppet 
DB

Master

Agent

Facts

Catalog

Report

Figure 2.3: Puppet workflow, based on [Pup21b]

Listing 2.3 Example template to create an EC2 Instance with AWS CloudFormation

Resources:

AppInstance:

Type: 'AWS::EC2::Instance'

Properties:

ImageId: ami-071a13877ce8467d4

InstanceType: t2.micro

this relationship in the required property. There exists a plethora of other resources and users are
also able to define their own resource types. A manifest can apply to only one node, like in the
example, or to multiple nodes. Additionally, multiple manifests may target the same node.

The process of deploying changes using Puppet is depicted in Figure 2.3. First, whenever an agent
connects to the primary server, it sends facts about its current state to the primary server. The facts
are any information that might be relevant for the primary server, like the hostname or the list of
installed packages. The primary server uses these facts together with all applicable manifests, to
generate the catalog which expresses the desired state of the agent node. The catalog is retrieved by
the agent, that computes and applies all necessary changes on the managed node. After applying
the catalog, the agent sends a report to the primary server. The report states whether each specified
resource was successfully created or if an error has been encountered. The primary server stores the
facts, catalogs and reports for each managed node in a database, the Puppet DB. The information
stored in the database can be queried over an API.

Amazon Web Services (AWS) CloudFormation

AWS CloudFormation [Ama21] is a provider-specific, declarative deployment technology. It only
supports provisioning of AWS cloud resources, e. g., Elastic Compute Cloud 2 (EC2) instances or
an Elastic File System (EFS). To model the desired state, AWS CloudFormation uses a declarative
model, called template, that can be defined using either YAML or JSON. Similar to Puppet, the
model specifies the resources which shall be provisioned. Every resource has a name, a type, and a
set of properties describing the desired state for each resource. An example template, creating an
EC2 instance, is shown in Listing 2.3. The list of Resources contains a single entry with the name

22


2.1 Automated Application Deployment

Listing 2.4 Excerpt of a Terraform file to create a AWS EC2 instance

provider "aws" {

region = "eu-central-1"

}

resource "aws_instance" "ec2-instance" {

ami = "ami-071a13877ce8467d4"

instance_type = "t2.micro"

}

AppInstance. The type AWS::EC2::Instance indicates that AWS CloudFormation should start a
VM on the EC2. The properties define more details for the VM. The ImageId defines the image
containing the OS that should be used and InstanceType defines the computational resources, i. e.,
processor and memory, that should be allocated for the VM. To ease the template creation, AWS
CloudFormation offers a visual designer.

AWS CloudFormation is provided as SaaS by AWS. Thus, a template must be uploaded either
via the User Interface (UI) of the AWS Console, the exposed Representational State Transfer
(REST) API, or the AWS Command Line Interface (CLI). To instantiate an uploaded template, AWS
CloudFormation creates a stack that bundles all provisioned resources. Thus, a single template may
be used to create multiple stacks. For every stack, AWS CloudFormation analyzes the respective
template and derives the necessary operations, e. g., creating or modifying a resource. As it is
operated by AWS and only supports AWS resources, AWS CloudFormation can utilize internal
APIs to execute these operations.

Terraform

Terraform [Has21b] is a general-purpose, declarative deployment technology. Its focus is to
provide the automated provisioning of infrastructure resources, e. g., networking or computing
resources [Has21b]. However, the configuration management of the provisioned resources, for
example installation of software on a VM, is out of scope for Terraform and should be handled by
other tools, e. g., Puppet. In comparison to the previously presented technologies, Terraform does
not require a running service. It provides a CLI that is used to analyze a declarative model and to
generate a provisioning plan. To support resources of different providers, the Terraform CLI uses a
plugin system. Each plugin specifies how resources can be defined in the Terraform model and
provides implementations to provision the resources at the respective provider.

Similar to Puppet, Terraform defines its own DSL to specify resources that should be provisioned.
Listing 2.4 shows an example Terraform file that can be used to provision an AWS EC2 instance.
The file specifies to use the provider “aws”, which causes Terraform to load the AWS plugin and its
respective resource types. The provider configuration also specifies that all AWS resources should
be provisioned in the “eu-central-1” region. The actual EC2 instance is defined as a resource of
type “aws_instance” and the name “ec2-instance”. The resource block also defines the properties of
the instance that should be provisioned. The ami property defines the image that should be used,
while the instance_type property defines the computational resources, i. e., processor and main
memory that should be allocated for the instance.

23


2 Foundations

Using the Terraform CLI, the Terraform file can be applied, which leads to all defined resources
being created. After each run of Terraform, the current state is stored in a state file. The state file
can be used to compare the desired state with the current state and to make only necessary changes.
The state file can also be parsed by third parties, to retrieve instance information about the resources
that were deployed using Terraform.

2.2 Technology and Orchestration Specification for Cloud
Applications (TOSCA)

TOSCA is a standard by OASIS to describe the deployment and management of cloud applications
in a portable and interoperable way [OAS13a]. The standard aims to provide a model of the
application that contains information on (i) how the application should be deployed and (ii) how
specific management operations can be executed. The main central construct of the standard is
the service template, that contains all necessary information for a single application. The service
template has two main parts: a topology template and plans. In addition, a service template can
have any number of tags, key-value pairs, that provide additional description of the template. The
topology template is a declarative deployment model, as described in Section 2.1.1, and defines
the structure of the application in form of a directed and weighted graph. The graph consists of
node templates as its nodes and relationship templates as its edges. A node template represents
a component of the application, e. g., a VM, the operating system, or a database. A relationship
template represents a connection between two application components, e. g., the connection between
a service and a database. To specify the semantics, every node template and relationship template
has a reusable type, i. e., a node type or a relationship type, assigned to it. A node type defines
the properties and management operations of an application component. An example for such a
node type is a MySQL DBMS type. It may specify properties, like the port on which the server
should listen or the password for the root user. Each node template that has the MySQL DBMS
node type assigned can provide values for the properties specified by the node type. Moreover, the
MySQL DBMS node type might define a management operation test that can be executed to check
the availability of the DBMS. Relationship types define the semantics of a relationship between
two components and can define properties for the relationship. There are three normative types
defined for TOSCA [OAS13b]: HostedOn, ConnectsTo or DependsOn. A DependsOn relationship
defines that the source component is dependent on the target component. This means, that the
source component cannot run, if the target component is unavailable. A HostedOn relationship
defines, that the source component is installed on the target component and runs in the context
of the hosting component. An example is a Tomcat application server that must be installed on
some sort of computing component, e. g., a VM. A ConnectsTo relationship defines, that the source
component in some way connects to the target component. An example for this is a service which
connects to a database.

Figure 2.2 shows an example topology template of a simple web shop application. Every node
template has a name and a type, e. g., the node template, that represents the business logic component
of the web shop, is called “Webshop” and is of type Java App as it is a Java application. The Java
application should be HostedOn an application server of type Tomcat 9, which is defined to listen
on port 80. The Tomcat server in turn should be HostedOn a VM of type Ubuntu VM, as it uses
Ubuntu as its operating system. The VM is defined to run in the region us-east-1 of the AWS

24


2.2 Technology and Orchestration Specification for Cloud Applications (TOSCA)

backupWindows

backupUbuntu

sshKey:

(Ubuntu VM)

rdpKey:

(Windows VM)

Node Type

DerivedFrom

Operation
Management 
Interface

ip:
cpuCount:

(Virtual Machine) properties

(Type)

Figure 2.4: An example type hierarchy

public cloud, thus, its node template is connected with a HostedOn relationship to the public cloud
node template of type AWS. To store and retrieve data the “Webshop” component ConnectsTo a
database of type MySQL DB which has the schema name shop. This database should be HostedOn
a MySQL DBMS, listening on port 3306. The database server should again be hosted on the same
VM as the Tomcat server. In addition, the Ubuntu VM type defines a management interface with
the management operation test. This operation can be called to check the availability of the VM.

To execute these management operations, the service template includes plans. Plans are process
models which describe a workflow that contains all the steps necessary to execute the management
operations in a service template. A plan can be parameterized and take input values from the
properties specified in the topology template. Moreover, a plan can execute any number of
management operations at the same time. For example, if multiple node templates in a service
template specify a restart operation there might be a single plan that restarts all components at the
same time. To ensure interoperability and portability, the TOSCA plans should be specified in
standardized process languages, e. g., Business Process Modeling and Notation (BPMN) [Obj10] or
Business Process Execution Language (BPEL) [OAS07]. However, plans may include the execution
of arbitrary code supplied as implementation artifacts, e. g., a shell script.

As mentioned above, node types are reusable, i. e., multiple node templates may have the same
type assigned. The same is true for relationship types and relationship templates. To improve
reusability, node types and relationship types can be stored in a dedicated repository and be used in
multiple service templates. For example, an organization might develop multiple Java applications.
Thus, the type Java App will be used in almost every service template. Storing the types in
a organization-wide repository prevents repeated tasks and supports knowledge transfer across
the organization. Further, the TOSCA standard employs inheritance for types. A node type or
relationship type inherits all elements of a super type, e. g., properties or interfaces, by defining
a DerivedFrom clause. This is especially useful for semantically similar types that share many
properties. An example type hierarchy for VMs is depicted in Figure 2.4. The hierarchy considers
the types Ubuntu VM and Windows VM. Both types do have several common properties, e. g., their
Internet Protocol (IP) address or the count of processors. However, the Ubuntu VM specifies the
additional sshKey property, while the Windows VM specifies the rdpKey property. The values of
the two additional properties are required to connect to the respective VM. Moreover, the backup

25


2 Foundations

Topology Template
Plans

Relationship
Types

Node
Types IA

Service Template

Figure 2.5: Structure of a CSAR

management operation must be executed in different ways for both node types. Thus, both types
inherit from a common super type VM that defines the common properties but define their own
management operations, backupUbuntu and backupWindows respectively.

To actually deploy an application modeled with TOSCA, the service templates are packed in a Cloud
Service Archive (CSAR). The CSAR contains one or more service templates and all necessary
information to deploy them, as depicted in Figure 2.5. Beside the topology template and the plans,
this includes the definitions of all node types and relationship types used in the topology template.
Further, all Implementation Artifacts (IAs) are part of the CSAR, so that the specified plans can be
executed correctly. To deploy the application the TOSCA runtime may process the CSAR either in
an imperative or in a declarative way. For imperative processing the CSAR must contain explicitly
specified plans to provision the application. If they are not provided, the imperative processing
fails. In case of declarative processing, however, the TOSCA runtime tries to derive the necessary
steps for deployment from the contained topology template, analogously to the behavior defined
in Section 2.1.2.

26


2.3 OpenTosca

TOSCA 
Modeling

Winery

TOSCA 
Runtime

OpenTOSCA
Container

TOSCA 
self-service

Vinothek

Model Deploy & Manage Instantiate

Figure 2.6: Overview of the OpenTOSCA ecosystem. Based on [BEK+16; Mat20]

2.3 OpenTosca

OpenTOSCA [BEK+16] is an ecosystem providing the possibility to model, deploy, manage, and
instantiate cloud applications defined using the TOSCA standard as described in Section 2.2. It
consists of three main components as depicted in Figure 2.6: (i) Winery [KBBL10], (ii) OpenTOSCA
Container [BBH+10], and (iii) Vinothek [BBKL14c]. Winery is a modeling tool, that helps to
create TOSCA models by providing a comprehensive UI. In addition, Winery incorporates a type
repository that allows defining and reusing node types and relationship types in multiple service
templates. The service templates created in Winery, can be exported in the CSAR format, either
using the UI or the Winery API. The exported CSAR files can be loaded into the OpenTOSCA
Container, which is a fully TOSCA compatible runtime. The OpenTOSCA Container is responsible
for provisioning the modeled application. To achieve this, it incorporates a plan generator, a plan
engine, and an IA engine. The plan generator analyzes a topology template and builds provisioning,
termination and management plans for the specified application. The plan engine takes care of
executing all plans contained in the CSAR or generated by the plan generator. This includes the
provisioning plans, when instantiating the application, as well as any on-demand management plans
that are requested by the user. The IA engine is used to run the implementation artifacts, that are part
of any plans. The Vinothek is a self-service portal that lists all service templates, i. e., applications,
that are available in an OpenTOSCA container. In this portal, the user can see all applications that
have been installed into the OpenTOSCA Container. Moreover, the user can execute the provisioning
plan to start an application. For already instantiated and running applications, the user can check the
status and request the execution of available management operations, which triggers the execution
of the corresponding plan by the plan engine inside the OpenTOSCA Container.

The typical workflow to deploy an application in the OpenTOSCA ecosystem is as follows. First,
a service template is created using the modeling tools in Winery. Second, the created model is
exported as a CSAR and installed into the OpenTOSCA Container. Third, the user instantiates the
created application in Vinothek, triggering the provisioning of the deployment by the OpentTOSCA
container. Fourth, while the application runs, the user monitors and manages it, using the
management operations provided in Vinothek.

27


2 Foundations

baseUrl: /shop

Webshop
(Java App)

port: 80

App Server
(Tomcat 9)

ram: 8GB
ip: 1.2.3.4

VM
(Ubuntu VM)

region: us-east-1

Public Cloud
(AWS)

port: 3306

DBMS
(MySQL DBMS)

Database
(MySQL DB)

property: propertyValue

Name
(Type)

Node Template

HostedOn

Operation
Management 
Interface

Figure 2.7: Example instance model

As the OpenTOSCA ecosystem can be used for every step of application deployment – from
modeling to management – many research work has been done, to enhance the ecosystem with useful
functionality [BBKL13b; HBB+21; WBK+20]. Thus, the prototype to retrieve instance models and
to generate management workflows is implemented as part of the OpenTOSCA ecosystem.

2.4 Instance Model

Similar to a declarative deployment model, an instance model also describes the state of an
application. However, instead of showing the desired state, it represents the current state of all
components and their relations. The instance model can be used for documentation purposes
(for an example see [BBKL13a]) or for management of the modeled application (for an example
see [BBKL14a]). For example, the information contained in the instance model can be used to check
the availability of the running components. Many parts of the instance model of an application are
similar to the deployment model of the same application. For example, a complete instance model
will describe the same components and the same relations as the deployment model. Moreover,
many properties of the components and relations will be identical. An example is the port property,
describing the port a web server listens on. The deployment model specifies the port, so that the

28


2.4 Instance Model

deployment technology that deploys the application can configure the web server appropriately. The
instance model also contains the port property, as its value is necessary to issue requests to the web
server. However, instance models may also differ from the deployment model. For example, the
IP address of a VM is often known only after its deployment. Nonetheless, the IP address is vital
information for any management operation that needs to connect to the VM. In addition, instance
models may not be complete. Depending on the method that is used to create the instance model,
different detail of instance information is available. For example, an error in the manual creation of
an instance model can lead to missing information. Also automatically generated instance models
can miss information, for example, if the retrieving tool cannot connect to some components. As
discussed by Binz et al. [BBKL13a], the TOSCA standard can be used to specify instance models.
Thus, this work uses TOSCA instance models. Figure 2.7 shows an example instance model for the
application from Figure 2.2. The instance model contains the same components as the deployment
model. However, the Ubuntu VM component specifies the additional property ip. Moreover, the
schema property of the MySQL DB component is missing, as is the ConnectsTo relation between
the Java App and the MySQL DB.

29


3 Related Work

The retrieval of information about running applications has been the topic of many research.
Especially in the field of Enterprise Architecture Management (EAM), many approaches have been
proposed. For example, Farwick et al. [FAB+11] and Holm et al. [HBLE14] presented approaches
for the automated generation of enterprise architecture documentation. Farwick et al. [FAB+11]
aimed to improve the maintenance of existing enterprise architecture models. To achieve this, they
propose an automated process that repeatedly queries defined data sources to adjust the existing
model. These sources may be databases, APIs or even human input. From the received data,
the maintenance process extracts the running application components, their relevant attributes,
as well as, the relationships between them. Holm et al. [HBLE14] rely on network scanners to
retrieve information about running applications inside the enterprise network. These scanners
analyze the Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) traffic of
the target network and extract information. Such information can be the hosts transmitting over
the network, their OSs and installed services. Using source and target addresses the relationships
between services can also be identified. From this information Holm et al. [HBLE14] generate an
ArchiMate [The19] model that can be refined manually. The approaches of Farwick et al. [FAB+11]
and Holm et al. [HBLE14] both aim at providing documentation of running applications as part of
the EAM process, while this work focuses on providing automated management operations for single
applications. Moreover, none of the approaches explicitly considers deployment technologies.

Further, Machiraju et al. [MDW+00] describe a generic approach for application discovery. Their
“generic auto-discovery engine” uses application template models to search for instances of the
defined application. The template describes what to discover and how to discover it. The engine
then searches possible instances of applications, that fit the template. However, the template model
requires the discoverable components to be defined in advance and thus, limits what application
components can be found. This does not fit the approach in this work, as this work tries to discover
arbitrary application components.

Brogi et al. [BCS17] introduced Discovering Available Cloud Offerings (DrACO), a tool to discover
Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) offerings and model these as
TOSCA node types. The tool retrieves all necessary information and adds a node type to the TOSCA
repository. This node type can later be used in TOSCA deployment models to deploy application
components onto said IaaS or PaaS offerings. This approach does only allow for the discovery of
single node types for already known services. However, this work aims at providing a complete
instance model for running applications that are composed of different components.

Binz et al. [BBKL13a] investigated the automated generation of Enterprise Topology Graphs (ETGs).
An ETG is a model of all components, that are running on an IT infrastructure of an enterprise,
and the relations between them. The presented approach starts on a manually provided entry
part of the ETG. A plugin-based crawler framework extends the initial ETG in multiple iterations
by performing arbitrary operations against the discovered software and hardware components.

31


3 Related Work

The framework consists of dedicated plugins for specific component types, which may discover
dependent components or refine the information on already discovered components. Although the
concept of an ETG seems similar to an instance model as described in this work, it has a larger
scope. The ETG aims to represent the entire landscape of enterprise IT, while the instance model is
specific to an application.

In later work Binz et al. [BBKL14a] utilized the ETG to migrate applications to and between cloud
environments. They proposed an approach to extract a sub-graph of the ETG, representing a single
application and map its components to TOSCA types. The resulting deployment model is then
modified to target a specific cloud environment, e.g. AWS. This approach defines an automated
process to retrieve TOSCA models of running applications. However, it focuses on migrating an
application between two environments, while this work aims at providing management operations
for the application without changing its environment.

Brogi et al. [BFGL20] also proposed Managing Applications Running In Opportunistic Fog
(MARIO), an automated approach for managing distributed applications. The approach uses
declarative policies, which are defined manually for specific applications. MARIO monitors the
distributed environment, the application is running in, and selects the best computing node for
each service, according to the specified policies. As the environment may change, MARIO is
able to select a new computing node and to move the service to said node. Similar to Binz
et al. [BBKL14a], Brogi et al. [BFGL20] focus on moving parts of the application, while this work
intents to enhance running components without moving them. Moreover, their approach is specific
to fog environments.

Harzenetter et al. [HBL+19] introduced the Management Feature Enrichment and Workflow
Generation (MFEW) method. This method enriches existing TOSCA deployment models with
management features and creates executable workflows in an automated process. This is achieved
by matching the node types of the modeled application with Feature Node Types. For example, a
Feature Node Type for a MySQL database may specify a backup operation. If the deployment model
contains a node with a matching MySQL node type, the management feature can be applied to the
model by replacing the original node type. The replacement node type is automatically generated
by merging the original node type and the Feature Node Type. As this process can be executed
iteratively, multiple management features can be applied to the same node in the deployment model.
The resulting deployment model can then be used to automatically derive executable workflows in
form of TOSCA management plans. The drawback of this approach is, that it requires an existing
deployment model as input. However, the approach can be modified to operate on instance models,
and thus, can be reused for this work.

As described in Section 2.1.2, most deployment technologies define their own declarative modeling
language. However, they all share some basic concepts e. g., components and relations. Wurster
et al. [WBB+20] introduced the Essential Deployment Meta Model (EDMM) modeling and
transformation system. It allows to create deployment models in a deployment-technology-agnostic
metamodel, the EDMM, and offers functionality to transform these models to deployment-technology-
specific models. This prevents vendor lock-in effects for applications modeled with EDMM. Based
on that, Mathony [Mat20] proposed Essential Deployment Meta Model instance (EDMMi), a
technology-agnostic metamodel, to define instance models. In EDMMi a deployment instance
describes the current state of an application and consists of component instances and relation
instances. All component instances and relation instances have a component type or a relation
type respectively. Moreover, every instance can have multiple instance properties. In addition,

32


Mathony [Mat20] provided the Instance Model Retrieval Framework to automatically retrieve
EDMMi models from the APIs of different deployment technologies and showed, that EDMMi
models can be trivially mapped to TOSCA instance models: a deployment instance becomes a
service template, while component instances and relation instances can be mapped to node template
and relationship template respectively. Also, the component types and relation types can be mapped
to their respective TOSCA types. As a consequence, this work skips the creation of EDMMi models,
but uses the mapping information, provided by the Instance Model Retrieval Framework, to directly
retrieve TOSCA instance models from the information provided by the deployment technologies.
Mathony [Mat20] also use the TOSCA instance models to automatically generate management
workflows, but their approach is limited to a single deployment technology.

To overcome the drawback of their previous approach, Harzenetter et al. [HBB+21] provided a
method to retrieve instance models for running applications. Similar to Mathony [Mat20], they
utilize the APIs of the deployment technology, used to deploy the target application, to gather
instance information. The Instance Information Retriever uses specialized plugins for every
supported deployment technology, that collect instance information from the APIs of the deployment
technologies. In contrast to Mathony [Mat20], Harzenetter et al. [HBB+21] directly map the
retrieved instance information to a TOSCA instance model. In addition, the retrieved instance
model contains information about the deployment technology itself. This way, the management
workflow, generated in a later step, can access this information in order to prevent interference
in workflow execution. Different deployment technologies, however, provide data of different
granularity and expressiveness over their APIs. Thus, Harzenetter et al. [HBB+21] included an
Instance Model Completer that retrieves additional information about the discovered components.
Hereby, the completer consists of several plugins that may perform arbitrary operations, e. g.,
sending Hypertext Transfer Protocol (HTTP) requests or issuing shell commands over Secure Shell
(SSH) connections. The resulting instance model can be enriched with management features, using
the adapted approach from Harzenetter et al. [HBL+19]. This approach constitutes the basis of
this work, however it only supports instance model retrieval from a single deployment technology,
similar to Mathony [Mat20].

33


4 Concept

This chapter presents the approach, this work introduces. The goal is, to retrieve a single instance
model for an application that has been deployed using multiple deployment technologies. The
instance model is based on the TOSCA standard and is used to automatically generate management
workflows that perform management operations on the discovered instances. The remainder of
this chapter first provides an overview of the complete process from instance model retrieval to
management workflow execution in Section 4.1. Following, Section 4.2 describes how the involved
deployment technologies are discovered. Section 4.3 explains how instance information can be
retrieved from the APIs of multiple deployment technologies, while Section 4.4 describes the
approach to complete the instance model with information that is not available from the given
deployment technologies. Lastly, Section 4.5 describes how the retrieved instance model can
be enriched with management features and how the enriched model can be used to generate
management workflows.

Instance 
Model 

Completer

Instance 
Model 

Enricher

Management 
Workflow 
Generator

API

Deployment
Technology

2 3 4

Workflow 
Engine

5

TOSCA Node Type Repository

Instance 
Model 

Retriever

1

Retrieved
Instance Model

Completed
Instance Model

Enriched
Instance Model

Management
Workflows

Retriever
Plugin

Completer
Plugin

Connection Process Flow

Figure 4.1: Mapping of Kubernetes entities to TOSCA entities

35


4 Concept

4.1 Overview

This work proposes a process for the retrieval of an instance model and the generation of management
workflows depicted in Figure 4.1. The process is made up of five components: (i) the Instance Model
Retriever, (ii) the Instance Model Completer, (iii) the Instance Model Enricher, (iv) the Management
Workflow Generator, and (v) the Workflow Engine. The order in which the components are invoked
are depicted by the numbers on each component. The first component, the Instance Model Retriever,
is a plugin-based component that retrieves instance information from the deployment technologies
and derives a TOSCA instance model. In order to map the retrieved instance information to
TOSCA types, the Instance Model Retriever needs access to the TOSCA node type repository. For
every supported deployment technology the Instance Model Retriever has a dedicated plugin, as
depicted by the deployment technology icons in the figure. Each of these plugins is responsible
for establishing a connection to the deployment technology and to map the provided instance
information to the derived TOSCA instance model. A detailed description of the Instance Model
Retriever is provided in Section 4.3.

The retrieved instance model is handed over to the Instance Model Completer. The Instance Model
Completer is also a plugin-based component that tries to refine the instance model with additional
information that cannot be queried from the deployment technologies. For this, the Instance Model
Completer uses the information provided in the retrieved instance model as a starting point to gather
additional information. For example, the Instance Model Completer may connect to the API of an
already discovered component and retrieve additional information about the component. In order
to refine the node types of the TOSCA model, the Instance Model Completer also needs access
to the TOSCA node type repository. More details on the Instance Model Completer are provided
in Section 4.4.

The completed instance model is handed over to the Instance Model Enricher which is responsible
for enriching the contained components with management features. To achieve this, the Instance
Model Enricher uses the concept of management feature node types. Each management feature
node type extends an existing TOSCA node type and defines a single management feature for it. The
original node type in the instance model can be replaced with a generated node type that represents
a combination of the original node type and one or more management feature node types. The
process of management feature enrichment is explained in more detail in Section 4.5.

The enriched instance model is given to the Management Workflow Generator, which is responsible
for generating executable management workflows. The Management Workflow Generator generates
a single plan for each management feature contained in the instance model. For example, if two
components in the instance model are enriched with the test management feature, a single workflow
will be generated which invokes the test operation on both components subsequently. As the
invocation of management features may require inputs, e. g., network addresses or credentials,
the Management Workflow Generator also extracts the necessary information from the instance
model and provides the invocation with concrete values. The generated management workflows are
deployed onto the Workflow Engine and can be executed on demand. More details on workflow
generation are provided in Section 4.5.

36


4.2 Discovering involved Deployment Technologies

Webshop
(Java App)

port: 80

App Server
(Tomcat 9)

ip: 3.70.5.128

VM
(Ubuntu VM)

DB Server
(MySQL DBMS)

schemaName: shop

Database
(MySQL DB)

Name
(Type)

Node Template

HostedOn

ConnectsTo

Managed By 
AWS CloudFormation

Managed By Puppet

Figure 4.2: Running example

4.1.1 Running Example

For a better understanding the application depicted in the TOSCA topology in Figure 2.2 will be
used as a running example throughout this chapter. It consists of a Java web application that is
hosted on a Tomcat application server. The Java web application connects to a MySQL database
which is hosted on a MySQL DBMS. The Tomcat application server and the MySQL DBMS are
hosted on the same VM which is running Ubuntu as its OS. As indicated by the icon in the upper
left corner of each node template, the VM is deployed by AWS CloudFormation, while all other
components are deployed using Puppet.

4.2 Discovering involved Deployment Technologies

This work focuses on retrieving instance information from the APIs of deployment technologies,
without any knowledge about the target application beforehand. Thus, the first step is to determine
the concrete deployment technologies that are involved in deploying the target application. For
the running example, the usage of AWS CloudFormation and the Puppet primary node must be
discovered. One possible approach is, to automatically discover all deployment technologies that
are used inside an organization. This could be achieved by utilizing network scanners, described
by [HBLE14], or to extract the information from existing enterprise architecture documentation,

37


4 Concept

like ETGs [BBKL13a]. However, this approach is insufficient, as network scanners are not
able to discover deployment technologies that are running outside the enterprise network, e. g.,
AWS CloudFormation, or that do not need a continuously running service, e. g., Terraform. Moreover,
most deployment technologies require additional information or some sort of credentials, to access
their APIs. For example, the AWS cloud is split into independent regions. For accessing the
AWS CloudFormation API the correct region must be known. Credentials are even impossible
to discover automatically, as the nature of credentials is to be kept secret. Examples for such
credentials are either private keys, e. g., for Puppet or Kubernetes, or access tokens, e. g., for
AWS CloudFormation.

In addition, discovering all deployment technologies used by an organization is only a first step. As
the organization will, most likely, operate several independent applications, only these deployment
technologies must be selected that are actually involved in deploying the target application.
For example, consider an organization that deploys the running example application. Beside
AWS CloudFormation and Puppet, the organization might also use Terraform to provision VMs.
When retrieving instance information for the running application, it is impossible to automatically
determine if Terraform or AWS CloudFormation are used for deploying the VM in the running
example application, without prior knowledge about the application.

As a consequence, this work proposes to manually define the list of involved deployment technologies
as a starting point to discover instance information. Each entry in the list specifies a unique identifier
for the entry and a list of properties, necessary for connecting to the API, e. g., credentials. Each
entry must have a unique identifier, as the list might contain multiple entries with the same
deployment technology. This is the case, if an organization runs multiple instances of a deployment
technology, e. g., two independent Puppet primary nodes. The manual configuration is reasonable,
despite the drawbacks of manual processes described by Oppenheimer [Opp03]. First, the effort of
compiling this list should be manageable, since the list of involved deployment technologies will
be significantly shorter, than the list of application components. While an enterprise application
may consist of many components, there are only a couple of deployment technologies involved.
Moreover, the list will be stable over the lifecycle of an application, compared to the components
the application consists of.

4.3 Instance Model Retrieval

Given the list of deployment technology instances as specified in Section 4.2, the Instance Model
Retriever retrieves a TOSCA instance model. The Instance Model Retriever reuses concepts from the
Instance Information Retriever and the Instance Model Normalizer of Harzenetter et al. [HBB+21],
but combines them into a single component. For every supported deployment technology, the
Instance Model Retriever has a dedicated plugin, which is responsible for connecting to the API and
to retrieve a TOSCA instance model. For example, the Puppet plugin is responsible for connecting
to a Puppet primary server and to retrieve information about the running resources, managed by
that primary server. As this work retrieves instance models from multiple deployment technology
instances, one run of the Instance Model Retriever will include multiple plugin runs. In every
plugin run, the plugin connects to the respective deployment technology instance and retrieves
instance information. The retrieved information is mapped to a TOSCA instance model. As this
mapping is specific to each deployment technology, Section 4.3.1 describes the mapping in detail

38


4.3 Instance Model Retrieval

Namespace
(Service 

Template)
Deployment

Pod

Container
(Node

Template)
(Node Type)

Property
(Template 
Property)

contains

defines

defines

has

has

hashas

Kubenetes
Entity

(TOSCA Entity)

Kubernetes
Entity Mapping

Relation 
between entities

has

Figure 4.3: Mapping of Kubernetes entities to TOSCA entities

for every deployment technology investigated by this work. In addition, the information about the
involved deployment technologies must be added to the instance model, to allow the generation
of management workflows that need to connect to the deployment technologies. Section 4.3.2
describes how this can be achieved in TOSCA models. Joining the instance models produced by
each plugin run into a single instance model is a complex task and is detailed in Section 4.3.3.

4.3.1 Mapping from Deployment Technologies to TOSCA Instance Models

Each plugin of the Instance Model Retriever is responsible for mapping the information provided
by the deployment technology to the respective entity in a TOSCA instance model. This section
provides conceptual mapping information for all deployment technologies mentioned in Section 2.1.2.
Technical details on the respective plugin implementation are presented in Chapter 5.

As described in Section 4.2, a single instance of a deployment technology may be used to deploy
multiple applications. Thus, it is necessary to filter the instance information retrieved from the
deployment technology to only include information on the components of the target application.
For example, AWS CloudFormation can be used to deploy VMs for other applications, in addition
to the application from the running example. Thus, when querying the AWS CloudFormation API,
it must be ensured, that only the information about the Ubuntu VM of the example application is
added to the instance model. There is no standardized way of logically separating components
of different applications in deployment models. However, most deployment technologies either
provide technical concepts to achieve this separation or suggest workflows to achieve the separation
on the process level. Thus, this section also describes for each deployment technology how the
relevant components can be isolated.

39


4 Concept

Mapping Kubernetes to TOSCA

Kubernetes is used to orchestrate the deployment of containerized applications in a cluster of
computing nodes. The cluster is managed by the control plane which provides an API to interact
with the cluster. The control plane can be used to start containers, to stop containers, or to alter
the configuration of running containers. To model application deployment, Kubernetes defines
Kubernetes objects of different types, e. g., containers, pods, or deployments. The control plane
provides instance information about all Kubernetes objects currently deployed in the cluster. To
logically group Kubernetes objects into different environments, Kubernetes provides the concept of
namespaces. Namespaces provide (i) a scope for object names, (ii) definition of access rights for
users and (iii) ability to limit resource consumption. Thus, they provide an enclosed environment,
that can be used to encapsulate different applications. Consequently, this work considers all objects
inside one namespace as possible components for the target application. The name of the namespace
that should be investigated has to be specified additionaly when defining the Kubernetes deployment
instance as of Section 4.2.

Figure 4.3 shows how the different Kubernetes objects inside a namespace are mapped to TOSCA
entities. The names of the Kubernetes objects are in black, while the mapped TOSCA entities are
displayed in gray. As the namespace defines the scope of the instance retrieval, it is mapped to the
service template. Each namespace may have several deployments which in turn may have several
pods. However, these objects do not represent running components – strictly speaking. Thus, they
are not directly mapped to any TOSCA entity. Each pod may consist of multiple containers and
each container actually is running on some node in the cluster. Thus, each container is mapped to a
node template. Deployments, pods and containers can all have properties, that might be relevant for
the instance model. For example, the identifier of the container as well as the identifier of the pod
are required to access the container from a management process. Thus, these properties are mapped
to properties of the node template for each container. Any combination of these properties may
be used to determine the TOSCA node type for a container. Most likely the image property will
define the node type, however this is not guaranteed. Thus, no concrete Kubernetes ojbect can be
directly mapped to the node type. This work uses a default fallback type DockerContainer, if no
other node type could be found. Discovering horizontal relationships between two containers, e. g.,
a web application connecting to a database, is impossible, since no Kubernetes object allows to
define such relationships.

Mapping Puppet to TOSCA

The Puppet primary server manages a set of agent nodes. Each agent supplies the primary server
with facts. In combination with the user supplied configuration, the primary server derives a catalog
for each agent that specifies the desired state for the agent. When an agent applies a new version of
the catalog, it sends a report to the primary server indicating the new state of the agent. The primary
server stores all this information – e. g., agents, reports, catalogs – in the Puppet DB, which provides
an API to retrieve this information. For logically separating manifests and resource definitions,
Puppet offers the concept of environments. The separation between Puppet environments is not as
strict as Kubernetes namespaces, e. g., environments may share configuration and an agent may
be referenced in multiple environments. However, an environment allows separating multiple

40


4.3 Instance Model Retrieval

Environment
(Service 

Template)

Agent
(Node

Template)

Resource
(Node

Template)

Resource Type
(Node Type)

Report

Fact
(Template 
Property)

manages generates

provides contains

has

Puppet Entity
(TOSCA Entity)

Puppet Entity 
Mapping

Relation 
between entities

has

Figure 4.4: Mapping of Puppet entities to TOSCA entities

applications that are managed by a single Puppet primary server. Thus, the name of the environment
that should be targeted must be specified as a property in the list of deployment technologies as
of Section 4.2.

Figure 4.4 depicts how Puppet entities are mapped to TOSCA entities. Again, Puppet specific terms
are displayed in black, while the respective TOSCA terms are displayed in gray. As the environment
defines the scope of instance retrieval, it is mapped to a service template. The manifests defined
inside an environment may reference multiple agents, each of which provides facts about itself. As
every agent runs on a computing node where application resources are deployed to, it is mapped to a
node instance template, which may be populated with properties that are retrieved from the facts of
the agent, e. g., its IP address. Its corresponding node type must be retrieved from the supplied facts
or the normative Compute type can be used as a fallback. For example, an agent running on a VM
may supply a fact that the operating system of the node is Ubuntu. Thus, an appropriate node type
would be Ubuntu VM. The reports for every agent, contain information about which resources were
configured on the node by Puppet. Resources may be mapped to node templates. However, not all
resources should actually be contained in the instance model. For example, a resource of type file
might just alter a configuration file and thus, should not be mapped to a node template. The node
type, that is assigned to a node template, is defined by the type of the corresponding resource. For
example, a resource of type package with the name mysql defines an installed MySQL database
server and should be mapped to a node template with the node type MySQL DBMS. Horizontal
relations between node templates cannot be discovered, since the reports do not contain information
about the dependents of a resource.

Mapping AWS CloudFormation to TOSCA

AWS CloudFormation is a SaaS offering and can be used to provision arbitrary resources in the
AWS cloud. The resources that shall be provisioned are defined in a template that is uploaded to the
API of AWS CloudFormation. Whenever a template is deployed by AWS CloudFormation, a stack
is created that contains all provisioned resources. AWS CloudFormation provides an API which
can be queried to retrieve information about deployed stacks. Creating a stack for each deployed
template, AWS CloudFormation provides separation between applications by design. Thus, the
name of the stack that should be used for instance retrieval must be specified as a property in the list
of deployment technologies as of Section 4.2.

41


4 Concept

Stack
(Service 

Template)

Stack Resource
(Node

Template)

Property
(Template 
Property)

Resource Type
(Node Type)

Dependency
(Relationship

Template)

contains

is source of is target of

has

has

CloudFormation
Entity

(TOSCA Entity)

CloudFormation
Entity Mapping

Relation 
between entities

has

Figure 4.5: Mapping of AWS CloudFormation entities to TOSCA entities

Figure 4.5 shows how AWS CloudFormation entities are mapped to TOSCA entities. As before,
AWS CloudFormation specific entities are in black, while the corresponding TOSCA entity is in gray.
As a stack defines the scope of instance model retrieval, it is mapped to a service template. A stack
contains multiple stack resources, each of which has a resource type and several properties. Every
stack resource may be mapped to a node template, depending on its resource type. For example a
resource of type EC2::Instance might be mapped to the TOSCA normative Compute type, while a
resource of type EC2::VPC merely defines a network interface for the VM which does not need to
be mapped to a node template. The properties of a stack resource can be mapped to properties of
the node template, e. g., the IP address property of an EC2::Instance can be added as a property to
the corresponding node template. In addition, stack resources may define dependencies between
each other, which may be mapped to relationship template intances of the TOSCA normative
type DependsOn.

Mapping Terraform to TOSCA

Terraform does not use an always-on service and, thus, provides no API to retrieve information
from. However, the state files written by Terraform contain all information about provisioned
resources and may be read and parsed for the purpose of instance model retrieval. Terraform does
not provide any technical concept for separating components of different applications. However, it
is encouraged to use separate Terraform workspaces for every application. As Terraform creates
a single state file per workspace, parsing this file provides information on all resources that were
provisioned for the target application, whilst resources of other applications should not be visible.
Thus, the Terraform state file that should be parsed must be provided as a property in the list of
deployment technologies as of Section 4.2.

Figure 4.6 shows how Terraform entities are mapped to TOSCA entities. As before, Terraform
specific entities are in black, while the corresponding TOSCA entity is in gray. As the state file
defines the scope of instance model discovery, it is mapped to a service template. A state file contains

42


4.3 Instance Model Retrieval

State File
(Service 

Template)

Resource
(Node

Template)

Property
(Template 
Property)

Stack Resource
(Node Type)

Dependency
(Relationship

Template)

contains

is source of is target of

has

has

Terraform 
Entity

(TOSCA Entity)

Terraform Entity 
Mapping

Relation 
between entities

has

Figure 4.6: Mapping of Terraform entities to TOSCA entities

a list of provisioned resources, each of which may be mapped to a node template, depending on its
resource type. The same example as in Section 4.3.1 applies: an EC2 instance should be mapped to
a node template, while its network interface does not represent a dedicated application component.
Every resource may have several properties which can mapped to properties of the corresponding
node template. Similar to Section 4.3.1, Terraform resources may define dependencies amongst
each other. These dependencies can be mapped to relationship templates of the TOSCA normative
type DependsOn.

4.3.2 Representing Deployment Technology Information in the Instance Model

In addition to information about running component instances, the instance model must also contain
information about the deployment technology used to deploy each component. This information
can be used in later stages. For example, management workflows might require access to the API of
a deployment technology. This section describes how this information can be included in a TOSCA
instance model.

The goal is to create a TOSCA instance model, that holds all instance information about the
components of an application and in addition provides all necessary information about the
deployment technologies used to deploy it. The first step is to define, what information is necessary.
First, the instance model should name the deployment technologies involved, in the running example
these are AWS CloudFormation and Puppet. In addition, the instance model should provide
information about which deployment technology was used to deploy a specific component. More
importantly, it must provide information about which deployment technology still manages a specific
component. For example, the Ubuntu VM runs the Puppet agent and, thus, is detected by Puppet.
However, Puppet does not manage the VM as it is controlled by AWS CloudFormation. Finally,
generated management workflows might need to connect to the API of a deployment technology.
Thus, the instance model must contain the information, necessary for a successful connection.

43


4 Concept

There are several possibilities to represent the described information in an instance model, four of
which are discussed in more detail:

1. Using nested service templates.

2. Add information about deployment technologies to every node template.

3. Add deployment technologies as additional node templates and introduce Manages relationship
type.

4. Add information about involved deployment technologies as properties to the service template

The first option is based on the concept of nested service templates. The idea is to create a separate
service template for each deployment technology, as described in Section 4.3.1. Each of these
service templates would contain the information of the single deployment technology inside tags, as
proposed by Harzenetter et al. [HBB+21]. For example, the Puppet retrieval plugin creates a service
template and populates it with node templates for all discovered resources. It then, adds a tag with
the key SourceTechnology and the value Puppet to indicate that the components described in the
service template were deployed using Puppet. Another tag with the key PuppetPrimaryIP then
contains the IP address of the Puppet primary server. Later in the process this information could
be extracted to connect to the Puppet primary server under the specified IP address. The separate
service templates, can be merged by creating a wrapping root service template. The root service
template references the deployment technology specific service templates as part of its topology
template. The option of nested service templates provides the highest degree of isolation between
the deployment technologies and the respective plugin execution. Considering the running example,
the Puppet plugin and the Terraform plugin could be executed completely independent of each other
and create their isolated service template. Both service templates could easily be merged by simply
wrapping them in the root service template. However, this approach has several drawbacks. First,
the root template does not convey any instance information. It is a technical necessity to hold a list
of other service templates and does not add any meaning to the instance model. Moreover, separate
service templates do not provide the possibility to add connection between their respective node
templates. For example, it is not possible to indicate, that the components managed by Puppet are
HostedOn the VM deployed by Terraform.

The second option utilizes the properties of the node templates. Every node template could specify
inside its properties, which deployment technology manages the node template and the necessary
information to connect to its API. This is similar to the tag approach of the option with nested service
templates. Instead of specifying the information once in the tags of the service template, the same
information is replicated to the properties of each node template. The node templates inside this
service template can be managed by different deployment technologies and management workflows
may extract necessary information from the properties of the node template. However, these
properties “pollute” the instance model, as they technically do not represent instance information
about the represented application component itself. This issue gets even worse, considering that
the information is replicated over each node template which clutters the instance model with
redundant information.

The third option models each deployment technology as an additional node template. This requires
a corresponding node type, e. g., a Puppet node type. The properties of this node template can hold
all information about the deployment technology in a central place. To indicate which deployment
technology manages an application component, the Manages relationship type is used. For each

44


4.3 Instance Model Retrieval

port: 80

App Server
(Tomcat9)

ip: 3.70.5.128

VM
(Ubuntu VM)

region: eu-central-1

(AWS 
CloudFormation)

primary-ip: 192.168.2.1

(Puppet)

Name
(Type) Node Template

Manages

HostedOn

Figure 4.7: Example for representing deployment technologies as dedicated node templates

component, that is managed by a deployment technology, a relationship template of type Manages is
added. The source of this relationship template is the node template of the deployment technology
and the managed node template is the target. Figure 4.7 shows an example service template for
the running example application. For the sake of brevity only the VM and the Tomcat application
server are depicted. The service template contains dedicated node templates for each deployment
technology used, i. e., AWS CloudFormation and Puppet. These node templates provide additional
information about the deployment technology instance in their properties. For example, the IP
address of the Puppet primary server is specified only once as a property of the Puppet node
template. If a workflow targeting the VM needs access to the AWS CloudFormation API, it
can search for incoming relationship templates of type Manages and backtrack them to the node
template that represents AWS CloudFormation. It can then read the necessary properties and
connect to the API. While this option avoids cluttering the model with redundant information,
it still “pollutes” the instance model with information that is not part of the actual application.
Although the deployment technologies are important components for deploying the application,
they not actually are application components. Thus, they should not be depicted as such in the
instance model.

The last option combines the usage of a single service template with the encoding of information
in its tags. Figure 4.8 depicts an example service template for the running example application.
Again only the VM and the Tomcat application server are depicted for the sake of brevity. The
instance components are modeled in the topology template without any information about the
involved deployment technologies. Thus, the topology template is a pure representation of instance
information of the target application. The necessary information about the involved deployment
technologies is encoded in the tags of the service template. Arbitrary methods may be used to
encode information in one or multiple tags. However, this work uses a JSON structure with all
necessary information and serializes it into a single tag with the key deploymentTechnologies. The
JSON structure is essentially a list of deployment technology descriptors. Each descriptor has the
mandatory property sourceTechnology specifying the type of deployment technology, e. g., Puppet.

45


4 Concept

Service Template

Topology Template

- port: 80

App Server
(Tomcat9)

- ip: 3.70.5.128

VM
(Ubuntu VM)

deploymentTechnologies

[
{
"sourceTechnology": "Puppet",
"managedNodeIds": ["VM"],
„primaryIp": "192.168.2.1"

},
{
"sourceTechnology": "CloudFormation",
"managedNodeIds": ["App Server"],
"region": "eu-central-1"

}
]

Name
(Type) Node Template

HostedOn

key Tag (Key)

[
…
]

Tag (Value)

Figure 4.8: Example for representing deployment technologies as deployment technology descrip-
tors

The second mandatory property is managedNodeIds, which is a list of node template ids. Each
application component, that is represented by a node template in that list, is assumed to be managed
by the deployment technology represented by the descriptor. Moreover, the descriptor may contain
arbitrary additional properties, e. g., the IP address of the Puppet primary server. A workflow, that
needs information about the deployment technology for a specific node template, must decode the
descriptors from the tags and iterate all node id lists to find the id of the target node template. If the
id is found in a descriptor, the workflow can extract the required information from the remaining
properties of the descriptor. The decoding of deployment technology information increases the
complexity of management workflows and imposes a performance penalty. Nonetheless, this option
provides many benefits. For once, there is no redundant information as all information regarding
the deployment technologies is stored in a central location. Moreover, the JSON structure allows
the usage of arbitrary property values instead of simple string based key-value pairs. At last,
the topology template itself is kept clear of “polluting” information that does not describe the
application itself. Thus, this work uses the last option.

4.3.3 Merging Instance Information from multiple Deployment Technologies

The previous sections described how to discover involved deployment technologies, how to map
deployment technology specific information to a TOSCA model and how to include information
about the deployment technologies in the model. This section explains how the instance information
that is retrieved from multiple deployment technologies can be merged into a single TOSCA
instance model.

46


4.3 Instance Model Retrieval

Option 1

CloudFormation
TemplatePuppet Template

Webshop
(Java App)

App Server
(Tomcat 9)

ip: 64.128.9.17

VM
(Ubuntu VM)

DB Server
(MySQL DBMS)

Database
(MySQL DB)

ip: 3.70.5.128

VM
(Ubuntu VM)

merge

Webshop
(Java App)

App Server
(Tomcat 9)

ip: 64.128.9.17

VM
(Ubuntu VM)

ip: 3.70.5.128

VM
(Ubuntu VM)

DB Server
(MySQL DBMS)

Database
(MySQL DB)

Option 2

Webshop
(Java App)

App Server
(Tomcat 9)

ip: ???

VM
(Ubuntu VM)

DB Server
(MySQL DBMS)

Database
(MySQL DB)

Option
Possible Merge
Result

Name
(Type) Node Tempalte HostedOn

Input
Plugin specific
Service Template

Managed By 
AWS CloudFormation

Managed By Puppet

Figure 4.9: Example for merging service templates, that shows different options for the merge
result

47


4 Concept

The Instance Model Retriever has a dedicated plugin for each supported deployment technology.
Taking the list of deployment technologies, as of Section 4.2, one possibility is to start an independent
plugin run for each entry of the list. This results in independent service templates for each plugin,
all of which need to be merged into a single service template. However, the merging of these service
templates is complex and cannot be performed in a deployment-technology-agnostic way. Again
consider the running example application. Executing the AWS CloudFormation plugin and the
Puppet plugin independently yields the two service templates depicted at the top of Figure 4.9. At
the bottom the figure shows two possibilities for merging the two templates. Merging the lists of
deployment technology descriptors is a simple union, thus only the topology templates are depicted.
The Puppet plugin discovers all components managed by the primary server, as indicated by the
icon in the top left corner of each node template. In addition, the Puppet plugin discovers the VM,
as the agent is installed on this machine. However, the VM is not marked as managed by Puppet,
since Puppet has not been used to deploy it. The AWS CloudFormation plugin discovers only the
VM, since it is the only resource deployed by AWS CloudFormation. Both technologies provide
information about the VM, e. g., its IP address. However, the IP address discovered might differ. For
example, EC2 instances have two IP addresses: a public address and a private address. While the
private address is registered with the OS of the VM, the public address is unknown to the OS. Thus,
the Puppet plugin can retrieve the private address, while the public address can only be retrieved by
AWS CloudFormation. The simplest merging approach is to create the union of the set of node
templates and the set of relationship templates. The resulting merged service model for this option
is shown as Option 1 at the bottom left side of the figure. Yet, this leads to a wrong instance model
with two VMs, while the application consists only of one. Consequently, the merge operation
must detect, that the two VM node templates actually represent the same application component
as depicted in Option 2. Merging the two node templates requires merging their properties, but
since both specify the property ip with different values, they cannot be trivially merged. As the ip
property is defined to contain the IP address on which the VM can be reached, the public IP address
is the best option to choose. However, only the AWS CloudFormation plugin can determine, which
of the addresses actually is the public one, since it is the only one that knows