Institute of Parallel and Distributed Systems

University of Stuttgart
Universitätsstraße 38

D–70569 Stuttgart

Masterarbeit

Enabling multi-tenant scalable
IoT platforms

Muhammad Ismaiel

Course of Study: Computer Science

Examiner: Prof. Dr.-Ing. habil. Bernhard Mitschang

Supervisor: Dr. rer. nat Pascal Hirmer

Commenced: Sep 20, 2019

Completed: Mar 20, 2020


Acknowledgment

First of all, I would like to thank my thesis advisor Dr. rer. nat Pascal Hirmer of the IPVS
at the University of Stuttgart for his helpful remarks and comments during this master’s
thesis. His supervision benefited me throughout this thesis’ research and writing.

I would like to express my thorough gratitude to my father, my mother, my brother, and my
sister, for providing me with continuous help and support throughout my years of education.
Without them, this achievement would not have been possible.

I extend my heartfelt gratitude to Waseem, Sohaib, Moiz, and Awais. I greatly appreciate
their encouragement, inspiration, and affection.

Finally, I would like to thank my wife for her love and unwavering support. Thank you
very much for being my best friend.

3


Abstract

Internet of Things (IoT) platforms have multi-layer architectures that facilitate the pro-
visioning, automation of connected devices, and monitoring. These platforms simplify
development because they solve a lot of the problems and complexities in building an
IoT application. IoT platforms are typically central components with many heterogeneous
users, which leads to the effect that these platforms are essential for IoT scenarios and
must not become a single point of failure. Furthermore, since many users access these
platforms, scalability and multi-tenancy are crucial. The scalability of IoT platforms makes
user applications stable and more comfortable to extend. In contrast, the multi-tenancy
trait allows multiple tenants to access user applications at the same time.

In this thesis, we examine every software stack layer of an IoT platform and explain different
methods to make each layer scalable. Enabling multi-tenancy at the application layer and
attaining scalability of message brokers, databases, the middleware as a whole, network
layer, and device layer are the primary tasks for enabling multi-tenant scalable IoT platforms.
To provide a solution to these tasks, we go through the research work already done in the
area of scalability and multi-tenancy. We address multiple solutions for each task and
also provide the best suitable solution for each task at the corresponding layer of an IoT
platform.

4


Contents

1 Introduction 17

2 Related Work 19

3 Enabling multi-tenant scalable IoT platforms 25
3.1 Application Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.1.1 Multi-tenancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.2 Multiple Instances Multi-tenancy . . . . . . . . . . . . . . . . . 29

3.1.2.1 Infrastructure Level Multi-tenancy . . . . . . . . . . . 30
3.1.2.2 Middleware Level Multi-tenancy . . . . . . . . . . . . 30

3.1.3 Native Multi-tenancy . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.3.1 Application Level Multi-tenancy . . . . . . . . . . . . 31

3.1.4 Application Layer Conclusion . . . . . . . . . . . . . . . . . . . 36
3.2 Middleware Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.2.1 Data Replication and Scalability . . . . . . . . . . . . . . . . . . 38
3.2.2 Scalability of Message Brokers . . . . . . . . . . . . . . . . . . . 42
3.2.3 Scalability of the Middleware as a whole . . . . . . . . . . . . . 48
3.2.4 Middleware Layer Conclusion . . . . . . . . . . . . . . . . . . . 62

3.3 Network Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.3.1 Scalability of Network Layer . . . . . . . . . . . . . . . . . . . . 63

3.4 Device Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.4.1 Scalability of Device Layer . . . . . . . . . . . . . . . . . . . . . 64

4 Implementation 65
4.1 Scalablity of the Middleware as a whole . . . . . . . . . . . . . . . . . . 65
4.2 Scalability of the Message Brokers . . . . . . . . . . . . . . . . . . . . . 68

5 Summary and Future Work 71
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

Bibliography 73

5


List of Figures

2.1 Architecture of the auto-scaling approach in the Cloud [CMKS09] . . . . 20
2.2 Multi-tenancy architectural strategies [WLJ14] . . . . . . . . . . . . . . 22
2.3 Scalable message broker architecture [JPV+17] . . . . . . . . . . . . . . 23

3.1 MBP Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Multi-tenancy Vs Single-tenancy [KCD+17] . . . . . . . . . . . . . . . . 28
3.3 Multiple Instances and Native Multi-tenancy [GSH+07] . . . . . . . . . . 29
3.4 Data Storage Strategies [KCD+17] . . . . . . . . . . . . . . . . . . . . . 32
3.5 Separate Databases [CCW06] . . . . . . . . . . . . . . . . . . . . . . . . 33
3.6 Shared Database, Separate Schema [KCD+17] . . . . . . . . . . . . . . . 34
3.7 Shared Schema [CCW06] . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.8 Hybrid Model [ABG+18] . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.9 Telegraf Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.10 Telegraf Replication with Kafka . . . . . . . . . . . . . . . . . . . . . . 42
3.11 Messaging Communication [Mag15] . . . . . . . . . . . . . . . . . . . . 43
3.12 Load Balancing Message Queuing Telemetry Transport (MQTT) messages

based on ClientID [JPV+17] . . . . . . . . . . . . . . . . . . . . . . . . 44
3.13 MQTT Cluster [JPV+17] . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.14 Kafka Architecture [KNR+11] . . . . . . . . . . . . . . . . . . . . . . . 46
3.15 The agent-environment interaction in reinforcement learning [LML12] . . 54
3.16 A simple queuing model with one server and multiple servers [LML14] . 56
3.17 IoT Platform Architecture [GBF+18] . . . . . . . . . . . . . . . . . . . . 64

4.1 Scalability of Middleware . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2 MQTT Clustering Connectivity . . . . . . . . . . . . . . . . . . . . . . . 68

7


List of Tables

3.1 Functionalities of MBP layers . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Requirements of MBP layers . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Pros and Cons of Infrastructure Level Multi-tenancy . . . . . . . . . . . . 30
3.4 Pros and Cons of Middleware Level Multi-tenancy . . . . . . . . . . . . 31
3.5 Data Isolation Approaches Comparison . . . . . . . . . . . . . . . . . . 37
3.6 Summary of Static Threshold-Based Rules [LML12] . . . . . . . . . . . 53
3.7 Summary of Reinforcement Learning [LML12] . . . . . . . . . . . . . . 55
3.8 Summary of Queuing Theory [LML12] . . . . . . . . . . . . . . . . . . 57
3.9 Summary of Control Theory [LML14] . . . . . . . . . . . . . . . . . . . 60
3.10 Summary of Time Series Analysis [LML14] . . . . . . . . . . . . . . . . 62

9


List of Listings

3.1 Exemplary Telegraf Config File [Pat] . . . . . . . . . . . . . . . . . . . . 41
3.2 Sample producer code [KNR+11] . . . . . . . . . . . . . . . . . . . . . 47
3.3 Sample consumer code [KNR+11] . . . . . . . . . . . . . . . . . . . . . 47
4.1 Exemplary Prometheus Targets File . . . . . . . . . . . . . . . . . . . . 67
4.2 Exemplary Nginx Configuration File . . . . . . . . . . . . . . . . . . . . 67

11


List of Algorithms

3.1 Threshold Based Auto-Scaling Algorithm For Web Applications . . . . . 51
3.2 Threshold Based Auto-Scaling Algorithm For Distributed Computing Tasks 52
3.3 Reinforcement Learning Algorithm (Q-Learning) . . . . . . . . . . . . . 55

13


List of Abbreviations

COAP Constrained Application Protocol. 17

DBMS Database Management System. 21

IoT Internet of Things. 4

MBP Multi-purpose Binding and Provisioning Platform. 17

MQTT Message Queuing Telemetry Transport. 7

OPC-UA Open Platform Communications Unified Architecture. 17

OS Operating System. 21

RDBMS Relational Database Management System. 21

RL Reinforcement Learning. 53

RP Retention Policy. 40

SaaS Software as a Service. 27

TSD Timeseries Data. 17

TSDB Timeseries Database. 17

VM Virtual Machine. 19

15


1 Introduction

In the IoT, interconnected devices equipped with sensors and actuators communicate with
each other through standardized network protocols to reach common goals [GBF+18]. Fa-
mous examples of IoT environments are Smart Homes [RMD+06], Smart Cities [SKP+11],
or Smart Factories [LCW08]. Most IoT environments have a central component called
an IoT platform, which is responsible for the management of the IoT devices, as well
as their equipped sensors and actuators. These IoT platforms usually support different
standards, such as MQTT, Constrained Application Protocol (COAP), or Open Platform
Communications Unified Architecture (OPC-UA), for data exchange and configuration.
IoT platforms are typically central components with many heterogeneous users, i.e., the IoT
devices, IoT applications, and human users that configure devices or view the data through
dashboards. This leads to the effect that these platforms are essential for IoT scenarios
and must not lead to a single point of failure. Furthermore, since many users access these
platforms, scalability and multi-tenancy are very important.

Multi-tenancy means using shared services and resources for multiple clients. These
resources can include hardware resources (such as CPU, storage) and networking [KCD+17].
In a multi-tenant environment, tenants’ data is isolated and invisible to other tenants. Data
isolation is key to achieve multi-tenancy at the application level. There are different data
isolation strategies [ABG+18], but we suggest a shared database with a shared schema for
the platform because data coming from IoT devices are more or less similar. Scalability is
the ability of software to adjust itself dynamically according to the workload. Vertical and
Horizontal scaling are the two generic ways of achieving scalability [LML12]. In vertical
scaling, hardware resources of a machine are upgraded and degraded according to the
need while more machines are added or removed in case of horizontal scaling. There are
several auto-scaling strategies available that could be used for deciding when to scale out
and when to scale in. These strategies can be used for either reactive scaling or proactive
scaling [LML14]. Auto-scaling actions are based on some performance indicators. CPU
and memory utilization are the major performance factors, along with currently active
sessions and response time. For the platform Multi-purpose Binding and Provisioning
Platform (MBP) [HBS+16] [HWBM16a], we implement a monitoring system using a static
threshold scaling strategy and use CPU and memory utilization, and active sessions as
scaling indicators for the scalability of the middleware [MSA14].

Timeseries Data (TSD) is data gathered over time consisting of a series of observations
(e.g., sensor data, data from smart grids). Timeseries Databases (TSDBs) are being used
to store and handle TSD [BKF17]. In terms of consistency, accessibility, schema-less
layout and immutability, Druid and InfluxDB provide mostly comparable characteristics.

17


1 Introduction

InfluxDB, however, supports more programming language interfaces than Druid. InfluxDB
also has a Time Structured Merge Tree engine that increases ingestion and compression of
data at higher rates [RPHB18]. InfluxData offers a high-availability open-source replication
solution while many users use sharding in the open-source version as a workaround for the
missing clustering feature.

Messaging is a way of exchanging messages between multiple endpoints. It is used for
decoupling of software components. The message sender is called producer in the messaging
middleware term, and the message recipient is called a consumer. A Message broker
sends messages from one application to another using point-to-point or publish/subscribe
messaging models [Mag15]. The publish/subscribe model is best to be used by a message
broker because it provides a one-to-many relationship between publishers and subscribers.
In the scope of this thesis, we implement a clustering of message brokers for the management
of a massive amount of IoT devices [JPV+17]. A gateway is used at the network layer,
which provides an interface to connect the devices to other systems. It also provides a
mechanism for translating between diverse protocols, payload formats, and communication
technologies [GBF+18]. The scalability of this gateway improves communication latency
between devices and middleware. The network may become the bottleneck because the
network bandwidth constrains the data flow. If an IoT platform is delivering/receiving a
higher volume of data than what is recommended by the existing capacity of the network,
then a network bottleneck will occur. The device is a hardware component that has
its storage and processor. Achieving gateway scalability also satisfies our goal for the
scalability of devices.

This thesis is organized as follows. In Chapter 2, we address related work of the scalability
of IoT platforms and its importance for this thesis. The main goals of this thesis are
discussed and compared in Chapter 3. The best solution for each task at the corresponding
layer of an IoT platform is also defined in this Section. The implementation part of this
thesis is discussed in Chapter 4 while we present our summary and future directions in
Chapter 5.

18


2 Related Work

In this chapter, we address work related to the scalability of IoT platforms, e.g., regarding
middleware components, message brokers, gateways, and databases. We also review
the methods for the replication of databases and the multi-tenancy functionality of
applications.

Scalability describes how a system copes with an increasing workload by provisioning of
resources. Lorido-Botrán et al. [LML12] explain two approaches for achieving scalability
which are 1. vertical scaling and 2. horizontal scaling. Vertical scaling is the process
of adding more computing power (CPU, memory, disk) to the hosting machine while
horizontal scaling is achieved by introducing additional hosting machines (e.g., Virtual
Machines (VMs)).

Qu et al. [QCB18] introduce a study of auto-scaling approaches for web applications. In
their work, auto-scaling frameworks’ actions are based on high or low-level performance
factors. CPU and memory utilization are low-level performance indicators monitors at
the server layer. While high-level performance factors, for example, response time, active
sessions, are monitors at the application layer. Auto-scaling techniques presented by
Lorido-Botrán et al. [LML12] derive scaling decisions based on algorithmic analysis
of current workload parameters. Auto-scaling procedures are characterized into five
classifications about cloud-based applications. Static threshold-based rules, queuing theory,
time series analysis, control theory, and reinforcement learning are the primary techniques
that have been proposed. A survey of these auto-scaling strategies has been introduced by
Lorido-Botrán et al. [LML14].

Chieu et al. [CMKS09] present an architecture to scale web applications dynamically in a
virtualized cloud computing environment. Their architecture includes several VMs, an
Apache HTTP load balancer, and monitoring and provisioning sub-systems, as shown
in Figure 2.1. This work considers application-specific scaling indicators, such as the
number of current users and the number of currently active sessions for each instance of an
application. The work uses a monitoring agent running in each application instance, which
forwards the scaling indicators to the dynamic scaling algorithm. This algorithm arbitrates
between monitoring and provisioning sub-systems. Depending on the statistics and moving
average of the scaling indicators, the algorithm triggers the provisioning or de-provisioning
event in the provisioning sub-systems. However, this work does not consider CPU, memory,
and disk usage of the machine hosting the VMs.

19


2 Related Work

 
Load Balancer 

Virtual 

Machine 1 
Virtual 

Machine 2 

Virtual 

Machine N …. 

Database on Virtual 

Machine 

Service Monitor 

Sub-System 

Provisioning Sub-

System 

Scaling 

Algorithm 

Figure 2.1: Architecture of the auto-scaling approach in the Cloud [CMKS09]

Hung et al. [HHL12] propose a similar kind of cloud computing auto-scaling architecture.
Their work contains a cluster monitoring system, a front-end load balancer, and an auto-
provisioning system. The purpose of a front-end load balancer is to send and balance
incoming requests to different application instances running in a virtual cluster. A cluster
monitoring system monitors the resource utilization of each host machine in a cluster. The
auto-provisioning system is responsible for the provisioning of VMs, depending on the
resource usage or currently active sessions. The authors also present an algorithm for
auto-scaling, which is being utilized by the auto-provisioning system.

Murthy et al. [MSA14] describe an architecture for threshold-based auto-scaling of virtual
machines. The architecture includes a configuration file, monitoring, and decision-
maker components, several virtual machines, and a VM configuration component. The
configuration file contains a predefined static upper and lower threshold values. The
monitoring component performs the monitoring of resource utilization of the virtual
machines. A decision-maker receives data about resource utilization as input from
the monitoring component. The decision-maker determines performance metrics using
threshold values given in the configuration properties file. The VM configuration component
performs provisioning or de-provisioning of VMs according to the input from the decision-
maker.

In a multi-tenant architecture, a single instance of the software is deployed on the server
functioning on the service provider infrastructure. It provides access to multiple tenants
at the same time, as describe by Karataş et al. [KCD+17]. Abdul et al. [ABG+18]

20


describe several multi-tenant data layer models, such as dedicated, isolated, shared, and
hybrid. By sharing infrastructure, middleware, or application among tenants, multi-
tenancy can be achieved. Figure 2.2 visualizes the multi-tenancy strategies described
by Walraven et al. [WLJ14]. However, application-level multi-tenancy with a shared-
everything architecture can lead to cost reduction. Chong et al. [CCW06] have outlined
several viable database patterns, primarily for Microsoft SQL Server, which supports
application-level multi-tenancy. In a multi-tenant situation, they present three major
strategies which are: distinct databases, shared databases with distinct schemes, and shared
databases with shared schemes.

Bezemer et al. [BZ10] define an architectural strategy that seeks to distinguish multi-tenant
configuration and internal execution. The authors implement a three-tiered architecture
(authentication, setup, database). In the same way, experiences have been recorded
throughout the transition of single-tenant industrial software systems to multi-tenant
software by Bezemer et al. [BZP+10]. Also, a reengineering pattern is described by the
authors to assure multi-tenancy in software systems. This pattern requires three additional
components: a multi-tenant database, tenant-specific authentication, and configuration.
However, the configuration just serves the look-and-feel purposes of applications and
workflows.

There are typically two sorts of multi-tenancy styles: multiple instances and native multi-
tenancy. The various instances style supports every tenant with its committed application
over shared hardware, Operating Systems (OSs), or a middleware server in a hosting
environment. The latter can assist all tenants through a single shared application instance
over numerous hosting resources, as describe by Guo et al. [GSH+07]. The authors also
describe a mechanism for the development and management of multi-tenant applications.
They believe that the primary challenge of multi-tenancy is tenant isolation. Thus, their
framework includes mainly tenant isolation components, such as data, performance, and
security isolation.

TSD is data gathered over time consisting of a series of observations (e.g., sensor data,
data from smart grids). TSDBs are being used to store and handle TSD, as mentioned by
Bader et al. [BKF17]. The authors perform a comprehensive research and comparison
that identifies 83 TSDBs. They compare 12 popularly chosen open source TSDBs with 27
criteria. Open source TSDBs have been divided into three groups: TSDBs depending on
any existing Database Management Systems (DBMS), TSDBs independent of any DBMS,
and Relational Database Management Systems (RDBMS). They infer that not one TSDB
suits every case of use but highlighted InfluxDB [Inf], MonetDB [Mona], and Druid [Dru]
as the feature richest.

InfluxDB, Druid, and SiriDB [Sir] are investigated by Ravichandran et al. [RPHB18]
as an alternative from TSDBs for their research. SiriDB and Druid were excluded for
quantitative comparison among these three databases. The reason for this is that SiriDB is
a new database that lacks some characteristics, such as supporting programming languages,
e.g., C, C++ and Java, community support, and documentation. In terms of consistency,
accessibility, schema-less layout and immutability, Druid and InfluxDB provide mostly

21


2 Related Work

 
App1 

MW1 

OS 

Tenant1 

Users 

App
n

 
MW
n

 
OS 

Tenant
n

 
Users 

… Tenant
1

 
Users 

Tenant
n

 
Users 

App 

MW 

OS 

Tenant
1

 
Users 

App
n

 
Tenant
n

 
Users 

App
1

 
MW 

OS 

Shared Infrastructure Shared Middleware Shared Application 

… 

… 

… 

… 

… 

… 

Figure 2.2: Multi-tenancy architectural strategies [WLJ14]

comparable characteristics. InfluxDB, however, supports more programming language
interfaces than Druid. InfluxDB also has a Time Structured Merge Tree engine that
increases ingestion and compression of data at higher rates.

In the context of IoT, Arnst et al. [AHPW19] conduct a comparative survey on the reading
and writing performance of three common DBMS. The authors chose MongoDB [Monb],
InfluxDB, and MariaDB [Mar] for their study. They rate InfluxDB higher than others and
endorsed it as a suitable DBMS for handling high recurrent IoT sensors data. Sanaboy-
ina [San16] records OpenTSDB [Ope] and InfluxDB performance assessments based on
data center energy usage. The study hypothesizes that the database application energy is
directly proportional to its use of resources (RAM, Disk, and CPU). They observe InfluxDB
be somewhat more energy effective than OpenTSDB in both single-node and distributed
database environments.

Mangoni [Mag15] provides an overview of messaging concepts, communication protocols,
functionalities, and advanced distributed communications systems. The author compares
and analyzes their weaknesses and strengths. This research includes most famous messaging
technologies which are: ActiveMQ [Apa], RabbitMQ [Rab], Apache Kafka [Kaf], and
ZeroMQ [Zer]. Mangoni rated Apache Kafka as the best technology for data movement
among processing systems. John et al. [JL17] and Dobbelaere et al. [DE17] also compare
two leading messaging technologies, Kafka and RabbitMQ, in their respective studies.
They explore the variety of their characteristics and architectures along with their efficiency
under diverse testing workloads. Both studies conclude their research by ranking Kafka

22


MQTT Load 

Balancer 

MQTT Client 

MQTT Client 

MQTT Client 

MQTT Broker 

MQTT Broker 

MQTT Broker 

MQTT Broker 

Figure 2.3: Scalable message broker architecture [JPV+17]

ahead of RabbitMQ on a throughput basis and rated RabbitMQ better when it comes to
latency. Dobbelaere et al. outline that if the configuration consists of a single node, a single
partition, a single channel, and no replication, then RabbitMQ also performs better than
Kafka in terms of throughput. Apache Kafka is the most advanced message broker. It has
been created to address the need to transfer enormous amounts of data from producers to
numerous consumers in an efficient and scalable manner. Kreps et al. [KNR+11] perform
an experimental study. It discloses that Kafka could publish messages twice and consumes
four times more messages than RabbitMQ and ActiveMQ in a second.

For the management of a massive amount of IoT devices using message brokers, Ju-
tadhamakorn et al. [JPV+17] implement a scalable and low-cost clustering architecture.
Figure 2.3 visualizes an architecture that is quite similar to the approach discussed by
Sen et al. [SB18] for scalability of message brokers. The architecture includes a MQTT
broker cluster at the backend and a load balancer in front of brokers. A load balancer is
used for the distribution of messages within the cluster. Docker Swarm [Doc] is used to
build the cluster and Mosquitto [Lig+17] on each broker in a cluster.

23


3 Enabling multi-tenant scalable IoT
platforms

In this chapter, we discuss the necessary tasks to enable scalability in each architectural
layer of the MBP [HWBM16b]. By doing so, we present different possible solutions for
each task. Furthermore, we suggest a solution that is most suitable for the MBP. Figure 3.1
exhibits the different layers of the MBP.

The application layer represents software that seeks sensor data and uses actuators for
supervising physical actions. It interacts with the middleware [HWS+16] layer to attain
insight and/or exploit the real world. The middleware layer acts as an integral component
that is responsible for receiving and processing data from the devices. It also serves the
data to the connected applications. A gateway is used at the network layer, which provides
an interface to connect the devices to other systems. It also provides a mechanism for
translating between diverse protocols, payload formats, and communication technologies.
The device is a hardware component that has its storage and processor. It forms a

 
MBP 

Application Layer 

Middleware Layer 

Network Layer 

Device Layer 

Figure 3.1: MBP Layers

25


3 Enabling multi-tenant scalable IoT platforms

Functionalities of Layers
# Layer Functionalities

1 Application Software (Smart Grid App, Intelligent
Transport Systems App, etc.)

2 Middleware Message Broker
Data Storage

3 Network Gateway / Communication

4 Device Hardware Component (connected to sensors,
actuators, and middleware)

Table 3.1: Functionalities of MBP layers

connection with the middleware using driver software running on it. Devices are the
physical environment’s point of entry into the digital world. The sensors and actuators are
the hardware components connected to devices. A sensor detects the physical environment
data, such as humidity, sound, or light, while the actuator manipulates the physical
environment [GBF+18]. Table 3.1 illustrates the functionalities of each individual layer of
the MBP.

The major goals of this thesis are to create a generic concept to enable the scalability and
multi-tenancy of an IoT platform. Each layer of the platform should be scalable in such a
way that a single point of failure is avoided. Enabling applications to be multi-tenant is our
goal for the application layer. Multi-tenancy enables an application to serve multiple users
at the same time [BZP+10]. At the middleware layer, the scalability of the message broker
and database should be achieved. Data replication and scalability of the middleware as a
whole are also our significant tasks for this layer. Moreover, the scalability of gateways is
the main task for the network layer. This improves communication latency between devices
and middleware. Regarding the device layer, we aim at supporting unlimited devices,
which is dependent on the gateway scalability. Achieving gateway scalability also satisfies
our goal for the device layer. Table 3.2 explains the significant requirements which we need
to fulfill for each layer.

In the following, we look into all layers individually and discuss each requirement in detail.
We examine and compare different ways of fulfilling each goal of every layer. Furthermore,
we propose which approaches are most suitable for the MBP platform.

3.1 Application Layer

The application layer uses the middleware to manipulate the real world (cf. Figure 3.1)
by demanding data from the sensors and by using actuators to control physical actions.
For example, a software system that controls a building’s temperature represents a

26


3.1 Application Layer

Requirements of Layers
# Layer R# Requirements

1 Application R1.1 Multi-tenancy (enable application to serve multiple
tenants at the same time)

2 Middleware
R2.1 Scalability of Message Broker
R2.2 Scalability of Database and Replication
R2.3 Scalability of Middleware as a whole

3 Network R3.1 Scalability of Gateway to avoid bottleneck issue
4 Device R4.1 Support of unlimited Devices

Table 3.2: Requirements of MBP layers

middleware-connected application [GBF+18]. Our main goal at the application layer is
to attain multi-tenancy enabled software. In Section 3.1.1, we discuss different generic
multi-tenancy architectures and choose the best-suited one for the MBP.

3.1.1 Multi-tenancy

Multi-tenancy is one of cloud computing’s key features. It is essential to know the complete
variety of tenancy alternatives for bringing cloud functionality into any application or
platform. Choosing a tenancy architecture for the application or platform depends on
the enterprise’s needs. A rational explanation of multi-tenancy is that it is a paradigm
of software architecture in which various clients concurrently use a single application.
However, when an application has a single client, it is prevalent for companies to use
a single-tenant architecture [ABG+18]. Figure 3.2 visualizes the distinction between
multi-tenant architectures and single-tenant architectures. The multi-tenant architecture
allows various clients to access the same application server and database at the same time,
while single-tenant applications are structured to provide each client with a dedicated
application server and database.

In a multi-tenant environment, a tenant comprises a group of customers of an organization.
A tenant uses a set of Software as a Service (SaaS) functionalities to attain organizational
objectives. In a cloud paradigm, each tenant is ignorant of other tenants. The benefits
of using multi-tenant architecture are 1. simplicity to add a new tenant, 2. maintaining
the same application becomes more convenient, 3. maximization of the use of resources,
4. handling of multiple tenants at the same time. The disadvantages of this architecture may
include those related to 1. confidentiality of the data, and 2. security of the data. However,
one can create and define a multi-tenancy model at the data layer to address these issues.
Dedicated Schema, Shared Schema, Isolated Schema, and Partially Isolated are the type of
models which can be used at the data layer.

27


3 Enabling multi-tenant scalable IoT platforms

 
Tenant 1 Tenant 2 Tenant N …. 

Multi-tenancy 

Tenant 1 Tenant 2 Tenant N …. 

Single-tenancy 

Figure 3.2: Multi-tenancy Vs Single-tenancy [KCD+17]

Multiple instances and native multi-tenancy are two architectural strategies to attain multi-
tenancy [GSH+07]. Multi-tenancy can be realized at the Application level, Middleware
level, and Infrastructure level of the software stack as listed below.

1. Multiple Instances Multi-tenancy
• Infrastructure Level Multi-tenancy
• Middleware Level Multi-tenancy

2. Native Multi-tenancy
• Application Level Multi-tenancy

Middleware and infrastructure level multi-tenancy provides each tenant with its separate
application instance known as multiple instance multi-tenancy. Application-level multi-
tenancy supports all tenants through a single shared application instance over a hosting
infrastructure, called native multi-tenancy. Multiple-instance technique is adopted to
support dozens of tenants. Native multi-tenancy is used to support a significantly larger
number of tenants, usually hundreds or even thousands. It is important to note that as the
level of scalability raises, the amount of isolation among tenants declines. Because of
various and/or all levels of the software stack which are shared among tenants. In such
cases, tenant isolation is performed at the data level.

Recently, the software industry has progressed from a rigorous hardware split between
tenants (shared-nothing architecture) to a growing point of sharing between tenants (shared
everything architecture). The business requirements are key to determine which multi-
tenancy model to use by taking each tenant’s performance and efficiency into account. For

28


3.1 Application Layer

 
Higher flexibility and Isolation 

Improved Scalability, Shared 

Hardware 

MW / OS 

App 

MW / OS MW / OS 

App App 

Infrastructure Level Multi-tenancy 

Hardware 

MW / OS 

App App App 

Middleware Level Multi-tenancy 

Multiple Instances Multi-tenancy 

Hardware 

App 

Application Level Multi-tenancy 

Native Multi-tenancy 

MW / OS 

Figure 3.3: Multiple Instances and Native Multi-tenancy [GSH+07]

instance, a big corporation may willingly pay a surcharge for various instances to avoid
potential resource sharing hazards. At the same time, most small and medium business
companies would prefer services with reasonable quality at reduced expenses.

In the following, we briefly look into each of the multi-tenancy approaches mentioned
above and decide which approach is suitable for the MBP according to its business needs.
Section 3.1.2 explains the multiple instances multi-tenancy patterns while Section 3.1.3
discusses the native multi-tenancy pattern in detail.

3.1.2 Multiple Instances Multi-tenancy

Multiple instances multi-tenancy supports each tenant with its dedicated application
instance over shared hardware, and/or a middleware server in a hosting environment.
In this multi-tenancy architecture, some or no layers of the software stack are shared
among tenants. Multiple instances multi-tenancy can be achieved using this approach by
sharing infrastructure (virtualization) or middleware components (application server and/or
database server). Multiple-instances architecture utilizes the ability of virtualization. It
hosts the application code on different instances to meet the increase in resource demand.
It is useful for multi-tenant applications when the number of tenants remains relatively
small. Section 3.1.2.1 and Section 3.1.2.2 briefly describe infrastructure and middleware
level multi-tenancy patterns, respectively.

29


3 Enabling multi-tenant scalable IoT platforms

Pros and Cons of Infrastructure Level Multi-tenancy
Pros Cons

Higher flexibility Higher operational cost
Higher tenant isolation Higher maintenance cost

Secure Software licensing expense per tenant
Strong data isolation Possible for limited clients

Easy to extend/restore tenant database

Table 3.3: Pros and Cons of Infrastructure Level Multi-tenancy

3.1.2.1 Infrastructure Level Multi-tenancy

The infrastructure level multi-tenancy [WLJ14] technique is used to maximize scalability
and resource usage by sharing infrastructure. Infrastructure includes physical/virtual
hardware (e.g., Servers, VMware). It provides dedicated VMs to each tenant and all the
requests of any tenant goes to its specific VM as shown in Figure 3.3. In this architecture,
the tenant applications and databases are isolated at the instance level. Each VM entirely
hosts a different tenant application, and no layer of the software stack is shared among
tenants, except the hardware. This approach is a coarse-grained way of isolating the tenants.
It can be used when each client is viewed as an entity without any business logic linked
with it or when the data privacy and regulations are of the greatest interest. Table 3.3 shows
the advantages and disadvantages of this architecture.

The advantages of infrastructure level multi-tenancy are that 1. it gives the highest amount
of tenant isolation compared to other multi-tenancy patterns 2. it is very secure because
nothing is shared among tenants except hardware 3. strong isolation of tenant’s data 4. it can
easily be used to extend or restore individual tenant database instances without interrupting
other tenants. The disadvantages of this pattern are that 1. hosting distinct applications and
middleware for each tenant is very expensive, 2. it is virtually impossible to establish for
most SaaS providers due to the number of clients that they intend to serve, 3. additional
costs, such as software licensing and development expenses per tenant will be significantly
higher due to secure data isolation and customization.

3.1.2.2 Middleware Level Multi-tenancy

The middleware lies between the OS and the application in a distributed computing system.
It is software that offers services other than those offered by the OS to software applications.
Application and database servers are examples of such middleware software. For larger
tenant sets, the VM-per-tenant strategy does not scale. The application provider further
optimizes it by sharing the application server, and possibly also the database layers of the
software stack [WLJ14]. However, the application is still deployed on top of the application

30


3.1 Application Layer

Pros and Cons of Middleware Level Multi-tenancy
Pros Cons

Moderate operational cost Low flexibility
Moderate maintenance cost Low tenant isolation

High data isolation if database is not shared Low data isolation if database is shared
Support for maximum clients Separate application for each tenant

Improved scalability Less secure

Table 3.4: Pros and Cons of Middleware Level Multi-tenancy

server in separate application spaces or application domains to realize tenant separation, as
demonstrated in Figure 3.3. This multi-tenancy pattern has several advantages over the
infrastructure level multi-tenancy architecture, as shown in Table 3.4.

The advantages of the middleware level multi-tenancy pattern are 1. less operational and
maintenance costs than infrastructure level multi-tenancy 2. it can support the maximum
number of tenants 3. scalability can be achieved by scaling middleware 4. strong data
isolation if the database is not shared among tenants.The disadvantages of this pattern are
that 1. it gives a lower amount of tenant isolation and flexibility compared to infrastructure
level multi-tenancy 2. hosting of separate application instances for each tenant 3. it is less
secure compared to infrastructure level multi-tenancy because everything is shared among
tenants except the application 4. low data isolation in case of a shared database among
tenants.

3.1.3 Native Multi-tenancy

Native multi-tenancy supports all tenants by a single shared application instance over various
hosting resources [GSH+07]. It is used to support more tenants than multiple instances
multi-tenancy patterns, as discussed in Section 3.1.2. With this architecture, isolation
among tenants decreases, but the scalability increases. Application-level multi-tenancy is
the only way of fulfilling native multi-tenancy, as depicted in Figure 3.3. Section 3.1.3.1
explains the application-level tenancy architecture in detail.

3.1.3.1 Application Level Multi-tenancy

In application-level multi-tenancy, a single application serves the needs of multiple tenants
at the same time. The application code is distributed among tenants, and a pool of
deployed application instances can be shared between tenants to enhance scalability further.
Segregation of tenants is tackled exclusively in the application code instead of virtualization
or separate application instances for each tenant. Different tenant groups with multiple
end-users access the same instance of the application running but getting a completely
different experience potentially because of tenant isolation being done in the application

31


3 Enabling multi-tenant scalable IoT platforms

 
Application Application 

Shared Application, Separate 

Database 

Shared Application, Shared 

Database 

Figure 3.4: Data Storage Strategies [KCD+17]

code. Each tenant gets completely different data that is being isolated by the platform or
application code underneath it. This multi-tenancy pattern is a very data-centric intelligent
model. Data for all the tenants are being stored in a common database, but it is being
served to the tenants very differently.

There are two vital tasks (data isolation and tenant-specific authentication) which we need
to achieve for enabling multi-tenancy at the application level. These are discussed in detail
in the following.

1. Data isolation: There is a high requirement for data isolation in a multi-tenant
application. Since all tenants use the same instance of the database, it is necessary
to ensure that only their data can be accessed. A separate database, shared database,
and hybrid model are the data storage patterns to realize multi-tenancy at the data
layer [ABG+18]. Data storage strategies are visualized in Figure 3.4.
1. Separate Database: It is a dedicated model in which each tenant has its own

physically separated database but sharing the application with other tenants. Each
instance of the database entirely hosts specific tenant data, and nothing at the data
level is shared among tenants. Storing tenants’ data in separate databases is the
simplest approach to data isolation. It is easy to modify the data model of the
application to satisfy the individual needs of the tenants. Using this pattern, it is
simple to restore a tenant’s data from backups if any failure occurs. But it tends to
result in higher costs for equipment maintenance and tenant data backup. The price

32


3.1 Application Layer

 
Tenant-1 Tenant-2 Tenant-3 

Figure 3.5: Separate Databases [CCW06]

of hardware is also lower than in alternative approaches because a database server
can only support the limited total of tenants due to the limited number of databases
that the server can support.
Segregation of tenants’ data into separate databases is an expensive approach because
of high maintenance requirements and hardware costs. It is relevant for the tenants
that are happy to pay such a hefty amount for the benefit of customizability and added
security. Tenants belonging to banking and health records management fields often
have rigorous requirements for data isolation. They may not even rate an application
that does not accommodate their data in separate database instances. This approach
is shown in Figure 3.5.

2. Shared Database: In this strategy, each tenant uses a common database, but it is
further divided into two models. These are Separate Schema and Shared Schema,
which we discuss in detail in the following.

• Separate Schema: In this approach, multiple tenants’ data is kept in the same
database, with each tenant having its own schema. A schema is a group of tables
that are created specifically for each tenant. Each tenant schema is separated
from other tenant schemas, as shown in Figure 3.6. Data is moderately isolated
on a schema level rather than using instance-level isolation. A set of tables is
generated and organized into a schema, and authorization is granted for each
new tenant. The application can then customize and design tables using the
schema name and table name. This approach is quite easy to implement, and

33


3 Enabling multi-tenant scalable IoT platforms

 
Tenant-1 Tenant-2 

Tenant-4 Tenant-3 

Figure 3.6: Shared Database, Separate Schema [KCD+17]

tenants can broaden the data model as conveniently as with the separate database
approach. It is very convenient where the schema of all tenants is different, and
usually, schema design is needed to be modified. It lessens the costs of a separate
database approach and the security drawback of the shared schema model. It
is also relevant for the applications that use a comparably meager number of
database tables (100 tables per tenant or less). It can typically entertain more
tenants per server than the separate database strategy can. The application
cost is low as long as tenants agree with their data being stored in a shared
database instance with other tenants. The disadvantages are that a large number
of tables must be kept, and it is tough to restore tenants’ data in the situation of
a breakdown. If each tenant has its database, restoring a single tenant’s data
means directly recovering the database from the most up-to-date backup. But
restoring the entire database will require overwriting each tenant’s data.

• Shared Schema: In this approach, both the database and schema are shared
for hosting every tenants’ data. This model yields the most productive use of
hardware with the slightest maintenance and usage costs. It is useful when
the tenants are willing to share the application components, specifically the
functionality at the data layer. Data isolation is achieved at row level within
tables, as shown in Figure 3.7. A group of tables is designed to hold all
the data of all the applicant tenants. It is also needed to design a security
mechanism of the databases to prevent tenants from accessing other tenant’s data.
Tenant identifiers are used to persist and access the tenant-related information

34


3.1 Application Layer

 
Figure 3.7: Shared Schema [CCW06]

within such tables. It assigns each row to the specific tenant. The shared
schema approach is usually applied in applications where the tenants’ data are
strictly related. Because of the shared schema, the extensibility of data is a
major challenge opposed to the separate schema pattern. This approach uses
multi-tenant metadata and indexes to provide data extensibility. This involves
a complicated schema and the development of dummy columns that break the
normalization rules of relational databases. Nevertheless, in terms of hardware,
software licenses, maintenance, and backup, it requires the lowest costs because
one database instance serves multiple tenants. This model is ideal when the
system is designed to support a significant number of tenants who are willing
to give up strong isolation for the cheaper cost of use. In case of a failure,
restoring tenants’ data is done by recovering data from up-to-date backups, like
the separate schema approach.

3. Hybrid Model: This model is a blend of different schema approaches and a shared
schema. The hybrid model shares application/platform segments (e.g., user interface,
database) among tenants that have identical features and functionality. It isolates
the sections with specific or unrelated features. Common data, such as tenants’
identification information, are stored in a single table or shared schema. Tenant-
specific data are segregated at the schema level by defining a separate schema for
every tenant, as illustrated in Figure 3.8.

35


3 Enabling multi-tenant scalable IoT platforms

 
Application (Common and/or Tenant Specific) 

Common Schema Tenant Specific Schema 

User Interface (Common and/or Tenant Specific) 

Figure 3.8: Hybrid Model [ABG+18]

2. Tenant-specific authentication: In a native multi-tenant system, all tenants share
the same application instances, infrastructure, and database instances. Therefore,
isolation should be carefully considered in architectural design. To be able to offer
the environment flexibility and to ensure tenants can only access their data, tenants
must be authenticated. Each tenant is assigned a unique tenant id and a sub-tree in the
organization directory isolated by tenant id during on-boarding. The tenant administrator
can easily map user accounts of its private domain into the organization sub-tree. To
authenticate the request, we can use the standard middleware security mechanism (e.g.,
Java Authentication & Authorization Service). According to the information stored
in the corresponding organization sub-tree, a credential is generated. We can restrict
a user of one tenant from using this credential to access the private domain of other
tenants without authentication since the credential contains the tenant data.

3.1.4 Application Layer Conclusion

Multiple instance multi-tenancy and native multi-tenancy are the two main architectural
strategies for achieving multi-tenancy. We discussed these two approaches in Section 3.1.1.
Multiple instances multi-tenancy supports each tenant with its dedicated application
instance over shared hardware, and/or a middleware server in a hosting environment. It
can be applied at the infrastructure level or middleware level of the software stack. The
infrastructure level multi-tenancy technique is used to maximize scalability and resource
usage by sharing infrastructure. An application server, and possibly also the database layers

36


3.2 Middleware Layer

Comparison of Data Isolation Approaches
Approach Data Isolation Extensibility Cost

Separate Database Database Instance Level Custom Columns High
Shared Schema Row Level Pre-allocated columns Low

Separate Schema Schema Level Custom Columns Moderate

Hybrid Model Row Level Custom Columns Moderate
Schema Level Pre-allocated columns

Table 3.5: Data Isolation Approaches Comparison

of the software stack are shared in the middleware level multi-tenancy strategy. Native
multi-tenancy is attaining at the application level that supports all tenants by a single shared
application instance over various hosting resources. Data isolation and authentication are
two significant tasks needed to be performed for attaining application-level multi-tenancy.
A brief comparison of different data isolation approaches is shown in Table 3.5.

In the MBP, native multi-level tenancy (cf. Section 3.1.3) is a suitable approach for
enabling multi-tenancy. It is also known as application-level multi-tenancy, in which
a single instance of the application serves all of the tenants. It is a shared everything
architecture, which means every component of the software stack is shared among tenants.
Tenant authentication is done in software code, and data isolation is achieved at the data
layer. From the tenants’ data isolation, a shared schema approach with a shared database
is convenient for the MBP because the data is more or less similar. The shared schema
approach is also less costly than the other alternate options. To accommodate a vast number
of tenants, we use a scalability mechanism for the middleware layer, which we discuss in
detail in Section 3.2.

3.2 Middleware Layer

The middleware lies between the OS and the application in a distributed computing
system. It is software that offers services other than those offered by the OS to software
applications. Application and database servers and message brokers are the components of
the middleware. IoT middleware is a software that acts as an intermediary between IoT
elements, allowing interaction among non-capable elements. Middleware often integrates
complicated and already existing programs that were not originally designed to be connected.
The nature of the IoT makes it possible to connect anything and exchange data over a
network. Middleware is part of the architecture that offers a layer of communication.

In this section, we discuss scalability and replication of databases, scalability of message
brokers, and different possible ways to scale the middleware as a whole. We also discuss
multiple auto-scaling techniques that can be used to decide when to scale-in or scale-out.

37


3 Enabling multi-tenant scalable IoT platforms

3.2.1 Data Replication and Scalability

TSD is data gathered over time consisting of a series of observations (e.g., sensor data, data
from smart grids). TSDBs are being used to store and handle TSD. Bader et al. [BKF17]
perform a comprehensive research and comparison that identifies 83 TSDBs. They compare
12 popularly chosen open source TSDBs with 27 criteria. Open source TSDBs have been
divided into three groups: TSDBs depending on any existing DBMS, TSDBs independent
of any DBMS, and RDBMS. They infer that not one TSDB suits every case of use but
highlighted InfluxDB [Inf], MonetDB [Mona], and Druid [Dru] as the feature richest. In
terms of consistency, accessibility, and immutability, Druid and InfluxDB provide mostly
comparable characteristics. InfluxDB, however, supports more programming language
interfaces than Druid. InfluxDB also has a Time Structured Merge Tree engine that
increases ingestion and compression of data at higher rates.

In the context of IoT, Arnst et al. [AHPW19] conduct a comparative survey on the reading
and writing performance of three common DBMS. The authors chose MongoDB [Monb],
InfluxDB, and MariaDB [Mar] for their study. MongoDB is a document-oriented database
that allows flexible schemas. Data is stored internally in binary JSON documents, which are
organized in collections in turn. InfluxDB has a strict schema architecture as a time-series
database. Each data series is composed of points. Each point includes a timestamp, the
measurement name, an optional tag, and one or more fields of key values. The timestamp
has a nanosecond precision and is indexed. The measurement name should be used to
describe the stored data. Therefore, the optional tags are indexed and used to group data.
InfluxQL, a SQL-like query language, is used to retrieve data. They rate InfluxDB higher
than others and endorsed it as a suitable DBMS for handling high recurrent IoT sensors data.
Sanaboyina [San16] records OpenTSDB [Ope] and InfluxDB performance assessments
based on data center energy usage. The study hypothesizes that the database application
energy is directly proportional to its use of resources (RAM, Disk, and CPU). They observe
InfluxDB be somewhat more energy effective than OpenTSDB in both single-node and
distributed database environments.

The CAP theorem implies that a common data system cannot promise at the same time, all
three of the following characteristics:

• Consistency

• Availability

• Partition tolerance

Consistency ensures that everybody can read the latest version of the data from the database
whenever an update activity is complete. A system where users are unable to perceive the
new data immediately is not firmly consistent and is referred to as eventually consistent.
Availability is usually accomplished by running the database as a node cluster, using
replication or partitioning data across several nodes, so if one node fails, the other nodes
can still operate, which ensures that the system requires continual operations. Partition
tolerance means that although part of it is unavailable (e.g., due to maintenance purpose or

38


3.2 Middleware Layer

network connectivity), the system will continue to operate. This can be done by redirecting
writings and reading to still available nodes. Nevertheless, for a single node network, this
property is useless, it only works for a cluster. For databases used in banking or accounting
data, consistency is essential. However, some systems may favor partition tolerance and
availability over consistency. For example, the emphasis on availability and partition
tolerance lies on social networks, wikis, forums, and other large-scale websites with high
traffic and low latency requirements. It is difficult to achieve the ACID properties for
these systems and, therefore, the BASE approach is more likely to be implemented, that is,
1. Basic Availability, 2. Soft-state and 3. Eventual consistency. Basic availability means
that in terms of the CAP theorem, the system guarantees availability. Soft-state implies that,
while there is no input, the state of the system can change over time. Eventual consistency
means that the system will be consistent over time during the state where the input is not
received. The system using the BASE method does not need to be exclusively available
and consistent all the time but is more tolerant of faults.

The distributed database architecture is usually defined as follows [SWED16]:

• Single Server: The simplest option without any distribution is represented by a
single-server architecture. In this architecture, a single node manages all requests for
reading and writing.

• Master-Slave: The data is spread across many nodes in a master-slave distribution.
One node serves as a master node that executes all write requests and synchronizes
the slave nodes that perform read requests only. It is possible to apply the master-slave
distribution to scale databases based on reading workloads. This architecture allows
the replication of data from the master database server to one or more slave database
servers. The master reports changes that spread to the slaves afterward. The slave
issues a message stating that it has successfully received the update, allowing future
updates to be sent.

• Multi-Master: The distribution of multi-master or peer-to-peer is not based on
different types of nodes, i.e., all nodes are similar. Replication and sharding of
the data are implemented in a multi-master distribution to distribute write and
read requests across all cluster nodes. It is a decentralized architecture, unlike
Master-Slave.

InfluxDB is an open-source time-series database developed by InfluxData with optional
closed-source components. It is written in the GO programming language and is designed
for handling data from time-series. The open-source version, namely the TICK Stack,
provides a time-series database platform with a variety of services including the InfluxDB
core and can run on a single node in the cloud and on-premises. The closed-source versions,
such as InfluxEnterprise and InfluxCloud, offer additional features, such as scalability,
high availability, backup and restore, and run on-site or in the cloud. Kapacitor is used to
stream data in real-time, and Chronograf is used to monitor data. The InfluxData ecosystem

39


3 Enabling multi-tenant scalable IoT platforms

 
Telegraf 

Data Node 1 

Data Node 2 

Data Node 3 

Load Balancer 

Telegraf 

Data Node 1 

Data Node 2 

Data Node 3 

Load Balancer 

Data Center 1 Data Center 2 

Figure 3.9: Telegraf Replication

includes both Kapacitor and Chronograf. For IoT Monitoring, DevOps Monitoring
(Application Monitoring, Infrastructure Monitoring, Cloud Monitoring), and Real-Time
Analytics, InfluxDB is typically chosen by clients.

When InfluxDB revealed that the open-source version would not include clustering, it faced
a global uproar. Nonetheless, InfluxData offers a high-availability open-source replication
solution while many users use sharding in the open-source version as a workaround for the
missing clustering feature. This can, for instance, cause InfluxDB to look unattractive for
low-budget start-ups. Sharding is the horizontal division of data into a database. Every
partition is referred to as a shard. InfluxDB stores data in shard groups ordered by retention
policies and stores data inside a specific time period with timestamps. A retention policy
specifies how long the data is stored by InfluxDB and how many copies are stored in
the cluster. The length of the above interval of time relies on the Retention Policy (RP)
duration. The standard shard group intervals are 1 hour for RP less than 2 days, 1 day for
RP between 2 days and 6 months and 7 days for RP greater than 6 months. For efficient
drop operations, the shard group length is essential, where data is dropped per shard, not
per data point. For example, if a RP has a 10 hour length, splitting the data into 5-hour
intervals does not make sense. Nevertheless, compression and speed can be affected by
short shard group durations for large RP.

There are usually two patterns that could be used to replicate data into the InfluxDB
multi-datacenter. The first is to transfer data to the second data center cluster when it is
ingested into InfluxDB. The second approach is to replicate data on the backend from one
cluster to another.

40


3.2 Middleware Layer

Listing 3.1 Exemplary Telegraf Config File [Pat]

1 ##Cluster1

2 [[outputs.influxdb]]

3 urls = ["URL of the cluster load balancer"]

4 database = "Name of the DB you want to write to"

5 retention_policy = "RetentionPolicy"

6 write_consistency = "any"

7 timeout = "5s"

8 content_encoding = "gzip"

9

10 ##Cluster2

11 [[outputs.influxdb]]

12 urls = ["URL of the cluster load balancer"]

13 database = "Name of the DB you want to write to"

14 retention_policy = "myRetentionPolicy"

15 write_consistency = "any"

16 timeout = "5s"

17 content_encoding = "gzip"

Ingest replication is a method used to copy data to InfluxDB when ingested. This is,
perhaps, the easiest way to set up for the data replication and the most beneficial for all
new data coming into our clusters from external sources. Most of the ingestion patterns
are based on Telegraf in some way. Telegraf can be used as a data aggregator, agent, or to
assist in setting up pipelines for ingesting data. Replication on ingestion can be done by
Telegraf replication or Telegraf replication with Kafka. The first pattern to be discussed
is to use Telegraf alone to replicate all data to both clusters when ingested, as shown in
Figure 3.9. To set this up, in the Telegraf config file, we define the URL of both clusters.

The configuration, depicted in Listing 3.1, states that we need to write to two different
clusters of InfluxDB. One thing to note is that either the URL of a load balancer in front of
the cluster or a list of the URLs of each data node in the cluster should be the URL that one
specified in the config file. It is the simplest method to set up. Telegraf will try to rewrite
the failed writes in the case of a network partition between Telegraf and the clusters. It will
also store data to be written in a buffer in the memory. One can customize the size of this
buffer in Telegraf.

Telegraf replication with Kafka, shown in Figure 3.10, has Kafka in front of Telegraf
instances. Kafka offers a long-lasting write queue for all data as it reaches the cluster.
Therefore, this also brings more versatility to what can be achieved with all the input data.
For example, for other types of analysis, one may also want to send all the data to long-term
storage or send it to other analytical platforms.

41


3 Enabling multi-tenant scalable IoT platforms

 
Telegraf 

Data Node 1 

Data Node 2 

Data Node 3 

Load Balancer 

Telegraf 

Data Node 1 

Data Node 2 

Data Node 3 

Load Balancer 

Data Center 1 Data Center 2 

Kafka 

Queue 

Figure 3.10: Telegraf Replication with Kafka

3.2.2 Scalability of Message Brokers

Messaging is a loosely connected interaction solution that greatly reduces the dependence
of producers and consumers. Eliminating these dependencies allows the architecture as a
whole to be more versatile and tolerant of change, but it comes with increased complexity.
Therefore, over the years, dedicated middleware messaging has been developed to provide
communication features without having to deal with the inner complexities. A messaging
system, as depicted in Figure 3.11, serves as a layer of indirection between communicating
entities. Commonly mentioned as a message broker, it is known for transferring data from
one application to the other as messages, so that producers and users can concentrate on
what to communicate instead of how to communicate. Message brokers are the messaging
system’s most popular implementation. A message broker is an intermediate standalone
entity offering messaging features through custom or standard protocols. There are many
message brokers with various features, protocols, languages for implementation, and
support for platforms. Many approaches have been suggested that can usually be split into
broker-based solutions (such as Kafka [Kaf], Active MQ [Apa], and RabbitMQ [Rab])
and broker-less solutions, like Java Messaging Service or ZeroMQ [Zer]). Broker-less
systems primarily provide an API that abstracts network communication information while
broker-based systems use a central system as a communication channel between peers. We
analyze and discuss only broker-based solutions.

A flexible and minimal-cost message broker is needed to manage a huge number of messages
that communicate between IoT devices in real-time, supporting a massive amount of IoT
devices. MQTT is a lightweight topic-based publish/subscribe messaging protocol that is
designed for machine-to-machine communications. It is standardized by the Organization

42


3.2 Middleware Layer

 
Producer Consumer 

Messaging System 

      
Figure 3.11: Messaging Communication [Mag15]

for the Advancement of Structured Information Standards [MQT]. The benefit of the
publish/subscribe design is that it is not important for the author and the subscriber to
know each other. The broker acts as a facilitator between publishers and subscribers for
the exchanges of messages. The connection is formed by the client forwarding the broker
a connect message request. The broker acknowledges the client’s request by sending a
response with a connection acknowledged message. Once the connection is established
to the broker, the user can produce or consume messages. A client’s published message
must contain a topic name that is a string that the broker uses to send the message to
the topic subscribers. The subscriber may subscribe to the MQTT client by sending
a subscribe packet containing the specific topic to a broker. There are three general
strategies to enhance the scalability of the message broker: 1. Improving a single machine’s
performance, 2. clustering of messages brokers or 3. Apache Kafka. Upgrading resources
of a single machine is not the right way to achieve scalability. We discuss the clustering of
brokers and Apache Kafka in detail below.

Clustering of Message Brokers: Figure 2.3 illustrates the system overview consisting of
the message broker cluster and the MQTT load balancer. This includes the load balancer at
the frontend and the MQTT broker cluster at the backend. To perform the MQTT load
balancing, NGINX [Ngi] is running as a frontend server. This is the core part of the
client-broker architecture. It is the interface interacting with the clients of MQTT. The
messages will then be sent to a broker node in the MQTT cluster backend. For the MQTT

43


3 Enabling multi-tenant scalable IoT platforms

 
Header 

Hash Value 

MQTT Broker 

node 

MQTT Broker 

node 

MQTT Broker 

node 

MQTT Broker 

node 

ClientID 

MQTT CONNECT Packet 

Hash 

Figure 3.12: Load Balancing MQTT messages based on ClientID [JPV+17]

cluster at the backend, Docker Swarm [Doc] can be used to form a cluster of Raspberry Pi
nodes and running Mosquitto on each node in container form. Figure 3.13 illustrates the
clustering of message brokers.

The load balancer is used in combination with the MQTT broker cluster to have a single
entry point for all MQTT clients. The message producer sends messages to a broker, and
by subscribing to the broker, the subscriber will receive the messages. In the case of a
lot of devices and messages, more brokers are required to manage the broker’s message
congestion and network traffic. Therefore, to spread incoming network traffic through
several brokers, it requires a load balancer. It also offers high availability and accuracy
by sending messages only to online brokers. When communicating with such a system,
a MQTT client sends a MQTT connect packet to establish a connection with the broker.
This packet contains that MQTT client’s header and ClientID as identifier. Thus, we can
use the ClientID for hashing to perform the load balancing, as shown in Figure 3.12. By
using this process, various ClientIDs will be hashed and linked thoroughly to a different
broker for messaging.

If one of the brokers is shut-off, the load balancer will automatically connect to the new
broker, which will keep the broker cluster running and available. Also, the load balancer
offers the consistency of the connection as it must first send the connect packet before
forwarding the request to a message broker from a client. If the acknowledgment message
is not received after sending the connect packet, the load balancer will know that the
broker is not available so that it will forward the MQTT request to another available broker.

44


3.2 Middleware Layer

 
Load Balancer 

(NGINX) 
MQTT Client 

RasPi Node1 

(Mosquitto) 

RasPi Node6 

(Mosquitto) 

RasPi Node5 

(Mosquitto) 

RasPi Node4 

(Mosquitto) 

RasPi Node3 

(Mosquitto) 

RasPi Node2 

(Mosquitto) 

Docker Swarm Cluster 

Figure 3.13: MQTT Cluster [JPV+17]

Docker Swarm, which offers a framework for a cluster orchestration management, can be
used for the backend message broker clustering. Docker Swarm can be easily installed and
configured by simply receiving a token from the manager node to add a new node to the
cluster, which makes the cluster scalable.

Jutadhamakorn et al.- [JPV+17] first install Docker on each node for the implementation of
the MQTT cluster, and then the nodes join the cluster one by one by running the Docker
swarm init command. Next, they launch a MQTT broker’s service, such as Mosquitto, an
open-source message broker. Under Docker, a MQTT broker runs and shares the cluster’s
internal storage. A broker will be able to join and leave the cluster easily. By running the
token from the manager node, a new node is connected to the Docker Swarm cluster. The
swarm join-token manager command needs to be run to get the token and copy the token to
the desired node. The new node then joins the cluster.

Apache Kafka: Apache Kafka is built based on a broker-based message-oriented mid-
dleware as a distributed streaming platform. The goal of Kafka is to provide a medium
for high-performance input data sequence processing, and its major design objectives
are scalability, reliable data transfer, fault tolerance, and fast processing. From a general
point of view, a distributed system based on Kafka consists of three main components:
1. message producer, 2. message consumer, and 3. Kafka messaging broker. The Message
producer generates data streams that one or more consumers read and/or process. The
Kafka messaging broker, which is primarily responsible for the storage of all messages
until the user receives them. A Kafka message consists of a timestamp, a key/value pair

45


3 Enabling multi-tenant scalable IoT platforms

 
Producer Producer 

 
Topic1/Part1 

            /Part2 

Topic2/Part1 

 
Topic1/Part1 

            /Part2 

Topic2/Part1 

 
Topic1/Part1 

            /Part2 

Topic2/Part1 

Broker 1 Broker 2 Broker 3 

Consumer Consumer 

Figure 3.14: Kafka Architecture [KNR+11]

for communication encoding, and an information address provided by the core message
topic definition. A topic essentially gives a name for a message group with a certain kind.
The producer can publish a message of that kind by using that topic name. On the other
hand, customers may subscribe to such messages using the topic name as well. There may
be multiple consumers on each topic. The messaging protocol used among Kafka clients
and brokers is considerably more powerful than the RabbitMQ and ActiveMQ, which use
AMQ. This becomes particularly apparent when looking at a performance analysis that
measures latency and CPU workload when processing bundles of messages. Figure 3.14
demonstrates Kafka’s overall structure. Because Kafka is distributed in nature, there are
usually multiple brokers in a Kafka cluster. To manage load, a topic is split into several
partitions, and one or more of those partitions is stored by each broker.

Kafka uses so-called partitions to provide high performance while messages are published
or consumed. Partitions can be interpreted within a single topic as message sub-queues
where the published messages are processed in a fixed first-in-first-out order. Usually, each
partition is managed by a Kafka cluster instance running on a dedicated host. Writing
and reading performance of a topic is, thus, substantially increased by allowing multiple
consumers and/or producers to write and read in parallel. Published messages are organized
into exactly one particular partition, either explicitly specified by the message producer
or in a Kafka-controlled round-robin manner. Essentially, a consumer polls a topic for
message consumption and retrieves messages from any of the partitions. Also, consumers
can be grouped into consumer groups to provide a means of load balancing and parallel
storage on the recipient side, leading to direct mapping among consumers in that group

46


3.2 Middleware Layer

Listing 3.2 Sample producer code [KNR+11]

1 producer = new Producer();

2 message = new Message(msg.getBytes());

3 set = new MessageSet(message);

4 producer.send(topic1, set);

Listing 3.3 Sample consumer code [KNR+11]

1 streams[] = Consumer.createMessageStreams(topic1, 1)

2 for (message : streams[0])

3 {

4 bytes = message.payload();

5 // do something with the bytes

6 }

and certain partitions. Every group consumer gets messages from a specific partition.
Parallel reads and writes reduce the increased latency created by general message brokering.
Concerning the Kafka communication system’s fault tolerance, each partition may have
replicas on other Kafka instances that provide a means of redundancy. One of the replicas
is called a master in which messages are first stored and then distributed to other instances.
The message can only be pulled by subscribed users after the message has been hard-
drive-persisted at each replication. This persistence message also provides the opportunity
of fault tolerance in the event of a Kafka instance system failure. Kafka offers secure
transport layer connections between broker systems and producers/consumers. It also
provides access control lists defining read/write access rights for each topic that a consumer
or producer has. Listing 3.2 and Listing 3.3 illustrate the message producer and message
consumer code simultaneously.

The information on how much each consumer has consumed in Kafka is not ensured by the
broker, but rather by the user. Such a model reduces the broker’s complexity and overhead.
However, it makes it difficult to remove a message because a broker does not know if all
subscribers have processed the message. By using a straightforward time-based service
level agreement for the retention period, Kafka solves this problem. If a message has been
preserved in the broker for longer than a certain timespan, commonly 7 days, a message is
deleted immediately. In practice, this solution works well. Most of the consumers end up
consuming messages either hourly, weekly, or in real-time, including offline ones. The
fact that Kafka’s performance with a larger data size does not degrade makes this long
persistence possible. A user may purposely return to an older offset and re-use data. It
breaches a queue’s standard contract, but for many customers, it appears to be an important
feature. Kafka ensures that messages are distributed to a customer in order from a single
partition. However, there is no guarantee that messages from separate partitions will be
ordered. Kafka is an optimized data flow solution, which is often used as a pipe for various
processing systems, such as Hadoop and Storm.

47


3 Enabling multi-tenant scalable IoT platforms

3.2.3 Scalability of the Middleware as a whole

The scalability of the middleware as a whole involves mapping performance demands
to the offered underlying resources. It can be pretty challenging to adjust resources to
an application’s on-demand requirements, which is called scaling. Under-provisioning
resources will eventually hurt performance, while over-provisioning resources can lead to
idle instances, leading to needless costs. The first consideration is to plan the average load
capacity or peak load capacity and set up the fixed infrastructure according to the planned
capacity. There is less cost incurred when planning for the average load, but performance
will be a concern if load spikes occur. Poor performance will discourage customers and
affect revenue. On the other hand, most resources will stay idle almost all the time if
capacity is anticipated for the maximum possible workload. Thus, a more comprehensive
resource allocation strategy seems to be appropriate, which will dynamically scale resources
to demand. These are called auto-scaling strategies.

Well-known cloud service providers, such as Amazon EC2, generally offer auto-scaling
based on rules. Typically, this straight-forward approach includes two rules to assess when
the system should be scaled-up or scaled-down. The user must classify a condition for each
rule based on a target variable, e.g., HTTP requests > 80. When the criterion is satisfied,
a predetermined scaling-down or scaling-up action is activated; e.g., instantiating a new
VM. The rule-based approach could be defined as a reactive algorithm, meaning that it
responds to the load of the system but does not predict system load. Predictive or proactive
auto-scaling strategies, on the other hand, attempt to predict the system needs in the future
and, thus, obtain or release resources ahead of time so that the system is ready in advance
to tackle the load when required.

Auto-scaling techniques are grouped into the following categories [LML12]:

• Static, threshold-based policies

• Reinforcement learning

• Queuing theory

• Control theory

• Time-series analysis

It is complicated to work out a proper categorization of auto-scaling strategies. It is possible
to split auto-scaling techniques into two main classes: reactive and proactive [LML12].
Threshold-based strategies are part of the reactive classification while analyzing time-series
is a solely proactive approach. In comparison, queuing theory, reinforcement learning, and
control theory can be used with both reactive and proactive approaches.

Reactive strategies, as the name suggests, refer to the number of methods that respond to the
state of the existing system and/or application. Decisions are made based on the previous
values acquired from a set of variables that have been monitored. Threshold-based norms
or strategies follow a purely reactive method among several classifications listed above.

48


3.2 Middleware Layer

The lack of awareness of a reactive approach has a strong impact on the performance of
auto-scaling. With the massive traffic bursts likely to result from special deals or business
campaigns, the system might not be able to scale proportionally. In addition, it may end up
taking up to 15 minutes to instantiate a new VM, and the impact of a scaling-up operation
may come too late. Hence, proactive or predictive resource-based scaling is needed to cope
with the variant pattern of resource usage and to be able to scale ahead of time continuously.
Time series research covers a wide range of approaches, which adopt a proactive approach
among the auto-scaling categories mentioned above.

Now, we discuss each of the auto-scaling technique and their limitations when applied to
the scaling task.

Static, Threshold-Based Policies: Application specifications can change over time,
and users may also host various applications (which require different resources) on the
VM. Fixed VM capacity in these cases can result in resource waste or degradation of
application performance. This issue can be resolved by interactively scaling the VM as
per the requirement of the host application. The resource usage of the VM is controlled
in threshold-based auto-scaling. In the case of vertical scaling, if they surpass the
clearly defined threshold values, the computing resources of the VM will be increased
or dynamically decreased as needed without shutting down the VMs, thereby reducing
the waste of resources. A new VM is added instead of upgrading resources if horizontal
scaling is being followed. Threshold-based policies or rules are very common and popular
among cloud service providers like Amazon EC2. These policies’ simplicity and straight-
forward nature make them quite desirable to users. Though setting thresholds is a task per
application and requires a thorough understanding of patterns in the workload. In the target
application, the number of VMs will depend on a set of rules. Usually, there are two sets
of rules: one to scale-up and one to scale-down. These rules may use one or even more
performance indicators, such as CPU load, request rate, or average response time. Each
rule has several user-defined variables: a lower threshold; an upper threshold; two-time
values that specify how long the condition must always be fulfilled to cause a scaling action.
The upper and lower thresholds variable for each performance metric must be defined.
The scaling behavior differs depending on the kind of scaling. For horizontal scaling, the
user should specify a fixed amount of VMs to be allocated or de-allocated, but for vertical
scaling, the number of resources that will be added (RAM, CPU, etc.) need to be defined
by the user. The resulting rules will have a structure similar to these [LML12]:

if G > D?)ℎA for vUp seconds then
= = = + B and
do nothing for inUp seconds

(3.1)

if G < 3>F=)ℎA for vDown seconds then
= = = − B and
do nothing for inDown seconds

(3.2)

49


3 Enabling multi-tenant scalable IoT platforms

Threshold-based rules are shown in the following example: “introduce 2 more instances of
virtual machines if for more than 5 minutes the average CPU load is above 70%”. The
user can identify the highest possible ="0G and the lowest possible ="8= amount of
VMs based on horizontal scaling, both to control the total cost and to ensure a minimum
threshold of availability. If the performance measurement G breaches the parameter D?)ℎA
for E*? seconds at each iteration, B number of VMs are proposed and introduced to the
application server pool. Then, for 8=*? seconds, the auto-scaling system restricts itself
from performing any other scaling task. If the performance measurement G becomes less
than 3>F=)ℎA for a user-defined time of E�>F= seconds, it will remove B number of VMs
and release their resources. And again, the system prohibits itself for 8=�>F= seconds.

Threshold-based rules could easily handle the number of resources allocated to an
application deployed on a cloud platform and carry out self-scaling actions to evolve
the resources required for the input demand. Setting the rules, however, requires additional
effort from its users, who are required to choose the appropriate performance parameter
or logical possible factors combination, as well as setting multiple parameters. Among
these variables, the upper and lower output vector thresholds (e.g., 30% and 70% percent
of the CPU load) are the key to the proper functioning of the rules. Rules conditions
are typically based on one or at the very most two performance indicators, and the most
common has always been the VM’s average CPU workload, input request frequency, or
response time. In particular, Dutreilh et al. [BGS+09] point out that thresholds ought to be
carefully adjusted to avoid system fluctuations in terms of several VMs or the total number
of CPUs allocated. It is, therefore, appropriate to define a cool down or quiet period to
avoid this problem, a time in which no scaling decisions need to be made. Many of the
researchers and cloud service providers use only two threshold variables per performance
metric, but Hasan et al. [HMC+12] take into account the use of four threshold values for
each performance metric and two durations. )ℎA�*, slightly below the upper threshold;
ThrUP, the upper threshold; ThrOL, slightly above the lower threshold; ThrL, the lower
threshold; and a couple of durations applied to control the persistence of the performance
metric.

Algorithm 3.1 depicts the web service application auto-scaling algorithm. First, the
algorithm calculates the current VMs with bandwidth utilization and currently active
sessions that are lower or higher than the defined threshold numbers. If all VMs have
bandwidth utilization and effective sessions above the defined upper limit, a new VM will
be installed, started, and connected to the front-end load-balancer, accordingly. If there are
VMs that have active sessions and network bandwidth below a specified lower threshold
and at least one VM has no web traffic or effective sessions, the idle VM will be removed
from the front-end load-balancer and is immediately terminated from the system. Physical
resources of computing, such as CPU and memory, are extremely important metrics for the
evolution of a virtual cluster’s workload for distributed computing tasks.

Algorithm 3.2 shows the auto-scaling algorithm for distributed computing tasks. First, the
algorithm resolves the current VMs that are using physical resources (CPU and memory)
above or below the given threshold. If all VMs are using resources above the specified

50


3.2 Middleware Layer

Algorithm 3.1 Threshold Based Auto-Scaling Algorithm For Web Applications
Input:
=: Number of Clusters
+�: A Virtual Cluster consists of VMs that run the same computational system
+"=B: Number of the active sessions in a virtual machine
(8"0G: Maximum sessions for a virtual machine of i-th Cluster
(D??4A_1>D=3: Session upper-threshold
(;>F_1>D=3: Session low-threshold
�14;>F: A virtual machine set records virtual machines that have active sessions below
the session low-threshold

Output:
Front Load-Balancer Set FLB

for 8 = 1 to = do
for all VM ∈ +�8 do

if VM=B/(8"0G ≥ (D??4A_1>D=3 then
4 = 4 + 1

end if
if VM=B/(8"0G ≤ (;>F_1>D=3 then

1 = 1 + 1
Record VM to �14;>F

end if
end for
if (4 == |+�8 |) then

Provision and start a new VM that runs the same system as +�8
Add new VM to FLB // Front Load-Balancer Set

end if
if 1 ≥ 2 then

for all VM ∈ �14;>F do
if VM=B == 0 then

Remove VM from FLB // Front Load-Balancer Set
Destroy VM

end if
end for

end if
Empty �14;>F

end for

51


3 Enabling multi-tenant scalable IoT platforms

Algorithm 3.2 Threshold Based Auto-Scaling Algorithm For Distributed Computing Tasks
Input:
=: Number of Clusters
+�: a Virtual Cluster consists of VMs that run the same computational system
+"': The use of resources in a virtual machine
'D??4A_1>D=3: The upper-threshold of use of physical resources
';>F_1>D=3: The low-threshold of use of physical resources
+14;>F: A virtual machine set records virtual machines that have the usage of resources
lower than low-threshold

for 8 = 1 to = do
for all VM ∈ +�8 do

if VM' ≥ 'D??4A_1>D=3 then
4 = 4 + 1

end if
if VM' ≤ ';>F_1>D=3 then

1 = 1 + 1
Record VM to +14;>F

end if
end for
if (4 == |+�8 |) then

Provision and start a new VM that runs the same computing tasks as +�8
end if
if 1 ≥ 2 then

for all VM ∈ +14;>F do
if VM is idle then

Destroy VM
end if

end for
end if
Empty +14;>F

end for

upper threshold, a new VM will be created, provisioned, started, and then conduct the
very same computing tasks in the virtual cluster. If some VM’s resource usage is below a
specified lower threshold and at least one VM has no computing task, the inactive VM
from the virtual cluster will be removed.

Many researchers introduce and implement the auto-scaling strategy, which integrates rules
for every performance metric. Initially, Chieu et al. [CMKS09] introduced a set of adaptive
rules depending on the number of currently active sessions. If there are active sessions
beyond the specified upper threshold in all instances of the application, a new instance is
created. If there are running instances with presently active sessions under a given lower

52


3.2 Middleware Layer

Static Threshold-based Rules
Reference Horizontal/Vertical Scaling Metric
[DMM+10] Horizontal Scaling Response Time
[HMC+12] Both CPU load, response time
[HGGG12] Both Response Time
[MBS11] Vertical Scaling CPU, Memory, Storage

Table 3.6: Summary of Static Threshold-Based Rules [LML12]

threshold and at least one instance without an active session, it will shut down the idle
instance. Table 3.6 shows a variety of performance metrics and scaling decisions adopted
by few authors using static threshold-based rules as auto-scaling techniques.

In conclusion, rules could be used to efficiently automate a specific application’s auto-
scaling without that much effort, especially in systems with predictable patterns that are
quite frequent. In the case of bursty workloads, however, the user should acquire a more
technically advanced auto-scaling method from the remainder of the categories.

Reinforcement Learning: Reinforcement Learning (RL) refers to a collection of general
trial-and-error strategies by which an agent can learn how to make sound decisions
via a series of interactions in an environment. The basic behavior is to analyze the
current state of the environment, choose an appropriate action, and then receive an instant
reward. [TJDB06]. It is a form of an automated approach to decision-making that can
also be used for auto-scaling. It collects a target application’s performance pattern and its
strategy without any prior information. It is a mathematical approach for understanding
and streamlining purpose-driven learning and decision-making. It relies on learning via an
agent’s direct interaction with its environment. The decision-making component is known
as the agent, which will benefit from the trial-and-error process. Agents always improve
their judgment capability from the previous actions which were conducted for each state of
the environment. They always attempt to optimize the returned reward. In our problem, the
auto-scalar is the agent that communicates with the scalable application environment all the
time. Depending on different performance indicators that include current input workload,
output, or other collection of variables, auto-scalar will decide whether to add or remove
resources. It always attempts to minimize the response time as a reward. Figure 3.15
visualizes the interaction between environment and its agent.

Formally, at every time step C, where C is a series of discrete-time steps, the agent receives
a certain representation of the state of the environment and chooses an action on that
basis. One time step later, the agent receives a response or reward (e.g., performance
improvement) as an outcome of its action and puts itself in a new state. The agent enforces
a mapping of states to select the probabilities of each possible action at each step of the time.
RL systems can also be termed memory-less, in the context that potential forthcoming
system states can only be decided with the existing state, irrespective of the previous record.
This is known as the Markov principle, which claims that the likelihood of a transition

53


3 Enabling multi-tenant scalable IoT platforms

 
Agent 

Environment 

Action State Reward 

Figure 3.15: The agent-environment interaction in reinforcement learning [LML12]

to a new system state relies only on the present state and the action of the agent. It is
provisionally free from all preceding actions and states. The Markov Decision Process
offers a computational framework for modeling decision-making problems and is usually
used to formulate scenarios for reinforcement learning. RL’s key problem is finding an
agent or decision-maker policy that maps each state with the best action to maximize future
cumulative rewards. In the case of horizontal scaling, we might consider a state specified
as (A, E, ?), where A is the number of client requests experienced per time frame, E is the
amount of VMs assigned to the application, and ? is the average response time of requests,
bounded by the ?"0G value chosen from experimental observations.

Q-learning is one of the model-free RL algorithms, which can be used to produce optimal
policies. The update rule for Q-learning is specified in Algorithm 3.3, and it is estimated
each time a non-terminal state is attained. Approximations of&(B, 0), which are suggestive
as to the benefit of taking action 0 while in state B, are determined after each time interval.
Actions are taken based on the policy being pursued. Q-learning can usually require
significant experience within a given environment to learn a good policy. To find the
optimal policy, the RL algorithm should estimate a utility value for each state B. The utility
is the amount of the anticipated discounted rewards. Given the current state, once the agent
has learned all the estimated values, it will choose the behavior that leads to the next state
with the highest utility value. The policy is based on a value function &(B, 0), usually
called the Q-value function. Every &(B, 0) value estimates the future cumulative rewards
by executing an action 0 while in state B.

54


3.2 Middleware Layer

Algorithm 3.3 Reinforcement Learning Algorithm (Q-Learning)
Initialize &(B, 0) arbitrarily.
Observe the current state, B.
loop {infinitely}

Choose an action, 0, for state B based on one of the action selection policies, such as
n − 6A443H.

Execute the action, 0, and observe the reward, A , as well as the new state, B′.
Update the Q-value for the state using the observed reward and the maximum

reward possible for the next state. The resulting update formula is: &(B, 0) ←
&(B, 0) + U[A + W ·max0′ &(B′, 0′) −&(B, 0)]

Set the state B to the new state B′. B← B′

end loop

Reinforcement Learning
Reference Horizontal/Vertical Scaling Metric

[BHD13] Horizontal Scaling Number of user requests,
and number of VMs

[RBX+09] Vertical Scaling CPU and memory usage

[DKM+11] Horizontal Scaling Number of user requests,
VMs, and response time

[TJDB06] Horizontal Scaling Arrival rate, Response time

Table 3.7: Summary of Reinforcement Learning [LML12]

Nearly all RL algorithms are dependent on estimating the value function of states (or
state-action pairs) from experience. There are three primary methods for RL: dynamic
programming, temporal difference learning, and the methods of Monte Carlo [LML12].
Dynamic programming needs to know both probabilities of transition and rewards, whereas
the other two are model-free. Temporal-difference approaches are ideal for step-by-step
learning and are best suited for the scenario of auto-scaling. Q-learning is a one-step TD
algorithm based on learning the & action-value function, which approximates the optimal
& directly, regardless of the policy being pursued. Therefore, it is called an off-policy
method: the optimum policy can be obtained at any time, taking into account the behavior
with the highest Q-function value for each state. Table 3.7 shows the scaling type and
performance metrics used by different authors using the RL auto-scaling technique.

It is important to note the capability of the RL algorithms to identify, without a priori
information, the best management strategy for a target scenario. The user does not need to
specify a specific application model. Instead, if the task workload or system conditions
change, it is learned remotely and adapted. RL techniques might be a promising way
of solving the specific application auto-scaling problem, but the field is not advanced

55


3 Enabling multi-tenant scalable IoT platforms

 
 S 

S 

 S 

S 

Figure 3.16: A simple queuing model with one server and multiple servers [LML14]

enough to fulfill the requirements of an actual production scenario. In this dynamic field of
research, steps should be taken to ensure a proper ability to adapt to occasional bursts in
the input workload and to tackle current state spaces and actions [LML14].

Queuing Theory: Queuing theory has been widely used to model Internet applications
and typical servers to measure performance metrics, such as the queue length or the average
waiting time for requests. The queuing theory concept refers to the mathematical analysis
of queues. A model’s basic structure is shown in Figure 3.16. Consumer requests arrive at
an average arrival frequency of _ to the network and are en-queued until processed. As the
figure shows, there may be one or even more servers in the system that will serve requests
at an average service rate of `.

Kendall’s notation [Ken] is the generic method for the description and identification of
queuing models. A queue is represented by �/�/�/ /#/�. The meaning of each element
in the notation is as follows:

A: Inter-arrival time distribution.
B: Service time distribution.
C: Number of servers.
K: System capacity or queue length. It applies to the total number of clients in the network,
even those in service. More arrivals are discarded when the system is filled.
N: Calling population. The size of the community that the clients come from. If the
requests come from an unlimited customer population, the queuing model will be open
while a closed model will be centered on a finite customer population.

56


3.2 Middleware Layer

Queuing Theory
Reference Horizontal/Vertical Scaling Metric
[USC+08] Horizontal Scaling PEAK workload
[VPR07] Horizontal Scaling Arrival rate
[ZCS07] - Number and type of requests

[TJDB06] Horizontal Scaling Arrival rate, Response time,
Number of servers

Table 3.8: Summary of Queuing Theory [LML12]

D: Service discipline or priority order. The discipline of operation or priority order in
which jobs are done in the queue. The most common is First-In-First-Out / First-Come-
First-Served, serving the requests in the order they arrived. Alternatives are available,
including Last-in-First-Out / Last-Come-First-Served and Processor Sharing.

Elements  , # and � are optional. The most common values are ", �, and � for both
inter-arrival times � and service time �. " is Markovian and corresponds to a Poisson
process defined by a _ parameter indicating the number of requests per unit of time.
Therefore, an exponential distribution may follow the inter-arrival or operation time. �
means times that are deterministic or constant. Another widely used value is �, which
corresponds to a known parameter general distribution. A basic queuing model can be
used to formulate the elastic application scenario, considering a single queue comprising
the load balancer that delivers the requests among = VMs, as shown in Figure 3.16. A
queuing network can be used to describe more complicated systems, such as multi-tier
applications. For example, with one or = servers, each level can be modeled as a queue.

The theory of queuing is used to evaluate stationary systems characterized by constant rates
of arrival and operation. The goal is to extract some performance metrics dependent on the
queuing model and some established parameters (e.g., arrival rate _). The average waiting
time in the queue and the mean response time are examples of performance metrics. For
situations with changing conditions (i.e., non-constant arrival and service rates), such as
our target, scalable applications, the queuing model parameters have to be recalculated
regularly, and the parameters have to be readjusted. To obtain these metrics, there are two
primary approaches: analytical methods and simulation. Analytical methods are available
only for comparatively simple queuing models, with a well-defined arrival and service
distributions. A typical formula in this context is the Little’s Law. It states that the average
number of users or requests � [#] in the system is equal to the average customer arrival
rate _ multiplied by the average duration of each customer � [)] [LML14]. The formula is
as follows: � [#] = _ × � [)]. A similar description for the Little’s Law states th