Learning-Based Control and Localization

of Magnetic Soft Millirobots

Von dem Stuttgarter Zentrum für Simulationswissenschaften (SC

SimTech) der Universität Stuttgart zur Erlangung der Würde eines

Doktor-Ingenieurs (Dr.-Ing.) genehmigte Abhandlung

Vorgelegt von

Sinan Özgün Demir

aus Istanbul, Türkei

Hauptberichter: Prof. Dr. Metin Sitti

Mitberichter: Prof. Dr. Syn Schmitt

Mitberichter: Prof. Dr. Georg Martius

Tag der mündlichen Prüfung: 23.04.2024

Max-Planck-Institut für Intelligente Systeme

2024


Learning-Based Control and Localization

of Magnetic Soft Millirobots

A thesis accepted at the Stuttgart Center for Simulation Science in

fulfillment of the requirements for the degree

Doctor Engineer (Dr.-Ing.)

Presented by

Sinan Özgün Demir

from Istanbul, Türkiye

Main Reporter: Prof. Dr. Metin Sitti

Co-reporter: Prof. Dr. Syn Schmitt

Co-reporter: Prof. Dr. Georg Martius

Date of the oral exam: 23.04.2024

Max-Planck-Institute for Intelligent Systems

2024


Aileme


v

Abstract
Soft millirobots have promising biomedical applications due to

mechanical compliance, absorbing excess forces without additional

computational effort, and multifunctionality. Especially with wireless

multimodal locomotion capabilities, magnetic soft millirobots (i.e.,

≤1 cm) have emerged as potential minimally invasive medical robotic

platforms as they can access confined and hard-to-reach spaces in the

human body (e.g., distal vascular regions), and carry out medical

applications, such as on-demand drug delivery, sensing, and

embolization, in a target location. For such potential biomedical

applications, the adaptivity of the robot control is essential to ensure the

continuity of the operations, as task environment conditions show

dynamic variations that can alter the robot's performance. However,

fabrication-, material-, physical-interaction-dependent variations, and

complex kinematics with virtually infinite degrees of freedom arising

from the nature of their soft material structure limit the applicability of

the conventional modeling and control methods.

The main objective of this dissertation is to establish a data-efficient

adaptive multimodal locomotion framework for the targeted application

scenarios of magnetic soft millirobots. To this end, a probabilistic

learning approach leveraging Bayesian optimization (BO) and Gaussian

processes (GPs) is introduced to address the controller adaptation

challenge. First, the efficacy of the BO to fabrication variabilities is

shown on three different robots fabricated following the same steps. Next,

through augmented tests on benchmark datasets, it is shown that


vi

transferring the posterior mean learned by one robot as the prior mean to

the other robots and test cases improves the learning performance of BO

by achieving quicker gait adaptation. Afterward, the controller adaptation

method employing the proposed transfer learning approach is

demonstrated in various task spaces with varying surface adhesion,

surface roughness, and medium viscosity properties. To further improve

the adaptation performance by including multimodal locomotion, the

sim-to-real transfer learning method is developed in the third study.

In this regard, a data-driven simulation environment is designed, and its

accuracy is demonstrated by comparing the simulated results to the

physical experiments. Leveraging the simulated experience and BO based

transfer learning, it is demonstrated that sim-to-real transfer learning

provides efficient locomotion learning. Furthermore, the adequacy of the

automated locomotion adaptation through the Kullback-Leibler

divergence-based domain recognition approach is shown to changing

environmental conditions.

As the secondary objective, a new localization method using electrical

impedance tomography is introduced. The applicability of the proposed

approach is demonstrated for stationary and moving cases in

environments with and without any obstacles.

With these contributions, this thesis proposes a domain-adaptive

locomotion learning framework enabling the soft millirobot locomotion

to quickly and continuously adapt to environmental changes while

exploring actuation space.


vii

Zusammenfassung
Weiche Milliroboter haben vielversprechende biomedizinische

Anwendungen, da sie mechanisch nachgiebig sind, überschüssige Kräfte

ohne zusätzlichen Rechenaufwand absorbieren und multifunktional sind.

Insbesondere mit ihren kabellosen multimodalen Fortbewegungs-

fähigkeiten haben sich magnetische weiche Milliroboter (d. h. ≤1 cm) als

potenzielle minimalinvasive medizinische Roboterplattformen

herauskristallisiert, da sie Zugang zu engen und schwer zugänglichen

Stellen im menschlichen Körper (z. B. distale Gefäßregionen) haben und

medizinische Anwendungen wie die bedarfsgerechte Verabreichung von

Medikamenten, Sensorik und Embolisierung an einem Zielort

durchführen können. Für solche potenziellen biomedizinischen

Anwendungen ist die Anpassungsfähigkeit der Robotersteuerung von

entscheidender Bedeutung, um die Kontinuität des Betriebs zu

gewährleisten, da die Bedingungen der Aufgabenumgebung dynamische

Schwankungen aufweisen, die die Leistung des Roboters verändern

können. Fertigungs-, material- und wechselwirkungsabhängige

Variatnen sowie eine komplexe Kinematik mit praktisch unendlichen

Freiheitsgraden, die sich aus der Natur ihrer weichen Materialstruktur

ergeben, schränken jedoch die Anwendbarkeit herkömmlicher

Modellierungs- und Steuerungsmethoden ein.

Das Hauptziel dieser Dissertation ist es, ein dateneffizientes adaptives

multimodales Fortbewegungskonzept für die angestrebten Anwendungs-

szenarien magnetischer weicher Milliroboter zu entwickeln. Zu diesem

Zweck wird ein probabilistischer Lernansatz eingeführt, der die


viii

Bayes'sche Optimierung (BO) und Gauß'sche Prozesse (GPs) nutzt, um

die Herausforderung der Regleranpassung anzugehen. Zunächst wird an

drei verschiedenen Robotern, die in den gleichen Schritten hergestellt

wurden, die Wirksamkeit der BO bei Fertigungsschwankungen gezeigt.

Anschließend wird gezeigt, dass die Übertragung des von einem Roboter

gelernten posterioren Mittelwerts als priorer Mittelwert auf die anderen

Roboter und Testfälle die Lernleistung von BO verbessert, indem eine

schnellere Ganganpassung durch die Durchführung erweiterter Tests auf

Benchmark-Datensätzen erreicht wird. Anschließend wird die Methode

der Regleranpassung, die den vorgeschlagenen Transfer-Learning-

Ansatz verwendet, in verschiedenen Aufgabenräumen mit

unterschiedlichen Oberflächenhaftungen, Oberflächenrauhigkeiten und

mittleren Viskositätseigenschaften demonstriert. Um die Anpassungs-

leistung durch Einbeziehung multimodaler Fortbewegung weiter zu

verbessern, wird in der dritten Studie die Sim-to-Real-Transferlern-

methode entwickelt. In diesem Zusammenhang wird eine datengesteuerte

Simulationsumgebung entworfen, deren Genauigkeit durch den

Vergleich der simulierten Ergebnisse mit den physikalischen

Experimenten nachgewiesen wird. Durch die Nutzung der simulierten

Erfahrungen und des auf der BO basierenden Transfer-Lernens wird

gezeigt, dass das Sim-zu-Real-Transfer-Lernen ein effizientes Lernen der

Fortbewegung ermöglicht. Darüber hinaus wird die Angemessenheit der

automatischen Bewegungsanpassung durch den auf der Kullback-

Leibler-Divergenz basierenden Domänenerkennungsansatz für sich

ändernde Umweltbedingungen gezeigt.


ix

Als sekundäres Ziel wird eine neue Lokalisierungsmethode unter

Verwendung der elektrischen Impedanztomographie eingeführt. Die

Anwendbarkeit des vorgeschlagenen Ansatzes wird für stationäre und

bewegte Fälle in Umgebungen mit und ohne Hindernisse demonstriert.

Mit diesen Beiträgen schlägt diese Arbeit ein domänenadaptives

Lernsystem für die Fortbewegung vor, das die sanfte Fortbewegung von

Millirobotern in die Lage versetzt, sich schnell und kontinuierlich an

Umweltveränderungen anzupassen, während sie den Aktionsraum

erkunden.


xi

Acknowledgments
Since the beginning of my doctoral research in 2018, I have had the

chance to meet and work with great people whose presence and

collaboration have influenced my personal and professional life. I would

like to thank them for their personal, philosophical, and scientific support

in completing this dissertation.

First, I would like to thank Prof. Dr. Metin Sitti for giving me the

opportunity to work with him in the Physical Intelligence Department at

the Max Planck Institute for Intelligent Systems, Stuttgart. I thank him

for all the supervision, support, guidance, and freedom he has provided

to pursue my research and curiosity.

I would like to express my deepest gratitude to the Max Planck Society

and Republic of Türkiye Ministry of National Education for supporting

me and my research throughout my Ph.D.

I would also like to thank my co-examiners, Prof. Dr. Syn Schmitt and

Prof. Dr. Georg Martius, for their contributions to my dissertation with

their invaluable feedback.

I want to express my special thanks to Prof. Dr. Sebastian Trimpe for his

scientific support, stimulating discussions, and influence on my research.

I would like to express my gratitude to Dr. Utku Culha. His mentorship,

continuous support, and scientific and philosophical insights have been a

constant source of motivation and enlightenment throughout this journey.


xii

I also want to thank my colleagues and friends who were with me through

my Ph.D. I am especially grateful to Patricia Martinez and Janina

Schwartz for always being there and helping with professional and

personal problems. I thank the undergraduate students I worked with,

Melike Yıldırım, Ceyda Akyol, Burak Aydın, Aras Güngöre, and Mete

Erdoğan, for contributing to my research. I thank the great friends I made

during my Ph.D., Jack Saud, Alp. C. Karacakol, Cem B. Dayan, Jie Han,

Ugur Bozuyuk, M. Birgul Akolpoglu, Zhiqiang Zheng, Gaurav Gardi, Y.

Eren Suyolcu, Amir Aghakhani, Abdon Pena-Francesch, Isabella

Bianchini, Ferro F. Ceylan, 'Le Petit Coq bar' family, and many others for

all the good memories. I also thank Eylül Suadiye for his heartful

friendship, all-time support, and coffee breaks. I especially thank M. Efe

Tiryaki for his heartful friendship, all-time support, and endless questions

about life, research, and technology.

I would like to extend my grateful appreciation to my parents, Hülya and

Sadık Demir, and my brother, U. Eren Demir, for their endless love and

support.

Last but most importantly, I would like to state my heartfelt thankfulness

to my wife, Saadet F. Baltacı, for her all-time support, endless patience,

continuous effort to motivate me, and for sharing all my happiness and

grief since we met. Without her, none of these would be possible.


Science is the only true guide in life.

-Mustafa Kemal Atatürk-


xv

Table of Contents
Abstract .............................................................................................. v

Zusammenfassung ............................................................................. vii

Acknowledgments .............................................................................. xi

List of Figures ................................................................................. xvii

List of Tables ................................................................................... xix

1. Introduction ................................................................................... 1

1.1. Magnetic Soft Millirobots .................................................... 1

1.2. Thesis Objectives ................................................................ 5

1.3. Outline ................................................................................ 6

1.4. List of Publications.............................................................. 9

2. Methods ....................................................................................... 12

2.1. Magnetic Soft Millirobot Design ....................................... 12

2.2. Magnetic Actuation and Feedback System ......................... 15

2.3. Gaussian Processes ............................................................ 19

2.4. Bayesian Optimization ...................................................... 24

2.5. Magnetic Soft Millirobot Simulation.................................. 27

2.6. Robot Tracking by Electrical Impedance Tomography ....... 30

3. Conclusion and Outlook ............................................................... 33

3.1. Conclusion ........................................................................ 33


xvi

3.2. Future Work...................................................................... 35

Bibliography ..................................................................................... 37

Publication 1: Learning of Sub-optimal Gait Controllers for Magnetic

Walking Soft Millirobots............................................. 47

Publication 2: Task Space Adaptation via The Learning of Magnetic

Soft Millirobots........................................................... 81

Publication 3: Learning Soft Millirobot Multimodal Locomotion with

Sim-to-Real Transfer..................................................147

Publication 4: A Localization Method for Untethered Small-Scale

Robots Using Electrical Impedance Tomography........199


xvii

List of Figures
Figure 1 Fabrication of the magnetic soft millirobot ...................... 12

Figure 2 Locomotion modes of the magnetic soft millirobot .......... 13

Figure 3 Manually designed magnetic actuation signals ................ 14

Figure 4 Walking performance of the magnetic soft millirobot ...... 15

Figure 5 Electromagnetic coil setup .............................................. 16

Figure 6 Electromagnetic (Helmholtz) coil setup ........................... 18

Figure 7 Sample learning run using GP-BO .................................. 21

Figure 8 Effect of the hyperparameters on GP model .................... 23

Figure 9 GP model representation of a function 𝑓(𝜃) .................... 24

Figure 10 Data-driven magnetic soft millirobot simulation .............. 28

Figure 11 Simulated locomotion modes of the magnetic soft

millirobot ....................................................................... 29

Figure 12 Forward and inverse problem definitions for EIT ............ 31

Figure 13 Experimental setup used for the EIT experiments ............ 32


xix

List of Tables
Algorithm 1 Pseudo code for GP-BO learning run .............................. 26


1

1. Introduction

1.1. Magnetic Soft Millirobots
Soft robots made of highly flexible materials can withstand and undergo

both elastic and plastic deformations1. This characteristic enables them to

adapt to their surroundings, absorb the excess forces without additional

computation effort, and achieve various functions while maintaining a

reduced system complexity compared to their rigid counterparts2–4.

Therefore, they have been utilized in various fields, such as object

manipulation, wearables, and biomedical devices5–9. Meanwhile,

small-scale (i.e., ≤1 cm) soft robots with wireless locomotion capability

have become a promising approach for medical applications as they can

access previously hard-to-reach environments non-invasively10–12,

including distal vascular regions and carry out medical applications, such

as on-demand drug delivery, sensing, and embolization13–16, in a target

location.

Among the existing external actuation methods, such as heat17, light17,18,

electric19, and magnetic fields17,20, magnetic actuation stands out due to

its high precision, dexterity, speed, penetration depth, and biological

safety features21. Moreover, magnetic actuation can be used with different

imaging modalities, such as camera22, electrical impedance

tomography23, x-ray13, and ultrasound24 without disturbing their

functionality. Furthermore, researchers demonstrated that magnetic soft

millirobots can perform various locomotion modes, such as walking,

rolling, crawling, jumping, tumbling, swimming, and climbing14,15,24.

While multiple robots with different magnetic profiles can generate these


2

gaits separately14, a single robot design with a preprogrammed magnetic

profile can also achieve multimodal locomotion under different periodic

actuation signals15–17,25, which extends the adaptation capabilities and

targeted application areas of the magnetic soft millirobots.

Despite their exciting capabilities and potential medical use, soft robots

face challenges arising from the nature of their soft materials, i.e.,

fabrication-, material-, and physical-interaction-dependent variations26.

Moreover, the inherent high degrees of freedom, complex dynamics,

contact mechanics, and dynamically changing task environments restrict

the application of conventional modeling and control methods widely

used in rigid robotic systems12,27.

To address these challenges, one proposed solution involves simplifying

the governing models for soft millirobots, such as dividing the robot into

sub-parts and using constant curvature (CC) approximation with well-

established beam theories to model the kinematics and dynamics of each

sub-part28,29. However, CC approximation fails when significant loads are

applied to the system, and its accuracy highly depends on the

discretization of the robot30. Although the Cosserat rod model offers

improved accuracy, it suffers from the computationally heavy model

solutions31.

Numerical approaches, on the other hand, such as finite-element methods

(FEMs), voxel-based representations, and discrete differential

geometries, improve the computation time by modeling the continuum

robot structures using a chain of rigid elements connected with tunable

spring-damper mechanisms at the expense of nonlinear dynamics


3

precision32–34. While these models and simulation tools perform well for

large-scale soft robots35, the lack of detailed and continuous body

deformation sensing and computational complexity restrict their use in

small-scale36. Alternatively, many soft millirobot applications rely on

manual tuning to control the periodic actuation signals15,16,24,37.

Besides manual methods, data-driven machine learning methods may

also be used to control soft robots in the lack of analytical or numerical

models38. One common approach is training neural network (NN)

architecture with the data collected from physical experiments39–41.

However, since they require a large training dataset, and the soft

millirobots tend to have variations in their performance over prolonged

usage due to material degradation, their use is also limited. Conversely,

Bayesian optimization (BO) enables a data-efficient optimization, i.e.,

using a small number of physical experiments42,43. Furthermore, using

BO together with Gaussian processes (GPs) improves the data efficiency

and optimization performance by incorporating information as

probabilistic priors44, which is demonstrated on different-size robots to

optimize their locomotion performance45–48, including untethered soft

millirobots49.

Although data-efficient machine learning methods can be used for

adaptation problems, their performance decreases as the search space

enlarges since they rely on experimental data. As a solution, sim-to-real

transfer learning-based control approaches have attracted significant

attention in various complex real-world robotics problems. Control

algorithms trained in high-fidelity rigid body simulation environments


4

now could beat human pilots in drone racing50,51, plan and control

complex legged locomotion52–56, and perform complex object

manipulation tasks57–60. On soft robotic systems, Hiller et al. proposed a

FEM-based simulation environment to model complex material

structures and multi-stimulus actuation61. The same simulation

environment is further utilized to learn shape and control policy pairs in

a given environment for a pneumatically actuated soft robot62. However,

the sim-to-real transferability of multiple shapes and behaviors to a single

robot remains an unsolved problem due to the gap between simulation

and real-world performance.

As locomotion learning and adaptation to environmental changes are

important challenges for magnetic soft millirobots, the proposed

solutions' efficacy relies on the feedback systems' performance.

Therefore, various technologies well established in biomedical imaging

have been explored for tracking small-scale robots, such as

magnetic-field-based techniques63, ultrasound64, optical techniques65, and

ionizing-radiation-based methods66. Despite their successful results, they

face some limitations in terms of cost, scanning speed, penetration depth,

and adverse health effects. As an alternative to these approaches,

localization via electrical impedance variations in the environment has

been demonstrated67. Electrical impedance tomography (EIT) exploits

electrical stimulations and measurements through the surface of an object

to noninvasively image the interior of a domain by estimating the internal

distribution of the electrical properties. EIT has been used in medical

imaging for various physiological phenomena, such as breathing68,

cardiac function69, and brain activity70. While it does not cause any risk


5

to the patient, unlike the prolonged use of ionizing radiations, such as

X-rays, it can image deep tissues, contrary to optical techniques, at high

speed71 and with low cost72. Moreover, since it is decoupled from the

magnetic actuation, it can be used without damaging the utilized robots'

magnetic profile, unlike magnetic resonance imaging (MRI).

1.2. Thesis Objectives
There is a growing interest in the field of soft millirobots and their

application in biomedical targeting scenarios. However, there is a

noticeable gap in addressing the control aspect of these robots, especially

given their intended use in dynamic biomedical environments. Moreover,

soft millirobots exhibit differences from one another due to fabrication

variances and face performance lost over time due to material

degradation.

The primary objective of this thesis is to establish a data-efficient

adaptive locomotion framework for targeted application scenarios. This

framework leverages BO with GP to enhance locomotion learning in a

data-efficient way. Additionally, a new transfer learning approach is

proposed for GP-BO to further improve data efficiency and optimization

performance, demonstrating efficacy for different robot and task

environments.

To expand the applicability of the proposed adaptive locomotion

approach for more complex test scenarios and larger search spaces, a

data-driven simulation environment is designed to model the soft

millirobot's behavior, and a sim-to-real transfer learning approach is


6

employed to optimize the robot's performance in the real world. This

sim-to-real transfer learning approach also enables the discovery of new

locomotion modes.

Finally, a localization method based on electrical impedance tomography

is proposed as an alternative to common biomedical imaging techniques

to track the soft millirobot's position in closed and nontransparent

environments.

These objectives collectively contribute to developing soft millirobots

and their application in dynamic and diverse scenarios.

1.3. Outline
This thesis is divided into three chapters and four appendices. This first

chapter lays down the problem description and motivation of this thesis,

including the formulation of the research goal and objectives. The second

chapter starts with defining the magnetic soft millirobot design and

magnetic actuation and feedback systems used in this thesis. Then, it

continues by explaining the utilized methods: Gaussian processes,

Bayesian optimization, data-driven magnetic soft millirobot simulation,

and electrical impedance tomography. At last, the third chapter concludes

by summarizing the contributions and presenting a future outlook for soft

millirobots.

Each appendix presents a publication that supports the thesis objectives.

The first publication 26 showed the performance differences between

reported results in 24 and three robots fabricated following the same

procedures by running repeated experiments. Then, GP-BO was


7

demonstrated for the walking gait adaptation of different robots.

Moreover, it was shown that transferring the posterior mean learned by

one robot as the prior mean to other robots improved learning

performance by achieving quicker gait adaptation.

The second publication 22 starts by testing the efficacy of the proposed

transfer learning approach in the first publication. To this end, it

compared four possible transfer learning methods with each other and the

standard GP-BO by running augmented tests on five benchmark datasets

consisting of 3750 physical experiments. The results showed that the

proposed posterior mean transfer method outperformed the rest by

achieving quicker gait adaptation and larger stride lengths for all five test

cases. Afterward, the efficacy of the adaptive learning method was

demonstrated in a wide range of task spaces with varying (1) surface

adhesion, (2) surface roughness, and (3) the medium viscosity properties.

The third study 73 addresses the challenges of domain adaptive magnetic

soft millirobot locomotion using a sim-to-real transfer learning method.

To this end, a data-driven simulation environment was designed.

Leveraging the simulated experience and GP-BO based transfer learning,

it was demonstrated that sim-to-real transfer learning provided efficient

locomotion learning. Moreover, the adequacy of the automated

locomotion adaptation through the Kullback-Leibler divergence-based

domain recognition approach was shown in changing environmental

conditions.

The last study 23 introduces using electrical impedance tomography to

track the position of the magnetic soft millirobot. After showing that the


8

EIT and the magnetic actuation system are decoupled, the localization of

the robot was demonstrated for stationary and moving cases in

environments with and without any obstacles.


9

1.4. List of Publications
This dissertation combines four first-author peer-reviewed papers listed

below. The papers are attached to the appendix of this dissertation.

Publication 1 U. Culha*, S. O. Demir*, S. Trimpe, M. Sitti,

"Learning of Sub-optimal Gait Controllers for

Magnetic Walking Soft Millirobots", Robotics:

Science and Systems 2020,

doi: 10.15607/RSS.2020.XVI.070.

(* Co-first authors)

Publication 2 S. O. Demir*, U. Culha*, A. C. Karacakol, A. Pena-

Francesch, S. Trimpe, M. Sitti, "Task Space

Adaptations via The Learning of Gait Controllers of

Magnetic Soft Millirobots", International Journal of

Robotics Research, 2021,

doi: 10.1177/02783649211021869.

(* Co-first authors)

Publication 3 S. O. Demir, M. E. Tiryaki, A. C. Karacakol, M. Sitti,

"Learning Soft Millirobot Multimodal Locomotion

with Sim-to-Real Transfer", Advanced Science,

2024,

doi: 10.1002/advs.202308881


10

Publication 4 H. Daguerre*, S. O. Demir*, U. Culha, F. Marionnet,

M. Gauthier, M. Sitti, A. Bolopion, "A Localization

Method for Untethered Small-Scale Robots Using

Electrical Impedance Tomography", IEEE/ASME

Transactions on Mechatronics, 2022,

doi: 10.1109/TMECH.2022.3142924.

(* Co-first authors)

Other publications, which I contributed and not part of this thesis, are

listed below.

- Z. Zheng*, J. Han*, Q. Shi*, S. O. Demir, W. Jiang, M. Sitti, "Single-

step precision programming of decoupled multiresponsive soft

millirobots", Proceedings of National Academy of Sciences, 2024,

doi: 10.1073/pnas.2320386121

- C. B. Dayan, D. Son, A. Aghakhani, Y. Wu, S. O. Demir, M. Sitti,

"Machine Learning-Based Shear Optimal Adhesive Microstructures

with Experimental Validation", Small, 2023,

doi: 10.1002/smll.202304437.

- U. Bozuyuk, E. Yildiz, M. Han, S. O. Demir, M. Sitti, "Size-

Dependent Locomotion Ability of Surface Microrollers on

Physiologically Relevant Microtopographical Surfaces", Small,

2023, doi: 10.1002/smll.202303396.

- Z. Zheng, J. Han, S. O. Demir, H. Wang, W. Jiang, H. Liu, M. Sitti

"Electrodeposited Superhydrophilic-Superhydrophobic Composites

for Untethered Multi-Stimuli-Responsive Soft Millirobots",

Advanced Science, 2023, 10, doi: 10.1002/advs.202302409.


11

- Z. Zheng, H. Wang, S. O. Demir, Q. Huang, T. Fukuda, M. Sitti,

"Programmable aniso-electrodeposited modular hydrogel

microrobots", Science Advances, 2022, 8,

doi: 10.1126/sciadv.ade6135.

- M. E. Tiryaki, S. O. Demir, M. Sitti, "Deep Learning-based 3D

Magnetic Microrobot Tracking using 2D MR Images", IEEE

Robotics and Automation Letters, 2022, 7, 6982-6989,

doi: 10.1109/LRA.2022.3179509

- U. Bozuyuk, E. Suadiye, A. Aghakhani, N. O. Dogan, J. Lazovic, M.

E. Tiryaki, M. Schneider, A. C. Karacakol, S. O. Demir, G. Richter,

M. Sitti, "High-Performance Magnetic FePt (L10) Surface

Microrollers Towards Medical Imaging-Guided Endovascular

Delivery Applications", Advanced Functional Materials, 2022, 8,

doi: 10.1002/adfm.202109741.

- F. N. P. Basualdo, G. Gardi, W. Wang, S. O. Demir, A. Bolopion,

M. Gauthier, P. Lambert, M. Sitti, "Control and Transport of Passive

Particles Using Self-Organized Spinning Micro-Disks", IEEE

Robotics and Automation Letters, 2022, 7, 2156-2161,

doi: 10.1109/LRA.2022.3143306.

- G. Dogan, S. O. Demir, R. Gutzler, H. Gruhn, C. B. Dayan, U. T.

Sanli, C. Silber, U. Culha, M. Sitti, G. Schütz, C. Grevent, K.

Keskinbora, "Bayesian Machine Learning for Efficient Minimization

of Defecs in ALD Passivation Layers", ACS Applied Materials &

Interfaces, 2021, 13, 54503-54515,

doi: 10.1021/acsami.1c14586


12

2. Methods

2.1. Magnetic Soft Millirobot Design
In this study, the performance of the proposed learning approaches was

evaluated using a magnetic soft millirobot design previously introduced

by Hu et al.24. The chosen design (Figure 1) was selected for its ability to

exhibit multimodal locomotion, such as rolling, walking, and crawling

(Figure 2), by controlling the applied magnetic field acting on the robot

(Figure 3). Moreover, this design has been employed in various prior

studies, demonstrating its adaptability to diverse environmental

conditions and tasks through manual control15,16,74.

The robot was fabricated by mixing Ecoflex 00-10 (Smoot-On Inc.) and

neodymium-iron-boron (NdFeB) magnetic particles with around 5𝜇𝑚

diameter (MQP-15-7, Magnequench) with a 1:1 body mass ratio. After

curing the pre-polymer mixture on a methyl methacrylate plate, the robots

were cut using a high-resolution laser cutter (LPKF Protolaser U4) with

dimensions of length 𝐿 = 3.7𝑚𝑚, width 𝑤 = 1.5𝑚𝑚, and height ℎ =

185𝜇𝑚.To magnetize the robots, they were folded around a cylindrical

Figure 1 Fabrication of the magnetic soft millirobot. (a) The sheet-shaped soft
millirobot was cut from the cured polymer mixture and magnetized inside a
homogeneous magnetic field with a magnitude of |B|=1.8T (red arrow) after
folding around a cylindrical rod. (b) The unfolded robot maintained a periodic
magnetization profile (blue arrows) along its body. (c) Image of the fabricated
and magnetized soft millirobot.


13

Figure 2 Locomotion modes of the magnetic soft millirobot. Physical
performance of the magnetic soft millirobot during (a) walking, (b) rolling, and
(c) crawling locomotion modes generated by manually designed actuation
signals24.

rod with a circumference equal to 𝐿 and placed inside a uniform magnetic

field with a magnitude of 1.8𝑇. The magnetic field was oriented at 45°

counterclockwise from the y-axis. After separation from the rod, the

magnetic particles retained their magnetization orientation, forming a

circular profile along the robot body (Figure 1b). To actuate the robot, the

homogeneous magnetic field in the environment and the resultant

magnetic torque acting on the robot were continuously modulated (Figure

3).


14

Figure 3 Manually designed magnetic actuation signals to generate
predefined locomotion modes, (a) walking, (b) rolling, and (c) crawling24. The
top and bottom rows show the direction and magnitude of the applied magnetic
field in degrees and mT, respectively. The yellow arrows represent the magnetic
profile of the robots.

To validate the repeatability of the reported results and the accuracy of

the proposed mathematical model, three robots were fabricated and tested

using the parameter sets reported by Hu et al.24. The results presented in

Figure 4 revealed the following,

- Locomotion performance showed clear inconsistency due to the

variations in the fabrication and environmental factors, even though

the same materials, methods, controller parameters, and walking

surfaces were used.

- Using a model-based controller was not possible due to

non-monotonic behavior with increasing f, unlike the model

prediction.

These results align with the objectives of this dissertation, which aim to

develop a data-efficient controller learning system capable of robustly

handling variations arising from the material properties, fabrication


15

Figure 4 Predicted and observed walking performance of the magnetic soft
millirobot in terms of stride length S. Previously reported model and
experimental results (robotR) were compared to the three replica robots (robot1,2,3)
tested with the same controller parameters24. Each data point for robot1,2,3

represents the mean of 10 experiments and the error bars show the standard
deviation.

processes, and the task environment of small-scale, medical-oriented,

untethered soft robots.

2.2. Magnetic Actuation and Feedback System
Two distinct electromagnetic coil setups were designed for the studies

reported in this dissertation (Figure 5). The first coil setup consisted of

three orthogonal pairs of custom-made electromagnets (Figure 5a). It

could generate a 3D uniform magnetic field within a 4×4×4cm3

workspace with a maximum field strength of 15mT. The magnetic field

was indirectly controlled by regulating the voltage values through six

motor driver units (SyRen25) and an Arduino microcontroller. The


16

Arduino was responsible for processing analog readings from the current

sensors, regulating the voltage outputs of the motor drivers, and handling

all necessary communications. The robot's behavior was tracked by two

high-speed cameras (Basler aCa2040-90uc) (Figure 5b). The first camera,

running at 120 frames per second (fps), was placed orthogonally to the

robot's motion plane (yz-plane) and was used to detect and evaluate the

robot's locomotion mode. The second camera, operating at 60 fps,

provided an isometric view of the test area and measured the robot's

displacement following the perspective correction. At the end of every

experiment, the system automatically commanded the robot to return to

its starting position to minimize any human intervention and to avoid

damage to the robot while handling.

The second coil setup was built following the Helmholtz coil design to

enlarge the workspace with a uniform magnetic field to 13.1×8.5×4.5cm3

Figure 5 Electromagnetic coil setup. (a) Magnetic actuation setup with six
electromagnetic coils could generate a 3D uniform magnetic field within a
4×4×4cm3 workspace with a maximum value of 15mT. (b) Two high-speed
cameras with isometric and front views observed and evaluated the robot's motion
in real-time.


17

 (Figure 6a). Moreover, this design provided clear side, front, and top

views while allowing new extra imaging modalities, such as C-arm and

multispectral optoacoustic tomography device (MSOT 512-element

transducer, iThera Medical), for future studies. Unlike the first setup, the

magnetic field was directly controlled by modulating the currents on the

coils through six motor driver units (Maxon ESCON 70/10). To enhance

the achievable actuation frequency, an FPGA module (NI PXIe-7847R)

was used as the interface to control the motor drivers, receive current

readings, and communicate with the master PC. The robot's motion was

tracked with two high-speed cameras (Basler aCa2040-90uc) running at

120 fps (Figure 6d). The first camera with a top view measured the robot's

displacement, while the second one identified the robot's locomotion

mode. The top view of the test area, captured without obstruction or

perspective correction, expanded the range of experimental scenarios.

Moreover, to enlarge the workspace having a homogeneous 3D magnetic

field and to test the robot's performance for longer runs without reaching

the workspace's limits, we integrated a motorized linear stage with

150 mm stroke (Thorlabs LTS150C) to the y-axis of the experimental

setup. The linear stage continuously adjusted its position based on the

displacement information received from the imaging system to keep the

robot centered in the magnetic field.

The entire software architecture, such as learning algorithms, actuation

signal generation, and image processing codes, ran on a master PC.

Communication between different elements of the experimental system

(e.g., image capture and electric current control) was executed on ROS


18

architecture, which allowed the system to be scalable for further

extensions.

Moreover, the automated experimental platforms minimized the need for

human intervention during physical experiments, reducing disturbances

introduced by human interactions with the soft millirobot and the test

environment. This approach maintained the repeatability of the physical

learning experiments while avoiding human-induced alterations to the

soft millirobot's behavior.

Figure 6 Electromagnetic (Helmholtz) coil setup. (a) Helmholtz coil setup with
six electromagnetic coils could generate a 3D uniform magnetic field within a
13.1×8.5×4.5cm3 workspace with a maximum value of 12 mT. (b) Two high-
speed cameras with top and front views observed and evaluated the robot's motion
in real-time. The linear stage attached to the xy-plane enlarged the test area with
the homogeneous magnetic field on the y-axis.


19

2.3. Gaussian Processes
The performance of the magnetic soft millirobot, even when the actuation

signal is defined by a single variable (e.g. Figure 4), cannot be modeled

or predicted accurately due to its inherent complexity. Furthermore, soft

millirobots tend to exhibit variations in performance over time, primarily

due to material degradation. These factors limit the size of the physical

dataset that can be generated for analysis. To overcome the sparsity in the

dataset, to include uncertainties coming from the physical experiments,

and to make probabilistic predictions, GP was used to map the input

parameter set 𝜃 to scalar reward values 𝑅(𝜃), i.e., the stride length of the

magnetic soft millirobot in the event of locomotion optimization.

𝑅(𝜃)~𝐺𝑃 𝜇(𝜃), 𝑘(𝜃, 𝜃′) , (1)

where 𝑅(𝜃) is the reward function mapping the input parameter 𝜃 to

scalar reward values, 𝜇(𝜃) denotes the prior mean for the input parameter

𝜃, and 𝑘(𝜃, 𝜃′) is the kernel function defining the covariance between

𝑅(𝜃) and 𝑅(𝜃′) for 𝜃, 𝜃′ ∈ Θ, where Θ defines the complete search space

of input parameters. For the cases where the performance metric can only

be measured with noise, then the observed reward value 𝑅(𝜃) is defined

as follows,

𝑅(𝜃) = 𝑅(𝜃) + 𝑛, (2)

where 𝑛 stands for the zero-mean Gaussian noise with variance 𝜎𝑛
2 for

each measurement. At each iteration of the optimization run, the GP

model is updated with 𝑅(𝜃) as shown in Figure 7. Then, the expected


20

value of 𝑅(𝜃) can be predicted with variance for any given 𝜃 using the

test data 𝐷 = {𝜃𝑖, 𝑅(𝜃𝑖)}𝑖=1
𝑁 , where 𝑁 is the size of the dataset 𝐷, as

follows:

𝜇𝑝𝑜𝑠𝑡(𝜃) = 𝜇(𝜃) + 𝑘𝑇(𝜃)𝐾−1𝑦, (3)

𝜎𝑝𝑜𝑠𝑡
2 (𝜃) = 𝑘(𝜃, 𝜃) − 𝑘𝑇(𝜃)𝐾−1𝑘(𝜃), (4)

𝑅𝑝𝑜𝑠𝑡(𝜃)|𝐷~𝒩 𝜇𝑝𝑜𝑠𝑡(𝜃), 𝜎𝑝𝑜𝑠𝑡
2 (𝜃) , (5)

where 𝑘(𝜃), 𝑦 ∈ ℝ𝑁, and 𝐾 ∈ ℝ𝑁×𝑁 denote [𝑘(𝜃)]𝑖 = 𝑘(𝜃, 𝜃𝑖),

𝑦𝑖 = 𝑅(𝜃𝑖) − 𝜇(𝜃𝑖), and 𝐾𝑖,𝑗 = 𝑘 𝜃𝑖 , 𝜃𝑗 + 𝛿𝑖,𝑗𝜎𝑛
2 with Kronecker delta

𝛿𝑖,𝑗 , respectively.

Since the kernel encodes the prior assumptions on the approximated

reward function, such as the smoothness and periodicity of the signal, its

definition and hyperparameter settings directly affect the prediction

accuracy of the GP. Based on the results presented by Von Rohr et al.49,

squared exponential was selected as the kernel function, which is defined

for multi-dimensional cases as follows:

𝑘𝑆𝐸(𝜃, 𝜃′) = 𝜎𝑓
2 𝑒𝑥𝑝 − ∑ 𝜃𝑑−𝜃𝑑

′ 2

2𝑙𝑐,𝑑
2

𝑑𝑐
𝑑=1 , (6)

where 𝑙𝑐 ∈ ℝ𝑐
𝑑 is the length scale defining the rate of change in the

approximated function for each parameter space dimension and 𝜎𝑓
2 is the

signal variance describing the uncertainty in the predictions for

unobserved parameter sets75.


21

Figure 7 Sample learning run to maximize a function 𝒇(𝜽) using Bayesian
optimization (BO) with Gaussian processes (GPs). The left column shows the
posterior GP model as new data points are observed. The right column shows the
value of the acquisition function at the end of each iteration. The next parameter
set (𝜃𝑛𝑒𝑥𝑡) chosen by BO to be used in the next iteration is shown by orange mark
on the acquisition function plots.


22

As illustrated in Figure 8, hyperparameter selection has a vital role in the

prediction accuracy of GP. One crucial hyperparameter is the length

scale. While long length scales are used to model slowly-varying

functions, short length scales are used to model quickly-varying

functions, enabling the GP to capture rapid variations within the data

(Figure 8a). Another key hyperparameter is the signal variance (𝜎𝑓
2),

which influences the exploration-exploitation balance during the

optimization run. A higher value of 𝜎𝑓
2 encourages exploration by

increasing the prediction uncertainty, which can be useful for discovering

optimal solutions. Conversely, a lower value promotes exploitation

(Figure 8b). Noise in the observation (𝜎𝑛
2) is another important

hyperparameter controlling the impact of a new data point on the

posterior mean function (Figure 8c).

In addition to the kernel selection and hyperparameter definition, the prior

mean (𝜇(𝜃)) has an important effect on the performance of GP. While the

prior mean is often assumed to be constant and set to zero  (i.e., 𝜇 = 0)

in the context of maximization problems76, its selection significantly

impacts optimizing performance by controlling the exploration-

exploitation balance. To illustrate, a constant zero mean may cause the

optimizer to get stuck on local optima for a maximization problem with

large reward values by making it believe it has already found the best-

performing result. On the other hand, a constant non-zero mean may lead

the optimizer to keep exploring new regions in the search space even after

finding the global optimum by making it assume that it evaluated poor-

performing parameter sets.


23

Figure 8 Effect of the hyperparameters on the prediction accuracy of
Gaussian processes, (a) length scale (𝑙𝑐), (b) signal variance 𝜎𝑓 , and
(c) observation noise (𝜎𝑛).

Alternative to the constant prior mean approach (Figure 9a,c), the transfer

of information from previous learning runs is proposed in this thesis. In

this approach, the prior mean was set equal to the posterior mean of a

previously trained GP model (Figure 9b,d). It enabled the transfer of the

topology of the target function across different test scenarios, as long as

variations between the robots and the environments did not significantly

alter the function shape.


24

Figure 9 Gaussian process (GP) model representation of a function 𝒇(𝜽).
(a), (c) show the GP model prediction at the beginning and after completing 5
iterations of learning run found by setting the prior mean function to constant
mean. (b), (d) show the GP model prediction at the beginning and after
completing 5 iterations of learning run found by setting the prior mean function
to previously learned posterior mean defined by the training data points shown in
green markers.

2.4. Bayesian Optimization
BO was chosen as the optimization method for the studies conducted in

the scope of this thesis because it allows the maximization of a

performance function, i.e., the stride length of the magnetic soft

millirobot in the event of locomotion optimization, using a small number

of physical experiments42,43. At each iteration of the learning run, BO

selects the next parameter set (𝜃𝑛𝑒𝑥𝑡) maximizing the acquisition function

(𝛼𝑎𝑐𝑞(𝜃)) as follows:

𝜃𝑛𝑒𝑥𝑡 = argmax
𝜃∈𝛩

𝛼𝑎𝑐𝑞(𝜃), (7)


25

The acquisition function was selected as the expected improvement (EI)

due to its better performance compared to its alternatives in similar test

scenarios49. Additionally, the exploration-exploitation balance capability

made EI a particularly effective choice. EI is defined as,

𝛼𝑎𝑐𝑞(𝜃) = 𝔼 max 0, 𝑅(𝜃) − 𝑅(𝜃∗) , (8)

where 𝑅(𝜃∗) is the highest observed reward function value77. The

analytical solution for Eq. (8) is given as

𝛼𝑎𝑐𝑞(𝜃) = 𝜇(𝜃) − 𝑅(𝜃∗) − 𝜉 Φ(𝑍) + 𝜎(𝜃)𝜙(𝑍), (9)

where Φ and 𝜙 are the Gaussian cumulative density and probability

density functions, respectively78. 𝑍 is defined as

𝑍 = 𝑍(𝜃) = (𝜇(𝜃) − 𝑅(𝜃∗) − 𝜉)/𝜎(𝜃), with 𝜇(𝜃) and 𝜎(𝜃) are

calculated by Eq.s (3) and (4). The two terms in Eq. (9) represent the

exploitation and exploration weights of the BO, respectively. Their

balance is controlled by setting the hyperparameter 𝜉. As 𝜉 gets higher,

BO tends to choose the parameter set in unobserved regions of the search

space. BO focuses more on exploitation by testing parameters close to

already explored regions as 𝜉 gets lower.

A sample run of GP-BO maximizing a function 𝑓(𝜃) is given in Figure 7

to show the posterior GP models and calculated acquisition function

values at each iteration. The pseudocode showing the detailed flow of the

learning run using GP-BO is given in Algorithm 1.


26

Algorithm 1 Pseudo code for the Bayesian optimization with Gaussian
processes learning run.

Inputs: Search space, Θ
Prior mean function, 𝜇(𝜃)

Output: Best performing parameter set, 𝜃∗

𝑟𝑒𝑠𝑢𝑙𝑡𝐴𝑟𝑟𝑎𝑦 ← Initialize an empty array to store 𝑅(𝜃) and 𝜃
𝐺𝑃 ← Initialize GP with 𝜇(𝜃)
for 𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛 < 𝑚𝑎𝑥𝐼𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛 do

𝜇𝑝𝑜𝑠𝑡 , 𝜎𝑝𝑜𝑠𝑡 = 𝐺𝑃(𝜃)
𝛼𝑎𝑐𝑞 = 𝐸𝐼 𝜇𝑝𝑜𝑠𝑡 , σ𝑝𝑜𝑠𝑡

𝜃𝑛𝑒𝑥𝑡 = argmax
𝜃∈Θ

𝛼𝑎𝑐𝑞(𝜃)

𝑅(𝜃𝑛𝑒𝑥𝑡) ← Test 𝜃𝑛𝑒𝑥𝑡 by running experiment
𝐺𝑃 ← Update 𝐺𝑃 with 𝑅(𝜃𝑛𝑒𝑥𝑡)
𝑟𝑒𝑠𝑢𝑙𝑡𝐴𝑟𝑟𝑎𝑦 ← Add [𝜃𝑛𝑒𝑥𝑡, 𝑅] to 𝑟𝑒𝑠𝑢𝑙𝑡𝐴𝑟𝑟𝑎𝑦

end
𝜃∗ = argmax

𝜃∈Θ
(𝑟𝑒𝑠𝑢𝑙𝑡𝐴𝑟𝑟𝑎𝑦)

return 𝜃∗


27

2.5. Magnetic Soft Millirobot Simulation
Even though automated physical experiments can generate the required

data to optimize the robot's locomotion performance in different

conditions, as the search space enlarges and environmental topologies

substantially vary, running physical experiments becomes impractical
26,79. Therefore, a data-driven simulation environment was designed based

on the open-source software Voxelyze, as it could efficiently simulate

heterogeneous 3D rigid and soft bodies 80.

The dynamic behavior of the magnetic and nonmagnetic rigid and soft

materials is modeled as a mass-spring-damper system (Figure 10). Multi-

body interaction is modeled by adapting the contact mechanics of

Voxelyze, defining the interaction between the robot and the floor. To

model the magnetic actuation, first, the magnetic torque acting on each

voxel due to the external homogeneous field is calculated and then added

into dynamic equations. Magnetic torque 𝜏𝑡 acting on a voxel of the robot

at time step 𝑡 is calculated as follows.

𝜏𝑡 = 𝑚𝑡 × 𝐵𝑡 , (10)

𝑚𝑡 = 𝑀𝑟𝑑𝑣
3𝑅𝑡, (11)

where 𝐵𝑡 , 𝑀𝑟 and 𝑑𝑣 denote the homogeneous magnetic field at time 𝑡,

magnetic remanence, and voxel size, respectively. 𝑅𝑡 is the rotational

matrix defining the magnetic orientation of the voxel at time step 𝑡

(Figure 10d).


28

Figure 10 Simulation parameters of the data-driven magnetic soft millirobot
simulation. Schematic representation of the magnetic soft millirobot simulation
by mass-spring-damper model between (a) colliding voxels, (b) connected voxels,
(c) voxel and the surrounding, and (d) magnetic torque acting on a voxel due to
external magnetic field 𝐵.

For all the simulations, the voxel size (𝑑𝑣) was set to 185𝜇𝑚. Density

(𝜌), Young's modulus (𝐸), and magnetic remanence (𝑀𝑟) values for the

magnetic soft millirobot were taken from 24 as 1.86𝑔/𝑐𝑚3, 8.45𝑘𝑃𝑎, and

62𝑘𝐴/𝑚, respectively. The Poisson's ratio is assumed to be 0.49.

Simulated results for walking, rolling, and crawling locomotions of the

magnetic soft millirobot actuated by the same control signals in Figure 2

are shown in Figure 11.


29

Figure 11 Simulated locomotion modes of the magnetic soft millirobot.
Simulated performance of the magnetic soft millirobot during (a) walking,
(b) rolling, and (c) crawling locomotion modes generated by manually designed
actuation signals24.


30

2.6. Robot Tracking by Electrical Impedance

Tomography
EIT is proposed as an alternative solution to the camera-based

localization of the magnetic soft millirobot, as it can be used in

nontransparent working environments.

EIT estimates the internal structure of the measured object or the

environment by processing the emitted and measured electrical signals,

i.e., voltages or currents, applied through multiple electrodes attached to

the surface. The forward problem focuses on modeling these electrical

measurements at any position on the domain's boundary using the internal

conductivity distribution 𝜎 and the stimulating current injection. It can be

solved by numerical methods, such as FEM. Conversely, the inverse

problem aims to reconstruct the conductivity distribution via the electrical

measurements and needs to be solved to detect the robot's position

(Figure 12). However, the inverse problem is particularly complex due to

nonuniqueness and the instability of the solution81. Thus, the solver

developed by Adler et al.81 employing a spatiotemporal resolution

approach is used with the open-source software EIDORS82.

To stimulate and collect the necessary electrical measurements from the

environment, a circular test area surrounded by eight evenly distributed

electrodes was designed (Figure 13b). Although the number of electrodes

could vary, it was determined by the maximum number of analog input

channels available in the data acquisition system (PXIe-7847R, National

Instruments). The electrodes were excited and measured following a full-

scan stimulation pattern involving a single current source and a sink


31

(Figure 13e). As a result, 28 different stimulation combination was

generated in the eight-electrode EIT system, and they were given to the

inverse problem solver as input data to reconstruct the conductivity map

of the environment.

Figure 12 Forward and inverse problem definitions for EIT. An electrical
stimulation was applied through the boundary of the measured domain, and the
resulting voltages were measured. The forward problem provides the
measurements from the knowledge of the medium properties, while the inverse
problem estimates the medium properties from the measurements.


32

Figure 13 Experimental setup and the electrical stimulation pattern used for
the EIT experiments. (a) Image of the magnetic actuation system
(Figure 5a) with the test area surrounded by the eight electrodes of the EIT system
and a high-speed camera to generate the ground truth data. (b) Top-view image
and drawing of the test area showing the electrode positions and a sample robot
in the test fluid. (c) Uniform magnetization profile of the magnetic small-scale
robot along its body (blue arrows). (d) Sample movement of the magnetic small-
scale robot by applying a rotating magnetic field 𝐵 shown by red arrows.
(e) Illustration of the full-scan EIT stimulation scheme. All electrode pairs were
successively selected as driving electrodes.


33

3. Conclusion and Outlook

3.1. Conclusion
This dissertation represents a comprehensive exploration of data-efficient

adaptive locomotion for soft millirobots demonstrated on a sheet-shaped

elastomeric magnetic soft millirobot with the reported publications. The

first paper in Appendix A, "Learning of Sub-optimal Gait Controllers for

Magnetic Walking Soft Millirobots", showed the performance variations

in magnetic soft millirobots by repeated tests. It proposed using GP-BO

to adapt to the variabilities inherent in soft robots, even in the absence of

model-based control.

Next, the second paper in Appendix B, "Task Space Adaptations via the

Learning of Magnetic Soft Millirobots", demonstrated the controller

parameters learning for the walking gait of the magnetic soft millirobot

by GP-BO. To this end, a benchmark dataset consisting of 3750

experimental results was generated through exhaustive grid search, and

the effectiveness of the prior mean transfer in GP-BO was shown by

running 104 augmented tests. The results highlighted that transferring

prior mean information improved the learning performance by reducing

the required experiments to find parameter sets generating walking

locomotion. The performance of the proposed transfer learning approach

was tested further in various task spaces with changing surface adhesion,

surface roughness, and medium viscosity. Apart from the proposed

learning approach, the electromagnetic coil setup built in the scope of this

study has allowed the running of automated experiments requiring


34

minimized human intervention, making running repeated experiments

with lower performance variations possible.

The third paper in Appendix C, "Learning Soft Millirobot Multimodal

Locomotion with Sim-to-Real Transfer", introduced sim-to-real transfer

learning to improve the optimization performance in larger search spaces.

In this regard, a data-driven soft millirobot simulation was designed.

Using the previously presented transfer learning approach to bridge the

gap between simulation and real-world performance, the efficacy of a

priori knowledge generation in the simulation and performing sim-to-real

transfer learning was demonstrated in various environments and robots.

Moreover, an autonomous environment identification method was

proposed by matching experimental results to simulated data. Its

performance was demonstrated with the autonomous locomotion

adaptation of the robot in previously not encountered conditions.

The fourth paper in Appendix D, "A Localization Method for Untethered

Small-Scale Robots Using Electrical Impedance Tomography",

demonstrated an alternative solution to track the position of the magnetic

soft millirobot using EIT, which was shown to operate without interfering

with the magnetic actuation system.

To summarize, this thesis presents a holistic approach to advancing the

field of soft millirobots, encompassing adaptive control, simulation, and

alternative tracking methods. The findings and proposed methodologies

contribute to the data-efficient, adaptive, and versatile locomotion

capabilities of soft millirobots.


35

3.2. Future Work
The research presented in this dissertation provides a solid foundation for

future explorations and advancements in the field of soft millirobots.

While ongoing research into small-scale fabrication techniques may

address reproducibility concerns83–85, the inherent variability and the

performance decrease over prolonged use due to material degradations in

soft millirobots underscore the continued need for adaptive control.

The learning method and simulation presented in this thesis are

demonstrated on a specific sheet-shaped magnetic soft millirobot, but

they can be applied to different soft robot shapes and materials25 and

various tasks in more complex environments, such as climbing, path

following, and velocity control in 3D confined spaces15. The adaptability

of these approaches opens up exciting possibilities toward the real-world

applications of these robots, especially with the sim-to-real transfer

learning method. Moreover, the developed simulation environment

enables the exploration of alternative control and adaptation approaches.

Furthermore, the successful application of sim-to-real transfer learning

introduces the potential to plan robotic operations beforehand, such as in

medical procedures.

The effectiveness of sim-to-real transfer learning and adaptive control

methods based on BO has been validated in various test cases; however,

these methods may face challenges in dynamic environments with

momentary changes owing to the episodic nature of BO. One potential

solution would be using continuous control algorithms, such as deep

reinforcement learning (DRL)55,86,87. However, these algorithms,


36

especially the neural network-based ones, require a larger training

datasets. Therefore, as the next step, the simulation environment will be

updated to integrate adhesion mechanics and fluidic interactions. This

will enable modeling the robot's dynamics in scenarios involving sticky

surfaces, such as biological tissues covered by mucus, and diverse fluid

environments, such as blood. Moreover, simulation speed will be

improved to explore planning and new continuous control algorithms.

This can be achieved by limiting the update in the simulation to a local

frame where the robot is located, thereby omitting unnecessary

calculations in the rest of the simulated environment. The possibility of

replacing the simulation environment with a deep neural network model

will also be explored88.

In addition to adaptive control, the GP-BO-based learning method can be

further used to design the robots, i.e., their morphological and magnetic

properties, and to learn actuation signals for a given task without

requiring physical experiments55,87,89. Furthermore, the domain

identification methods introduced in this research can be used for

enhancing localization accuracy and mapping complex environments.

As the field of soft millirobots continues to evolve, these future research

directions will play a pivotal role in enhancing their adaptability and

expanding their applicability across various domains.


37

Bibliography
1. Majidi, C. Soft Robotics: A Perspective - Current Trends and

Prospects for the Future. Soft Robot 1, 5–11 (2014).

2. Polygerinos, P. et al. Soft Robotics: Review of Fluid-Driven
Intrinsically Soft Devices; Manufacturing, Sensing, Control, and
Applications in Human-Robot Interaction. Adv Eng Mater 19,
1700016 (2017).

3. Haddadin, S., De Luca, A. & Albu-Schäffer, A. Robot collisions:
A survey on detection, isolation, and identification. IEEE
Transactions on Robotics 33, 1292–1312 (2017).

4. Hughes, J. et al. Soft manipulators and grippers: A review.
Frontiers Robotics AI 3, 1–12 (2016).

5. Gu, G. et al. A soft neuroprosthetic hand providing simultaneous
myoelectric control and tactile feedback. Nature Biomedical
Engineering 2021 7:4 7, 589–598 (2021).

6. Yin, J. et al. Wearable Soft Technologies for Haptic Sensing and
Feedback. Adv Funct Mater 31, 2007428 (2021).

7. Li, M., Pal, A., Aghakhani, A., Pena-Francesch, A. & Sitti, M.
Soft actuators for real-world applications. Nature Reviews
Materials 2021 7:3 7, 235–249 (2021).

8. Reeder, J. T. et al. Soft, bioresorbable coolers for reversible
conduction block of peripheral nerves. Science (1979) 377, 109–
115 (2022).

9. Cianchetti, M., Laschi, C., Menciassi, A. & Dario, P. Biomedical
applications of soft robotics. Nature Reviews Materials 2018 3:6
3, 143–153 (2018).


38

10. Hines, L., Petersen, K., Lum, G. Z. & Sitti, M. Soft Actuators for
Small-Scale Robotics. Advanced Materials 29, (2017).

11. Wang, B., Kostarelos, K., Nelson, B. J. & Zhang, L. Trends in
Micro-/Nanorobotics: Materials Development, Actuation,
Localization, and System Integration for Biomedical
Applications. Advanced Materials 33, (2021).

12. Sitti, M. Miniature soft robots — road to the clinic. Nature
Reviews Materials 2018 3:6 3, 74–75 (2018).

13. Wang, T. et al. Adaptive wireless millirobotic locomotion into
distal vasculature. Nat Commun 13, 1–17 (2022).

14. Soon, R. H. et al. Pangolin-inspired untethered magnetic robot
for on-demand biomedical heating applications. Nat Commun 14,
1–15 (2023).

15. Wu, Y., Dong, X., Kim, J. K., Wang, C. & Sitti, M. Wireless soft
millirobots for climbing three-dimensional surfaces in confined
spaces. Sci Adv 8, (2022).

16. Ren, Z. et al. Soft-bodied adaptive multimodal locomotion
strategies in fluid-filled confined spaces. Sci Adv 7, (2021).

17. Zheng, Z. et al. Electrodeposited Superhydrophilic-
Superhydrophobic Composites for Untethered Multi-Stimuli-
Responsive Soft Millirobots. Advanced Science 10, (2023).

18. Pilz Da Cunha, M., Debije, M. G. & Schenning, A. P. H. J.
Bioinspired light-driven soft robots based on liquid crystal
polymers. Chem Soc Rev 49, 6568–6578 (2020).

19. Li, G. et al. Self-powered soft robot in the Mariana Trench.
Nature 591, 66–71 (2021).


39

20. Son, D., Gilbert, H. & Sitti, M. Magnetically Actuated Soft
Capsule Endoscope for Fine-Needle Biopsy. Soft Robot 7, 10–21
(2020).

21. Sitti, M. & Wiersma, D. S. Pros and Cons: Magnetic versus
Optical Microrobots. Advanced Materials 32, (2020).

22. Demir, S. O. et al. Task space adaptation via the learning of gait
controllers of magnetic soft millirobots. International Journal of
Robotics Research 40, 1331–1351 (2021).

23. Daguerre, H. et al. A Localization Method for Untethered Small-
Scale Robots Using Electrical Impedance Tomography.
IEEE/ASME Transactions on Mechatronics 27, 3506–3516
(2022).

24. Hu, W., Lum, G. Z., Mastrangeli, M. & Sitti, M. Small-scale soft-
bodied robot with multimodal locomotion. Nature 554, 81–85
(2018).

25. Zheng, Z. et al. Programmable aniso-electrodeposited modular
hydrogel microrobots. Sci Adv 8, (2022).

26. Culha, U., Demir, S. O., Trimpe, S. & Sitti, M. Learning of Sub-
optimal Gait Controllers for Magnetic Walking Soft Millirobots.
in Robotics: Science and Systems (2020).
doi:10.15607/RSS.2020.XVI.070.

27. Rus, D. & Tolley, M. T. Design, fabrication and control of soft
robots. Nature 521, 467–475 (2015).

28. Della Santina, C., Bicchi, A. & Rus, D. On an Improved State
Parametrization for Soft Robots with Piecewise Constant
Curvature and Its Use in Model Based Control. IEEE Robot
Autom Lett 5, 1001–1008 (2020).


40

29. Webster, R. J. & Jones, B. A. Design and kinematic modeling of
constant curvature continuum robots: A review. International
Journal of Robotics Research 29, 1661–1683 (2010).

30. Cianchetti, M., Renda, F., Giorelli, M., Laschi, C. & Calisti, M.
Dynamic Model of a Multibending Soft Robot Arm Driven by
Cables. IEEE Transactions on Robotics 30, 1109–1122 (2014).

31. Renda, F., Boyer, F., Dias, J. & Seneviratne, L. Discrete Cosserat
Approach for Multisection Soft Manipulator Dynamics. IEEE
Transactions on Robotics 34, 1518–1533 (2018).

32. Chenevier, J., González, D., Aguado, J. V., Chinesta, F. & Cueto,
E. Reduced-order modeling of soft robots. PLoS One 13, (2018).

33. Goury, O. & Duriez, C. Fast, Generic, and Reliable Control and
Simulation of Soft Robots Using Model Order Reduction. IEEE
Transactions on Robotics 34, 1565–1576 (2018).

34. Huang, W., Huang, X., Majidi, C. & Jawed, M. K. Dynamic
simulation of articulated soft robots. Nature Communications
2020 11:1 11, 1–9 (2020).

35. George Thuruthel, T., Ansari, Y., Falotico, E. & Laschi, C.
Control Strategies for Soft Robotic Manipulators: A Survey. Soft
Robot 5, 149–163 (2018).

36. Rich, S. I., Wood, R. J. & Majidi, C. Untethered soft robotics.
Nature Electronics 2018 1:2 1, 102–112 (2018).

37. Kim, Y., Parada, G. A., Liu, S. & Zhao, X. Ferromagnetic Soft
Continuum Robots. Sci. Robot vol. 4 https://www.science.org
(2019).

38. Chin, K., Hellebrekers, T. & Majidi, C. Machine Learning for
Soft Robotic Sensing and Control. Advanced Intelligent Systems
2, 1900171 (2020).


41

39. Hyatt, P., Wingate, D. & Killpack, M. D. Model-based control of
soft actuators using learned non-linear discrete-time models.
Frontiers Robotics AI 6, (2019).

40. George Thuruthel, T., Shih, B., Laschi, C. & Thomas Tolley, M.
Soft Robot Perception Using Embedded Soft Sensors and
Recurrent Neural Networks. Sci. Robot vol. 4
https://www.science.org (2019).

41. Bern, J. M., Schnider, Y., Banzet, P., Kumar, N. & Coros, S. Soft
Robot Control with a Learned Differentiable Model. 2020 3rd
IEEE International Conference on Soft Robotics, RoboSoft 2020
417–423 (2020) doi:10.1109/ROBOSOFT48309.2020.9116011.

42. Ghahramani, Z. Probabilistic Machine Learning and Artificial
Intelligence. Nature on vol. 27 (2015).

43. Shahriari, B., Swersky, K., Wang, Z., Adams, R. P. & De Freitas,
N. Taking the human out of the loop: A review of Bayesian
optimization. Proceedings of the IEEE vol. 104 148–175 Preprint
at https://doi.org/10.1109/JPROC.2015.2494218 (2016).

44. Rasmussen, C. Edward. & Williams, C. K. I. Gaussian Processes
for Machine Learning. (MIT Press, 2006).

45. Calandra, R., Seyfarth, A., Peters, J. & Deisenroth, M. P.
Bayesian optimization for learning gaits under uncertainty: An
experimental comparison on a dynamic bipedal walker. Ann
Math Artif Intell 76, 5–23 (2016).

46. Liao, T. et al. Data-efficient learning of morphology and
controller for a microrobot. Proc IEEE Int Conf Robot Autom
2019-May, 2488–2494 (2019).

47. Marco, A., Hennig, P., Bohg, J., Schaal, S. & Trimpe, S.
Automatic LQR tuning based on Gaussian process global


42

optimization. Proc IEEE Int Conf Robot Autom 2016-June, 270–
277 (2016).

48. Yang, B. et al. Learning flexible and reusable locomotion
primitives for a microrobot. IEEE Robot Autom Lett 3, 1904–
1911 (2018).

49. Von Rohr, A., Trimpe, S., Marco, A., Fischer, P. & Palagi, S.
Gait Learning for Soft Microrobots Controlled by Light Fields.
in 2018 IEEE/RSJ International Conference on Intelligent Robots
and Systems (IROS) 6199–6206 (2018).
doi:10.1109/IROS.2018.8594092.

50. Kaufmann, E. et al. Champion-level drone racing using deep
reinforcement learning. Nature 2023 620:7976 620, 982–987
(2023).

51. Song, Y., Romero, A., Müller, M., Koltun, V. & Scaramuzza, D.
Reaching the limit in autonomous racing: Optimal control versus
reinforcement learning. Sci Robot 8, eadg1462 (2023).

52. Bongard, J., Zykov, V. & Lipson, H. Resilient machines through
continuous self-modeling. Science (1979) 314, 1118–1121
(2006).

53. Hwangbo, J. et al. Learning agile and dynamic motor skills for
legged robots. Sci Robot 4, (2019).

54. Lee, Y. H. et al. Whole-Body Control and Angular Momentum
Regulation using Torque Sensors for Quadrupedal Robots. J
Intell Robot Syst 102, 1–15 (2021).

55. Choi, S. et al. Learning quadrupedal locomotion on deformable
terrain. Sci Robot 8, (2023).

56. Yang, Y. et al. Data Efficient Reinforcement Learning for
Legged Robots. Proc Mach Learn Res 100, 1–10 (2019).


43

57. Mahler, J. et al. Learning ambidextrous robot grasping policies.
Sci Robot 4, (2019).

58. Lázaro-Gredilla, M., Lin, D., Swaroop Guntupalli, J. & George,
D. Beyond imitation: Zero-shot task transfer on robots by
learning concepts as cognitive programs. Sci Robot 4, (2019).

59. Fazeli, N. et al. See, feel, act: Hierarchical learning for complex
manipulation skills with multisensory fusion. Sci Robot 4, (2019).

60. Ficuciello, F., Migliozzi, A., Laudante, G., Falco, P. & Siciliano,
B. Vision-based grasp learning of an anthropomorphic hand-arm
system in a synergy-based control framework. Sci Robot 4,
(2019).

61. Hiller, J. & Lipson, H. Dynamic Simulation of Soft Multimaterial
3D-Printed Objects. https://home.liebertpub.com/soro 1, 88–101
(2014).

62. Shah, D. S. et al. A soft robot that adapts to environments through
shape change. Nat Mach Intell 3, 51–59 (2020).

63. Tiryaki, M. E., Demir, S. O. & Sitti, M. Deep Learning-based 3D
Magnetic Microrobot Tracking using 2D MR Images. IEEE
Robot Autom Lett 7, 6982–6989 (2022).

64. Pane, S., Iacovacci, V., Sinibaldi, E. & Menciassi, A. Real-time
imaging and tracking of microrobots in tissues using ultrasound
phase analysis. Appl Phys Lett 118, 26 (2021).

65. Yan, X. et al. Multifunctional biohybrid magnetite microrobots
for imaging-guided therapy. Sci Robot 2, (2017).

66. Iacovacci, V. et al. High-Resolution SPECT Imaging of Stimuli-
Responsive Soft Microrobots. Small 15, (2019).

67. Reale, R., De Ninno, A., Businaro, L., Bisegna, P. & Caselli, F.
High-throughput electrical position detection of single flowing


44

particles/cells with non-spherical shape. Lab Chip 19, 1818–1827
(2019).

68. Masner, A., Blasina, F. & Simini, F. Electrical impedance
tomography for neonatal ventilation assessment: A narrative
review. in Journal of Physics: Conference Series vol. 1272
(Institute of Physics Publishing, 2019).

69. Rapin, M. et al. Wearable Sensors for Frequency-Multiplexed
EIT and Multilead ECG Data Acquisition. IEEE Trans Biomed
Eng 66, 810–820 (2019).

70. Shi, X. et al. High-precision electrical impedance tomography
data acquisition system for brain imaging. IEEE Sens J 18, 5974–
5984 (2018).

71. Darnajou, M. et al. High Speed EIT With Multifrequency
Excitation Using FPGA and Response Analysis Using FDM.
IEEE Sens J 20, 8698–8710 (2020).

72. Zhang, Y., Xiao, R. & Harrison, C. Advancing hand gesture
recognition with high resolution electrical impedance
tomography. in UIST 2016 - Proceedings of the 29th Annual
Symposium on User Interface Software and Technology 843–850
(Association for Computing Machinery, Inc, 2016).
doi:10.1145/2984511.2984574.

73. Demir, S. O., Tiryaki, M. E., Karacakol, A. C. & Sitti, M.
Learning Soft Millirobot Multimodal Locomotion with Sim‐to‐
Real Transfer. Advanced Science (2024)
doi:10.1002/advs.202308881.

74. Wang, C., Wu, Y., Dong, X., Armacki, M. & Sitti, M. In situ
sensing physiological properties of biological tissues using
wireless miniature soft robots. Sci Adv 9, (2023).


45

75. Duvenaud, D., Nickisch, H. & Rasmussen, C. E. Additive
Gaussian Processes. in Proceedings of the 24th International
Conference on Neural Information Processing Systems 226–234
(Curran Associates Inc., Granada, Spain, 2011).
doi:10.5555/2986459.2986485.

76. Chen, Z. & Wang, B. How priors of initial hyperparameters affect
Gaussian process regression models. Neurocomputing 275,
1702–1710 (2018).

77. Jones, D. R., Schonlau, M. & Welch, W. J. Efficient Global
Optimization of Expensive Black-Box Functions. Journal of
Global Optimization 13, 455–492 (1998).

78. Brochu, E., Cora, V. M. & de Freitas, N. A Tutorial on Bayesian
Optimization of Expensive Cost Functions, with Application to
Active User Modeling and Hierarchical Reinforcement Learning.
(2010).

79. Ibarz, J. et al. How to train your robot with deep reinforcement
learning: lessons we have learned. International Journal of
Robotics Research 40, 698–721 (2021).

80. Hiller, J. & Lipson, H. Dynamic Simulation of Soft Multimaterial
3D-Printed Objects. Soft Robot 1, 88–101 (2014).

81. Silvera-Tawil, D., Rye, D., Soleimani, M. & Velonaki, M.
Electrical impedance tomography for artificial sensitive robotic
skin: A review. IEEE Sens J 15, 2001–2016 (2015).

82. Adler, A. & Lionheart, W. R. B. Uses and abuses of EIDORS: an
extensible software base for EIT. Physiol Meas 27, (2006).

83. Kim, Y., Yuk, H., Zhao, R., Chester, S. A. & Zhao, X. Printing
ferromagnetic domains for untethered fast-transforming soft
materials. Nature 558, 274–279 (2018).


46

84. Xu, T., Zhang, J., Salehizadeh, M., Onaizah, O. & Diller, E.
Millimeter-scale flexible robots with programmable three-
dimensional magnetization and motions. Sci Robot 4, (2019).

85. Alapan, Y., Karacakol, A. C., Guzelhan, S. N., Isik, I. & Sitti, M.
Reprogrammable shape morphing of magnetic soft machines. Sci
Adv 6, 6414–6432 (2020).

86. Heess, N. et al. Emergence of Locomotion Behaviours in Rich
Environments. arXiv preprint arXiv:1707.02286 (2017)
doi:10.48550/arXiv.1707.02286.

87. Yao, J. et al. Adaptive Actuation of Magnetic Soft Robots Using
Deep Reinforcement Learning. Advanced Intelligent Systems 5,
(2023).

88. Tariverdi, A. et al. A Recurrent Neural-Network-Based Real-
Time Dynamic Model for Soft Continuum Manipulators. Front
Robot AI 8, 631303 (2021).

89. Schaff, C., Sedal, A. & Walter, M. R. Soft Robots Learn to Crawl:
Jointly Optimizing Design and Control with Sim-to-Real
Transfer. in Robotics: Science and Systems (2022).


47

Publication 1

Learning of Sub-optimal Gait Controllers for

Magnetic Walking Soft Millirobots


49

Abstract

Untethered small-scale soft robots have promising applications in

minimally invasive surgery, targeted drug delivery, and bioengineering

applications as they can access confined spaces in the human body.

However, due to highly nonlinear soft continuum deformation

kinematics, inherent stochastic variability during fabrication at the small

scale, and lack of accurate models, the conventional control methods

cannot be easily applied. Adaptivity of robot control is additionally

crucial for medical operations, as operation environments show large

variability, and robot materials may degrade or change over time, which

would have deteriorating effects on the robot motion and task

performance. Therefore, we propose using a probabilistic learning

approach for millimeter-scale magnetic walking soft robots using

Bayesian optimization (BO) and Gaussian processes (GPs). Our approach

provides a data-efficient learning scheme to find controller parameters

while optimizing the stride length performance of the walking soft

millirobot robot within a small number of physical experiments. We

demonstrate adaptation to fabrication variabilities in three different

robots and to walking surfaces with different roughness. We also show

an improvement in the learning performance by transferring the learning

results of one robot to the others as prior information.

I. Introduction

Soft-bodied robots are composed of functional soft materials exhibiting

shape-programmable properties that allow passive/active structural

compliance and large degrees of freedom, which are hard to achieve using


50

conventional rigid materials [16]. The research on soft robots is getting

more attention owing to easier access to novel fabrication methods and

functional materials, and potential high-impact medical and other

applications [25]. Biologically inspired soft robots can be used to study

their soft-bodied biological counterparts [17], and open new application

areas in multi-terrain locomotion [4], adaptive manipulation [15, 28], and

human-assistive wearable systems [41]. Soft robots also enable safe

human-robot physical interaction due to their high compliance and

limited output force, which normally require additional computational

effort in conventional robotic systems [12]. Small-scale (i.e., millimeter)

untethered soft robots have further potential usage in medicine owing to

their ability to access to enclosed small spaces non-invasively [26, 35]

and the embodiment of functionalized materials enabling targeted drug

delivery and bio-sensing [5].

Despite their potential, the virtual infinite degrees of freedom, the lack of

accurate models, fabrication variations, and non-linear behavior (e.g.,

hysteresis) render the application of conventional control methods

challenging for soft robots [32]. So far, constant curvature (CC) models

utilizing bending beam theories have been widely-used to approximately

represent the deformation of continuum robots [42]. Alternatively,

analytical and geometrically exact models have been suggested for

continuum robots that are represented as simplified rods [29]. Finite

element methods (FEM) provide numerical solutions to soft robot

kinematics by utilizing a chain of rigid elements connected with tunable

spring-damper mechanisms [21]. These kinematic models allow the

implementation of static and dynamic controllers for continuum robots


51

on a larger scale [9]. However, these controllers typically depend on the

continuous sensing of body deformations from embedded sensors and

computationally heavy model solutions, which are conditions that may

not be met for untethered soft robots at the small scales [30]. The dynamic

task environment, complex deformation kinematics, fabrication-

dependent performance variations, and actuation/sensing limitations have

further impacts on the soft mobile robots targeting medical applications,

which make adaptive and data-efficient control methods attractive for

these robots [36].

In the case of uncertainty and lack of a parametric model that represents

the system, data-driven control [13] and reinforcement learning [38, 19]

provide promising alternatives over model-based designs in small-scale

soft robotic systems. However, the need for data efficiency, i.e., the

ability to learn from only a few experimental trials, presents a core

challenge for such methods [6]. Conversely, Bayesian optimization (BO)

[10, 34] allows for the maximization of a performance function using a

small number of physical experiments. BO typically employs Gaussian

processes (GPs) [27] as a probabilistic model of the latent objective

function. While no explicit dynamics model is needed, GPs allow for

incorporating information as probabilistic priors, thus reducing data

requirements. There are emerging examples that demonstrate the

application of this approach to optimize the locomotion performance of

robots on different length scales [3, 44, 22]. Despite its potential to

address the control challenge for untethered soft robots, there are only a

few examples that apply this method such as in the gait exploration of a

tensegrity system [31], and the optimization of an undulating motion of a


52

microrobot [40]. So far, a data-efficient procedure that adapts the learned

controllers to different robots and environmental conditions for

untethered small-scale soft robots has not been demonstrated.

In this paper, we propose a learning procedure to find the controller

parameters of magnetically actuated, untethered, soft millirobots (see

Fig. 1) that generate optimum walking gaits within a small number of

physical experiments. We specifically focus on these types of robots due

to their biocompatible use of external magnetic actuation that supports

multi-functionality in future medical tasks [14, 23] and the high-

resolution magnetization methods that allow more complex deformation

capabilities at the small scale [24, 7]. We produce three replicas of a

previously reported soft millirobot demonstrating a hand-tuned walking

gait aiming for medical applications [14] and test the repeatability of their

results. We begin with finding the optimum walking gait controllers of

our robots using BO with GPs; initially without any prior information

about the correlation between the controller parameters and the robot

performance. Later, we explore the controller parameter space of one

robot and present a straightforward way in the context of BO to transfer

prior information from the first robot to all three robots while finding their

optimum walking gait controllers in a data-efficient fashion. We report

the optimum controllers and walking gait performances in terms of

achieved stride lengths for all three robots and compare the two learning

approaches (i.e., with and without using prior information). We also

transfer this information to adapt to the changes in the task environment

by finding the controller parameters for walking on rough surfaces.


53

Figure 1. (a) The magneto-elastomer robot is rolled around a jig and magnetized
with 𝐵 = 1.8𝑇 field (red arrow) with a 45° angle with respect to the y-axis. The
unfolded robot maintains a circular magnetization profile along its body (blue
arrows). (b) Photo of the experimental setup with 6 electromagnetic coils and a
high-speed camera. (c) Image of the fabricated and magnetized soft millirobot.
(d) Projected planar images showing the four consecutive states of the robot
walking gait: (1) relaxed, (2) front-stance, (3) double-stance, and (4) back-stance.
These images are placed with a separation on the y-axis for visual clarity, i.e., the
robot does not jump in between states during the experiments. Numbers represent
the four states.

The organization of this paper is as follows. We describe the robot design

and its walking gaits in Section II and introduce our learning approach in

Section III. Section IV presents the experiments on learning of the

optimum gait controllers without using the prior information, the

generation of the prior controller information from one robot, and the

optimization of the walking gaits by transferring the learned controllers

to three robots and different locomotion surfaces. We discuss the

experimental results in Section V and conclude our work in Section VI.


54

II. Robot Design and Gait Definition

We followed the methods and materials reported in [14] and fabricated

three magnetic soft millirobots with a 1:1 body mass ratio of Ecoflex

00-10 (Smooth-On Inc.) and neodymium-iron-boron (NdFeB) magnetic

microparticles with around 5𝜇𝑚 diameter (MQP-15-7, Magnequench).

We placed this pre-polymer mixture on a methyl methacrylate plate and

cut the robots out of the cast using a high-resolution laser cutter (LPKF

Protolaser U4) after the polymer is cured. Our robots had the final

dimensions of length 𝐿 = 3.7𝑚𝑚, width 𝑤 = 1.5𝑚𝑚, and height

ℎ = 185𝜇𝑚 as shown in Fig. 1-a. We separately folded the robots around

a circular jig with a circumference equal to L and magnetized them within

a magnetic field with a magnitude of 1.8𝑇 and orientation of

45° measured counterclockwise from the y-axis. Once the robots are

unfolded from the jig, the magnetic particles maintained their

magnetization orientation forming a circular profile along the

longitudinal axis of the robot body (Fig. 1-a). We used these robots  (i.e.,

robots 1, 2, and 3) with the same nominal material properties and

dimensions for our experiments (see Fig. 1-c for a sample robot image).

The walking gait of our robot is composed of four consecutive quasi-

static states that are inspired by the planar quadrupedal bounding [1] and

a caterpillar's inching motion [39]. These states are depicted as

(1) relaxed, (2) front-stance, (3) double-stance, and (4) back-stance as

shown in Fig. 1-d. We placed our magnetized robot along the y-axis of


55

Figure 2. (a) Walking gait control parameters during a single period of a sample
motion (a single period of 1 𝑓⁄ = 90𝑚𝑠 for 𝑓 = 11𝐻𝑧 is normalized to 0-1 on
the abscissa). The magnetic field 𝐵 is controlled on the y-z plane and shown with
its y and z components (top) whose magnitude (middle) reaches 𝐵𝑚𝑎𝑥 and
orientation (bottom) changes from 𝛼1 to 𝛼2. Dashed vertical lines represent the
(1) relaxed, (2) front-stance, (3) double-stance, and (4) back-stance states of the
walking gait. (b) The stride length 𝑆 performance of the previously reported robot
(𝑟𝑜𝑏𝑜𝑡𝑅) vs. the performance of three replica robots using the same controller
parameters (𝑟𝑜𝑏𝑜𝑡1,2,3). Each data point for each robot (𝑟𝑜𝑏𝑜𝑡1,2,3) represents the
mean of 10 experiments and the error bars show the standard deviation. The
performance of 𝑟𝑜𝑏𝑜𝑡𝑅 and the horizontal dashed line which represents the model
prediction are adapted from [14].

the magnetic coil setup consisting of three orthogonal pairs of custom-

made electromagnets (Fig. 1-b) that generated a 3-D uniform magnetic

field within a 4×4×4cm3 space. We modulated the magnetic field on the

y-z plane that coincided with the center of the test environment. We

controlled four parameters to generate the walking gait: the maximum

magnetic field magnitude (𝐵𝑚𝑎𝑥), the frequency of the actuation cycle


56

(𝑓), and two magnetic field orientation angles (𝛼1 and 𝛼2) measured

counterclockwise from the y-axis. The plots in Fig. 2-a show the change

of the control parameters during a single period of the motion for

𝐵𝑚𝑎𝑥 = 10𝑚𝑇, 𝑓 = 11𝐻𝑧, 𝛼1 = 30° and 𝛼2 = 60°, which are hand-

tuned parameters reported in [14]. At the beginning of a single gait period,

the robot started at a relaxed state for 0 ≤ 𝐵 ≤ 4𝑚𝑇. The robot tilted

forward when 𝛼 = 𝛼1 and 𝐵 increased from 4𝑚𝑇 to 𝐵𝑚𝑎𝑥 = 10𝑚𝑇.

While 𝐵 remained constant at 𝐵𝑚𝑎𝑥, the orientation of the magnetic field

changed from 𝛼1 to 𝛼2 causing the robot to initially switch to the double-

stance state and then to the back-stance state when 𝛼 = 𝛼2. Then, 𝐵

decreased while keeping the orientation of the magnetic field constant,

and the robot gradually switched back to the relaxed state. For 𝐵 < 4𝑚𝑇,

the robot assumed the relaxed state, and a single period of walking

actuation ended when 𝐵 = 0𝑚𝑇. We reset 𝐵 at the end of every gait cycle

to avoid jerky motion when 𝛼 changed from 𝛼1 to 𝛼2. In our experiments,

the relaxed state was never skipped but its duration changed according to

𝑓. The consecutive images from a single walking gait period are shown

in Fig. 1-d. We tracked the robot gait using a high-speed camera (Basler

aCa2040-90uc, 60 frames per second (fps), 1pixel∼27𝜇𝑚 resolution) that

is orthogonally placed to the axis of robot motion (Fig. 1-b). In every

experiment, we calculated the stride length (𝑆) of the robot by tracking

the average distance covered by its center of mass in 10 consecutive steps.

To test the repeatability of the previously reported results in [14], we

experimented with our fabricated robots using their suggested controller

parameter sets. Fig. 2-b shows the stride length performances of our


57

robots (robot1,2,3) and compares them with the reported robot (robotR)

performance (i.e., we calculated the stride length of robotR from the

values reported in [14]) for 𝐵𝑚𝑎𝑥 = 10𝑚𝑇, 𝛼1 = 30°, 𝛼2 = 60°, and

2 ≤ 𝑓 ≤ 20𝐻𝑧. Our preliminary results revealed that:

 In this scale, the gait performance showed clear inconsistency due to

the variability during fabrication and environmental factors even

though the same materials, methods, controller parameters, and

walking surfaces are used in the fabrication and experimentation of

the millimeter-scale soft robots.

 Unlike the model prediction, the robot performance showed non-

monotonic behavior along with increasing 𝑓, which rendered the

design of a model-based gait controller unreliable.

 In addition to the virtual infinite degrees of freedom inherited by the

soft materials, the controller parameters existed in a continuous

space, making the hand-tuning of these parameters within physical

experiments impractical.

These observations found the goals of our paper in which we address the

necessity for a data-efficient controller learning system that is robust to

the variabilities caused by the material, fabrication, and the task

environment of the miniature scale, medical-oriented, untethered soft

robots.


58

III. Learning Approach

We aim to optimize the walking gait controller parameters to maximize

the stride length 𝑆 of the robot. Therefore, we define the reward function

as

𝑆: 𝛩 → ℝ, (1)

which maps the parameter set 𝜃 = [𝐵𝑚𝑎𝑥 , 𝑓, 𝛼1, 𝛼2] to scalar reward

values (i.e., the experimental stride length performance of a robot).

According to the definition of the reward function, we formulate the

parameter learning as the (global) optimization problem

𝜃∗ = argmax
𝜃∈Θ

𝑆(𝜃), (2)

where Θ denotes the complete search space, 𝜃 is the parameter set, and

𝑆(𝜃) is the experimentally observed stride length performance of the

robot for a given 𝜃.

We define the range of the controller parameters based on the findings in

[14] and the physical limitations of our magnetic actuation setup.

Accordingly, 𝐵𝑚𝑎𝑥 is defined between 5𝑚𝑇 and 12𝑚𝑇, and the walking

frequency, 𝑓, ranges from 𝑓𝑚𝑖𝑛 = 0.5𝐻𝑧 to 𝑓𝑚𝑎𝑥 = 20𝐻𝑧. We limit 𝛼1

and 𝛼2 to [10, 50]° and [40,80]° respectively and select values that

satisfy 𝛼2 > 𝛼1 to generate the walking gait in Fig. 1-d. We use a step

size of 1𝑚𝑇 for 𝐵, 1° for each 𝛼, and a variable step size of 0.25𝐻𝑧 for

𝑓 < 2𝐻𝑧 and 2𝐻𝑧 for 𝑓 ≥ 2𝐻𝑧, which yield a total number of 203520

possible parameter sets in Θ.


59

A. Gaussian Processes (GPs)

The magnetic soft millirobots in our paper did not have an accurate

kinematic or dynamics model (see Fig. 2-b). Therefore, it is necessary to

approximate the reward function based on the data collected from

physical experiments rather than numerical analysis. However, the

physical data has inherent uncertainty due to the noise in the

measurements and the variations during the experiments. To include

these uncertainties in the model, overcome the sparsity in the data, and

make probabilistic predictions at unobserved locations, we model the

reward function 𝑆(𝜃) using GPs following the previous study in [40]:

𝑆(𝜃)~𝐺𝑃 𝜇(𝜃), 𝑘(𝜃, 𝜃′) , (3)

However, as 𝑆(𝜃) can only be measured with noise, we define �̃� as

�̃�(𝜃𝑖) = 𝑆(𝜃𝑖) + 𝑛𝑖, (4)

where 𝑛𝑖 is zero-mean Gaussian noise with variance 𝜎𝑛
2 for each

measurement 𝑖.

A GP is a non-parametric model defined by its prior mean 𝜇(𝜃) and the

covariance function 𝑐𝑜𝑣 𝑆(𝜃), 𝑆(𝜃′) = 𝑘(𝜃, 𝜃′), where 𝑘 is the kernel.

During one run of BO, the GP model is sequentially updated with �̃�(𝜃)

observed from experiments. We define one “learning run” as a run of BO

until the desired stopping criterion is matched (e.g., a fixed number of

experiments is reached).


60

From the experimental data 𝐷 = 𝜃𝑖 , �̃�(𝜃𝑖) 𝑖=1
𝑁

, the stride length of the

robot for an unobserved 𝜃 can be predicted using the posterior mean and

variance as follows.

𝜇𝑝𝑜𝑠𝑡(𝜃) = 𝜇(𝜃) + 𝑘𝑇(𝜃)𝐾−1𝑦, (5)

𝜎𝑝𝑜𝑠𝑡
2 (𝜃) = 𝑘(𝜃, 𝜃) − 𝑘𝑇(𝜃)𝐾−1𝑘(𝜃), (6)

𝑆𝑝𝑜𝑠𝑡(𝜃)|𝐷~𝑁(𝜇𝑝𝑜𝑠𝑡(𝜃), 𝜎𝑝𝑜𝑠𝑡
2 (𝜃), (7)

where 𝑘(𝜃), 𝑦 ∈ ℝ𝑁 with 𝑘(𝜃𝑖) = 𝑘(𝜃, 𝜃𝑖), 𝑦𝑖 = �̃�(𝜃𝑖) − 𝜇(𝜃𝑖), and

𝐾 ∈ ℝ𝑁×𝑁 with 𝐾𝑖,𝑗 = 𝑘 𝜃𝑖 , 𝜃𝑗 + 𝛿𝑖,𝑗𝜎𝑛
2, where 𝛿𝑖,𝑗  is the Kronecker

delta and 𝜎𝑛
2 is the noise in the collected data set.

We select the squared exponential as the kernel function in the GPs,

which is for the 1D case:

𝑘𝑆𝐸(𝜃, 𝜃′) = 𝜎𝑓
2 exp(−(𝜃 − 𝜃′)2 2𝑙𝑐

2⁄ ), (8)

where 𝑙𝑐  is the length scale that defines the rate of variation in the modeled

function for each dimension of the parameter space. Long length scales

are used to model slowly-varying functions and short length scales are

used to model quickly-varying functions. The signal variance 𝜎𝑓
2

describes the width of distribution, e.g., high 𝜎𝑓
2 means higher uncertainty

in the predictions of the unobserved 𝜃. Hyperparameters of the GPs can

be listed as the noise in the collected data 𝜎𝑛
2, length scale 𝑙𝑐  for each

dimension of the parameter space ℝ𝑑𝑐, and signal variance 𝜎𝑓
2. To

determine the value of 𝜎𝑛
2, we use the maximum variance found in the


61

experimental results shown in Fig. 2-b. We set the length scale 𝑙𝑐  to one-

fourth of the total range of each corresponding parameter. We also set the

signal variance 𝜎𝑓
2 to half of the body length of the robot so that the

highest possible reward value (i.e., 𝐿 = 3.7𝑚𝑚) remained inside the 95%

confidence interval of the prior.

B. Bayesian Optimization (BO)

We use BO to select the parameter set 𝜃𝑛𝑒𝑥𝑡 to be tested in the next step

of the learning run using the acquisition function 𝛼𝑎𝑐𝑞(𝜃).

𝜃𝑛𝑒𝑥𝑡 = argmax
𝜃∈Θ

𝛼𝑎𝑐𝑞(𝜃). (9)

In this work, we choose the expected improvement (EI) as the acquisition

function 𝛼𝑎𝑐𝑞(𝜃) due to its better performance compared to its

alternatives as demonstrated in [40]. EI seeks the parameter set for the

next step where the expected improvement in reward function is the

highest compared to the previously collected data:

𝛼𝑎𝑐𝑞(𝜃) = 𝔼 max 0, (𝑆(𝜃), 𝑆(𝜃∗) − 𝜉) , (10)

𝛼𝑎𝑐𝑞(𝜃) = (𝜇(𝜃) − 𝑆(𝜃∗) − 𝜉)Φ(𝑍) + 𝜎(𝜃)𝜙(𝑍), (11)

where 𝑆(𝜃∗) is the highest reward function value collected so far, Φ and

𝜙 are the Gaussian cumulative density and probability density functions,

respectively [2]. The term 𝑍 is described as

𝑍 = 𝑍(𝜃) = (𝜇(𝜃) − 𝑆(𝜃∗) − 𝜉) 𝜎(𝜃)⁄ , with 𝜇(𝜃) and 𝜎(𝜃) computed

from Eq. (5) and Eq. (6). In Eq. (11), the two terms define the exploitation

and the exploration weights of the BO, respectively. The balance between


62

these two terms is controlled by the hyperparameter 𝜉. As 𝜉 gets higher,

BO focuses more on exploration and seeks the next parameter set in

regions with high prediction uncertainty. Conversely, BO focuses more

on exploitation and selects the next parameter set within a close range to

already explored regions. We choose 𝜉 = 0.1 to balance the exploration

and exploitation weights.

C. Transfer of the Prior Mean

In addition to the kernel (see Section III-A), the prior mean 𝜇(𝜃) must be

chosen at the beginning of a BO run. Often, 𝜇 = 0 is the default choice

for an uninformed prior. For the millirobot learning problem herein, we

suggest and investigate the transfer of information from previous learning

runs by setting the prior mean to the posterior mean of a previously

trained GP model, such as from a different robot. In this way, we can

approximately transfer the topology of the target function between robots,

which is reasonable as long as the differences between the robots and the

environment do not significantly alter the function shape. In this work,

we adopt and compare both approaches of an uninformed prior

(𝜇(𝜃) = 0) in Section IV-A, and the transfer of the posterior mean from

robot 1's previous run to all three robots in Section IV-B.

IV. Experiments

We modulated the currents running through the electromagnetic coils and

the resulting magnetic field by controlling six motor driver units

(SyRen25) using an Arduino microcontroller running at 1.2 kHz. We

regularly calibrated the magnetic actuation matrix inside the workspace,


63

i.e., the mapping between the applied electric current and the generated

magnetic field, to maintain reliable and repeatable experiments. The

learning process ran on a PC that additionally handled image processing

and hardware communication tasks. One step of the learning run involved

five steps:

1) BO selected a new parameter set 𝜃 that maximized the acquisition

function based on the GP model,

2) The microcontroller regulated the magnetic field based on the

selected 𝜃 and initiated the physical experiment,

3) The camera recorded the robot motion and measured the average

stride length performance 𝑆,

4) The learning system updated the GP model using the newly collected

data from the experiment,

5) The robot returned to its initial position for the next step.

A. Optimization of the Walking Gait without the Prior

To test our controller learning approach without prior information, i.e.,

𝜇(𝜃) = 0, we experimented with all three robots in the same

environmental conditions and limited the number of steps for each

learning run to 20 experiments. We initialized the BO with the best

controller parameter set reported in [14]. We performed three

independent learning runs (i.e., 60 experiments in total) for every robot

with the same initial state, whose results are shown in Fig. 3. Each data

point represents the robot's stride length performance 𝑆 resulting from a

different controller parameter set chosen by the BO at a given step of a


64

learning run. As BO actively chose sample locations (e.g., to explore

unknown regions described in Section III), the variation in these data

points was the desired behavior of the explorative learning algorithm. See

Supplementary Video 1 for the gait performances of four different

controller parameter sets for robot 1.

We chose the optimum controller parameter sets (𝜃∗) from these learning

runs and repeated the walking gait five times to collect statistical

information about the stride length performances. The rows designated

with “no prior” in Table I shows the values for 𝜃∗ and the resulting 𝑆 for

each robot. It can be seen that the walking gait performances of the robots

were significantly improved compared to the robot reported in [14]. Also,

the standard deviation within these repeated experiments agreed with the

previously reported values, platform. In this optimization approach, we

achieved 86.6%, 94.7%, and 60.5% increase in 𝑆 for robots 1 to 3

respectively (i.e., compared to the 𝑆 of the previous robot shown in the

last row of Table I). The difference between the optimum controller

parameter values in Table I demonstrates the influence of the fabrication

variabilities on the robot design and performance. Separate from the

optimum stride length performance of each robot, we evaluated the

overall performance of the learning runs based on the achieved stride

length average of all of the tested controller parameters. This performance

metric 𝑃𝐿𝑅  shows the overall quality of the learning run's parameter

selection in terms of the average of all the 𝑆 and the standard deviation in

60 experiments (i.e., avg(𝑆) ± std). In these experiments, the learning run

for robot 1 yielded 𝑃𝐿𝑅1 = 1.07 ± 0.80𝑚𝑚, 𝑃𝐿𝑅2 = 1.21 ± 0.88𝑚𝑚 for


65

robot 2, and 𝑃𝐿𝑅3 = 0.75 ± 0.64𝑚𝑚 for robot 3. Even though there were

multiple individual 𝜃s within these runs (e.g., the optimums reported in

Table I) that outperformed the previous study, the large standard

deviation shows that the BO selected parameters that generated a wide

range of performances.

Figure 3. The learning of the controller parameters without utilizing the prior
information within 20 physical experiments in 3 independent learning runs
(shown as LR1-3) for (a) robot 1, (b) robot 2, and (c) robot 3.


66

B. Optimization of the Walking Gait with the Prior

1) Generation of the Prior Information

To generate useful prior information, we constructed the posterior mean

𝜇𝑝𝑜𝑠𝑡(𝜃) for robot 1 using the BO and GP described in Section III.

Initially, we adopted nine different controller parameters from [14] and

collected the stride length information from repeated experiments in our

setup. Fig. 4-a and 4-c show the two-dimensional projection of the

approximation of 𝑆 function generated by the GP model (utilizing the

same hyperparameters in Section III) based on these experiments. After

the initial approximation, we used the BO to select new parameter sets

from the unexplored parts of the 4-D search space and collected the

experimental stride length performance information. We explored 123

different parameter sets in total by selectively isolating the search space

dimensions. Initially, we fixed 𝛼1 = 30° and 𝛼2 = 60° and explored 18

different parameter values for 𝐵𝑚𝑎𝑥 and 𝑓. Then, we fixed 𝐵𝑚𝑎𝑥 = 10𝑚𝑇

and 𝑓 = 2𝐻𝑧 and explored 38 values for 𝛼1 and 𝛼2. We performed 17

additional tests for 𝛼1 and 𝛼2 for 𝐵𝑚𝑎𝑥 = 10𝑚𝑇 and 𝑓 = 8𝐻𝑧. Finally,

we explored the complete search space for four parameters with 50 more

tests. For all of these tests, we stopped the exploration when the BO

converged in the sense of repetitively selecting similar 𝜃s.