Learning-Based Control and Localization of Magnetic Soft Millirobots Von dem Stuttgarter Zentrum für Simulationswissenschaften (SC SimTech) der Universität Stuttgart zur Erlangung der Würde eines Doktor-Ingenieurs (Dr.-Ing.) genehmigte Abhandlung Vorgelegt von Sinan Özgün Demir aus Istanbul, Türkei Hauptberichter: Prof. Dr. Metin Sitti Mitberichter: Prof. Dr. Syn Schmitt Mitberichter: Prof. Dr. Georg Martius Tag der mündlichen Prüfung: 23.04.2024 Max-Planck-Institut für Intelligente Systeme 2024 Learning-Based Control and Localization of Magnetic Soft Millirobots A thesis accepted at the Stuttgart Center for Simulation Science in fulfillment of the requirements for the degree Doctor Engineer (Dr.-Ing.) Presented by Sinan Özgün Demir from Istanbul, Türkiye Main Reporter: Prof. Dr. Metin Sitti Co-reporter: Prof. Dr. Syn Schmitt Co-reporter: Prof. Dr. Georg Martius Date of the oral exam: 23.04.2024 Max-Planck-Institute for Intelligent Systems 2024 Aileme v Abstract Soft millirobots have promising biomedical applications due to mechanical compliance, absorbing excess forces without additional computational effort, and multifunctionality. Especially with wireless multimodal locomotion capabilities, magnetic soft millirobots (i.e., ≤1 cm) have emerged as potential minimally invasive medical robotic platforms as they can access confined and hard-to-reach spaces in the human body (e.g., distal vascular regions), and carry out medical applications, such as on-demand drug delivery, sensing, and embolization, in a target location. For such potential biomedical applications, the adaptivity of the robot control is essential to ensure the continuity of the operations, as task environment conditions show dynamic variations that can alter the robot's performance. However, fabrication-, material-, physical-interaction-dependent variations, and complex kinematics with virtually infinite degrees of freedom arising from the nature of their soft material structure limit the applicability of the conventional modeling and control methods. The main objective of this dissertation is to establish a data-efficient adaptive multimodal locomotion framework for the targeted application scenarios of magnetic soft millirobots. To this end, a probabilistic learning approach leveraging Bayesian optimization (BO) and Gaussian processes (GPs) is introduced to address the controller adaptation challenge. First, the efficacy of the BO to fabrication variabilities is shown on three different robots fabricated following the same steps. Next, through augmented tests on benchmark datasets, it is shown that vi transferring the posterior mean learned by one robot as the prior mean to the other robots and test cases improves the learning performance of BO by achieving quicker gait adaptation. Afterward, the controller adaptation method employing the proposed transfer learning approach is demonstrated in various task spaces with varying surface adhesion, surface roughness, and medium viscosity properties. To further improve the adaptation performance by including multimodal locomotion, the sim-to-real transfer learning method is developed in the third study. In this regard, a data-driven simulation environment is designed, and its accuracy is demonstrated by comparing the simulated results to the physical experiments. Leveraging the simulated experience and BO based transfer learning, it is demonstrated that sim-to-real transfer learning provides efficient locomotion learning. Furthermore, the adequacy of the automated locomotion adaptation through the Kullback-Leibler divergence-based domain recognition approach is shown to changing environmental conditions. As the secondary objective, a new localization method using electrical impedance tomography is introduced. The applicability of the proposed approach is demonstrated for stationary and moving cases in environments with and without any obstacles. With these contributions, this thesis proposes a domain-adaptive locomotion learning framework enabling the soft millirobot locomotion to quickly and continuously adapt to environmental changes while exploring actuation space. vii Zusammenfassung Weiche Milliroboter haben vielversprechende biomedizinische Anwendungen, da sie mechanisch nachgiebig sind, überschüssige Kräfte ohne zusätzlichen Rechenaufwand absorbieren und multifunktional sind. Insbesondere mit ihren kabellosen multimodalen Fortbewegungs- fähigkeiten haben sich magnetische weiche Milliroboter (d. h. ≤1 cm) als potenzielle minimalinvasive medizinische Roboterplattformen herauskristallisiert, da sie Zugang zu engen und schwer zugänglichen Stellen im menschlichen Körper (z. B. distale Gefäßregionen) haben und medizinische Anwendungen wie die bedarfsgerechte Verabreichung von Medikamenten, Sensorik und Embolisierung an einem Zielort durchführen können. Für solche potenziellen biomedizinischen Anwendungen ist die Anpassungsfähigkeit der Robotersteuerung von entscheidender Bedeutung, um die Kontinuität des Betriebs zu gewährleisten, da die Bedingungen der Aufgabenumgebung dynamische Schwankungen aufweisen, die die Leistung des Roboters verändern können. Fertigungs-, material- und wechselwirkungsabhängige Variatnen sowie eine komplexe Kinematik mit praktisch unendlichen Freiheitsgraden, die sich aus der Natur ihrer weichen Materialstruktur ergeben, schränken jedoch die Anwendbarkeit herkömmlicher Modellierungs- und Steuerungsmethoden ein. Das Hauptziel dieser Dissertation ist es, ein dateneffizientes adaptives multimodales Fortbewegungskonzept für die angestrebten Anwendungs- szenarien magnetischer weicher Milliroboter zu entwickeln. Zu diesem Zweck wird ein probabilistischer Lernansatz eingeführt, der die viii Bayes'sche Optimierung (BO) und Gauß'sche Prozesse (GPs) nutzt, um die Herausforderung der Regleranpassung anzugehen. Zunächst wird an drei verschiedenen Robotern, die in den gleichen Schritten hergestellt wurden, die Wirksamkeit der BO bei Fertigungsschwankungen gezeigt. Anschließend wird gezeigt, dass die Übertragung des von einem Roboter gelernten posterioren Mittelwerts als priorer Mittelwert auf die anderen Roboter und Testfälle die Lernleistung von BO verbessert, indem eine schnellere Ganganpassung durch die Durchführung erweiterter Tests auf Benchmark-Datensätzen erreicht wird. Anschließend wird die Methode der Regleranpassung, die den vorgeschlagenen Transfer-Learning- Ansatz verwendet, in verschiedenen Aufgabenräumen mit unterschiedlichen Oberflächenhaftungen, Oberflächenrauhigkeiten und mittleren Viskositätseigenschaften demonstriert. Um die Anpassungs- leistung durch Einbeziehung multimodaler Fortbewegung weiter zu verbessern, wird in der dritten Studie die Sim-to-Real-Transferlern- methode entwickelt. In diesem Zusammenhang wird eine datengesteuerte Simulationsumgebung entworfen, deren Genauigkeit durch den Vergleich der simulierten Ergebnisse mit den physikalischen Experimenten nachgewiesen wird. Durch die Nutzung der simulierten Erfahrungen und des auf der BO basierenden Transfer-Lernens wird gezeigt, dass das Sim-zu-Real-Transfer-Lernen ein effizientes Lernen der Fortbewegung ermöglicht. Darüber hinaus wird die Angemessenheit der automatischen Bewegungsanpassung durch den auf der Kullback- Leibler-Divergenz basierenden Domänenerkennungsansatz für sich ändernde Umweltbedingungen gezeigt. ix Als sekundäres Ziel wird eine neue Lokalisierungsmethode unter Verwendung der elektrischen Impedanztomographie eingeführt. Die Anwendbarkeit des vorgeschlagenen Ansatzes wird für stationäre und bewegte Fälle in Umgebungen mit und ohne Hindernisse demonstriert. Mit diesen Beiträgen schlägt diese Arbeit ein domänenadaptives Lernsystem für die Fortbewegung vor, das die sanfte Fortbewegung von Millirobotern in die Lage versetzt, sich schnell und kontinuierlich an Umweltveränderungen anzupassen, während sie den Aktionsraum erkunden. xi Acknowledgments Since the beginning of my doctoral research in 2018, I have had the chance to meet and work with great people whose presence and collaboration have influenced my personal and professional life. I would like to thank them for their personal, philosophical, and scientific support in completing this dissertation. First, I would like to thank Prof. Dr. Metin Sitti for giving me the opportunity to work with him in the Physical Intelligence Department at the Max Planck Institute for Intelligent Systems, Stuttgart. I thank him for all the supervision, support, guidance, and freedom he has provided to pursue my research and curiosity. I would like to express my deepest gratitude to the Max Planck Society and Republic of Türkiye Ministry of National Education for supporting me and my research throughout my Ph.D. I would also like to thank my co-examiners, Prof. Dr. Syn Schmitt and Prof. Dr. Georg Martius, for their contributions to my dissertation with their invaluable feedback. I want to express my special thanks to Prof. Dr. Sebastian Trimpe for his scientific support, stimulating discussions, and influence on my research. I would like to express my gratitude to Dr. Utku Culha. His mentorship, continuous support, and scientific and philosophical insights have been a constant source of motivation and enlightenment throughout this journey. xii I also want to thank my colleagues and friends who were with me through my Ph.D. I am especially grateful to Patricia Martinez and Janina Schwartz for always being there and helping with professional and personal problems. I thank the undergraduate students I worked with, Melike Yıldırım, Ceyda Akyol, Burak Aydın, Aras Güngöre, and Mete Erdoğan, for contributing to my research. I thank the great friends I made during my Ph.D., Jack Saud, Alp. C. Karacakol, Cem B. Dayan, Jie Han, Ugur Bozuyuk, M. Birgul Akolpoglu, Zhiqiang Zheng, Gaurav Gardi, Y. Eren Suyolcu, Amir Aghakhani, Abdon Pena-Francesch, Isabella Bianchini, Ferro F. Ceylan, 'Le Petit Coq bar' family, and many others for all the good memories. I also thank Eylül Suadiye for his heartful friendship, all-time support, and coffee breaks. I especially thank M. Efe Tiryaki for his heartful friendship, all-time support, and endless questions about life, research, and technology. I would like to extend my grateful appreciation to my parents, Hülya and Sadık Demir, and my brother, U. Eren Demir, for their endless love and support. Last but most importantly, I would like to state my heartfelt thankfulness to my wife, Saadet F. Baltacı, for her all-time support, endless patience, continuous effort to motivate me, and for sharing all my happiness and grief since we met. Without her, none of these would be possible. Science is the only true guide in life. -Mustafa Kemal Atatürk- xv Table of Contents Abstract .............................................................................................. v Zusammenfassung ............................................................................. vii Acknowledgments .............................................................................. xi List of Figures ................................................................................. xvii List of Tables ................................................................................... xix 1. Introduction ................................................................................... 1 1.1. Magnetic Soft Millirobots .................................................... 1 1.2. Thesis Objectives ................................................................ 5 1.3. Outline ................................................................................ 6 1.4. List of Publications.............................................................. 9 2. Methods ....................................................................................... 12 2.1. Magnetic Soft Millirobot Design ....................................... 12 2.2. Magnetic Actuation and Feedback System ......................... 15 2.3. Gaussian Processes ............................................................ 19 2.4. Bayesian Optimization ...................................................... 24 2.5. Magnetic Soft Millirobot Simulation.................................. 27 2.6. Robot Tracking by Electrical Impedance Tomography ....... 30 3. Conclusion and Outlook ............................................................... 33 3.1. Conclusion ........................................................................ 33 xvi 3.2. Future Work...................................................................... 35 Bibliography ..................................................................................... 37 Publication 1: Learning of Sub-optimal Gait Controllers for Magnetic Walking Soft Millirobots............................................. 47 Publication 2: Task Space Adaptation via The Learning of Magnetic Soft Millirobots........................................................... 81 Publication 3: Learning Soft Millirobot Multimodal Locomotion with Sim-to-Real Transfer..................................................147 Publication 4: A Localization Method for Untethered Small-Scale Robots Using Electrical Impedance Tomography........199 xvii List of Figures Figure 1 Fabrication of the magnetic soft millirobot ...................... 12 Figure 2 Locomotion modes of the magnetic soft millirobot .......... 13 Figure 3 Manually designed magnetic actuation signals ................ 14 Figure 4 Walking performance of the magnetic soft millirobot ...... 15 Figure 5 Electromagnetic coil setup .............................................. 16 Figure 6 Electromagnetic (Helmholtz) coil setup ........................... 18 Figure 7 Sample learning run using GP-BO .................................. 21 Figure 8 Effect of the hyperparameters on GP model .................... 23 Figure 9 GP model representation of a function 𝑓(𝜃) .................... 24 Figure 10 Data-driven magnetic soft millirobot simulation .............. 28 Figure 11 Simulated locomotion modes of the magnetic soft millirobot ....................................................................... 29 Figure 12 Forward and inverse problem definitions for EIT ............ 31 Figure 13 Experimental setup used for the EIT experiments ............ 32 xix List of Tables Algorithm 1 Pseudo code for GP-BO learning run .............................. 26 1 1. Introduction 1.1. Magnetic Soft Millirobots Soft robots made of highly flexible materials can withstand and undergo both elastic and plastic deformations1. This characteristic enables them to adapt to their surroundings, absorb the excess forces without additional computation effort, and achieve various functions while maintaining a reduced system complexity compared to their rigid counterparts2–4. Therefore, they have been utilized in various fields, such as object manipulation, wearables, and biomedical devices5–9. Meanwhile, small-scale (i.e., ≤1 cm) soft robots with wireless locomotion capability have become a promising approach for medical applications as they can access previously hard-to-reach environments non-invasively10–12, including distal vascular regions and carry out medical applications, such as on-demand drug delivery, sensing, and embolization13–16, in a target location. Among the existing external actuation methods, such as heat17, light17,18, electric19, and magnetic fields17,20, magnetic actuation stands out due to its high precision, dexterity, speed, penetration depth, and biological safety features21. Moreover, magnetic actuation can be used with different imaging modalities, such as camera22, electrical impedance tomography23, x-ray13, and ultrasound24 without disturbing their functionality. Furthermore, researchers demonstrated that magnetic soft millirobots can perform various locomotion modes, such as walking, rolling, crawling, jumping, tumbling, swimming, and climbing14,15,24. While multiple robots with different magnetic profiles can generate these 2 gaits separately14, a single robot design with a preprogrammed magnetic profile can also achieve multimodal locomotion under different periodic actuation signals15–17,25, which extends the adaptation capabilities and targeted application areas of the magnetic soft millirobots. Despite their exciting capabilities and potential medical use, soft robots face challenges arising from the nature of their soft materials, i.e., fabrication-, material-, and physical-interaction-dependent variations26. Moreover, the inherent high degrees of freedom, complex dynamics, contact mechanics, and dynamically changing task environments restrict the application of conventional modeling and control methods widely used in rigid robotic systems12,27. To address these challenges, one proposed solution involves simplifying the governing models for soft millirobots, such as dividing the robot into sub-parts and using constant curvature (CC) approximation with well- established beam theories to model the kinematics and dynamics of each sub-part28,29. However, CC approximation fails when significant loads are applied to the system, and its accuracy highly depends on the discretization of the robot30. Although the Cosserat rod model offers improved accuracy, it suffers from the computationally heavy model solutions31. Numerical approaches, on the other hand, such as finite-element methods (FEMs), voxel-based representations, and discrete differential geometries, improve the computation time by modeling the continuum robot structures using a chain of rigid elements connected with tunable spring-damper mechanisms at the expense of nonlinear dynamics 3 precision32–34. While these models and simulation tools perform well for large-scale soft robots35, the lack of detailed and continuous body deformation sensing and computational complexity restrict their use in small-scale36. Alternatively, many soft millirobot applications rely on manual tuning to control the periodic actuation signals15,16,24,37. Besides manual methods, data-driven machine learning methods may also be used to control soft robots in the lack of analytical or numerical models38. One common approach is training neural network (NN) architecture with the data collected from physical experiments39–41. However, since they require a large training dataset, and the soft millirobots tend to have variations in their performance over prolonged usage due to material degradation, their use is also limited. Conversely, Bayesian optimization (BO) enables a data-efficient optimization, i.e., using a small number of physical experiments42,43. Furthermore, using BO together with Gaussian processes (GPs) improves the data efficiency and optimization performance by incorporating information as probabilistic priors44, which is demonstrated on different-size robots to optimize their locomotion performance45–48, including untethered soft millirobots49. Although data-efficient machine learning methods can be used for adaptation problems, their performance decreases as the search space enlarges since they rely on experimental data. As a solution, sim-to-real transfer learning-based control approaches have attracted significant attention in various complex real-world robotics problems. Control algorithms trained in high-fidelity rigid body simulation environments 4 now could beat human pilots in drone racing50,51, plan and control complex legged locomotion52–56, and perform complex object manipulation tasks57–60. On soft robotic systems, Hiller et al. proposed a FEM-based simulation environment to model complex material structures and multi-stimulus actuation61. The same simulation environment is further utilized to learn shape and control policy pairs in a given environment for a pneumatically actuated soft robot62. However, the sim-to-real transferability of multiple shapes and behaviors to a single robot remains an unsolved problem due to the gap between simulation and real-world performance. As locomotion learning and adaptation to environmental changes are important challenges for magnetic soft millirobots, the proposed solutions' efficacy relies on the feedback systems' performance. Therefore, various technologies well established in biomedical imaging have been explored for tracking small-scale robots, such as magnetic-field-based techniques63, ultrasound64, optical techniques65, and ionizing-radiation-based methods66. Despite their successful results, they face some limitations in terms of cost, scanning speed, penetration depth, and adverse health effects. As an alternative to these approaches, localization via electrical impedance variations in the environment has been demonstrated67. Electrical impedance tomography (EIT) exploits electrical stimulations and measurements through the surface of an object to noninvasively image the interior of a domain by estimating the internal distribution of the electrical properties. EIT has been used in medical imaging for various physiological phenomena, such as breathing68, cardiac function69, and brain activity70. While it does not cause any risk 5 to the patient, unlike the prolonged use of ionizing radiations, such as X-rays, it can image deep tissues, contrary to optical techniques, at high speed71 and with low cost72. Moreover, since it is decoupled from the magnetic actuation, it can be used without damaging the utilized robots' magnetic profile, unlike magnetic resonance imaging (MRI). 1.2. Thesis Objectives There is a growing interest in the field of soft millirobots and their application in biomedical targeting scenarios. However, there is a noticeable gap in addressing the control aspect of these robots, especially given their intended use in dynamic biomedical environments. Moreover, soft millirobots exhibit differences from one another due to fabrication variances and face performance lost over time due to material degradation. The primary objective of this thesis is to establish a data-efficient adaptive locomotion framework for targeted application scenarios. This framework leverages BO with GP to enhance locomotion learning in a data-efficient way. Additionally, a new transfer learning approach is proposed for GP-BO to further improve data efficiency and optimization performance, demonstrating efficacy for different robot and task environments. To expand the applicability of the proposed adaptive locomotion approach for more complex test scenarios and larger search spaces, a data-driven simulation environment is designed to model the soft millirobot's behavior, and a sim-to-real transfer learning approach is 6 employed to optimize the robot's performance in the real world. This sim-to-real transfer learning approach also enables the discovery of new locomotion modes. Finally, a localization method based on electrical impedance tomography is proposed as an alternative to common biomedical imaging techniques to track the soft millirobot's position in closed and nontransparent environments. These objectives collectively contribute to developing soft millirobots and their application in dynamic and diverse scenarios. 1.3. Outline This thesis is divided into three chapters and four appendices. This first chapter lays down the problem description and motivation of this thesis, including the formulation of the research goal and objectives. The second chapter starts with defining the magnetic soft millirobot design and magnetic actuation and feedback systems used in this thesis. Then, it continues by explaining the utilized methods: Gaussian processes, Bayesian optimization, data-driven magnetic soft millirobot simulation, and electrical impedance tomography. At last, the third chapter concludes by summarizing the contributions and presenting a future outlook for soft millirobots. Each appendix presents a publication that supports the thesis objectives. The first publication 26 showed the performance differences between reported results in 24 and three robots fabricated following the same procedures by running repeated experiments. Then, GP-BO was 7 demonstrated for the walking gait adaptation of different robots. Moreover, it was shown that transferring the posterior mean learned by one robot as the prior mean to other robots improved learning performance by achieving quicker gait adaptation. The second publication 22 starts by testing the efficacy of the proposed transfer learning approach in the first publication. To this end, it compared four possible transfer learning methods with each other and the standard GP-BO by running augmented tests on five benchmark datasets consisting of 3750 physical experiments. The results showed that the proposed posterior mean transfer method outperformed the rest by achieving quicker gait adaptation and larger stride lengths for all five test cases. Afterward, the efficacy of the adaptive learning method was demonstrated in a wide range of task spaces with varying (1) surface adhesion, (2) surface roughness, and (3) the medium viscosity properties. The third study 73 addresses the challenges of domain adaptive magnetic soft millirobot locomotion using a sim-to-real transfer learning method. To this end, a data-driven simulation environment was designed. Leveraging the simulated experience and GP-BO based transfer learning, it was demonstrated that sim-to-real transfer learning provided efficient locomotion learning. Moreover, the adequacy of the automated locomotion adaptation through the Kullback-Leibler divergence-based domain recognition approach was shown in changing environmental conditions. The last study 23 introduces using electrical impedance tomography to track the position of the magnetic soft millirobot. After showing that the 8 EIT and the magnetic actuation system are decoupled, the localization of the robot was demonstrated for stationary and moving cases in environments with and without any obstacles. 9 1.4. List of Publications This dissertation combines four first-author peer-reviewed papers listed below. The papers are attached to the appendix of this dissertation. Publication 1 U. Culha*, S. O. Demir*, S. Trimpe, M. Sitti, "Learning of Sub-optimal Gait Controllers for Magnetic Walking Soft Millirobots", Robotics: Science and Systems 2020, doi: 10.15607/RSS.2020.XVI.070. (* Co-first authors) Publication 2 S. O. Demir*, U. Culha*, A. C. Karacakol, A. Pena- Francesch, S. Trimpe, M. Sitti, "Task Space Adaptations via The Learning of Gait Controllers of Magnetic Soft Millirobots", International Journal of Robotics Research, 2021, doi: 10.1177/02783649211021869. (* Co-first authors) Publication 3 S. O. Demir, M. E. Tiryaki, A. C. Karacakol, M. Sitti, "Learning Soft Millirobot Multimodal Locomotion with Sim-to-Real Transfer", Advanced Science, 2024, doi: 10.1002/advs.202308881 10 Publication 4 H. Daguerre*, S. O. Demir*, U. Culha, F. Marionnet, M. Gauthier, M. Sitti, A. Bolopion, "A Localization Method for Untethered Small-Scale Robots Using Electrical Impedance Tomography", IEEE/ASME Transactions on Mechatronics, 2022, doi: 10.1109/TMECH.2022.3142924. (* Co-first authors) Other publications, which I contributed and not part of this thesis, are listed below. - Z. Zheng*, J. Han*, Q. Shi*, S. O. Demir, W. Jiang, M. Sitti, "Single- step precision programming of decoupled multiresponsive soft millirobots", Proceedings of National Academy of Sciences, 2024, doi: 10.1073/pnas.2320386121 - C. B. Dayan, D. Son, A. Aghakhani, Y. Wu, S. O. Demir, M. Sitti, "Machine Learning-Based Shear Optimal Adhesive Microstructures with Experimental Validation", Small, 2023, doi: 10.1002/smll.202304437. - U. Bozuyuk, E. Yildiz, M. Han, S. O. Demir, M. Sitti, "Size- Dependent Locomotion Ability of Surface Microrollers on Physiologically Relevant Microtopographical Surfaces", Small, 2023, doi: 10.1002/smll.202303396. - Z. Zheng, J. Han, S. O. Demir, H. Wang, W. Jiang, H. Liu, M. Sitti "Electrodeposited Superhydrophilic-Superhydrophobic Composites for Untethered Multi-Stimuli-Responsive Soft Millirobots", Advanced Science, 2023, 10, doi: 10.1002/advs.202302409. 11 - Z. Zheng, H. Wang, S. O. Demir, Q. Huang, T. Fukuda, M. Sitti, "Programmable aniso-electrodeposited modular hydrogel microrobots", Science Advances, 2022, 8, doi: 10.1126/sciadv.ade6135. - M. E. Tiryaki, S. O. Demir, M. Sitti, "Deep Learning-based 3D Magnetic Microrobot Tracking using 2D MR Images", IEEE Robotics and Automation Letters, 2022, 7, 6982-6989, doi: 10.1109/LRA.2022.3179509 - U. Bozuyuk, E. Suadiye, A. Aghakhani, N. O. Dogan, J. Lazovic, M. E. Tiryaki, M. Schneider, A. C. Karacakol, S. O. Demir, G. Richter, M. Sitti, "High-Performance Magnetic FePt (L10) Surface Microrollers Towards Medical Imaging-Guided Endovascular Delivery Applications", Advanced Functional Materials, 2022, 8, doi: 10.1002/adfm.202109741. - F. N. P. Basualdo, G. Gardi, W. Wang, S. O. Demir, A. Bolopion, M. Gauthier, P. Lambert, M. Sitti, "Control and Transport of Passive Particles Using Self-Organized Spinning Micro-Disks", IEEE Robotics and Automation Letters, 2022, 7, 2156-2161, doi: 10.1109/LRA.2022.3143306. - G. Dogan, S. O. Demir, R. Gutzler, H. Gruhn, C. B. Dayan, U. T. Sanli, C. Silber, U. Culha, M. Sitti, G. Schütz, C. Grevent, K. Keskinbora, "Bayesian Machine Learning for Efficient Minimization of Defecs in ALD Passivation Layers", ACS Applied Materials & Interfaces, 2021, 13, 54503-54515, doi: 10.1021/acsami.1c14586 12 2. Methods 2.1. Magnetic Soft Millirobot Design In this study, the performance of the proposed learning approaches was evaluated using a magnetic soft millirobot design previously introduced by Hu et al.24. The chosen design (Figure 1) was selected for its ability to exhibit multimodal locomotion, such as rolling, walking, and crawling (Figure 2), by controlling the applied magnetic field acting on the robot (Figure 3). Moreover, this design has been employed in various prior studies, demonstrating its adaptability to diverse environmental conditions and tasks through manual control15,16,74. The robot was fabricated by mixing Ecoflex 00-10 (Smoot-On Inc.) and neodymium-iron-boron (NdFeB) magnetic particles with around 5𝜇𝑚 diameter (MQP-15-7, Magnequench) with a 1:1 body mass ratio. After curing the pre-polymer mixture on a methyl methacrylate plate, the robots were cut using a high-resolution laser cutter (LPKF Protolaser U4) with dimensions of length 𝐿 = 3.7𝑚𝑚, width 𝑤 = 1.5𝑚𝑚, and height ℎ = 185𝜇𝑚.To magnetize the robots, they were folded around a cylindrical Figure 1 Fabrication of the magnetic soft millirobot. (a) The sheet-shaped soft millirobot was cut from the cured polymer mixture and magnetized inside a homogeneous magnetic field with a magnitude of |B|=1.8T (red arrow) after folding around a cylindrical rod. (b) The unfolded robot maintained a periodic magnetization profile (blue arrows) along its body. (c) Image of the fabricated and magnetized soft millirobot. 13 Figure 2 Locomotion modes of the magnetic soft millirobot. Physical performance of the magnetic soft millirobot during (a) walking, (b) rolling, and (c) crawling locomotion modes generated by manually designed actuation signals24. rod with a circumference equal to 𝐿 and placed inside a uniform magnetic field with a magnitude of 1.8𝑇. The magnetic field was oriented at 45° counterclockwise from the y-axis. After separation from the rod, the magnetic particles retained their magnetization orientation, forming a circular profile along the robot body (Figure 1b). To actuate the robot, the homogeneous magnetic field in the environment and the resultant magnetic torque acting on the robot were continuously modulated (Figure 3). 14 Figure 3 Manually designed magnetic actuation signals to generate predefined locomotion modes, (a) walking, (b) rolling, and (c) crawling24. The top and bottom rows show the direction and magnitude of the applied magnetic field in degrees and mT, respectively. The yellow arrows represent the magnetic profile of the robots. To validate the repeatability of the reported results and the accuracy of the proposed mathematical model, three robots were fabricated and tested using the parameter sets reported by Hu et al.24. The results presented in Figure 4 revealed the following, - Locomotion performance showed clear inconsistency due to the variations in the fabrication and environmental factors, even though the same materials, methods, controller parameters, and walking surfaces were used. - Using a model-based controller was not possible due to non-monotonic behavior with increasing f, unlike the model prediction. These results align with the objectives of this dissertation, which aim to develop a data-efficient controller learning system capable of robustly handling variations arising from the material properties, fabrication 15 Figure 4 Predicted and observed walking performance of the magnetic soft millirobot in terms of stride length S. Previously reported model and experimental results (robotR) were compared to the three replica robots (robot1,2,3) tested with the same controller parameters24. Each data point for robot1,2,3 represents the mean of 10 experiments and the error bars show the standard deviation. processes, and the task environment of small-scale, medical-oriented, untethered soft robots. 2.2. Magnetic Actuation and Feedback System Two distinct electromagnetic coil setups were designed for the studies reported in this dissertation (Figure 5). The first coil setup consisted of three orthogonal pairs of custom-made electromagnets (Figure 5a). It could generate a 3D uniform magnetic field within a 4×4×4cm3 workspace with a maximum field strength of 15mT. The magnetic field was indirectly controlled by regulating the voltage values through six motor driver units (SyRen25) and an Arduino microcontroller. The 16 Arduino was responsible for processing analog readings from the current sensors, regulating the voltage outputs of the motor drivers, and handling all necessary communications. The robot's behavior was tracked by two high-speed cameras (Basler aCa2040-90uc) (Figure 5b). The first camera, running at 120 frames per second (fps), was placed orthogonally to the robot's motion plane (yz-plane) and was used to detect and evaluate the robot's locomotion mode. The second camera, operating at 60 fps, provided an isometric view of the test area and measured the robot's displacement following the perspective correction. At the end of every experiment, the system automatically commanded the robot to return to its starting position to minimize any human intervention and to avoid damage to the robot while handling. The second coil setup was built following the Helmholtz coil design to enlarge the workspace with a uniform magnetic field to 13.1×8.5×4.5cm3 Figure 5 Electromagnetic coil setup. (a) Magnetic actuation setup with six electromagnetic coils could generate a 3D uniform magnetic field within a 4×4×4cm3 workspace with a maximum value of 15mT. (b) Two high-speed cameras with isometric and front views observed and evaluated the robot's motion in real-time. 17 (Figure 6a). Moreover, this design provided clear side, front, and top views while allowing new extra imaging modalities, such as C-arm and multispectral optoacoustic tomography device (MSOT 512-element transducer, iThera Medical), for future studies. Unlike the first setup, the magnetic field was directly controlled by modulating the currents on the coils through six motor driver units (Maxon ESCON 70/10). To enhance the achievable actuation frequency, an FPGA module (NI PXIe-7847R) was used as the interface to control the motor drivers, receive current readings, and communicate with the master PC. The robot's motion was tracked with two high-speed cameras (Basler aCa2040-90uc) running at 120 fps (Figure 6d). The first camera with a top view measured the robot's displacement, while the second one identified the robot's locomotion mode. The top view of the test area, captured without obstruction or perspective correction, expanded the range of experimental scenarios. Moreover, to enlarge the workspace having a homogeneous 3D magnetic field and to test the robot's performance for longer runs without reaching the workspace's limits, we integrated a motorized linear stage with 150 mm stroke (Thorlabs LTS150C) to the y-axis of the experimental setup. The linear stage continuously adjusted its position based on the displacement information received from the imaging system to keep the robot centered in the magnetic field. The entire software architecture, such as learning algorithms, actuation signal generation, and image processing codes, ran on a master PC. Communication between different elements of the experimental system (e.g., image capture and electric current control) was executed on ROS 18 architecture, which allowed the system to be scalable for further extensions. Moreover, the automated experimental platforms minimized the need for human intervention during physical experiments, reducing disturbances introduced by human interactions with the soft millirobot and the test environment. This approach maintained the repeatability of the physical learning experiments while avoiding human-induced alterations to the soft millirobot's behavior. Figure 6 Electromagnetic (Helmholtz) coil setup. (a) Helmholtz coil setup with six electromagnetic coils could generate a 3D uniform magnetic field within a 13.1×8.5×4.5cm3 workspace with a maximum value of 12 mT. (b) Two high- speed cameras with top and front views observed and evaluated the robot's motion in real-time. The linear stage attached to the xy-plane enlarged the test area with the homogeneous magnetic field on the y-axis. 19 2.3. Gaussian Processes The performance of the magnetic soft millirobot, even when the actuation signal is defined by a single variable (e.g. Figure 4), cannot be modeled or predicted accurately due to its inherent complexity. Furthermore, soft millirobots tend to exhibit variations in performance over time, primarily due to material degradation. These factors limit the size of the physical dataset that can be generated for analysis. To overcome the sparsity in the dataset, to include uncertainties coming from the physical experiments, and to make probabilistic predictions, GP was used to map the input parameter set 𝜃 to scalar reward values 𝑅(𝜃), i.e., the stride length of the magnetic soft millirobot in the event of locomotion optimization. 𝑅(𝜃)~𝐺𝑃 𝜇(𝜃), 𝑘(𝜃, 𝜃′) , (1) where 𝑅(𝜃) is the reward function mapping the input parameter 𝜃 to scalar reward values, 𝜇(𝜃) denotes the prior mean for the input parameter 𝜃, and 𝑘(𝜃, 𝜃′) is the kernel function defining the covariance between 𝑅(𝜃) and 𝑅(𝜃′) for 𝜃, 𝜃′ ∈ Θ, where Θ defines the complete search space of input parameters. For the cases where the performance metric can only be measured with noise, then the observed reward value 𝑅(𝜃) is defined as follows, 𝑅(𝜃) = 𝑅(𝜃) + 𝑛, (2) where 𝑛 stands for the zero-mean Gaussian noise with variance 𝜎𝑛 2 for each measurement. At each iteration of the optimization run, the GP model is updated with 𝑅(𝜃) as shown in Figure 7. Then, the expected 20 value of 𝑅(𝜃) can be predicted with variance for any given 𝜃 using the test data 𝐷 = {𝜃𝑖, 𝑅(𝜃𝑖)}𝑖=1 𝑁 , where 𝑁 is the size of the dataset 𝐷, as follows: 𝜇𝑝𝑜𝑠𝑡(𝜃) = 𝜇(𝜃) + 𝑘𝑇(𝜃)𝐾−1𝑦, (3) 𝜎𝑝𝑜𝑠𝑡 2 (𝜃) = 𝑘(𝜃, 𝜃) − 𝑘𝑇(𝜃)𝐾−1𝑘(𝜃), (4) 𝑅𝑝𝑜𝑠𝑡(𝜃)|𝐷~𝒩 𝜇𝑝𝑜𝑠𝑡(𝜃), 𝜎𝑝𝑜𝑠𝑡 2 (𝜃) , (5) where 𝑘(𝜃), 𝑦 ∈ ℝ𝑁, and 𝐾 ∈ ℝ𝑁×𝑁 denote [𝑘(𝜃)]𝑖 = 𝑘(𝜃, 𝜃𝑖), 𝑦𝑖 = 𝑅(𝜃𝑖) − 𝜇(𝜃𝑖), and 𝐾𝑖,𝑗 = 𝑘 𝜃𝑖 , 𝜃𝑗 + 𝛿𝑖,𝑗𝜎𝑛 2 with Kronecker delta 𝛿𝑖,𝑗 , respectively. Since the kernel encodes the prior assumptions on the approximated reward function, such as the smoothness and periodicity of the signal, its definition and hyperparameter settings directly affect the prediction accuracy of the GP. Based on the results presented by Von Rohr et al.49, squared exponential was selected as the kernel function, which is defined for multi-dimensional cases as follows: 𝑘𝑆𝐸(𝜃, 𝜃′) = 𝜎𝑓 2 𝑒𝑥𝑝 − ∑ 𝜃𝑑−𝜃𝑑 ′ 2 2𝑙𝑐,𝑑 2 𝑑𝑐 𝑑=1 , (6) where 𝑙𝑐 ∈ ℝ𝑐 𝑑 is the length scale defining the rate of change in the approximated function for each parameter space dimension and 𝜎𝑓 2 is the signal variance describing the uncertainty in the predictions for unobserved parameter sets75. 21 Figure 7 Sample learning run to maximize a function 𝒇(𝜽) using Bayesian optimization (BO) with Gaussian processes (GPs). The left column shows the posterior GP model as new data points are observed. The right column shows the value of the acquisition function at the end of each iteration. The next parameter set (𝜃𝑛𝑒𝑥𝑡) chosen by BO to be used in the next iteration is shown by orange mark on the acquisition function plots. 22 As illustrated in Figure 8, hyperparameter selection has a vital role in the prediction accuracy of GP. One crucial hyperparameter is the length scale. While long length scales are used to model slowly-varying functions, short length scales are used to model quickly-varying functions, enabling the GP to capture rapid variations within the data (Figure 8a). Another key hyperparameter is the signal variance (𝜎𝑓 2), which influences the exploration-exploitation balance during the optimization run. A higher value of 𝜎𝑓 2 encourages exploration by increasing the prediction uncertainty, which can be useful for discovering optimal solutions. Conversely, a lower value promotes exploitation (Figure 8b). Noise in the observation (𝜎𝑛 2) is another important hyperparameter controlling the impact of a new data point on the posterior mean function (Figure 8c). In addition to the kernel selection and hyperparameter definition, the prior mean (𝜇(𝜃)) has an important effect on the performance of GP. While the prior mean is often assumed to be constant and set to zero (i.e., 𝜇 = 0) in the context of maximization problems76, its selection significantly impacts optimizing performance by controlling the exploration- exploitation balance. To illustrate, a constant zero mean may cause the optimizer to get stuck on local optima for a maximization problem with large reward values by making it believe it has already found the best- performing result. On the other hand, a constant non-zero mean may lead the optimizer to keep exploring new regions in the search space even after finding the global optimum by making it assume that it evaluated poor- performing parameter sets. 23 Figure 8 Effect of the hyperparameters on the prediction accuracy of Gaussian processes, (a) length scale (𝑙𝑐), (b) signal variance 𝜎𝑓 , and (c) observation noise (𝜎𝑛). Alternative to the constant prior mean approach (Figure 9a,c), the transfer of information from previous learning runs is proposed in this thesis. In this approach, the prior mean was set equal to the posterior mean of a previously trained GP model (Figure 9b,d). It enabled the transfer of the topology of the target function across different test scenarios, as long as variations between the robots and the environments did not significantly alter the function shape. 24 Figure 9 Gaussian process (GP) model representation of a function 𝒇(𝜽). (a), (c) show the GP model prediction at the beginning and after completing 5 iterations of learning run found by setting the prior mean function to constant mean. (b), (d) show the GP model prediction at the beginning and after completing 5 iterations of learning run found by setting the prior mean function to previously learned posterior mean defined by the training data points shown in green markers. 2.4. Bayesian Optimization BO was chosen as the optimization method for the studies conducted in the scope of this thesis because it allows the maximization of a performance function, i.e., the stride length of the magnetic soft millirobot in the event of locomotion optimization, using a small number of physical experiments42,43. At each iteration of the learning run, BO selects the next parameter set (𝜃𝑛𝑒𝑥𝑡) maximizing the acquisition function (𝛼𝑎𝑐𝑞(𝜃)) as follows: 𝜃𝑛𝑒𝑥𝑡 = argmax 𝜃∈𝛩 𝛼𝑎𝑐𝑞(𝜃), (7) 25 The acquisition function was selected as the expected improvement (EI) due to its better performance compared to its alternatives in similar test scenarios49. Additionally, the exploration-exploitation balance capability made EI a particularly effective choice. EI is defined as, 𝛼𝑎𝑐𝑞(𝜃) = 𝔼 max 0, 𝑅(𝜃) − 𝑅(𝜃∗) , (8) where 𝑅(𝜃∗) is the highest observed reward function value77. The analytical solution for Eq. (8) is given as 𝛼𝑎𝑐𝑞(𝜃) = 𝜇(𝜃) − 𝑅(𝜃∗) − 𝜉 Φ(𝑍) + 𝜎(𝜃)𝜙(𝑍), (9) where Φ and 𝜙 are the Gaussian cumulative density and probability density functions, respectively78. 𝑍 is defined as 𝑍 = 𝑍(𝜃) = (𝜇(𝜃) − 𝑅(𝜃∗) − 𝜉)/𝜎(𝜃), with 𝜇(𝜃) and 𝜎(𝜃) are calculated by Eq.s (3) and (4). The two terms in Eq. (9) represent the exploitation and exploration weights of the BO, respectively. Their balance is controlled by setting the hyperparameter 𝜉. As 𝜉 gets higher, BO tends to choose the parameter set in unobserved regions of the search space. BO focuses more on exploitation by testing parameters close to already explored regions as 𝜉 gets lower. A sample run of GP-BO maximizing a function 𝑓(𝜃) is given in Figure 7 to show the posterior GP models and calculated acquisition function values at each iteration. The pseudocode showing the detailed flow of the learning run using GP-BO is given in Algorithm 1. 26 Algorithm 1 Pseudo code for the Bayesian optimization with Gaussian processes learning run. Inputs: Search space, Θ Prior mean function, 𝜇(𝜃) Output: Best performing parameter set, 𝜃∗ 𝑟𝑒𝑠𝑢𝑙𝑡𝐴𝑟𝑟𝑎𝑦 ← Initialize an empty array to store 𝑅(𝜃) and 𝜃 𝐺𝑃 ← Initialize GP with 𝜇(𝜃) for 𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛 < 𝑚𝑎𝑥𝐼𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛 do 𝜇𝑝𝑜𝑠𝑡 , 𝜎𝑝𝑜𝑠𝑡 = 𝐺𝑃(𝜃) 𝛼𝑎𝑐𝑞 = 𝐸𝐼 𝜇𝑝𝑜𝑠𝑡 , σ𝑝𝑜𝑠𝑡 𝜃𝑛𝑒𝑥𝑡 = argmax 𝜃∈Θ 𝛼𝑎𝑐𝑞(𝜃) 𝑅(𝜃𝑛𝑒𝑥𝑡) ← Test 𝜃𝑛𝑒𝑥𝑡 by running experiment 𝐺𝑃 ← Update 𝐺𝑃 with 𝑅(𝜃𝑛𝑒𝑥𝑡) 𝑟𝑒𝑠𝑢𝑙𝑡𝐴𝑟𝑟𝑎𝑦 ← Add [𝜃𝑛𝑒𝑥𝑡, 𝑅] to 𝑟𝑒𝑠𝑢𝑙𝑡𝐴𝑟𝑟𝑎𝑦 end 𝜃∗ = argmax 𝜃∈Θ (𝑟𝑒𝑠𝑢𝑙𝑡𝐴𝑟𝑟𝑎𝑦) return 𝜃∗ 27 2.5. Magnetic Soft Millirobot Simulation Even though automated physical experiments can generate the required data to optimize the robot's locomotion performance in different conditions, as the search space enlarges and environmental topologies substantially vary, running physical experiments becomes impractical 26,79. Therefore, a data-driven simulation environment was designed based on the open-source software Voxelyze, as it could efficiently simulate heterogeneous 3D rigid and soft bodies 80. The dynamic behavior of the magnetic and nonmagnetic rigid and soft materials is modeled as a mass-spring-damper system (Figure 10). Multi- body interaction is modeled by adapting the contact mechanics of Voxelyze, defining the interaction between the robot and the floor. To model the magnetic actuation, first, the magnetic torque acting on each voxel due to the external homogeneous field is calculated and then added into dynamic equations. Magnetic torque 𝜏𝑡 acting on a voxel of the robot at time step 𝑡 is calculated as follows. 𝜏𝑡 = 𝑚𝑡 × 𝐵𝑡 , (10) 𝑚𝑡 = 𝑀𝑟𝑑𝑣 3𝑅𝑡, (11) where 𝐵𝑡 , 𝑀𝑟 and 𝑑𝑣 denote the homogeneous magnetic field at time 𝑡, magnetic remanence, and voxel size, respectively. 𝑅𝑡 is the rotational matrix defining the magnetic orientation of the voxel at time step 𝑡 (Figure 10d). 28 Figure 10 Simulation parameters of the data-driven magnetic soft millirobot simulation. Schematic representation of the magnetic soft millirobot simulation by mass-spring-damper model between (a) colliding voxels, (b) connected voxels, (c) voxel and the surrounding, and (d) magnetic torque acting on a voxel due to external magnetic field 𝐵. For all the simulations, the voxel size (𝑑𝑣) was set to 185𝜇𝑚. Density (𝜌), Young's modulus (𝐸), and magnetic remanence (𝑀𝑟) values for the magnetic soft millirobot were taken from 24 as 1.86𝑔/𝑐𝑚3, 8.45𝑘𝑃𝑎, and 62𝑘𝐴/𝑚, respectively. The Poisson's ratio is assumed to be 0.49. Simulated results for walking, rolling, and crawling locomotions of the magnetic soft millirobot actuated by the same control signals in Figure 2 are shown in Figure 11. 29 Figure 11 Simulated locomotion modes of the magnetic soft millirobot. Simulated performance of the magnetic soft millirobot during (a) walking, (b) rolling, and (c) crawling locomotion modes generated by manually designed actuation signals24. 30 2.6. Robot Tracking by Electrical Impedance Tomography EIT is proposed as an alternative solution to the camera-based localization of the magnetic soft millirobot, as it can be used in nontransparent working environments. EIT estimates the internal structure of the measured object or the environment by processing the emitted and measured electrical signals, i.e., voltages or currents, applied through multiple electrodes attached to the surface. The forward problem focuses on modeling these electrical measurements at any position on the domain's boundary using the internal conductivity distribution 𝜎 and the stimulating current injection. It can be solved by numerical methods, such as FEM. Conversely, the inverse problem aims to reconstruct the conductivity distribution via the electrical measurements and needs to be solved to detect the robot's position (Figure 12). However, the inverse problem is particularly complex due to nonuniqueness and the instability of the solution81. Thus, the solver developed by Adler et al.81 employing a spatiotemporal resolution approach is used with the open-source software EIDORS82. To stimulate and collect the necessary electrical measurements from the environment, a circular test area surrounded by eight evenly distributed electrodes was designed (Figure 13b). Although the number of electrodes could vary, it was determined by the maximum number of analog input channels available in the data acquisition system (PXIe-7847R, National Instruments). The electrodes were excited and measured following a full- scan stimulation pattern involving a single current source and a sink 31 (Figure 13e). As a result, 28 different stimulation combination was generated in the eight-electrode EIT system, and they were given to the inverse problem solver as input data to reconstruct the conductivity map of the environment. Figure 12 Forward and inverse problem definitions for EIT. An electrical stimulation was applied through the boundary of the measured domain, and the resulting voltages were measured. The forward problem provides the measurements from the knowledge of the medium properties, while the inverse problem estimates the medium properties from the measurements. 32 Figure 13 Experimental setup and the electrical stimulation pattern used for the EIT experiments. (a) Image of the magnetic actuation system (Figure 5a) with the test area surrounded by the eight electrodes of the EIT system and a high-speed camera to generate the ground truth data. (b) Top-view image and drawing of the test area showing the electrode positions and a sample robot in the test fluid. (c) Uniform magnetization profile of the magnetic small-scale robot along its body (blue arrows). (d) Sample movement of the magnetic small- scale robot by applying a rotating magnetic field 𝐵 shown by red arrows. (e) Illustration of the full-scan EIT stimulation scheme. All electrode pairs were successively selected as driving electrodes. 33 3. Conclusion and Outlook 3.1. Conclusion This dissertation represents a comprehensive exploration of data-efficient adaptive locomotion for soft millirobots demonstrated on a sheet-shaped elastomeric magnetic soft millirobot with the reported publications. The first paper in Appendix A, "Learning of Sub-optimal Gait Controllers for Magnetic Walking Soft Millirobots", showed the performance variations in magnetic soft millirobots by repeated tests. It proposed using GP-BO to adapt to the variabilities inherent in soft robots, even in the absence of model-based control. Next, the second paper in Appendix B, "Task Space Adaptations via the Learning of Magnetic Soft Millirobots", demonstrated the controller parameters learning for the walking gait of the magnetic soft millirobot by GP-BO. To this end, a benchmark dataset consisting of 3750 experimental results was generated through exhaustive grid search, and the effectiveness of the prior mean transfer in GP-BO was shown by running 104 augmented tests. The results highlighted that transferring prior mean information improved the learning performance by reducing the required experiments to find parameter sets generating walking locomotion. The performance of the proposed transfer learning approach was tested further in various task spaces with changing surface adhesion, surface roughness, and medium viscosity. Apart from the proposed learning approach, the electromagnetic coil setup built in the scope of this study has allowed the running of automated experiments requiring 34 minimized human intervention, making running repeated experiments with lower performance variations possible. The third paper in Appendix C, "Learning Soft Millirobot Multimodal Locomotion with Sim-to-Real Transfer", introduced sim-to-real transfer learning to improve the optimization performance in larger search spaces. In this regard, a data-driven soft millirobot simulation was designed. Using the previously presented transfer learning approach to bridge the gap between simulation and real-world performance, the efficacy of a priori knowledge generation in the simulation and performing sim-to-real transfer learning was demonstrated in various environments and robots. Moreover, an autonomous environment identification method was proposed by matching experimental results to simulated data. Its performance was demonstrated with the autonomous locomotion adaptation of the robot in previously not encountered conditions. The fourth paper in Appendix D, "A Localization Method for Untethered Small-Scale Robots Using Electrical Impedance Tomography", demonstrated an alternative solution to track the position of the magnetic soft millirobot using EIT, which was shown to operate without interfering with the magnetic actuation system. To summarize, this thesis presents a holistic approach to advancing the field of soft millirobots, encompassing adaptive control, simulation, and alternative tracking methods. The findings and proposed methodologies contribute to the data-efficient, adaptive, and versatile locomotion capabilities of soft millirobots. 35 3.2. Future Work The research presented in this dissertation provides a solid foundation for future explorations and advancements in the field of soft millirobots. While ongoing research into small-scale fabrication techniques may address reproducibility concerns83–85, the inherent variability and the performance decrease over prolonged use due to material degradations in soft millirobots underscore the continued need for adaptive control. The learning method and simulation presented in this thesis are demonstrated on a specific sheet-shaped magnetic soft millirobot, but they can be applied to different soft robot shapes and materials25 and various tasks in more complex environments, such as climbing, path following, and velocity control in 3D confined spaces15. The adaptability of these approaches opens up exciting possibilities toward the real-world applications of these robots, especially with the sim-to-real transfer learning method. Moreover, the developed simulation environment enables the exploration of alternative control and adaptation approaches. Furthermore, the successful application of sim-to-real transfer learning introduces the potential to plan robotic operations beforehand, such as in medical procedures. The effectiveness of sim-to-real transfer learning and adaptive control methods based on BO has been validated in various test cases; however, these methods may face challenges in dynamic environments with momentary changes owing to the episodic nature of BO. One potential solution would be using continuous control algorithms, such as deep reinforcement learning (DRL)55,86,87. However, these algorithms, 36 especially the neural network-based ones, require a larger training datasets. Therefore, as the next step, the simulation environment will be updated to integrate adhesion mechanics and fluidic interactions. This will enable modeling the robot's dynamics in scenarios involving sticky surfaces, such as biological tissues covered by mucus, and diverse fluid environments, such as blood. Moreover, simulation speed will be improved to explore planning and new continuous control algorithms. This can be achieved by limiting the update in the simulation to a local frame where the robot is located, thereby omitting unnecessary calculations in the rest of the simulated environment. The possibility of replacing the simulation environment with a deep neural network model will also be explored88. In addition to adaptive control, the GP-BO-based learning method can be further used to design the robots, i.e., their morphological and magnetic properties, and to learn actuation signals for a given task without requiring physical experiments55,87,89. Furthermore, the domain identification methods introduced in this research can be used for enhancing localization accuracy and mapping complex environments. As the field of soft millirobots continues to evolve, these future research directions will play a pivotal role in enhancing their adaptability and expanding their applicability across various domains. 37 Bibliography 1. Majidi, C. Soft Robotics: A Perspective - Current Trends and Prospects for the Future. Soft Robot 1, 5–11 (2014). 2. Polygerinos, P. et al. Soft Robotics: Review of Fluid-Driven Intrinsically Soft Devices; Manufacturing, Sensing, Control, and Applications in Human-Robot Interaction. Adv Eng Mater 19, 1700016 (2017). 3. Haddadin, S., De Luca, A. & Albu-Schäffer, A. Robot collisions: A survey on detection, isolation, and identification. IEEE Transactions on Robotics 33, 1292–1312 (2017). 4. Hughes, J. et al. Soft manipulators and grippers: A review. Frontiers Robotics AI 3, 1–12 (2016). 5. Gu, G. et al. A soft neuroprosthetic hand providing simultaneous myoelectric control and tactile feedback. Nature Biomedical Engineering 2021 7:4 7, 589–598 (2021). 6. Yin, J. et al. Wearable Soft Technologies for Haptic Sensing and Feedback. Adv Funct Mater 31, 2007428 (2021). 7. Li, M., Pal, A., Aghakhani, A., Pena-Francesch, A. & Sitti, M. Soft actuators for real-world applications. Nature Reviews Materials 2021 7:3 7, 235–249 (2021). 8. Reeder, J. T. et al. Soft, bioresorbable coolers for reversible conduction block of peripheral nerves. Science (1979) 377, 109– 115 (2022). 9. Cianchetti, M., Laschi, C., Menciassi, A. & Dario, P. Biomedical applications of soft robotics. Nature Reviews Materials 2018 3:6 3, 143–153 (2018). 38 10. Hines, L., Petersen, K., Lum, G. Z. & Sitti, M. Soft Actuators for Small-Scale Robotics. Advanced Materials 29, (2017). 11. Wang, B., Kostarelos, K., Nelson, B. J. & Zhang, L. Trends in Micro-/Nanorobotics: Materials Development, Actuation, Localization, and System Integration for Biomedical Applications. Advanced Materials 33, (2021). 12. Sitti, M. Miniature soft robots — road to the clinic. Nature Reviews Materials 2018 3:6 3, 74–75 (2018). 13. Wang, T. et al. Adaptive wireless millirobotic locomotion into distal vasculature. Nat Commun 13, 1–17 (2022). 14. Soon, R. H. et al. Pangolin-inspired untethered magnetic robot for on-demand biomedical heating applications. Nat Commun 14, 1–15 (2023). 15. Wu, Y., Dong, X., Kim, J. K., Wang, C. & Sitti, M. Wireless soft millirobots for climbing three-dimensional surfaces in confined spaces. Sci Adv 8, (2022). 16. Ren, Z. et al. Soft-bodied adaptive multimodal locomotion strategies in fluid-filled confined spaces. Sci Adv 7, (2021). 17. Zheng, Z. et al. Electrodeposited Superhydrophilic- Superhydrophobic Composites for Untethered Multi-Stimuli- Responsive Soft Millirobots. Advanced Science 10, (2023). 18. Pilz Da Cunha, M., Debije, M. G. & Schenning, A. P. H. J. Bioinspired light-driven soft robots based on liquid crystal polymers. Chem Soc Rev 49, 6568–6578 (2020). 19. Li, G. et al. Self-powered soft robot in the Mariana Trench. Nature 591, 66–71 (2021). 39 20. Son, D., Gilbert, H. & Sitti, M. Magnetically Actuated Soft Capsule Endoscope for Fine-Needle Biopsy. Soft Robot 7, 10–21 (2020). 21. Sitti, M. & Wiersma, D. S. Pros and Cons: Magnetic versus Optical Microrobots. Advanced Materials 32, (2020). 22. Demir, S. O. et al. Task space adaptation via the learning of gait controllers of magnetic soft millirobots. International Journal of Robotics Research 40, 1331–1351 (2021). 23. Daguerre, H. et al. A Localization Method for Untethered Small- Scale Robots Using Electrical Impedance Tomography. IEEE/ASME Transactions on Mechatronics 27, 3506–3516 (2022). 24. Hu, W., Lum, G. Z., Mastrangeli, M. & Sitti, M. Small-scale soft- bodied robot with multimodal locomotion. Nature 554, 81–85 (2018). 25. Zheng, Z. et al. Programmable aniso-electrodeposited modular hydrogel microrobots. Sci Adv 8, (2022). 26. Culha, U., Demir, S. O., Trimpe, S. & Sitti, M. Learning of Sub- optimal Gait Controllers for Magnetic Walking Soft Millirobots. in Robotics: Science and Systems (2020). doi:10.15607/RSS.2020.XVI.070. 27. Rus, D. & Tolley, M. T. Design, fabrication and control of soft robots. Nature 521, 467–475 (2015). 28. Della Santina, C., Bicchi, A. & Rus, D. On an Improved State Parametrization for Soft Robots with Piecewise Constant Curvature and Its Use in Model Based Control. IEEE Robot Autom Lett 5, 1001–1008 (2020). 40 29. Webster, R. J. & Jones, B. A. Design and kinematic modeling of constant curvature continuum robots: A review. International Journal of Robotics Research 29, 1661–1683 (2010). 30. Cianchetti, M., Renda, F., Giorelli, M., Laschi, C. & Calisti, M. Dynamic Model of a Multibending Soft Robot Arm Driven by Cables. IEEE Transactions on Robotics 30, 1109–1122 (2014). 31. Renda, F., Boyer, F., Dias, J. & Seneviratne, L. Discrete Cosserat Approach for Multisection Soft Manipulator Dynamics. IEEE Transactions on Robotics 34, 1518–1533 (2018). 32. Chenevier, J., González, D., Aguado, J. V., Chinesta, F. & Cueto, E. Reduced-order modeling of soft robots. PLoS One 13, (2018). 33. Goury, O. & Duriez, C. Fast, Generic, and Reliable Control and Simulation of Soft Robots Using Model Order Reduction. IEEE Transactions on Robotics 34, 1565–1576 (2018). 34. Huang, W., Huang, X., Majidi, C. & Jawed, M. K. Dynamic simulation of articulated soft robots. Nature Communications 2020 11:1 11, 1–9 (2020). 35. George Thuruthel, T., Ansari, Y., Falotico, E. & Laschi, C. Control Strategies for Soft Robotic Manipulators: A Survey. Soft Robot 5, 149–163 (2018). 36. Rich, S. I., Wood, R. J. & Majidi, C. Untethered soft robotics. Nature Electronics 2018 1:2 1, 102–112 (2018). 37. Kim, Y., Parada, G. A., Liu, S. & Zhao, X. Ferromagnetic Soft Continuum Robots. Sci. Robot vol. 4 https://www.science.org (2019). 38. Chin, K., Hellebrekers, T. & Majidi, C. Machine Learning for Soft Robotic Sensing and Control. Advanced Intelligent Systems 2, 1900171 (2020). 41 39. Hyatt, P., Wingate, D. & Killpack, M. D. Model-based control of soft actuators using learned non-linear discrete-time models. Frontiers Robotics AI 6, (2019). 40. George Thuruthel, T., Shih, B., Laschi, C. & Thomas Tolley, M. Soft Robot Perception Using Embedded Soft Sensors and Recurrent Neural Networks. Sci. Robot vol. 4 https://www.science.org (2019). 41. Bern, J. M., Schnider, Y., Banzet, P., Kumar, N. & Coros, S. Soft Robot Control with a Learned Differentiable Model. 2020 3rd IEEE International Conference on Soft Robotics, RoboSoft 2020 417–423 (2020) doi:10.1109/ROBOSOFT48309.2020.9116011. 42. Ghahramani, Z. Probabilistic Machine Learning and Artificial Intelligence. Nature on vol. 27 (2015). 43. Shahriari, B., Swersky, K., Wang, Z., Adams, R. P. & De Freitas, N. Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE vol. 104 148–175 Preprint at https://doi.org/10.1109/JPROC.2015.2494218 (2016). 44. Rasmussen, C. Edward. & Williams, C. K. I. Gaussian Processes for Machine Learning. (MIT Press, 2006). 45. Calandra, R., Seyfarth, A., Peters, J. & Deisenroth, M. P. Bayesian optimization for learning gaits under uncertainty: An experimental comparison on a dynamic bipedal walker. Ann Math Artif Intell 76, 5–23 (2016). 46. Liao, T. et al. Data-efficient learning of morphology and controller for a microrobot. Proc IEEE Int Conf Robot Autom 2019-May, 2488–2494 (2019). 47. Marco, A., Hennig, P., Bohg, J., Schaal, S. & Trimpe, S. Automatic LQR tuning based on Gaussian process global 42 optimization. Proc IEEE Int Conf Robot Autom 2016-June, 270– 277 (2016). 48. Yang, B. et al. Learning flexible and reusable locomotion primitives for a microrobot. IEEE Robot Autom Lett 3, 1904– 1911 (2018). 49. Von Rohr, A., Trimpe, S., Marco, A., Fischer, P. & Palagi, S. Gait Learning for Soft Microrobots Controlled by Light Fields. in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 6199–6206 (2018). doi:10.1109/IROS.2018.8594092. 50. Kaufmann, E. et al. Champion-level drone racing using deep reinforcement learning. Nature 2023 620:7976 620, 982–987 (2023). 51. Song, Y., Romero, A., Müller, M., Koltun, V. & Scaramuzza, D. Reaching the limit in autonomous racing: Optimal control versus reinforcement learning. Sci Robot 8, eadg1462 (2023). 52. Bongard, J., Zykov, V. & Lipson, H. Resilient machines through continuous self-modeling. Science (1979) 314, 1118–1121 (2006). 53. Hwangbo, J. et al. Learning agile and dynamic motor skills for legged robots. Sci Robot 4, (2019). 54. Lee, Y. H. et al. Whole-Body Control and Angular Momentum Regulation using Torque Sensors for Quadrupedal Robots. J Intell Robot Syst 102, 1–15 (2021). 55. Choi, S. et al. Learning quadrupedal locomotion on deformable terrain. Sci Robot 8, (2023). 56. Yang, Y. et al. Data Efficient Reinforcement Learning for Legged Robots. Proc Mach Learn Res 100, 1–10 (2019). 43 57. Mahler, J. et al. Learning ambidextrous robot grasping policies. Sci Robot 4, (2019). 58. Lázaro-Gredilla, M., Lin, D., Swaroop Guntupalli, J. & George, D. Beyond imitation: Zero-shot task transfer on robots by learning concepts as cognitive programs. Sci Robot 4, (2019). 59. Fazeli, N. et al. See, feel, act: Hierarchical learning for complex manipulation skills with multisensory fusion. Sci Robot 4, (2019). 60. Ficuciello, F., Migliozzi, A., Laudante, G., Falco, P. & Siciliano, B. Vision-based grasp learning of an anthropomorphic hand-arm system in a synergy-based control framework. Sci Robot 4, (2019). 61. Hiller, J. & Lipson, H. Dynamic Simulation of Soft Multimaterial 3D-Printed Objects. https://home.liebertpub.com/soro 1, 88–101 (2014). 62. Shah, D. S. et al. A soft robot that adapts to environments through shape change. Nat Mach Intell 3, 51–59 (2020). 63. Tiryaki, M. E., Demir, S. O. & Sitti, M. Deep Learning-based 3D Magnetic Microrobot Tracking using 2D MR Images. IEEE Robot Autom Lett 7, 6982–6989 (2022). 64. Pane, S., Iacovacci, V., Sinibaldi, E. & Menciassi, A. Real-time imaging and tracking of microrobots in tissues using ultrasound phase analysis. Appl Phys Lett 118, 26 (2021). 65. Yan, X. et al. Multifunctional biohybrid magnetite microrobots for imaging-guided therapy. Sci Robot 2, (2017). 66. Iacovacci, V. et al. High-Resolution SPECT Imaging of Stimuli- Responsive Soft Microrobots. Small 15, (2019). 67. Reale, R., De Ninno, A., Businaro, L., Bisegna, P. & Caselli, F. High-throughput electrical position detection of single flowing 44 particles/cells with non-spherical shape. Lab Chip 19, 1818–1827 (2019). 68. Masner, A., Blasina, F. & Simini, F. Electrical impedance tomography for neonatal ventilation assessment: A narrative review. in Journal of Physics: Conference Series vol. 1272 (Institute of Physics Publishing, 2019). 69. Rapin, M. et al. Wearable Sensors for Frequency-Multiplexed EIT and Multilead ECG Data Acquisition. IEEE Trans Biomed Eng 66, 810–820 (2019). 70. Shi, X. et al. High-precision electrical impedance tomography data acquisition system for brain imaging. IEEE Sens J 18, 5974– 5984 (2018). 71. Darnajou, M. et al. High Speed EIT With Multifrequency Excitation Using FPGA and Response Analysis Using FDM. IEEE Sens J 20, 8698–8710 (2020). 72. Zhang, Y., Xiao, R. & Harrison, C. Advancing hand gesture recognition with high resolution electrical impedance tomography. in UIST 2016 - Proceedings of the 29th Annual Symposium on User Interface Software and Technology 843–850 (Association for Computing Machinery, Inc, 2016). doi:10.1145/2984511.2984574. 73. Demir, S. O., Tiryaki, M. E., Karacakol, A. C. & Sitti, M. Learning Soft Millirobot Multimodal Locomotion with Sim‐to‐ Real Transfer. Advanced Science (2024) doi:10.1002/advs.202308881. 74. Wang, C., Wu, Y., Dong, X., Armacki, M. & Sitti, M. In situ sensing physiological properties of biological tissues using wireless miniature soft robots. Sci Adv 9, (2023). 45 75. Duvenaud, D., Nickisch, H. & Rasmussen, C. E. Additive Gaussian Processes. in Proceedings of the 24th International Conference on Neural Information Processing Systems 226–234 (Curran Associates Inc., Granada, Spain, 2011). doi:10.5555/2986459.2986485. 76. Chen, Z. & Wang, B. How priors of initial hyperparameters affect Gaussian process regression models. Neurocomputing 275, 1702–1710 (2018). 77. Jones, D. R., Schonlau, M. & Welch, W. J. Efficient Global Optimization of Expensive Black-Box Functions. Journal of Global Optimization 13, 455–492 (1998). 78. Brochu, E., Cora, V. M. & de Freitas, N. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. (2010). 79. Ibarz, J. et al. How to train your robot with deep reinforcement learning: lessons we have learned. International Journal of Robotics Research 40, 698–721 (2021). 80. Hiller, J. & Lipson, H. Dynamic Simulation of Soft Multimaterial 3D-Printed Objects. Soft Robot 1, 88–101 (2014). 81. Silvera-Tawil, D., Rye, D., Soleimani, M. & Velonaki, M. Electrical impedance tomography for artificial sensitive robotic skin: A review. IEEE Sens J 15, 2001–2016 (2015). 82. Adler, A. & Lionheart, W. R. B. Uses and abuses of EIDORS: an extensible software base for EIT. Physiol Meas 27, (2006). 83. Kim, Y., Yuk, H., Zhao, R., Chester, S. A. & Zhao, X. Printing ferromagnetic domains for untethered fast-transforming soft materials. Nature 558, 274–279 (2018). 46 84. Xu, T., Zhang, J., Salehizadeh, M., Onaizah, O. & Diller, E. Millimeter-scale flexible robots with programmable three- dimensional magnetization and motions. Sci Robot 4, (2019). 85. Alapan, Y., Karacakol, A. C., Guzelhan, S. N., Isik, I. & Sitti, M. Reprogrammable shape morphing of magnetic soft machines. Sci Adv 6, 6414–6432 (2020). 86. Heess, N. et al. Emergence of Locomotion Behaviours in Rich Environments. arXiv preprint arXiv:1707.02286 (2017) doi:10.48550/arXiv.1707.02286. 87. Yao, J. et al. Adaptive Actuation of Magnetic Soft Robots Using Deep Reinforcement Learning. Advanced Intelligent Systems 5, (2023). 88. Tariverdi, A. et al. A Recurrent Neural-Network-Based Real- Time Dynamic Model for Soft Continuum Manipulators. Front Robot AI 8, 631303 (2021). 89. Schaff, C., Sedal, A. & Walter, M. R. Soft Robots Learn to Crawl: Jointly Optimizing Design and Control with Sim-to-Real Transfer. in Robotics: Science and Systems (2022). 47 Publication 1 Learning of Sub-optimal Gait Controllers for Magnetic Walking Soft Millirobots 49 Abstract Untethered small-scale soft robots have promising applications in minimally invasive surgery, targeted drug delivery, and bioengineering applications as they can access confined spaces in the human body. However, due to highly nonlinear soft continuum deformation kinematics, inherent stochastic variability during fabrication at the small scale, and lack of accurate models, the conventional control methods cannot be easily applied. Adaptivity of robot control is additionally crucial for medical operations, as operation environments show large variability, and robot materials may degrade or change over time, which would have deteriorating effects on the robot motion and task performance. Therefore, we propose using a probabilistic learning approach for millimeter-scale magnetic walking soft robots using Bayesian optimization (BO) and Gaussian processes (GPs). Our approach provides a data-efficient learning scheme to find controller parameters while optimizing the stride length performance of the walking soft millirobot robot within a small number of physical experiments. We demonstrate adaptation to fabrication variabilities in three different robots and to walking surfaces with different roughness. We also show an improvement in the learning performance by transferring the learning results of one robot to the others as prior information. I. Introduction Soft-bodied robots are composed of functional soft materials exhibiting shape-programmable properties that allow passive/active structural compliance and large degrees of freedom, which are hard to achieve using 50 conventional rigid materials [16]. The research on soft robots is getting more attention owing to easier access to novel fabrication methods and functional materials, and potential high-impact medical and other applications [25]. Biologically inspired soft robots can be used to study their soft-bodied biological counterparts [17], and open new application areas in multi-terrain locomotion [4], adaptive manipulation [15, 28], and human-assistive wearable systems [41]. Soft robots also enable safe human-robot physical interaction due to their high compliance and limited output force, which normally require additional computational effort in conventional robotic systems [12]. Small-scale (i.e., millimeter) untethered soft robots have further potential usage in medicine owing to their ability to access to enclosed small spaces non-invasively [26, 35] and the embodiment of functionalized materials enabling targeted drug delivery and bio-sensing [5]. Despite their potential, the virtual infinite degrees of freedom, the lack of accurate models, fabrication variations, and non-linear behavior (e.g., hysteresis) render the application of conventional control methods challenging for soft robots [32]. So far, constant curvature (CC) models utilizing bending beam theories have been widely-used to approximately represent the deformation of continuum robots [42]. Alternatively, analytical and geometrically exact models have been suggested for continuum robots that are represented as simplified rods [29]. Finite element methods (FEM) provide numerical solutions to soft robot kinematics by utilizing a chain of rigid elements connected with tunable spring-damper mechanisms [21]. These kinematic models allow the implementation of static and dynamic controllers for continuum robots 51 on a larger scale [9]. However, these controllers typically depend on the continuous sensing of body deformations from embedded sensors and computationally heavy model solutions, which are conditions that may not be met for untethered soft robots at the small scales [30]. The dynamic task environment, complex deformation kinematics, fabrication- dependent performance variations, and actuation/sensing limitations have further impacts on the soft mobile robots targeting medical applications, which make adaptive and data-efficient control methods attractive for these robots [36]. In the case of uncertainty and lack of a parametric model that represents the system, data-driven control [13] and reinforcement learning [38, 19] provide promising alternatives over model-based designs in small-scale soft robotic systems. However, the need for data efficiency, i.e., the ability to learn from only a few experimental trials, presents a core challenge for such methods [6]. Conversely, Bayesian optimization (BO) [10, 34] allows for the maximization of a performance function using a small number of physical experiments. BO typically employs Gaussian processes (GPs) [27] as a probabilistic model of the latent objective function. While no explicit dynamics model is needed, GPs allow for incorporating information as probabilistic priors, thus reducing data requirements. There are emerging examples that demonstrate the application of this approach to optimize the locomotion performance of robots on different length scales [3, 44, 22]. Despite its potential to address the control challenge for untethered soft robots, there are only a few examples that apply this method such as in the gait exploration of a tensegrity system [31], and the optimization of an undulating motion of a 52 microrobot [40]. So far, a data-efficient procedure that adapts the learned controllers to different robots and environmental conditions for untethered small-scale soft robots has not been demonstrated. In this paper, we propose a learning procedure to find the controller parameters of magnetically actuated, untethered, soft millirobots (see Fig. 1) that generate optimum walking gaits within a small number of physical experiments. We specifically focus on these types of robots due to their biocompatible use of external magnetic actuation that supports multi-functionality in future medical tasks [14, 23] and the high- resolution magnetization methods that allow more complex deformation capabilities at the small scale [24, 7]. We produce three replicas of a previously reported soft millirobot demonstrating a hand-tuned walking gait aiming for medical applications [14] and test the repeatability of their results. We begin with finding the optimum walking gait controllers of our robots using BO with GPs; initially without any prior information about the correlation between the controller parameters and the robot performance. Later, we explore the controller parameter space of one robot and present a straightforward way in the context of BO to transfer prior information from the first robot to all three robots while finding their optimum walking gait controllers in a data-efficient fashion. We report the optimum controllers and walking gait performances in terms of achieved stride lengths for all three robots and compare the two learning approaches (i.e., with and without using prior information). We also transfer this information to adapt to the changes in the task environment by finding the controller parameters for walking on rough surfaces. 53 Figure 1. (a) The magneto-elastomer robot is rolled around a jig and magnetized with 𝐵 = 1.8𝑇 field (red arrow) with a 45° angle with respect to the y-axis. The unfolded robot maintains a circular magnetization profile along its body (blue arrows). (b) Photo of the experimental setup with 6 electromagnetic coils and a high-speed camera. (c) Image of the fabricated and magnetized soft millirobot. (d) Projected planar images showing the four consecutive states of the robot walking gait: (1) relaxed, (2) front-stance, (3) double-stance, and (4) back-stance. These images are placed with a separation on the y-axis for visual clarity, i.e., the robot does not jump in between states during the experiments. Numbers represent the four states. The organization of this paper is as follows. We describe the robot design and its walking gaits in Section II and introduce our learning approach in Section III. Section IV presents the experiments on learning of the optimum gait controllers without using the prior information, the generation of the prior controller information from one robot, and the optimization of the walking gaits by transferring the learned controllers to three robots and different locomotion surfaces. We discuss the experimental results in Section V and conclude our work in Section VI. 54 II. Robot Design and Gait Definition We followed the methods and materials reported in [14] and fabricated three magnetic soft millirobots with a 1:1 body mass ratio of Ecoflex 00-10 (Smooth-On Inc.) and neodymium-iron-boron (NdFeB) magnetic microparticles with around 5𝜇𝑚 diameter (MQP-15-7, Magnequench). We placed this pre-polymer mixture on a methyl methacrylate plate and cut the robots out of the cast using a high-resolution laser cutter (LPKF Protolaser U4) after the polymer is cured. Our robots had the final dimensions of length 𝐿 = 3.7𝑚𝑚, width 𝑤 = 1.5𝑚𝑚, and height ℎ = 185𝜇𝑚 as shown in Fig. 1-a. We separately folded the robots around a circular jig with a circumference equal to L and magnetized them within a magnetic field with a magnitude of 1.8𝑇 and orientation of 45° measured counterclockwise from the y-axis. Once the robots are unfolded from the jig, the magnetic particles maintained their magnetization orientation forming a circular profile along the longitudinal axis of the robot body (Fig. 1-a). We used these robots (i.e., robots 1, 2, and 3) with the same nominal material properties and dimensions for our experiments (see Fig. 1-c for a sample robot image). The walking gait of our robot is composed of four consecutive quasi- static states that are inspired by the planar quadrupedal bounding [1] and a caterpillar's inching motion [39]. These states are depicted as (1) relaxed, (2) front-stance, (3) double-stance, and (4) back-stance as shown in Fig. 1-d. We placed our magnetized robot along the y-axis of 55 Figure 2. (a) Walking gait control parameters during a single period of a sample motion (a single period of 1 𝑓⁄ = 90𝑚𝑠 for 𝑓 = 11𝐻𝑧 is normalized to 0-1 on the abscissa). The magnetic field 𝐵 is controlled on the y-z plane and shown with its y and z components (top) whose magnitude (middle) reaches 𝐵𝑚𝑎𝑥 and orientation (bottom) changes from 𝛼1 to 𝛼2. Dashed vertical lines represent the (1) relaxed, (2) front-stance, (3) double-stance, and (4) back-stance states of the walking gait. (b) The stride length 𝑆 performance of the previously reported robot (𝑟𝑜𝑏𝑜𝑡𝑅) vs. the performance of three replica robots using the same controller parameters (𝑟𝑜𝑏𝑜𝑡1,2,3). Each data point for each robot (𝑟𝑜𝑏𝑜𝑡1,2,3) represents the mean of 10 experiments and the error bars show the standard deviation. The performance of 𝑟𝑜𝑏𝑜𝑡𝑅 and the horizontal dashed line which represents the model prediction are adapted from [14]. the magnetic coil setup consisting of three orthogonal pairs of custom- made electromagnets (Fig. 1-b) that generated a 3-D uniform magnetic field within a 4×4×4cm3 space. We modulated the magnetic field on the y-z plane that coincided with the center of the test environment. We controlled four parameters to generate the walking gait: the maximum magnetic field magnitude (𝐵𝑚𝑎𝑥), the frequency of the actuation cycle 56 (𝑓), and two magnetic field orientation angles (𝛼1 and 𝛼2) measured counterclockwise from the y-axis. The plots in Fig. 2-a show the change of the control parameters during a single period of the motion for 𝐵𝑚𝑎𝑥 = 10𝑚𝑇, 𝑓 = 11𝐻𝑧, 𝛼1 = 30° and 𝛼2 = 60°, which are hand- tuned parameters reported in [14]. At the beginning of a single gait period, the robot started at a relaxed state for 0 ≤ 𝐵 ≤ 4𝑚𝑇. The robot tilted forward when 𝛼 = 𝛼1 and 𝐵 increased from 4𝑚𝑇 to 𝐵𝑚𝑎𝑥 = 10𝑚𝑇. While 𝐵 remained constant at 𝐵𝑚𝑎𝑥, the orientation of the magnetic field changed from 𝛼1 to 𝛼2 causing the robot to initially switch to the double- stance state and then to the back-stance state when 𝛼 = 𝛼2. Then, 𝐵 decreased while keeping the orientation of the magnetic field constant, and the robot gradually switched back to the relaxed state. For 𝐵 < 4𝑚𝑇, the robot assumed the relaxed state, and a single period of walking actuation ended when 𝐵 = 0𝑚𝑇. We reset 𝐵 at the end of every gait cycle to avoid jerky motion when 𝛼 changed from 𝛼1 to 𝛼2. In our experiments, the relaxed state was never skipped but its duration changed according to 𝑓. The consecutive images from a single walking gait period are shown in Fig. 1-d. We tracked the robot gait using a high-speed camera (Basler aCa2040-90uc, 60 frames per second (fps), 1pixel∼27𝜇𝑚 resolution) that is orthogonally placed to the axis of robot motion (Fig. 1-b). In every experiment, we calculated the stride length (𝑆) of the robot by tracking the average distance covered by its center of mass in 10 consecutive steps. To test the repeatability of the previously reported results in [14], we experimented with our fabricated robots using their suggested controller parameter sets. Fig. 2-b shows the stride length performances of our 57 robots (robot1,2,3) and compares them with the reported robot (robotR) performance (i.e., we calculated the stride length of robotR from the values reported in [14]) for 𝐵𝑚𝑎𝑥 = 10𝑚𝑇, 𝛼1 = 30°, 𝛼2 = 60°, and 2 ≤ 𝑓 ≤ 20𝐻𝑧. Our preliminary results revealed that:  In this scale, the gait performance showed clear inconsistency due to the variability during fabrication and environmental factors even though the same materials, methods, controller parameters, and walking surfaces are used in the fabrication and experimentation of the millimeter-scale soft robots.  Unlike the model prediction, the robot performance showed non- monotonic behavior along with increasing 𝑓, which rendered the design of a model-based gait controller unreliable.  In addition to the virtual infinite degrees of freedom inherited by the soft materials, the controller parameters existed in a continuous space, making the hand-tuning of these parameters within physical experiments impractical. These observations found the goals of our paper in which we address the necessity for a data-efficient controller learning system that is robust to the variabilities caused by the material, fabrication, and the task environment of the miniature scale, medical-oriented, untethered soft robots. 58 III. Learning Approach We aim to optimize the walking gait controller parameters to maximize the stride length 𝑆 of the robot. Therefore, we define the reward function as 𝑆: 𝛩 → ℝ, (1) which maps the parameter set 𝜃 = [𝐵𝑚𝑎𝑥 , 𝑓, 𝛼1, 𝛼2] to scalar reward values (i.e., the experimental stride length performance of a robot). According to the definition of the reward function, we formulate the parameter learning as the (global) optimization problem 𝜃∗ = argmax 𝜃∈Θ 𝑆(𝜃), (2) where Θ denotes the complete search space, 𝜃 is the parameter set, and 𝑆(𝜃) is the experimentally observed stride length performance of the robot for a given 𝜃. We define the range of the controller parameters based on the findings in [14] and the physical limitations of our magnetic actuation setup. Accordingly, 𝐵𝑚𝑎𝑥 is defined between 5𝑚𝑇 and 12𝑚𝑇, and the walking frequency, 𝑓, ranges from 𝑓𝑚𝑖𝑛 = 0.5𝐻𝑧 to 𝑓𝑚𝑎𝑥 = 20𝐻𝑧. We limit 𝛼1 and 𝛼2 to [10, 50]° and [40,80]° respectively and select values that satisfy 𝛼2 > 𝛼1 to generate the walking gait in Fig. 1-d. We use a step size of 1𝑚𝑇 for 𝐵, 1° for each 𝛼, and a variable step size of 0.25𝐻𝑧 for 𝑓 < 2𝐻𝑧 and 2𝐻𝑧 for 𝑓 ≥ 2𝐻𝑧, which yield a total number of 203520 possible parameter sets in Θ. 59 A. Gaussian Processes (GPs) The magnetic soft millirobots in our paper did not have an accurate kinematic or dynamics model (see Fig. 2-b). Therefore, it is necessary to approximate the reward function based on the data collected from physical experiments rather than numerical analysis. However, the physical data has inherent uncertainty due to the noise in the measurements and the variations during the experiments. To include these uncertainties in the model, overcome the sparsity in the data, and make probabilistic predictions at unobserved locations, we model the reward function 𝑆(𝜃) using GPs following the previous study in [40]: 𝑆(𝜃)~𝐺𝑃 𝜇(𝜃), 𝑘(𝜃, 𝜃′) , (3) However, as 𝑆(𝜃) can only be measured with noise, we define �̃� as �̃�(𝜃𝑖) = 𝑆(𝜃𝑖) + 𝑛𝑖, (4) where 𝑛𝑖 is zero-mean Gaussian noise with variance 𝜎𝑛 2 for each measurement 𝑖. A GP is a non-parametric model defined by its prior mean 𝜇(𝜃) and the covariance function 𝑐𝑜𝑣 𝑆(𝜃), 𝑆(𝜃′) = 𝑘(𝜃, 𝜃′), where 𝑘 is the kernel. During one run of BO, the GP model is sequentially updated with �̃�(𝜃) observed from experiments. We define one “learning run” as a run of BO until the desired stopping criterion is matched (e.g., a fixed number of experiments is reached). 60 From the experimental data 𝐷 = 𝜃𝑖 , �̃�(𝜃𝑖) 𝑖=1 𝑁 , the stride length of the robot for an unobserved 𝜃 can be predicted using the posterior mean and variance as follows. 𝜇𝑝𝑜𝑠𝑡(𝜃) = 𝜇(𝜃) + 𝑘𝑇(𝜃)𝐾−1𝑦, (5) 𝜎𝑝𝑜𝑠𝑡 2 (𝜃) = 𝑘(𝜃, 𝜃) − 𝑘𝑇(𝜃)𝐾−1𝑘(𝜃), (6) 𝑆𝑝𝑜𝑠𝑡(𝜃)|𝐷~𝑁(𝜇𝑝𝑜𝑠𝑡(𝜃), 𝜎𝑝𝑜𝑠𝑡 2 (𝜃), (7) where 𝑘(𝜃), 𝑦 ∈ ℝ𝑁 with 𝑘(𝜃𝑖) = 𝑘(𝜃, 𝜃𝑖), 𝑦𝑖 = �̃�(𝜃𝑖) − 𝜇(𝜃𝑖), and 𝐾 ∈ ℝ𝑁×𝑁 with 𝐾𝑖,𝑗 = 𝑘 𝜃𝑖 , 𝜃𝑗 + 𝛿𝑖,𝑗𝜎𝑛 2, where 𝛿𝑖,𝑗 is the Kronecker delta and 𝜎𝑛 2 is the noise in the collected data set. We select the squared exponential as the kernel function in the GPs, which is for the 1D case: 𝑘𝑆𝐸(𝜃, 𝜃′) = 𝜎𝑓 2 exp(−(𝜃 − 𝜃′)2 2𝑙𝑐 2⁄ ), (8) where 𝑙𝑐 is the length scale that defines the rate of variation in the modeled function for each dimension of the parameter space. Long length scales are used to model slowly-varying functions and short length scales are used to model quickly-varying functions. The signal variance 𝜎𝑓 2 describes the width of distribution, e.g., high 𝜎𝑓 2 means higher uncertainty in the predictions of the unobserved 𝜃. Hyperparameters of the GPs can be listed as the noise in the collected data 𝜎𝑛 2, length scale 𝑙𝑐 for each dimension of the parameter space ℝ𝑑𝑐, and signal variance 𝜎𝑓 2. To determine the value of 𝜎𝑛 2, we use the maximum variance found in the 61 experimental results shown in Fig. 2-b. We set the length scale 𝑙𝑐 to one- fourth of the total range of each corresponding parameter. We also set the signal variance 𝜎𝑓 2 to half of the body length of the robot so that the highest possible reward value (i.e., 𝐿 = 3.7𝑚𝑚) remained inside the 95% confidence interval of the prior. B. Bayesian Optimization (BO) We use BO to select the parameter set 𝜃𝑛𝑒𝑥𝑡 to be tested in the next step of the learning run using the acquisition function 𝛼𝑎𝑐𝑞(𝜃). 𝜃𝑛𝑒𝑥𝑡 = argmax 𝜃∈Θ 𝛼𝑎𝑐𝑞(𝜃). (9) In this work, we choose the expected improvement (EI) as the acquisition function 𝛼𝑎𝑐𝑞(𝜃) due to its better performance compared to its alternatives as demonstrated in [40]. EI seeks the parameter set for the next step where the expected improvement in reward function is the highest compared to the previously collected data: 𝛼𝑎𝑐𝑞(𝜃) = 𝔼 max 0, (𝑆(𝜃), 𝑆(𝜃∗) − 𝜉) , (10) 𝛼𝑎𝑐𝑞(𝜃) = (𝜇(𝜃) − 𝑆(𝜃∗) − 𝜉)Φ(𝑍) + 𝜎(𝜃)𝜙(𝑍), (11) where 𝑆(𝜃∗) is the highest reward function value collected so far, Φ and 𝜙 are the Gaussian cumulative density and probability density functions, respectively [2]. The term 𝑍 is described as 𝑍 = 𝑍(𝜃) = (𝜇(𝜃) − 𝑆(𝜃∗) − 𝜉) 𝜎(𝜃)⁄ , with 𝜇(𝜃) and 𝜎(𝜃) computed from Eq. (5) and Eq. (6). In Eq. (11), the two terms define the exploitation and the exploration weights of the BO, respectively. The balance between 62 these two terms is controlled by the hyperparameter 𝜉. As 𝜉 gets higher, BO focuses more on exploration and seeks the next parameter set in regions with high prediction uncertainty. Conversely, BO focuses more on exploitation and selects the next parameter set within a close range to already explored regions. We choose 𝜉 = 0.1 to balance the exploration and exploitation weights. C. Transfer of the Prior Mean In addition to the kernel (see Section III-A), the prior mean 𝜇(𝜃) must be chosen at the beginning of a BO run. Often, 𝜇 = 0 is the default choice for an uninformed prior. For the millirobot learning problem herein, we suggest and investigate the transfer of information from previous learning runs by setting the prior mean to the posterior mean of a previously trained GP model, such as from a different robot. In this way, we can approximately transfer the topology of the target function between robots, which is reasonable as long as the differences between the robots and the environment do not significantly alter the function shape. In this work, we adopt and compare both approaches of an uninformed prior (𝜇(𝜃) = 0) in Section IV-A, and the transfer of the posterior mean from robot 1's previous run to all three robots in Section IV-B. IV. Experiments We modulated the currents running through the electromagnetic coils and the resulting magnetic field by controlling six motor driver units (SyRen25) using an Arduino microcontroller running at 1.2 kHz. We regularly calibrated the magnetic actuation matrix inside the workspace, 63 i.e., the mapping between the applied electric current and the generated magnetic field, to maintain reliable and repeatable experiments. The learning process ran on a PC that additionally handled image processing and hardware communication tasks. One step of the learning run involved five steps: 1) BO selected a new parameter set 𝜃 that maximized the acquisition function based on the GP model, 2) The microcontroller regulated the magnetic field based on the selected 𝜃 and initiated the physical experiment, 3) The camera recorded the robot motion and measured the average stride length performance 𝑆, 4) The learning system updated the GP model using the newly collected data from the experiment, 5) The robot returned to its initial position for the next step. A. Optimization of the Walking Gait without the Prior To test our controller learning approach without prior information, i.e., 𝜇(𝜃) = 0, we experimented with all three robots in the same environmental conditions and limited the number of steps for each learning run to 20 experiments. We initialized the BO with the best controller parameter set reported in [14]. We performed three independent learning runs (i.e., 60 experiments in total) for every robot with the same initial state, whose results are shown in Fig. 3. Each data point represents the robot's stride length performance 𝑆 resulting from a different controller parameter set chosen by the BO at a given step of a 64 learning run. As BO actively chose sample locations (e.g., to explore unknown regions described in Section III), the variation in these data points was the desired behavior of the explorative learning algorithm. See Supplementary Video 1 for the gait performances of four different controller parameter sets for robot 1. We chose the optimum controller parameter sets (𝜃∗) from these learning runs and repeated the walking gait five times to collect statistical information about the stride length performances. The rows designated with “no prior” in Table I shows the values for 𝜃∗ and the resulting 𝑆 for each robot. It can be seen that the walking gait performances of the robots were significantly improved compared to the robot reported in [14]. Also, the standard deviation within these repeated experiments agreed with the previously reported values, platform. In this optimization approach, we achieved 86.6%, 94.7%, and 60.5% increase in 𝑆 for robots 1 to 3 respectively (i.e., compared to the 𝑆 of the previous robot shown in the last row of Table I). The difference between the optimum controller parameter values in Table I demonstrates the influence of the fabrication variabilities on the robot design and performance. Separate from the optimum stride length performance of each robot, we evaluated the overall performance of the learning runs based on the achieved stride length average of all of the tested controller parameters. This performance metric 𝑃𝐿𝑅 shows the overall quality of the learning run's parameter selection in terms of the average of all the 𝑆 and the standard deviation in 60 experiments (i.e., avg(𝑆) ± std). In these experiments, the learning run for robot 1 yielded 𝑃𝐿𝑅1 = 1.07 ± 0.80𝑚𝑚, 𝑃𝐿𝑅2 = 1.21 ± 0.88𝑚𝑚 for 65 robot 2, and 𝑃𝐿𝑅3 = 0.75 ± 0.64𝑚𝑚 for robot 3. Even though there were multiple individual 𝜃s within these runs (e.g., the optimums reported in Table I) that outperformed the previous study, the large standard deviation shows that the BO selected parameters that generated a wide range of performances. Figure 3. The learning of the controller parameters without utilizing the prior information within 20 physical experiments in 3 independent learning runs (shown as LR1-3) for (a) robot 1, (b) robot 2, and (c) robot 3. 66 B. Optimization of the Walking Gait with the Prior 1) Generation of the Prior Information To generate useful prior information, we constructed the posterior mean 𝜇𝑝𝑜𝑠𝑡(𝜃) for robot 1 using the BO and GP described in Section III. Initially, we adopted nine different controller parameters from [14] and collected the stride length information from repeated experiments in our setup. Fig. 4-a and 4-c show the two-dimensional projection of the approximation of 𝑆 function generated by the GP model (utilizing the same hyperparameters in Section III) based on these experiments. After the initial approximation, we used the BO to select new parameter sets from the unexplored parts of the 4-D search space and collected the experimental stride length performance information. We explored 123 different parameter sets in total by selectively isolating the search space dimensions. Initially, we fixed 𝛼1 = 30° and 𝛼2 = 60° and explored 18 different parameter values for 𝐵𝑚𝑎𝑥 and 𝑓. Then, we fixed 𝐵𝑚𝑎𝑥 = 10𝑚𝑇 and 𝑓 = 2𝐻𝑧 and explored 38 values for 𝛼1 and 𝛼2. We performed 17 additional tests for 𝛼1 and 𝛼2 for 𝐵𝑚𝑎𝑥 = 10𝑚𝑇 and 𝑓 = 8𝐻𝑧. Finally, we explored the complete search space for four parameters with 50 more tests. For all of these tests, we stopped the exploration when the BO converged in the sense of repetitively selecting similar 𝜃s.