Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen: http://dx.doi.org/10.18419/opus-11888
Autor(en): Hoppe, Sabrina
Titel: Improving sample-efficiency for model-free off-policy deep reinforcement learning in contact-rich manipulation
Erscheinungsdatum: 2021
Dokumentart: Dissertation
Seiten: xxviii, 187
URI: http://nbn-resolving.de/urn:nbn:de:bsz:93-opus-ds-119058
http://elib.uni-stuttgart.de/handle/11682/11905
http://dx.doi.org/10.18419/opus-11888
Zusammenfassung: For centuries, humans have been dreaming of intelligent machines which can move and behave like humans. With the industrial revolution, autonomously moving machinery came into existence. The intelligence to make such a machine behave or even learn autonomously is still a huge challenge however. Since first steps in this direction have been taken we have seen a rising demand for intelligent robots in many fields including production automation where robots may help to cope with increasingly flexible and rapidly changing manufacturing processes. Researchers in the field of reinforcement learning (RL) investigate how to design algorithms such that agents can learn autonomously. For instance robots can be enabled to figure out the optimal way to perform production tasks by themselves. In this thesis, I will focus on insertion tasks which frequently occur in manufacturing. Today's algorithms for such tasks typically trade off sample efficiency against generalization capabilities. This means that one extreme type of algorithms makes an agent learn quickly by pre-defining a lot of structure or making specific assumptions about the task at hand, but this typically implies that the algorithm becomes very specific to this task. The other extreme type of algorithms, model-free learning methods, are very broadly applicable but then typically require vast amounts of data to learn reasonable behavior. In this thesis, I will start from flexible, general-purpose model-free RL algorithms and examine ways how to add small amounts of common sense or human prior knowledge to considerably speed up learning. Such an improvement can be characterized by the type of information that is used as well as by the approach that is chosen to make use of the information. The types of information I will use in this thesis include the robot itself, i.e., for example dynamics information; prior knowledge about the task, for instance a coarse solution strategy that intuitively seems sensible for humans; and mathematical insights into the type and structure of data that the agent has collected. I will also suggest a number of methods to integrate such information in a learning process: as a way to make informed choices about new actions for the agent to try (the so-called exploration), as criteria for an engineer on how to formally describe the task (i.e., how to design a suitable Markov Decision Process), and by choosing and adapting the type of function approximation that is used inside the learning process. All methods that I will present in this thesis have been evaluated on real-world robotic manipulation tasks that have been derived or taken from industrial production plants. The results show that the proposed ways to make use of additional information significantly increase the efficiency of learning processes and can improve their stability even in adverse settings.
Enthalten in den Sammlungen:05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Dateien zu dieser Ressource:
Datei Beschreibung GrößeFormat 
dissertation_sabrina (5).pdf208,46 MBAdobe PDFÖffnen/Anzeigen


Alle Ressourcen in diesem Repositorium sind urheberrechtlich geschützt.