06 Fakultät Luft- und Raumfahrttechnik und Geodäsie

Permanent URI for this collectionhttps://elib.uni-stuttgart.de/handle/11682/7

Browse

Search Results

Now showing 1 - 10 of 11
  • Thumbnail Image
    ItemOpen Access
    Forming a hybrid intelligence system by combining Active Learning and paid crowdsourcing for semantic 3D point cloud segmentation
    (2023) Kölle, Michael; Sörgel, Uwe (Prof. Dr.-Ing.)
    While in recent years tremendous advancements have been achieved in the development of supervised Machine Learning (ML) systems such as Convolutional Neural Networks (CNNs), still the most decisive factor for their performance is the quality of labeled training data from which the system is supposed to learn. This is why we advocate focusing more on methods to obtain such data, which we expect to be more sustainable than establishing ever new classifiers in the rapidly evolving ML field. In the geospatial domain, however, the generation process of training data for ML systems is still rather neglected in research, with typically experts ending up being occupied with such tedious labeling tasks. In our design of a system for the semantic interpretation of Airborne Laser Scanning (ALS) point clouds, we break with this convention and completely lift labeling obligations from experts. At the same time, human annotation is restricted to only those samples that actually justify manual inspection. This is accomplished by means of a hybrid intelligence system in which the machine, represented by an ML model, is actively and iteratively working together with the human component through Active Learning (AL), which acts as pointer to exactly such most decisive samples. Instead of having an expert label these samples, we propose to outsource this task to a large group of non-specialists, the crowd. But since it is rather unlikely that enough volunteers would participate in such crowdsourcing campaigns due to the tedious nature of labeling, we argue attracting workers by monetary incentives, i.e., we employ paid crowdsourcing. Relying on respective platforms, typically we have access to a vast pool of prospective workers, guaranteeing completion of jobs promptly. Thus, crowdworkers become human processing units that behave similarly to the electronic processing units of this hybrid intelligence system performing the tasks of the machine part. With respect to the latter, we do not only evaluate whether an AL-based pipeline works for the semantic segmentation of ALS point clouds, but also shed light on the question of why it works. As crucial components of our pipeline, we test and enhance different AL sampling strategies in conjunction with both a conventional feature-driven classifier as well as a data-driven CNN classification module. In this regard, we aim to select AL points in such a manner that samples are not only informative for the machine, but also feasible to be interpreted by non-experts. These theoretical formulations are verified by various experiments in which we replace the frequently assumed but highly unrealistic error-free oracle with simulated imperfect oracles we are always confronted with when working with humans. Furthermore, we find that the need for labeled data, which is already reduced through AL to a small fraction (typically ≪1 % of Passive Learning training points), can be even further minimized when we reuse information from a given source domain for the semantic enrichment of a specific target domain, i.e., we utilize AL as means for Domain Adaptation. As for the human component of our hybrid intelligence system, the special challenge we face is monetarily motivated workers with a wide variety of educational and cultural backgrounds as well as most different mindsets regarding the quality they are willing to deliver. Consequently, we are confronted with a great quality inhomogeneity in results received. Thus, when designing respective campaigns, special attention to quality control is required to be able to automatically reject submissions of low quality and to refine accepted contributions in the sense of the Wisdom of the Crowds principle. We further explore ways to support the crowd in labeling by experimenting with different data modalities (discretized point cloud vs. continuous textured 3D mesh surface), and also aim to shift the motivation from a purely extrinsic nature (i.e., payment) to a more intrinsic one, which we intend to trigger through gamification. Eventually, by casting these different concepts into the so-called CATEGORISE framework, we constitute the aspired hybrid intelligence system and employ it for the semantic enrichment of ALS point clouds of different characteristics, enabled through learning from the (paid) crowd.
  • Thumbnail Image
    ItemOpen Access
    Automatic model reconstruction of indoor Manhattan-world scenes from dense laser range data
    (2013) Budroni, Angela; Fritsch, Dieter (Prof. Dr.-Ing.)
    Three-dimensional modeling has always received a great deal of attention from computer graphics designers and with emphasis on existing urban scenarios it became an important topic for the photogrammetric community and architects as well. The generation of three-dimensional models of real objects requires both efficient techniques to acquire visual information about the object characteristics and robust methods to compute the mathematical models in which this information can be stored. Photogrammetric techniques for measuring object features recover three-dimensional object profiles from conventional intensity images. Active sensors based on laser measurements are able to directly deliver three-dimensional point coordinates of an object providing a fast and reliable description of its geometric characteristics. In order to transform laser range data into consistent object models, existing CAD software products establish a valid support to manual based approaches. However, the growing use of three-dimensional models in different field of applications brings into focus the need for automated methods for the generation of models. The goal of this thesis is the development of a new concept for the automatic computation of three-dimensional building models from laser data. The automatic modeling method aims at a reconstruction targeted on building interiors with an orthogonal layout. For this purpose, two aspects are considered: the extraction of all surfaces that enclose the interior volume and the computation of the floor plan. As a final result, the three-dimensional model integrates geometry and topology of the interior in terms of its boundary representation. The main idea underlying the automatic modeling is based on plane sweeping, a technique referable to the concept of sweep representation used in computer graphics to generate solid models. A data segmentation driven by the sweep and controlled by a hypothesis-and-test approach allows to assign each laser point to a surface of the building interior. At the next step of the algorithm, the floor plan is recovered by cell decomposition based on split and merge. For a successful generation of the model every activity of the reconstruction workflow should be taken into consideration. This includes the acquisition of the laser data, the registration of the point clouds, the computation of the model and the visualization of the results. The dissertation provides a full implementation of all activities of the automatic modeling pipeline. Besides, due to the high degree of automation, it aims at contributing to the dissemination of three-dimensional models in different areas and in particular in BIM processes for architecture applications.
  • Thumbnail Image
    ItemOpen Access
    New methods for 3D reconstructions using high resolution satellite data
    (2021) Gong, Ke; Fritsch, Dieter (Prof. Dr.-Ing. habil. Prof. h.c.)
  • Thumbnail Image
    ItemOpen Access
    Mathematical methods for camera self-calibration in photogrammetry and computer vision
    (2013) Tang, Rongfu; Fritsch, Dieter (Prof. Dr.-Ing. habil.)
    Camera calibration is a central subject in photogrammetry and geometric computer vision. Self-calibration is a most flexible and highly useful technique, and it plays a significant role in camera automatic interior/exterior orientation and image-based reconstruction. This thesis study is to provide a mathematical, intensive and synthetic study on the camera self-calibration techniques in aerial photogrammetry, close range photogrammetry and computer vision. In aerial photogrammetry, many self-calibration additional parameters (APs) are used increasingly without evident mathematical or physical foundations, and moreover they may be highly correlated with other correction parameters. In close range photogrammetry, high correlations exist between different terms in the ‘standard’ Brown self-calibration model. The negative effects of those high correlations on self-calibration are not fully clear. While distortion compensation is essential in the photogrammetric self-calibration, geometric computer vision concerns auto-calibration (known as self-calibration as well) in calibrating the internal parameters, regardless of distortion and initial values of internal parameters. Although camera auto-calibration from N≥3 views has been studied extensively in the last decades, it remains quite a difficult problem so far. The mathematical principle of self-calibration models in photogrammetry is studied synthetically. It is pointed out that photogrammetric self-calibration (or building photogrammetric self-calibration models) can – to a large extent – be considered as a function approximation problem in mathematics. The unknown function of distortion can be approximated by a linear combination of specific mathematical basis functions. With algebraic polynomials being adopted, a whole family of Legendre self-calibration model is developed on the base of the orthogonal univariate Legendre polynomials. It is guaranteed by the Weierstrass theorem, that the distortion of any frame-format camera can be effectively calibrated by the Legendre model of proper degree. The Legendre model can be considered as a superior generalization of the historical polynomial models proposed by Ebner and Grün, to which the Legendre models of second and fourth orders should be preferred, respectively. However, from a mathemtical viewpoint, the algebraic polynomials are undesirable for self-calibration purpose due to high correlations between polynomial terms. These high correlations are exactly those occurring in the Brown model in close range photogrammetry. They are factually inherent in all self-calibration models using polynomial representation, independent of block geometry. According to the correlation analyses, a refined model of the in-plane distortion is proposed for close range camera calibration. After examining a number of mathematical basis functions, the Fourier series are suggested to be the theoretically optimal basis functions to build the self-calibration model in photogrammetry. Another family of Fourier self-calibration model is developed, whose mathematical foundations are the Laplace’s equation and the Fourier theorem. By considering the advantages and disvantages of the physical and the mathematical self-calibration models, it is recommended that the Legendre or the Fourier model should be combined with the radial distortion parameters in many calibration applications. A number of simulated and empirical tests are performed to evaluate the new self-calibration models. The airborne camera tests demonstrate that, both the Legendre and the Fourier self-calibration models are rigorous, flexible, generic and effective to calibrate the distortion of digital frame airborne cameras of large-, medium- and small-formats, mounted in single- and multi-head systems (including the DMC, DMC II, UltraCamX, UltraCamXp, DigiCAM cameras and so on). The advantages of the Fourier model result from the fact that it usually needs fewer APs and obtains more reliable distortion calibration. The tests in close range photogrammetry show that, although it is highly correlated with the decentering distortion parameters, the principal point can be reliably and precisely located in a self-calibration process under appropriate image configurations. The refined in-plane distortion model is advantageous in reducing correlations with the focal length and improving the calibration of it. The good performance of the combined “Radial + Legendre” and “Radial + Fourier” models is illustrated. In geometric computer vision, a new auto-calibration solution which needs image correspondences and zero (or known) skew parameter only is presented. This method is essentially based on the fundamental matrix and the three (dependent) constraints derived from the rank-2 essential matrix. The main virtues of this method are threefold. First, a recursive strategy is employed subsequently to a coordinate transformation. With an appropriate approximation, the recursion estimates the focal length and aspect ratio in advance and then calculates the principal point location. Second, the optimal geometric constraints are selected using error propagation analyses. Third, the final nonlinear optimization is performed on the four internal parameters via the Levenberg–Marquardt algorithm. This auto-calibration method is fast and efficient to obtain a unique calibration. Besides auto-calibration, a new idea is proposed to calibrate the focal length from two views without the knowledge of the principal point coordinates. Compared to the conventional two-view calibration techniques which have to know principal point shift a priori, this new analytical method is more flexible and more useful. Although the auto-calibration and the two-view calibration methods have not been fully mature yet, their good performance is demonstrated in both simulated and practical experiments. Discussions are made on future refinements. It is hoped that this thesis not only introduces the relevant mathematical principles into the practice of camera self-calibration, but is also helpful for the inter-communications between photogrammetry and geometric computer vision, which have many tasks and goals in common but simply using different mathematical tools.
  • Thumbnail Image
    ItemOpen Access
    Automatische Interpretation von Semantik aus digitalen Karten im World Wide Web
    (2014) Luo, Fen; Fritsch, Dieter (Prof. Dr.-Ing.)
    Im Internet befindet sich eine sehr große Menge an raumbezogenen Daten, die in Form von Raster- und Vektorkarten unterschiedliche Ausschnitte der Welt darstellen. Die in diesen Karten enthaltenen Informationen sind jedoch nicht automatisch auffindbar, da sie mittels bestimmter Kartenelemente kodiert sind. Ihre Semantik wird erst bei der Interpretation durch einen Betrachter explizit. Die Kar-teninformationen sollen jedoch nicht nur von Menschen, sondern auch von Maschinen interpretiert werden können. Dies erfordert schon die große Menge der zu interpretierenden Daten. Die automati-sche Ableitung der Semantik aus den Karten wird unter dem Begriff Automatische Karteninterpreta-tion zusammengefasst. Es handelt sich dabei also um einen Prozess, der implizites Wissen eines Kar-tenbestandes explizit macht. Hierzu soll die vorliegende Arbeit Lösungen in Form der Karteninterpre-tation anbieten. Die Karteninterpretation dieser Arbeit erfolgt an Vektorkarten, die im Internet zu finden sind. Für die gezielte Suche der Vektorkarten des Internets wird eigens ein Webcrawler entwickelt. Der Webcrawler ist eine Suchmaschine, die speziell nach Vektorkarten sucht. Dazu wird ausschließlich das Shapefile-Dateiformat gesucht, das sich zu einer Art Standardformat im GIS-Umfeld entwickelt hat und in dem die Vektorkarten zumeist abgespeichert sind. Um möglichst viele Shapefiles zu finden, wird die Suche auf Servern betrieben, auf denen die Wahrscheinlichkeit Shapefiles zu finden hoch ist. Diese Server werden zuvor durch Google-Suche nach dem Schlüsselwort „shapefile download“ gefunden. Die Karteninterpretation umfasst Verfahren zur Interpretation der Kartenobjekte, der Kartentypen so-wie des Maßstabs. Zunächst soll das Verfahren zur Interpretation der Objekte einer Karte vorgestellt werden. Hier geht es darum, die Objekte anhand ihrer spezifischen Charakteristika automatisch zu erkennen. Die Ob-jekterkennung basiert auf SOM (Self-Organizing Map), bekannt aus der künstlichen Intelligenz. Die Kartenobjekte werden in Klassen wie beispielsweise Gebäudegrundriss oder Straßennetz gegliedert. Für jede Klasse sollen die ihr jeweils eigenen Merkmale gefunden und in eine der SOM zugängliche Form, hier als Parametervektor, gebracht werden. Die Parametervektoren bilden die Eingabemuster, die in der Lernphase von SOM gelernt werden. Nachdem die Eingabemuster aller Objektklassen von SOM gelernt wurden, wird der Parametervektor für jedes auf der Karte vorliegende Objekt ausgewertet und in die SOM eingegeben. Durch das zunächst erfolgte Lernen der Eingabemuster können die Ob-jekte anhand ihrer jeweils berechneten Parametervektoren der entsprechenden Objektklasse zugeord-net werden. Als weiteres Verfahren soll die Interpretation des Kartentyps vorgestellt werden. Karten sind nach ihrem inhaltlichen Gehalt und Zweck in Kartentypen wie beispielsweise Flusskarten, Straßenkarten, Höhenlinienkarten etc. kategorisiert. Wie bei der Interpretation der Objekte wird auch hierzu die SOM verwandt. Es werden also auch Eingabemuster gelernt, die die geometrischen Merkmale der Karten-typen repräsentieren. Die Merkmale ergeben sich sowohl aus der Struktur der einzelnen Objekte als auch aus der Topologie zwischen den Objekten auf einer Karte. Wird nun eine Karte in die SOM eingegeben, so erkennt die SOM anhand des gelernten Eingabemusters den entsprechenden Kartentyp. Zusätzlich erhält man den Dateinamen der Karten sowie den Inhalt der Webseite, auf welcher die Karte gefunden wurde. So wird in der vorliegenden Arbeit ebenfalls untersucht, inwiefern diese Zusatzin-formationen bei der Interpretation des Kartentyps helfen können. Die automatische Interpretation des Maßstabs ist neben der Interpretation der Kartenobjekte und Kar-tentypen ein weiteres Verfahren, das in der vorliegenden Arbeit diskutiert werden soll. Die Interpreta-tion des Maßstabs wird auf zwei Wegen vorangetrieben: Die Mehrfachrepräsentation und die Detail-lierungsgrade. Im ersten Fall kann der Maßstab aus der entsprechenden Repräsentation hergeleitet werden, da ein identisches Objekt in unterschiedlichen realitätsgetreuen Repräsentationen auf der Karte dargestellt wird. Im zweiten Fall kann der Maßstab aus den Detaillierungsgraden abgeleitet wer-den. Dies basiert darauf, dass die Karten mit verschiedenen Maßstäben unterschiedlich detailliert dar-gestellt werden.
  • Thumbnail Image
    ItemOpen Access
    CRBeDaSet : a benchmark dataset for high accuracy close range 3D object reconstruction
    (2023) Gabara, Grzegorz; Sawicki, Piotr
    This paper presents the CRBeDaSet - a new benchmark dataset designed for evaluating close range, image-based 3D modeling and reconstruction techniques, and the first empirical experiences of its use. The test object is a medium-sized building. Diverse textures characterize the surface of elevations. The dataset contains: the geodetic spatial control network (12 stabilized ground points determined using iterative multi-observation parametric adjustment) and the photogrammetric network (32 artificial signalized and 18 defined natural control points), measured using Leica TS30 total station and 36 terrestrial, mainly convergent photos, acquired from elevated camera standpoints with non-metric digital single-lens reflex Nikon D5100 camera (ground sample distance approx. 3 mm), the complex results of the bundle block adjustment with simultaneous camera calibration performed in the Pictran software package, and the colored point clouds (ca. 250 million points) from terrestrial laser scanning acquired using the Leica ScanStation C10 and post-processed in the Leica Cyclone™ SCAN software (ver. 2022.1.1) which were denoized, filtered, and classified using LoD3 standard (ca. 62 million points). The existing datasets and benchmarks were also described and evaluated in the paper. The proposed photogrammetric dataset was experimentally tested in the open-source application GRAPHOS and the commercial suites ContextCapture, Metashape, PhotoScan, Pix4Dmapper, and RealityCapture. As the first experience in its evaluation, the difficulties and errors that occurred in the software used during dataset digital processing were shown and discussed. The proposed CRBeDaSet benchmark dataset allows obtaining high accuracy (“mm” range) of the photogrammetric 3D object reconstruction in close range, based on a multi-image view uncalibrated imagery, dense image matching techniques, and generated dense point clouds.
  • Thumbnail Image
    ItemOpen Access
    On the information transfer between imagery, point clouds, and meshes for multi-modal semantics utilizing geospatial data
    (2022) Laupheimer, Dominik; Haala, Norbert (apl. Prof. Dr.-Ing.)
    The semantic segmentation of the huge amount of acquired 3D data has become an important task in recent years. Images and Point Clouds (PCs) are fundamental data representations, particularly in urban mapping applications. Textured meshes integrate both representations by wiring the PC and texturing the reconstructed surface elements with high-resolution imagery. Meshes are adaptive to the underlying mapped geometry due to their graph structure composed of non-uniform and non-regular entities. Hence, the mesh is a memory-efficient realistic-looking 3D map of the real world. For these reasons, we primarily opt for semantic segmentation of meshes, which is a widely overlooked topic in photogrammetry and remote sensing yet. In particular, we head for multi-modal semantics utilizing supervised learning. However, publicly available annotated geospatial mesh data has been rare at the beginning of the thesis. Therefore, annotating mesh data has to be done beforehand. To kill two birds with one stone, we aim for a multi-modal fusion that enables multi-modal enhancement of entity descriptors and semi-automatic data annotation leveraging publicly available annotations of non-mesh data. We propose a novel holistic geometry-driven association mechanism that explicitly integrates entities of modalities imagery, PC, and mesh. The established entity relationships between pixels, points, and faces enable the sharing of information across the modalities in a two-fold manner: (i) feature transfer (measured or engineered) and (ii) label transfer (predicted or annotated). The implementation follows a tile-wise strategy to facilitate scalability to large-scale data sets. At the same time, it enables parallel, distributed processing, reducing processing time. We demonstrate the effectiveness of the proposed method on the International Society for Photogrammetry and Remote Sensing (ISPRS) benchmark data sets Vaihingen 3D and Hessigheim 3D. Taken together, the proposed entity linking and subsequent information transfer inject great flexibility into the semantic segmentation of geospatial data. Imagery, PCs, and meshes can be semantically segmented with classifiers trained on any of these modalities utilizing features derived from any of these modalities. Particularly, we can semantically segment a modality by training a classifier on the same modality (direct approach) or by transferring predictions from other modalities (indirect approach). Hence, any established well-performing modality-specific classifier can be used for semantic segmentation of these modalities - regardless of whether they follow an end-to-end learning or feature-driven scheme. We perform an extensive ablation study on the impact of multi-modal handcrafted features for automatic 3D scene interpretation - both for the direct and indirect approach. We discuss and analyze various Ground Truth (GT) generation methods. The semi-automatic labeling leveraging the entity linking achieves consistent annotation across modalities and reduces the manual label effort to a single representation. Please note that the multiple epochs of the Hessigheim data consisting of manually annotated PCs and semi-automatically annotated meshes are a result of this thesis and provided to the community as part of the Hessigheim 3D benchmark. To further reduce the labeling effort to a few instances on a single modality, we combine the proposed information transfer with active learning. We recruit non-experts for the tedious labeling task and analyze their annotation quality. Subsequently, we compare the resulting classifier performances to conventional passive learning using expert annotation. In particular, we investigate the impact of visualizing the mesh instead of the PC on the annotation quality achieved by non-experts. In summary, we accentuate the mesh and its utility for multi-modal fusion, GT generation, multi-modal semantics, and visualizational purposes.
  • Thumbnail Image
    ItemOpen Access
  • Thumbnail Image
    ItemOpen Access
    Concept and performance evaluation of a novel UAV-borne topo-bathymetric LiDAR sensor
    (2020) Mandlburger, Gottfried; Pfennigbauer, Martin; Schwarz, Roland; Flöry, Sebastian; Nussbaumer, Lukas
    We present the sensor concept and first performance and accuracy assessment results of a novel lightweight topo-bathymetric laser scanner designed for integration on Unmanned Aerial Vehicles (UAVs), light aircraft, and helicopters. The instrument is particularly well suited for capturing river bathymetry in high spatial resolution as a consequence of (i) the low nominal flying altitude of 50-150 m above ground level resulting in a laser footprint diameter on the ground of typically 10-30 cm and (ii) the high pulse repetition rate of up to 200 kHz yielding a point density on the ground of approximately 20-50 points/m2. The instrument features online waveform processing and additionally stores the full waveform within the entire range gate for waveform analysis in post-processing. The sensor was tested in a real-world environment by acquiring data from two freshwater ponds and a 500 m section of the pre-Alpine Pielach River (Lower Austria). The captured underwater points featured a maximum penetration of two times the Secchi depth. On dry land, the 3D point clouds exhibited (i) a measurement noise in the range of 1-3 mm; (ii) a fitting precision of redundantly captured flight strips of 1 cm; and (iii) an absolute accuracy of 2-3 cm compared to terrestrially surveyed checkerboard targets. A comparison of the refraction corrected LiDAR point cloud with independent underwater checkpoints exhibited a maximum deviation of 7.8 cm and revealed a systematic depth-dependent error when using a refraction coefficient of n = 1.36 for time-of-flight correction. The bias is attributed to multi-path effects in the turbid water column (Secchi depth: 1.1 m) caused by forward scattering of the laser signal at suspended particles. Due to the high spatial resolution, good depth performance, and accuracy, the sensor shows a high potential for applications in hydrology, fluvial morphology, and hydraulic engineering, including flood simulation, sediment transport modeling, and habitat mapping.
  • Thumbnail Image
    ItemOpen Access
    Dense image matching for close range photogrammetry
    (2016) Wenzel, Konrad; Fritsch, Dieter (Prof. Dr.-Ing.)
    Dichte Bildzuordnung ermöglicht die Berechnung von 3D Oberflächen aus mindestens zwei Bildern durch die Lösung des Korrespondenzproblems für jedes Pixel. Anhand der Korrespondenzinformation und der bekannten Kamerageometrie kann die Tiefeninformation durch einen Schnitt der Sichtstrahlen im Raum rekonstruiert werden. Dichte Stereobildzuordnung wird beispielsweise für Stereo-Kamerasysteme im Robotik- und Automobilbereich eingesetzt, wo die Tiefeninformation mit hoher Frequenz berechnet und für Aufgaben wie Szenenverstehen und Maschinensteuerung verwendet wird. Die Erweiterung zu Multi-View Stereo ermöglicht die Rekonstruktion von Oberflächen aus mehr als zwei Bildern. In Kombination mit den aktuellen Entwicklungen im Bereich automatischer Orientierungsbestimmung können komplexe Szenen mit beliebiger Kamerakonfiguration ohne weitere Vorinformationen erfasst werden. Hierdurch wird die Erfassung von Oberflächen mit handelsüblichen Kameras für Anwendungen wie Denkmalpflege oder Vermessung ermöglicht. Die Herausforderungen der dichten Bildzuordnung sind insbesondere schwache Texturen und sich wiederholende Muster, welche zu Mehrdeutigkeiten bei der Zuordnung führen. Die verwendete Bildzordnungsmethode sollte diese zuverlässig auflösen können und zudem robust gegenüber radiometrischen und projektiven Unterschieden sein. Szenen mit starken Tiefenvariationen, beispielsweise durch Vordergrundobjekte und entfernten Hintergrund, sollten prozessiert werden können ohne scharfe Kanten oder Details zu verlieren. Fehlzuordnungen und falsche Korrespondenzen durch bewegte Objekte sollten automatisch erkannt werden. In dieser Arbeit wird eine Multi-View Stereo Methode vorgestellt, welche dichte Punkt- wolken für einen gegebenen Satz von Bildern und deren Orientierungen rekonstruiert ohne Vorinformationen über die Szene zu benötigen. Sie skaliert auf große Datensätze komplexer Szenen mit starken Tiefen- und Maßstabsvariationen. Die Methode basiert auf einem multi-baseline Ansatz, bei welchem für jedes Bild Disparitätskarten für mehrere Stereomodelle mithilfe einer hierarchischen Semi Global Matching Methode berechnet werden. Anschließend werden die resultierenden Disparitätskarten in einem Multi-Stereo Triangulationsschritt für die Berechnung einer dichten Punktwolke verwendet. In einer darauf folgenden Punktwolkenfusion und -filterung werden die Punktwolken der Einzelbilder zusammengefasst und validiert, um Ausreißer und redundante Punkte zu eliminieren. Das erste Kapitel umfasst eine Einleitung in das Thema und die Ziele dieser Arbeit. Im zweiten Kapitel wird der Stand der Forschung im Vergleich zu der in dieser Arbeit vorgestellten Methode diskutiert. Die Methode selbst wird den darauffolgenden drei Kapiteln im Detail behandelt. Das dritte Kapitel umfasst dabei den multi-baseline Ansatz für die bildweise Extraktion von Punktwolken, während das vierte Kapitel das Problem der Auswahl von günstigen Stereomodellen in Bezug auf geometrische Konfigurationen diskutiert. Im fünften Kapitel folgt ein Nachverarbeitungsschritt zur Punktwolkenfusion und -filterung. Das siebte Kapitel umfasst eine Zusammenfassung mit Hinblick auf Grenzen des Verfahrens sowie mögliche Erweiterungen.