On the information transfer between imagery, point clouds, and meshes for multi-modal semantics utilizing geospatial data

Laupheimer, Dominik

Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen: http://dx.doi.org/10.18419/opus-12668

Langanzeige der Metadaten

DC Element	Wert	Sprache
dc.contributor.advisor	Haala, Norbert (apl. Prof. Dr.-Ing.)	-
dc.contributor.author	Laupheimer, Dominik	-
dc.date.accessioned	2023-01-19T14:15:26Z	-
dc.date.available	2023-01-19T14:15:26Z	-
dc.date.issued	2022	de
dc.identifier.other	1831550253	-
dc.identifier.uri	http://nbn-resolving.de/urn:nbn:de:bsz:93-opus-ds-126879	de
dc.identifier.uri	http://elib.uni-stuttgart.de/handle/11682/12687	-
dc.identifier.uri	http://dx.doi.org/10.18419/opus-12668	-
dc.description.abstract	The semantic segmentation of the huge amount of acquired 3D data has become an important task in recent years. Images and Point Clouds (PCs) are fundamental data representations, particularly in urban mapping applications. Textured meshes integrate both representations by wiring the PC and texturing the reconstructed surface elements with high-resolution imagery. Meshes are adaptive to the underlying mapped geometry due to their graph structure composed of non-uniform and non-regular entities. Hence, the mesh is a memory-efficient realistic-looking 3D map of the real world. For these reasons, we primarily opt for semantic segmentation of meshes, which is a widely overlooked topic in photogrammetry and remote sensing yet. In particular, we head for multi-modal semantics utilizing supervised learning. However, publicly available annotated geospatial mesh data has been rare at the beginning of the thesis. Therefore, annotating mesh data has to be done beforehand. To kill two birds with one stone, we aim for a multi-modal fusion that enables multi-modal enhancement of entity descriptors and semi-automatic data annotation leveraging publicly available annotations of non-mesh data. We propose a novel holistic geometry-driven association mechanism that explicitly integrates entities of modalities imagery, PC, and mesh. The established entity relationships between pixels, points, and faces enable the sharing of information across the modalities in a two-fold manner: (i) feature transfer (measured or engineered) and (ii) label transfer (predicted or annotated). The implementation follows a tile-wise strategy to facilitate scalability to large-scale data sets. At the same time, it enables parallel, distributed processing, reducing processing time. We demonstrate the effectiveness of the proposed method on the International Society for Photogrammetry and Remote Sensing (ISPRS) benchmark data sets Vaihingen 3D and Hessigheim 3D. Taken together, the proposed entity linking and subsequent information transfer inject great flexibility into the semantic segmentation of geospatial data. Imagery, PCs, and meshes can be semantically segmented with classifiers trained on any of these modalities utilizing features derived from any of these modalities. Particularly, we can semantically segment a modality by training a classifier on the same modality (direct approach) or by transferring predictions from other modalities (indirect approach). Hence, any established well-performing modality-specific classifier can be used for semantic segmentation of these modalities - regardless of whether they follow an end-to-end learning or feature-driven scheme. We perform an extensive ablation study on the impact of multi-modal handcrafted features for automatic 3D scene interpretation - both for the direct and indirect approach. We discuss and analyze various Ground Truth (GT) generation methods. The semi-automatic labeling leveraging the entity linking achieves consistent annotation across modalities and reduces the manual label effort to a single representation. Please note that the multiple epochs of the Hessigheim data consisting of manually annotated PCs and semi-automatically annotated meshes are a result of this thesis and provided to the community as part of the Hessigheim 3D benchmark. To further reduce the labeling effort to a few instances on a single modality, we combine the proposed information transfer with active learning. We recruit non-experts for the tedious labeling task and analyze their annotation quality. Subsequently, we compare the resulting classifier performances to conventional passive learning using expert annotation. In particular, we investigate the impact of visualizing the mesh instead of the PC on the annotation quality achieved by non-experts. In summary, we accentuate the mesh and its utility for multi-modal fusion, GT generation, multi-modal semantics, and visualizational purposes.	en
dc.language.iso	en	de
dc.rights	info:eu-repo/semantics/openAccess	de
dc.subject.ddc	000	de
dc.subject.ddc	550	de
dc.subject.ddc	620	de
dc.title	On the information transfer between imagery, point clouds, and meshes for multi-modal semantics utilizing geospatial data	en
dc.type	doctoralThesis	de
ubs.dateAccepted	2022-09-09	-
ubs.fakultaet	Luft- und Raumfahrttechnik und Geodäsie	de
ubs.institut	Institut für Photogrammetrie	de
ubs.publikation.seiten	149	de
ubs.publikation.typ	Dissertation	de
ubs.thesis.grantor	Luft- und Raumfahrttechnik und Geodäsie	de
Enthalten in den Sammlungen:	06 Fakultät Luft- und Raumfahrttechnik und Geodäsie

Dateien zu dieser Ressource:

Datei	Beschreibung	Größe	Format
dissertation_laupheimer.pdf		328,8 MB	Adobe PDF	Öffnen/Anzeigen

Zur Kurzanzeige

Alle Ressourcen in diesem Repositorium sind urheberrechtlich geschützt.

Universität Stuttgart

OPUS - Online Publikationen der Universität Stuttgart