06 Fakultät Luft- und Raumfahrttechnik und Geodäsie

Permanent URI for this collectionhttps://elib.uni-stuttgart.de/handle/11682/7

Browse

Search Results

Now showing 1 - 5 of 5
  • Thumbnail Image
    ItemOpen Access
    Design and development of a calibration solution feasible for series production of cameras for video-based driver-assistant systems
    (2022) Nekouei Shahraki, Mehrdad; Haala, Norbert (apl. Prof. Dr.)
    In this study, we reviewed the current techniques and methods in photogrammetry - especially close-range photogrammetry - and focused on camera calibration. We reviewed the new evolving field of video-based driver-assistant systems, their requirements and their applications. Exclusively of fisheye cameras and a general omnidirectional projection, we extended an existing camera calibration model to address our needs and functionality requirements. These extensions enable us to use the camera calibration model in real-time embedded mobile systems with low processing power. We also introduced the free-function model as a flexible and advantageous model for camera distortion modelling. This is a new approach for modelling the overall image distortion together with the local lens distortions that are estimated using a standard model during the calibration process. Using free-function model on different lens designs, one can achieve good calibration accuracies by modelling the very local lens distortion taking benefit from the flexibility of this model. We introduced optimization strategies for recalculation and image rectification. These optimizations are also used to minimize the amount of required processing power and device memory. This brings many advantages to variety of computational platforms such as FPGAs, x86 and ARM processors, and makes it possible to benefit from variety of parallel-processing techniques. This model is capable of being used in runtime and is an ideal calibration model for using in variety of machine vision solutions. We also discussed several important requirements for accurate camera calibration that we later used in hardware test stand design phase. We designed and developed two different test stands in order to realize the specifications and geometrical features of multiple-view test-field-based camera calibration referred to as bundle-block calibration. One of their special geometrical characteristics is the uniform point distribution, which corresponds to the uniform motion. Such a point distribution is beneficial when using calibration models such as free-function model that enable us to model of local lens distortion with good accuracy and quality all over the image. A very important feature of this test stand is having the capability of performing camera/sensor alignment testing, a feature which is very important for testing the geometrical alignment of the internal mechanical elements of each camera. Using automated machines and algorithms in test stand calibration increased the stability and accuracy of the calibration and thus ensured the quality and speed of the calibration for cameras. These test stands are capable of performing automatic camera calibration, suitable for applications such as series-production of cameras. As an accuracy -and flexibility evaluation step for the free-function model, we tested the free-function calibration model on real-world data using a stereo camera with added large local distortions taking images from a front vehicle similar to the conditions where real-world use-cases are defined. By performing the camera calibration, we compared the calibration results and accuracy parameters of the free-function model to a conventional calibration model. Using these calibration results, we generated a set of disparity maps and compared their density and availability, especially on the areas where the local distortion was present. We used this test to compare the capabilities of the proposed model to conventional ones in real-wold situations where large optical distortions could be present that cannot be easily modelled with conventional calibration models. The higher modelling capability and accuracy of the free-function model will generally influence those functions that are using the information of the disparity map or the derived 3D information as part of their input data and potentially leads to the better functionality or even their availability if local distortions are present in the image. There are many more use-cases in photogrammetry and computer-vision where a higher calibration accuracy is beneficial on hardware such as low-cost optics where sometimes optical distortion are available that cannot easily be modelled with classical models. These use-cases could all benefit from the flexibility and modelling accuracy of the free-function model.
  • Thumbnail Image
    ItemOpen Access
    On the information transfer between imagery, point clouds, and meshes for multi-modal semantics utilizing geospatial data
    (2022) Laupheimer, Dominik; Haala, Norbert (apl. Prof. Dr.-Ing.)
    The semantic segmentation of the huge amount of acquired 3D data has become an important task in recent years. Images and Point Clouds (PCs) are fundamental data representations, particularly in urban mapping applications. Textured meshes integrate both representations by wiring the PC and texturing the reconstructed surface elements with high-resolution imagery. Meshes are adaptive to the underlying mapped geometry due to their graph structure composed of non-uniform and non-regular entities. Hence, the mesh is a memory-efficient realistic-looking 3D map of the real world. For these reasons, we primarily opt for semantic segmentation of meshes, which is a widely overlooked topic in photogrammetry and remote sensing yet. In particular, we head for multi-modal semantics utilizing supervised learning. However, publicly available annotated geospatial mesh data has been rare at the beginning of the thesis. Therefore, annotating mesh data has to be done beforehand. To kill two birds with one stone, we aim for a multi-modal fusion that enables multi-modal enhancement of entity descriptors and semi-automatic data annotation leveraging publicly available annotations of non-mesh data. We propose a novel holistic geometry-driven association mechanism that explicitly integrates entities of modalities imagery, PC, and mesh. The established entity relationships between pixels, points, and faces enable the sharing of information across the modalities in a two-fold manner: (i) feature transfer (measured or engineered) and (ii) label transfer (predicted or annotated). The implementation follows a tile-wise strategy to facilitate scalability to large-scale data sets. At the same time, it enables parallel, distributed processing, reducing processing time. We demonstrate the effectiveness of the proposed method on the International Society for Photogrammetry and Remote Sensing (ISPRS) benchmark data sets Vaihingen 3D and Hessigheim 3D. Taken together, the proposed entity linking and subsequent information transfer inject great flexibility into the semantic segmentation of geospatial data. Imagery, PCs, and meshes can be semantically segmented with classifiers trained on any of these modalities utilizing features derived from any of these modalities. Particularly, we can semantically segment a modality by training a classifier on the same modality (direct approach) or by transferring predictions from other modalities (indirect approach). Hence, any established well-performing modality-specific classifier can be used for semantic segmentation of these modalities - regardless of whether they follow an end-to-end learning or feature-driven scheme. We perform an extensive ablation study on the impact of multi-modal handcrafted features for automatic 3D scene interpretation - both for the direct and indirect approach. We discuss and analyze various Ground Truth (GT) generation methods. The semi-automatic labeling leveraging the entity linking achieves consistent annotation across modalities and reduces the manual label effort to a single representation. Please note that the multiple epochs of the Hessigheim data consisting of manually annotated PCs and semi-automatically annotated meshes are a result of this thesis and provided to the community as part of the Hessigheim 3D benchmark. To further reduce the labeling effort to a few instances on a single modality, we combine the proposed information transfer with active learning. We recruit non-experts for the tedious labeling task and analyze their annotation quality. Subsequently, we compare the resulting classifier performances to conventional passive learning using expert annotation. In particular, we investigate the impact of visualizing the mesh instead of the PC on the annotation quality achieved by non-experts. In summary, we accentuate the mesh and its utility for multi-modal fusion, GT generation, multi-modal semantics, and visualizational purposes.
  • Thumbnail Image
    ItemOpen Access
    Radargrammetric DSM generation by semi-global matching and evaluation of penalty functions
    (2022) Wang, Jinghui; Gong, Ke; Balz, Timo; Haala, Norbert; Sörgel, Uwe; Zhang, Lu; Liao, Mingsheng
    Radargrammetry is a useful approach to generate Digital Surface Models (DSMs) and an alternative to InSAR techniques that are subject to temporal or atmospheric decorrelation. Stereo image matching in radargrammetry refers to the process of determining homologous points in two images. The performance of image matching influences the final quality of DSM used for spatial-temporal analysis of landscapes and terrain. In SAR image matching, local matching methods are commonly used but usually produce sparse and inaccurate homologous points adding ambiguity to final products; global or semi-global matching methods are seldom applied even though more accurate and dense homologous points can be yielded. To fill this gap, we propose a hierarchical semi-global matching (SGM) pipeline to reconstruct DSMs in forested and mountainous regions using stereo TerraSAR-X images. In addition, three penalty functions were implemented in the pipeline and evaluated for effectiveness. To make accuracy and efficiency comparisons between our SGM dense matching method and the local matching method, the normalized cross-correlation (NCC) local matching method was also applied to generate DSMs using the same test data. The accuracy of radargrammetric DSMs was validated against an airborne photogrammetric reference DSM and compared with the accuracy of NASA’s 30 m SRTM DEM. The results show the SGM pipeline produces DSMs with height accuracy and computing efficiency that exceeds the SRTM DEM and NCC-derived DSMs. The penalty function adopting the Canny edge detector yields a higher vertical precision than the other two evaluated penalty functions. SGM is a powerful and efficient tool to produce high-quality DSMs using stereo Spaceborne SAR images.
  • Thumbnail Image
    ItemOpen Access
    Individual tree detection in urban ALS point clouds with 3D convolutional networks
    (2022) Schmohl, Stefan; Narváez Vallejo, Alejandra; Sörgel, Uwe
    Since trees are a vital part of urban green infrastructure, automatic mapping of individual urban trees is becoming increasingly important for city management and planning. Although deep-learning-based object detection networks are the state-of-the-art in computer vision, their adaptation to individual tree detection in urban areas has scarcely been studied. Some existing works have employed 2D object detection networks for this purpose. However, these have used three-dimensional information only in the form of projected feature maps. In contrast, we exploited the full 3D potential of airborne laser scanning (ALS) point clouds by using a 3D neural network for individual tree detection. Specifically, a sparse convolutional network was used for 3D feature extraction, feeding both semantic segmentation and circular object detection outputs, which were combined for further increased accuracy. We demonstrate the capability of our approach on an urban topographic ALS point cloud with 10,864 hand-labeled ground truth trees. Our method achieved an average precision of 83% regarding the common 0.5 intersection over union criterion. 85% percent of the stems were found correctly with a precision of 88%, while tree area was covered by the individual tree detections with an F1 accuracy of 92%. Thereby, we outperformed traditional delineation baselines and recent detection networks.
  • Thumbnail Image
    ItemOpen Access
    Integration of geometric computer vision, endoscopy and computed tomography for 3D modeling of gyroscopic instruments
    (2022) Zhan, Kun; Fritsch, Dieter (Prof. Dr.-Ing. habil. Prof. h. c.)
    3D digitization is of vital importance for cultural heritage assets for modern civilizations regarding safekeeping and promotion. Generally, cultural heritage indicates old buildings, ancient status or unearthed relics for the public. However, the objectives to be digitized also include tools and instruments that have been widely applied in the past decades, even though they have been replaced with more advanced technologies. We call these technical instruments and artifacts Tech Heritage (TH). Gyroscopes are one group of such fascinating instruments with a history dating back to 200 years. The main characteristics of gyroscopes regarding 3D digitization are (1) having highly complex structure; (2) consisting of different materials; (3) not only the surfaces but also the internal structures are important. All these features decide that no single methodology could meet the demand for their 3D digitization. To fulfill the requirements of gyroscopes in our research, photogrammetry, endoscopy and Computed Tomography (CT) are introduced for complete 3D digitization. With colored point clouds or textured meshes as result, photogrammetry is mainly for the global surface reconstruction of the object. For some cavities, holes or other parts that the regular camera hardly has access to, endoscopy is applied for a local 3D reconstruction, as supplement. As internal structures are also important, X-Ray computed tomography is utilized for volumetric 3D digitization. These three 3D sensor data should then be integrated for a complete 3D model. Additionally, the registration method should be adaptive to the data characteristics such as the geometry, point cloud density, etc. In this thesis, 3D reconstructions with each method as well as the data fusion are investigated. 1. Firstly, we study the stability and reliability of camera calibration before 3D reconstruction with photogrammetry and endoscopy. As the standard pre-calibration solution, Zhang's method suffers from the instability due to the correlations between the calibration parameters. To reduce this effect, the image configuration should be well considered with adequate oblique angles, distance difference as well as roll angels for a convergent image block. In our research, a quantitative analysis is implemented by a statistical approach using large bundles of images and get calibrations from randomly chosen image subsets. In addition, the recovered expected values of parameters are utilized as ground truth to scrutinize the single influencing factors of the imaging configuration. 2. Secondly, the 3D reconstruction processes are investigated with practical implementations. For the endoscope 3D reconstruction, the data acquisition process is the first challenge resulting from the image blur which may caused by the hand shaking as well as the small overlap. The imaging assistant setup and a mixture of image and video strategy are the methods adopted in our research as the solution. With the accurate calibration information and the improved image quality and configuration, we optimize the entire process through optimization of the Structure-from-Motion (SfM) method. As for CT 3D reconstruction, a stack of X-ray images, carrying the information of attenuation, is to be collected from different perspectives of the object. All reconstructed slices are integrated into an uniform 3D coordinate system to construct the complete 3D volumetric representation. 3. Thirdly, data registration methods are proposed regarding different data features. To register these two 3D data with few overlaps such as photogrammetry and endoscopic point clouds, a Gauss-Helmert model with manually picked control points is applied for transformation estimation with precision assessments. To take advantage of the pair-wise point cloud registration research, CT point cloud conversion and surface extraction are implemented from the volumetric CT data. As for the CT and photogrammetry data registration, it could be divided into two cases regarding the completeness of the CT surface representation. If the surface material is completely indicated in the CT data, we could directly project the color information from photogrammetric images to the CT surface after both datasets are transformed into the same coordinate system. In this way, we combine the high precision of CT data with the rich texture information. While low density surface material causes an incomplete representation of the CT surface, the transformation is estimated via the primitive based virtual control points from both surface data. With the determined transformation, the photogrammetric model could then be integrated with the CT model for a complete 3D representation. 4. Finally, in terms of 3D model expression, point clouds are of too big data volume if precision is required and have limited interaction possibilities. Therefore, the point clouds need to be vectorized into Constructive Solid Geometry (CSG) models to enable easier human-computer interaction. This process could be precisely done via manual work with sufficient caution via a Random Sample Consensus (RANSAC)-based geometric fitting process or even with a deep learning strategy via an end-to-end trained framework. The vectorized 3D model could be applied in AR/VR related applications to make full use of the work of 3D digitization. For the first time, three totally different sensors are studied for a fused 3D reconstruction in this research. Among the workflow, the practical application of endoscopy is fully investigated. The integration methods are adaptively designed according to the characteristics of each sensor as well as of the reconstructed object. It provides more possibilities and ideas for the digital tasks of different types of cultural heritage.