Stereoscopic videos : data generation, image synthesis and motion analysis

Thumbnail Image

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Videos are an essential data source for computer vision and an important form of media and entertainment. While many research works consider monocular videos, i.e. videos captured with a single camera, there is less focus on stereoscopic videos, i.e. videos captured with two cameras, although they have a direct correspondence to the human binocular vision system. In this thesis, we will discuss stereoscopic videos in detail and focus on the three main topics data generation, image synthesis and motion analysis. In the first main topic, we discuss data generation for stereoscopic video tasks in computer vision. There, we first introduce a novel large-scale stereoscopic video dataset with ground truth for the stereo matching, optical flow and scene flow tasks and propose a high-detail evaluation methodology that is able to assess predictions at fine details such as grass or hair. Based on this, we introduce a benchmark website for the evaluation and comparison of future works. We further analyze the results of 14 initial methods from the dense matching literature and investigate the influence of their model architectures on the benchmark performance. Additionally, we propose an augmentation strategy with snow particle effects that can be used to create even more data based on existing datasets. In the second main topic, we introduce image synthesis for stereo conversion, i.e. generating stereoscopic videos from monocular videos. We present a method that performs disparity-aware warping, consistent foreground-background compositing and background-aware inpainting and combines these steps with a temporal consistency strategy that integrates information from additional video frames. Several experiments not only show that our approach outperforms existing methods both visually and quantitatively by a large margin, but also analyze the model design choices. Further, by adding extensions for user interaction to our model, we demonstrate that our approach is directly applicable to current practices in 3D movie production. In our third main topic, we cover motion analysis for stereoscopic videos by introducing a novel method for scene flow prediction. Our model predicts scene flow based on three frames from a stereoscopic video by combining forward predictions and the SE(3) matrix inversion of backward predictions in a fusion module, which leads to strong improvements over baseline models and highly competitive benchmark results. Further experiments demonstrate model robustness and compare architectures, scene flow parametrizations and fusion strategies. With contributions in these three main topics, this dissertation advances both datasets and algorithms in the context of stereoscopic videos.

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By