Ενσωμάτωση δεσμεύσεων στην πολυεικονική ανακατασκευή

Σταθοπούλου, Ελισάβετ-Κωνσταντίνα; Stathopoulou, Elisavet-Konstantina

dc.contributor.author	Σταθοπούλου, Ελισάβετ-Κωνσταντίνα	el
dc.contributor.author	Stathopoulou, Elisavet-Konstantina	en
dc.date.accessioned	2023-01-12T10:16:04Z
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/56640
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.24338
dc.rights	Αναφορά Δημιουργού - Μη Εμπορική Χρήση - Παρόμοια Διανομή 3.0 Ελλάδα	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/3.0/gr/	*
dc.subject	Υπολογισμός βάθους	el
dc.subject	Πολυεικονική ανακατασκευή	el
dc.subject	Σημασιολογική κατάτμηση	el
dc.subject	Τρισδιάστατη ανακατασκευή	el
dc.subject	PatchMatch	en
dc.subject	Multiple view stereo	en
dc.subject	Semantic segmentation	en
dc.subject	3D reconstruction	en
dc.subject	Depth estimation	en
dc.title	Ενσωμάτωση δεσμεύσεων στην πολυεικονική ανακατασκευή	el
dc.title	Integrating scene priors in Multi-View Stereo (MVS)	en
dc.contributor.department	Εργαστήριο Φωτογραμμετρίας - Laboratory of photogrammetry	el
heal.type	doctoralThesis
heal.classification	Φωτογραμμετρία	el
heal.classification	Photogrammetry	en
heal.dateAvailable	2024-01-11T22:00:00Z
heal.language	en
heal.access	embargo
heal.recordProvider	ntua	el
heal.publicationDate	2022-09
heal.abstract	Image-based 3D reconstruction addresses the problem of generating 3D representations of 3D scenes given overlapping 2D images as observations. It is one of the fundamental topics in photogrammetry and computer vision, counting many decades of research. In recent decades, many robust algorithms have been introduced for pixel depth estimation and 3D reconstruction, achieving great results in various applications. However, there are still several open challenges and space for improvement toward efficient, complete, and accurate 3D reconstruction in real-world scenarios. Geometric 3D reconstruction is closely related to scene understanding, another hot topic in computer vision research that has seen tremendous growth due to the recently developed deep learning algorithms. Indeed, advanced scene prior cues can potentially support efficient 3D reconstruction and vice versa. However, semantic reasoning directly in the 3D space is non-trivial mainly due to the limited availability of training data and the computational complexity; on the contrary, algorithms for 2D semantic segmentation are mature enough to obtain robust results, and the existence of large-scale datasets facilitates the generalization of the trained models. In this dissertation, both 3D reconstruction and semantic segmentation are comprehensively studied and interlinked; the main open challenges and limitations are identified, while innovative and easy-to-implement solutions in real-world scenarios are proposed. In the field of semantic segmentation, a new benchmark with GT semantic maps of pixel-level accuracy for historic building facades is introduced, 3DOM Semantic Facade, acknowledging the lack of existing, high-resolution benchmarks for similar purposes. Using this benchmark, a straightforward pipeline for model training based on state-of-the-art learning algorithms is proposed, and the inference results are experimentally evaluated on unseen data. Moreover, a new functionality is built upon the open-source and broadly-used MVS pipeline OpenMVS to enable label transfer from 2D to 3D, yielding semantically enriched dense point clouds. At the same time, selective (class-specific) reconstruction is made possible based on the semantic label of each scene pixel; in this way, the reconstruction of only the areas of interest is enabled according to the needs of each application. These functionalities are domain-independent and can, thus, be generalized in every MVS scenario for which semantic segmentation maps are available. Regarding depth estimation and reconstruction, this thesis focuses on the multi-view stereo (MVS) part of the 3D reconstruction; it proposes methods to integrate advanced scene priors in the process in order to obtain high-quality and complete 3D point clouds. Depth estimation typically relies on correspondence search based on visual appearance between image pixels and is commonly measured using photometric consistency metrics. A variety of robust algorithms exist for efficient correspondence search and subsequent depth reconstruction for both stereo (two-view) and multi-view scenarios. Yet, certain limitations regarding the geometry of the scene (slanted surfaces, occlusions), material properties (repetitive patterns, textureless, reflective, or transparent surfaces), and acquisition conditions remain challenging. The main goal and objectives of the thesis refer to the development of novel practical approaches toward confronting the inevitable matching ambiguities in large, non-Lambertian surfaces due to the nature of the photometric consistency costs. The first proposed method exploits semantic priors to indicate important cues for the 3D scene structure. Thus, a novel strategy is proposed to guide the depth propagation in such challenging surfaces under a PatchMatch-based scenario using RANSAC-based plane hypotheses in the 3D space. Then, a novel, adaptive cost function is introduced to leverage the prior hypotheses with the standard photometric cost and adaptively promote more reliable depth estimates across the image. During the experimental evaluation on the \textit{ETH3D} benchmark as well as on custom scenes, the proposed algorithm achieved constantly better results than the baseline method in point cloud completeness while not sacrificing accuracy. Given the growing availability of semantically segmented data, this approach can be implemented in a variety of scenarios, indoor and outdoor. However, in real-world applications, it is not always trivial to obtain such semantic cues for every scene; a large amount of additional GT data may be required, and model training or fine-tuning is often a laborious task. Thus, an alternative, generic and domain-independent solution is also proposed, guided only by local structure and textureness cues. Based on quadtree decomposition on the image, groups of pixels with similar color attributes are grouped together. Similar to the previous method, planar hypotheses are extracted in 3D and guided by the quadtree blocks. The adaptive cost function is also used here to support PatchMatch depth propagation. Results on the entire training and test set of the ETH3D dataset demonstrate the effectiveness of the proposed approach and show a clear improvement in performance scores with respect to the baseline method while being competitive with other state-of-the-art algorithms. To further prove the applicability of the new method under varying scenarios, two more custom datasets were considered, on which similar improvements were achieved. The proposed methodologies are integrated into the well-established, open-source framework OpenMVS to promote usability and reproducibility.	en
heal.advisorName	Γεωργόπουλος, Ανδρέας	el
heal.advisorName	Georgopoulos, Andreas	en
heal.committeeMemberName	Georgopoulos, Andreas	en
heal.committeeMemberName	Remondino, Fabio	en
heal.committeeMemberName	Karantzalos, Konstantinos	en
heal.committeeMemberName	Ioannidis, Charalambos	en
heal.committeeMemberName	Doulamis, Anastasios	en
heal.committeeMemberName	Fusiello, Andrea	en
heal.committeeMemberName	Pateraki, Maria	en
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Αγρονόμων και Τοπογράφων Μηχανικών. Τομέας Τοπογραφίας. Εργαστήριο Φωτογραμμετρίας	el
heal.academicPublisherID	ntua
heal.numberOfPages	234 σ.	el
heal.fullTextAvailability	false