dc.contributor.author | Σταθοπούλου, Ελισάβετ-Κωνσταντίνα | el |
dc.contributor.author | Stathopoulou, Elisavet-Konstantina | en |
dc.date.accessioned | 2023-01-12T10:16:04Z | |
dc.identifier.uri | https://dspace.lib.ntua.gr/xmlui/handle/123456789/56640 | |
dc.identifier.uri | http://dx.doi.org/10.26240/heal.ntua.24338 | |
dc.rights | Αναφορά Δημιουργού - Μη Εμπορική Χρήση - Παρόμοια Διανομή 3.0 Ελλάδα | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/3.0/gr/ | * |
dc.subject | Υπολογισμός βάθους | el |
dc.subject | Πολυεικονική ανακατασκευή | el |
dc.subject | Σημασιολογική κατάτμηση | el |
dc.subject | Τρισδιάστατη ανακατασκευή | el |
dc.subject | PatchMatch | en |
dc.subject | Multiple view stereo | en |
dc.subject | Semantic segmentation | en |
dc.subject | 3D reconstruction | en |
dc.subject | Depth estimation | en |
dc.title | Ενσωμάτωση δεσμεύσεων στην πολυεικονική ανακατασκευή | el |
dc.title | Integrating scene priors in Multi-View Stereo (MVS) | en |
dc.contributor.department | Εργαστήριο Φωτογραμμετρίας - Laboratory of photogrammetry | el |
heal.type | doctoralThesis | |
heal.classification | Φωτογραμμετρία | el |
heal.classification | Photogrammetry | en |
heal.dateAvailable | 2024-01-11T22:00:00Z | |
heal.language | en | |
heal.access | embargo | |
heal.recordProvider | ntua | el |
heal.publicationDate | 2022-09 | |
heal.abstract | Image-based 3D reconstruction addresses the problem of generating 3D representations of 3D scenes given overlapping 2D images as observations. It is one of the fundamental topics in photogrammetry and computer vision, counting many decades of research. In recent decades, many robust algorithms have been introduced for pixel depth estimation and 3D reconstruction, achieving great results in various applications. However, there are still several open challenges and space for improvement toward efficient, complete, and accurate 3D reconstruction in real-world scenarios. Geometric 3D reconstruction is closely related to scene understanding, another hot topic in computer vision research that has seen tremendous growth due to the recently developed deep learning algorithms. Indeed, advanced scene prior cues can potentially support efficient 3D reconstruction and vice versa. However, semantic reasoning directly in the 3D space is non-trivial mainly due to the limited availability of training data and the computational complexity; on the contrary, algorithms for 2D semantic segmentation are mature enough to obtain robust results, and the existence of large-scale datasets facilitates the generalization of the trained models. In this dissertation, both 3D reconstruction and semantic segmentation are comprehensively studied and interlinked; the main open challenges and limitations are identified, while innovative and easy-to-implement solutions in real-world scenarios are proposed. In the field of semantic segmentation, a new benchmark with GT semantic maps of pixel-level accuracy for historic building facades is introduced, 3DOM Semantic Facade, acknowledging the lack of existing, high-resolution benchmarks for similar purposes. Using this benchmark, a straightforward pipeline for model training based on state-of-the-art learning algorithms is proposed, and the inference results are experimentally evaluated on unseen data. Moreover, a new functionality is built upon the open-source and broadly-used MVS pipeline OpenMVS to enable label transfer from 2D to 3D, yielding semantically enriched dense point clouds. At the same time, selective (class-specific) reconstruction is made possible based on the semantic label of each scene pixel; in this way, the reconstruction of only the areas of interest is enabled according to the needs of each application. These functionalities are domain-independent and can, thus, be generalized in every MVS scenario for which semantic segmentation maps are available. Regarding depth estimation and reconstruction, this thesis focuses on the multi-view stereo (MVS) part of the 3D reconstruction; it proposes methods to integrate advanced scene priors in the process in order to obtain high-quality and complete 3D point clouds. Depth estimation typically relies on correspondence search based on visual appearance between image pixels and is commonly measured using photometric consistency metrics. A variety of robust algorithms exist for efficient correspondence search and subsequent depth reconstruction for both stereo (two-view) and multi-view scenarios. Yet, certain limitations regarding the geometry of the scene (slanted surfaces, occlusions), material properties (repetitive patterns, textureless, reflective, or transparent surfaces), and acquisition conditions remain challenging. The main goal and objectives of the thesis refer to the development of novel practical approaches toward confronting the inevitable matching ambiguities in large, non-Lambertian surfaces due to the nature of the photometric consistency costs. The first proposed method exploits semantic priors to indicate important cues for the 3D scene structure. Thus, a novel strategy is proposed to guide the depth propagation in such challenging surfaces under a PatchMatch-based scenario using RANSAC-based plane hypotheses in the 3D space. Then, a novel, adaptive cost function is introduced to leverage the prior hypotheses with the standard photometric cost and adaptively promote more reliable depth estimates across the image. During the experimental evaluation on the \textit{ETH3D} benchmark as well as on custom scenes, the proposed algorithm achieved constantly better results than the baseline method in point cloud completeness while not sacrificing accuracy. Given the growing availability of semantically segmented data, this approach can be implemented in a variety of scenarios, indoor and outdoor. However, in real-world applications, it is not always trivial to obtain such semantic cues for every scene; a large amount of additional GT data may be required, and model training or fine-tuning is often a laborious task. Thus, an alternative, generic and domain-independent solution is also proposed, guided only by local structure and textureness cues. Based on quadtree decomposition on the image, groups of pixels with similar color attributes are grouped together. Similar to the previous method, planar hypotheses are extracted in 3D and guided by the quadtree blocks. The adaptive cost function is also used here to support PatchMatch depth propagation. Results on the entire training and test set of the ETH3D dataset demonstrate the effectiveness of the proposed approach and show a clear improvement in performance scores with respect to the baseline method while being competitive with other state-of-the-art algorithms. To further prove the applicability of the new method under varying scenarios, two more custom datasets were considered, on which similar improvements were achieved. The proposed methodologies are integrated into the well-established, open-source framework OpenMVS to promote usability and reproducibility. | en |
heal.advisorName | Γεωργόπουλος, Ανδρέας | el |
heal.advisorName | Georgopoulos, Andreas | en |
heal.committeeMemberName | Georgopoulos, Andreas | en |
heal.committeeMemberName | Remondino, Fabio | en |
heal.committeeMemberName | Karantzalos, Konstantinos | en |
heal.committeeMemberName | Ioannidis, Charalambos | en |
heal.committeeMemberName | Doulamis, Anastasios | en |
heal.committeeMemberName | Fusiello, Andrea | en |
heal.committeeMemberName | Pateraki, Maria | en |
heal.academicPublisher | Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Αγρονόμων και Τοπογράφων Μηχανικών. Τομέας Τοπογραφίας. Εργαστήριο Φωτογραμμετρίας | el |
heal.academicPublisherID | ntua | |
heal.numberOfPages | 234 σ. | el |
heal.fullTextAvailability | false |
Οι παρακάτω άδειες σχετίζονται με αυτό το τεκμήριο: