Ανίχνευση γονιδίων-δεικτών για την απόκριση του ανθρώπινου οργανισμού στην αέρια βιομηχανική ρύπανση με τη χρήση μικροσυστοιχιών

Δελαστίκ, Έκτωρ - Ξαβιέ; de Lastic, Hector - Xavier; de Lastic, Hector - Xavier

dc.contributor.author	Δελαστίκ, Έκτωρ - Ξαβιέ	el
dc.contributor.author	de Lastic, Hector - Xavier	en
dc.contributor.author	de Lastic, Hector - Xavier	fr
dc.date.accessioned	2019-04-01T08:31:53Z
dc.date.available	2019-04-01T08:31:53Z
dc.date.issued	2019-04-01
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/48554
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.16519
dc.rights	Αναφορά Δημιουργού-Μη Εμπορική Χρήση 3.0 Ελλάδα	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc/3.0/gr/	*
dc.subject	Βιοπληροφορική	el
dc.subject	Ρύπανση	el
dc.subject	Μικροσυστοιχίες	el
dc.subject	Γονιδιακή απόκριση	el
dc.subject	Bioinformatics	en
dc.subject	Pollution	en
dc.subject	Gene response	en
dc.subject	Microarrays	en
dc.title	Ανίχνευση γονιδίων-δεικτών για την απόκριση του ανθρώπινου οργανισμού στην αέρια βιομηχανική ρύπανση με τη χρήση μικροσυστοιχιών	el
dc.title	Identification of gene markers for the response of a human organism to airborne industrial pollution with the use of microarray technology	en
heal.type	bachelorThesis
heal.classification	Bioinformatics	el
heal.language	el
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2016-10-20
heal.abstract	Ο σκοπός της εργασίας αυτής είναι πρωτίστως η διερεύνηση της πιθανότητας ύπαρξης γονιδίων που εκφράζονται διαφορικώς παρουσία χρόνιας περιβαλλοντικής ρύπανσης βιομηχανικού τύπου στην ατμόσφαιρα, στο πόσιμο νερό, στην τροφή κλπ, με τη βοήθεια δεδομένων από μικροσυστοιχίες. Για τη μελέτη αυτή επελέχθη το, δημοσιευμένο στη δημόσια βάση δεδομένων GEO, data set με κωδικό GSE60767. Πρόκειται για μια μελέτη που διενεργήθη στα πλαίσια του Τσέχικου Ινστιτούτου Institute of Experimental Medicine AS CR, στα πλαίσια σειράς μελετών επί των επιδράσεων της ατμοσφαιρικής ρύπανσης. Το συγκεκριμένο data set περιέχει δεδομένα έκφρασης λευκοκυττάρων από μικροσυστοιχίες της Illumina. Οι λόγοι επιλογής αυτού του data set: • Αποτελεί μελέτη σύγκρισης μεταξύ των πόλεων Πράγα και Οστράβα, με την τελευταία να θεωρείται μία από τις πιο βαριά ρυπασμένες περιοχές εντός της Ευρωπαϊκής Ένωσης. • Αποτελείται από αξισημείωτα υψηλό αριθμό δειγμάτων (468 δείγματα σε 6 δειγματοληπτικές περιόδους), πράγμα που θα ενισχύσει τη στατιστική σημαντικότητα των αποτελεσμάτων. • Πρόκειται για έναν αρκετά ομοιογενή πληθυσμό από πλευράς διατροφικών συνηθειών και συνθηκών ζωής, περιορίζοντας τον αριθμό μεταβλητών που θα μπορούσαν να μειώσουν την καθαρότητα των αποτελεσμάτων. • Το εργασιακό περιβάλλον των δοτών εξασφαλίζει πως υφίστανται το γενικό προφίλ ρύπανσης που υφίσταται το μεγαλύτερο μέρος του πληθυσμού και η ομοιογένεια του συνόλου των δοτών σε φύλο, ηλικία, εργασιακό περιβάλλον μειώνει περαιτέρω τον αριθμό των μεταβλητών που θα περιέπλεκαν την ανάλυση (περισσότερες λεπτομέρειες στην περιγραφή της διαδικασίας δειγματοληψιών). Η ανάλυση των δεδομένων και η εξαγωγή διαφορικώς εκφρασμένων γονιδίων έγινε από την αρχή στο περιβάλλον της γλώσσας προγραμματισμού R v3.3.1 και πακέτα ανοικτού κώδικα του Bioconductor, πράγμα που διευκολύνει την εποπτεία και τον έλεγχο της ανάλυσης. Τα δεδομένα του GSE60767 κατ’ αρχήν ανακτήθηκαν από το δημόσιο εναποθετήριο GEOμε τη χρήση του πακέτου της R «GEOquery» και ύστερα αναλύθηκαν διεξοδικά με χρήση των πακέτων «limma» και «SVA». Μέσω αυτής της διαδικασίας αναγνωρίστηκαν γονίδια που εκφράζονται διαφορικά μεταξύ των δύο πόλεων, με το βιολογικό σήμα που μετράται όμως να είναι αναγκαστικώς εξασθενημένο από τη διαδικασία διόρθωσης για batch effects, λόγω λάθους που είχε σημειωθεί κατά την πειραματική διαδικασία, όπως θα αναλυθεί και στο τρίτο κεφάλαιο. Μετά την εξαγωγή λιστών στατιστικώς σημαντικά διαφορικώς εκφρασμένων γονιδίων, χρησιμοποιήθηκε η πλατφόρμα BioInfoMiner της e-NiOS ώστε να διερευνηθούν οι ρόλοι τους στις οντολογίες Gene Ontology, Human Phenotype Ontology, MGI Mammalian Phenotype Ontology, Reactome Pathways Ontology, ώστε με τη σύγκριση αυτών να αναγνωριστούν, αφ’ ενός οι συνδέσεις των αποτελεσμάτων της ανάλυσης με επιδημιολογικά δεδομένα και αφ’ ετέρου να αναγνωριστούν πιθανά γονίδια-κόμβοι (hub genes), με κεντρικό ρόλο σε αρκετές βιολογικές διεργασίες, τα οποία έχουν συχνά αναβαθμισμένη βιολογική σημασία. Τα αποτελέσματα παρουσιάζονται διεξοδικά στο τέταρτο κεφάλαιο. Ύστερα ελέγχθη αν κάποιο υποσύνολο των hub genes που εντοπίστηκαν θα μπορούσε να χρησιμεύσει ως δείκτης συστημικής απόκρισης του οργανισμού σε ρυπασμένο περιβάλλον και την αποτελεσματικότητα αυτής της απόκρισης ή προσαρμογής. Λόγω των λαθών στην πειραματική μέθοδο και το αδύναμο βιολογικό σήμα που αυτές επάγουν, η επιτυχία σε αυτό το μέρος της έρευνας ήταν περιορισμένη και ήταν κυρίως στατιστικού χαρακτήρα. Παρ’ όλα αυτά όμως, μπόρεσε να αναγνωριστεί ένα βέλτιστο δίκτυο hub genes το οποίο αναδεικνύει τη διαφορά μεταξύ των προφίλ των δύο πόλεων σε βαθμό χαμηλότερο μεν, της ίδιας τάξης μεγέθους δε, με τη διακριτική ικανότητα που μπορούμε να επιτύχουμε χρησιμοποιώντας τα διαφορικώς εκφρασμένα γονίδια μεταξύ των δύο πόλεων. Τέλος, οι λίστες γονιδίων συγκρίθηκαν με την υπάρχουσα βιβλιογραφία μέσω της Συγκριτικής Τοξικογενομικής Βάσης Δεδομένων (Comparative Toxicogenomics Database, CTD, http://ctdbase.org/) ώστε να αναδειχθούν αλληλοεπικαλύψεις μεταξύ των αποτελεσμάτων της έρευνάς μας και ήδη στοιχειοθετημένων συνδέσεων μεταξύ της διαφορικής έκφρασης γονιδίων και των αιωρούμενων μικροσωματιδίων, καθώς και του B(a)P.	el
heal.abstract	The primary aim of this study is to investigate the possibility of genes which are differentially expressed in the presence of prolonged exposure to environmental pollution of an industrial kind (in the air, water, food etc.), using microarray data. This study will focus on data set published in the open-access genomics database GEO, bearing the identifier GSE60767. It is a study carried out within the Czech Institute of Experimental Medicine AS CR, as part of a series of studies concerning the effects of environmental pollution. This specific data set contains expression data from leukocytes obtained through the use of Illumina microarrays. The reasons for choosing it were: • Its being a study between the cities of Prague and Ostrava, the latter of which is considered to be one of the most heavily polluted areas within the European Union. • It contains an uncommonly high number of samples (468 samples obtained in three discrete sample periods), a feature that will enhance the statistical significance of our findings. • It refers to a relatively uniform population, in terms of dietary standards and living conditions, limiting the number of probable confounding variables. • The donors’ work environment guarantees their exposure to the pollution profile the general population is exposed to. At the same time, the donors’ uniformity in terms of gender, age and line of employment further reduces the confounding variables (a more in-depth description will follow where the sampling procedure is explained). Analysis and extraction of differentially expressed genes were carried out wholly using the R v3.3.1 programming language and open-source packages from Bioconductor, improving not only the ease, but the verifiability of analysis as well. The GSE 60767 data were firstly retrieved from the open-access GEO repository using the R package “GEOquery” and were subjected to analysis through the use of the software packages “limma” and “SVA”. Through this procedure, genes which were differentially expressed between the two cities were identified, the biological signal of which was inescapably weak due to a mistake in the experimental process and the batch effect correction procedure used, as will be explained in greater detail in chapter 3. Following the extraction of statistically significant differentially expressed genes, the BioInfoMiner platform created by e-NiOS was used to explore their roles in Gene Ontology, Human Phenotype Ontology, MGI Mammalian Phenotype Ontology and Reactome Pathways Ontology, so that through the comparison of the results, connections between the results and existing epidemiological data could be identified, as well as genes playing a central role in more than one biological processes (hub genes), which often have an elevated biological importance. Then, we explored the possibility of a subset of using a subset of the hub genes as a marker for the systemic response to a polluted environment. Due to the limitations introduced in the study and the ensuing weak biological signal, success in this endeavor was limited and of a statistical nature. Nevertheless, we managed to identify an optimal subset of hub genes that brings forth the difference between the expression profiles of the two cities, at a rate lower than, yet in the same order of magnitude with, the resolution attained by straightforwardly using the genes differentially expressed between these two cities. Lastly, the gene lists were compared to existing literature through the use of the Comparative Toxicogenomics Database (CTD, http://ctdbase.org/) to illustrate overlaps between our results and already-established linkages between differential gene expression and particulate matter and Benzo(a)pyrene (“B(a)P”) pollution.	en
heal.advisorName	Γεωργακίλας, Αλέξανδρος	el
heal.committeeMemberName	Γεωργακίλας, Αλέξανδρος	el
heal.committeeMemberName	Χατζηϊωάννου, Αχιλλέας	el
heal.committeeMemberName	Παπαγιάννης, Αλέξανδρος	el
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Εφαρμοσμένων Μαθηματικών και Φυσικών Επιστημών. Τομέας Φυσικής	el
heal.academicPublisherID	ntua
heal.numberOfPages	222 σ.
heal.fullTextAvailability	true