HEAL DSpace

Focused crawling ethnopharmacological references with active and reinforcement learning

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Κοντογιάννης, Ανδρέας el
dc.date.accessioned 2021-03-26T07:11:45Z
dc.date.available 2021-03-26T07:11:45Z
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/53128
dc.identifier.uri http://dx.doi.org/10.26240/heal.ntua.20826
dc.rights Default License
dc.subject Εστιασμένη διαδικτυακή ανίχνευση el
dc.subject Ενισχυτική μάθηση el
dc.subject Ενεργός μάθηση el
dc.subject Βαθιά μάθηση el
dc.subject Επιλογή καλύτερου μοντέλου el
dc.subject Focused crawling en
dc.subject Reinforcement learning en
dc.subject Active learning en
dc.subject Deep learning en
dc.subject Model selection en
dc.title Focused crawling ethnopharmacological references with active and reinforcement learning en
heal.type bachelorThesis
heal.classification Ευφυή Συστήματα el
heal.language el
heal.access free
heal.recordProvider ntua el
heal.publicationDate 2021-03-12
heal.abstract Ethnopharmacology is the scientific study of ethnic groups and their use of herbal medicines. It - being a particular field of traditional medicine - is now widely considered as a promising alternative medicine for complementary treatment of the well known western world. However, the search and documentation of indigenous knowledge on the use of specific plant properties by the experts themselves is a very challenging task, taking into account the volume of information shared through ethnopharmacological literature. Scientific research requires anyone to be able to efficiently search for relevant documents related to their subjects. These kinds of challenges can be faced as Internet focused search problems. To support experts, we propose the use of intelligent focused search systems, known as focused crawlers. Typically, such a system receives a few initial seed documents/URLs and optionally some keywords as input, all of which are relevant to a predefined search topic. The goal of a focused crawler is to discover and output as many relevant webpages as possible. In the present thesis, we develop intelligent focused crawler systems, so that they become supportive tools for the ethnopharmacological research. We propose a two-stage Machine Learning focused crawler that follows a Researcher-Apprentice paradigm. In the first stage, we recommend the use of Active Learning (AL); the system is trained to identify the relevant documents by receiving feedback from the researcher, when that is needed. In the second stage, we propose the use of Reinforcement Learning (RL), regarding the focused crawler as an intelligent agent. The agent estimates how profitable would be to follow the available URLs, in the long term, and selects the most promising ones. In the RL framework, we model the focused crawler environment as a Markov Decision Process (MDP), considering shared representations between the states and the actions of the agent. The representation features consist of the publication title word embeddings, statistical features extracted from the link structure, keywords and/or relevance predictions of the pretrained models from the first stage. Additionally, we consider cases where the AL model, trained in the first stage, is used as the reward function. We evaluate two different search problems; one general, based on initial seed documents and one more specific, based on initial seed documents along with keywords. We compare 6 different AL models, such as the MarginSVM and the DoubleLSTM, 3 different state-action shared representations (General/Keyword/Only NLP Representation) and 2 RL agents; the Deep Q-Network (DQN) and the Double DQN (DDQN). The two-stage focused crawler with the use of DQN, as well as DDQN, agent is more effective than baseline methods, such as random crawling and a greedy deterministic focused crawler we defined. Finally, comparing our method on the more specific setting to an estimated real-time researcher performance, we outperform 5.14 times the efficiency and 3.31 times the effectiveness of the expert. en
heal.advisorName Ποταμιάνος, Αλέξανδρος el
heal.committeeMemberName Τσανάκας, Παναγιώτης el
heal.committeeMemberName Ρουσσάκη, Ιωάννα el
heal.committeeMemberName Ποταμιάνος, Αλέξανδρος el
heal.academicPublisher Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών el
heal.academicPublisherID ntua
heal.numberOfPages 144 σ. el
heal.fullTextAvailability false


Αρχεία σε αυτό το τεκμήριο

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής