Total fuel oil consumption minimization using reinforcement learning

Mesolongitis, Konstantinos; Μεσολογγίτης, Κωνσταντίνος

dc.contributor.author	Mesolongitis, Konstantinos	en
dc.contributor.author	Μεσολογγίτης, Κωνσταντίνος	el
dc.date.accessioned	2023-05-31T10:53:30Z
dc.date.available	2023-05-31T10:53:30Z
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/57782
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.25479
dc.rights	Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα	*
dc.rights	Αναφορά Δημιουργού-Όχι Παράγωγα Έργα 3.0 Ελλάδα	*
dc.rights.uri	http://creativecommons.org/licenses/by-nd/3.0/gr/	*
dc.subject	Machine Learning	en
dc.subject	Reinforcement Learning	en
dc.subject	Fuel Oil Consumption Minimization	en
dc.title	Total fuel oil consumption minimization using reinforcement learning	en
heal.type	bachelorThesis
heal.classification	Machine Learning	en
heal.classification	Reinforcement Learning	en
heal.language	en
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2022-09-01
heal.abstract	The purpose of this research was to discover if Reinforcement Learning (RL) could produce remarkable results in assisting an autonomous ship to minimize the Total Fuel Oil Consumption (TFOC). Furthermore, if the first goal was to be achieved, a second equally important objective was the discovery of the most suitable RL agent for this initiative. In this particular case study, the environment consisted of the sea area - which can be explored by the ship - the existing boundaries, - such as islands, rocky islets and land where the access is forbidden - and the weather conditions. The first part was related to the environment construction. A grid-world environment was selected and the testing route represented a short journey between two ports in the area of the Faroe Islands: Torshavn and Krambatangi. There was no particular reason for choosing this exact course, other than the fact that this was the first constructed environment, and since it was capable of acting as proof of concept for the needs of the project, it was decided to be the final environment. The boundaries were transformed according to a computationally cost effective technique that was developed for the purpose of this task. In order to achieve this, some of the accuracy, as regarding the boundaries position, was sacrificed. This decision was made taking into consideration the massive amount of computations being executed during a Reinforcement Learning experiment. The aforementioned choice did not really substantially affect the problem at hand at all. Since permitting an actual ship to explore its environment for the sake of learning is an unnecessary and costly enterprise (ship rental, fuel costs, crew related costs, ...), relevant simulations are implemented instead. A reliable fuel consumption process was mandatory for the success of this project. To achieve that, data were obtained from a shipping company, which were used to train an Artificial Neural Network (ANN). A Long Short Term Memory (LSTM) ANN was implemented as it has been proven to be superior at tracking a time-series outcome. Especially when the previous observations are not independent form the current ones. The following features were used as inputs; Speed overground, Significant Waves Height, Draught AFT, Draught FWD and Distance Overground. A weather approximating function was designed in order to be applied into the environment. A storm center that interfered with the minimum distance route was chosen, in order to examine if the agent was capable of learning to choose the longer but cheaper route - the one where the minimum amount of fuel was consumed. Since the environment’s state space was really large, it appeared like a Deep Q-Network (DQN) agent would be the best option for this project. A value-approximating agent was required, in order to predict the values of states that had not been experienced before. However, the limits of the Q-learning agent were put to the test and the improved and more sophisticated rainbow agent was employed in order to detect the best available option. The results revealed the dominance of the rainbow agent. Q-learning agent, as expected, was limited from the state’s space size. While, on the other hand, the DQN agent was extremely unstable and sensitive to hyperparameters tuning. This application, also demonstrates the feasibility of minimizing the Total Fuel Oil Consumption (TFOC) using Reinforcement Learning (RL).	en
heal.advisorName	Papalambrou, George	en
heal.committeeMemberName	Papadopoulos, Christos	en
heal.committeeMemberName	Themelis, Nikolaos	en
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ναυπηγών Μηχανολόγων Μηχανικών. Τομέας Ναυτικής Μηχανολογίας	el
heal.academicPublisherID	ntua
heal.numberOfPages	75 σ.	el
heal.fullTextAvailability	false