Banca de DEFESA: JÚLIO CÉSAR MENDES DE RESENDE

Uma banca de DEFESA de MESTRADO foi cadastrada pelo programa.
STUDENT : JÚLIO CÉSAR MENDES DE RESENDE
DATE: 26/10/2022
TIME: 14:00
LOCAL: Videoconferência
TITLE:

Deep Reinforcement Learning: Combining Techniques to Improve the FQF Algorithm


KEY WORDS:
Reinforcement Learning, Deep Learning, Distributional Reinforcement Learning.

PAGES: 92
BIG AREA: Ciências Exatas e da Terra
AREA: Ciência da Computação
SUMMARY:

Reinforcement learning algorithms allow agents to learn from experience, without the need for prior knowledge. For this reason, they have been widely used and the use of low and medium complexity digital games as benchmark environments has become a common practice. In 2013, a new algorithm, called DQN (Deep Q Network), caused a great impact in the academic environment by obtaining human-level results in several Atari 2600 games, using artificial neural networks. Consequently, new lines of research emerged and new derived algorithms were proposed. Among these, the FQF (Fully Parameterized Quantile Function) stands out, an algorithm that has become the state of the art among the non-distributional algorithms in the Atari 2600 domain. However, the FQF has not yet achieved results obtained by a human expert in all evaluated games. Considering the ability of artificial intelligence to detect patterns imperceptible by humans, this led us to believe that better results than the current ones could still be obtained. Therefore, in this work, a search for related works was carried out and three improvements that brought success in algorithms proposed before the FQF were chosen to be combined and evaluated together with the FQF, thus seeking to improve the algorithm. The improvements applied to the FQF are: the use of three steps in temporal difference, the application of the Munchausen approach and the use of prioritized experience replay. The combination of the three improvements made it possible to analyze eight algorithms, which were evaluated in five MinAtar games. According to the analyzed metrics, the version of the FQF that makes use of the three improvements was better than the original FQF in all experiments carried out, thus making a more promising version of the algorithm available to the scientific community.


BANKING MEMBERS:
Interna - 1777390 - CAROLINA RIBEIRO XAVIER
Externo à Instituição - DENIS FERNANDO WOLF - USP
Presidente - 1985872 - EDIMILSON BATISTA DOS SANTOS
Externo ao Programa - 1652537 - MARCOS ANTONIO DE MATOS LAIA
Notícia cadastrada em: 05/10/2022 13:50
SIGAA | NTInf - Núcleo de Tecnologia da Informação - | Copyright © 2006-2024 - UFSJ - sigaa05.ufsj.edu.br.sigaa05