Actas de congresos
PetroBERT: A Domain Adaptation Language Model for Oil and Gas Applications in Portuguese
Fecha
2022-01-01Registro en:
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), v. 13208 LNAI, p. 101-109.
1611-3349
0302-9743
10.1007/978-3-030-98305-5_10
2-s2.0-85127159496
Autor
Universidade Estadual Paulista (UNESP)
Petróleo Brasileiro S.A. - Petrobras
Centro de Pesquisas da Petróleo Brasileiro S.A. - CENPES/Petrobras
Institución
Resumen
This work proposes the PetroBERT, which is a BERT-based model adapted to the oil and gas exploration domain in Portuguese. PetroBERT was pre-trained using the Petrolês corpus and a private daily drilling report corpus over BERT multilingual and BERTimbau. The proposed model was evaluated in the NER and sentence classification tasks and achieved interesting results, which shows its potential for such a domain. To the best of our knowledge, this is the first BERT-based model to the oil and gas context.