PetroBERT: A Domain Adaptation Language Model for Oil and Gas Applications in Portuguese

Actas de congresos

Fecha

2022-01-01

Registro en:

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), v. 13208 LNAI, p. 101-109.

1611-3349

0302-9743

http://hdl.handle.net/11449/234320

10.1007/978-3-030-98305-5_10

2-s2.0-85127159496

https://repositorioslatinoamericanos.uchile.cl/handle/2250/5414421

Autor

Universidade Estadual Paulista (UNESP)

Petróleo Brasileiro S.A. - Petrobras

Centro de Pesquisas da Petróleo Brasileiro S.A. - CENPES/Petrobras

Institución

Universidade Estadual Paulista (Brasil)

Resumen

This work proposes the PetroBERT, which is a BERT-based model adapted to the oil and gas exploration domain in Portuguese. PetroBERT was pre-trained using the Petrolês corpus and a private daily drilling report corpus over BERT multilingual and BERTimbau. The proposed model was evaluated in the NER and sentence classification tasks and achieved interesting results, which shows its potential for such a domain. To the best of our knowledge, this is the first BERT-based model to the oil and gas context.

Materias

BERT

Domain adaption

Oil and gas

Mostrar el registro completo del ítem