Dissertação
Arterial: um modelo inteligente para a prevenção ao vazamento de informações de prontuários eletrônicos utilizando processamento de linguagem natural
Fecha
2021-12-21Autor
Goldschmidt, Guilherme
Resumen
Over the past decade, there has been a steady increase in healthcare security breaches. A study on patient privacy and data security showed that 94% of hospitals had at least one security breach in the past two years. In most cases, the attacks originated from internal actors. Therefore, it is essential that healthcare organizations protect their sensitive information such as test results, diagnoses, prescriptions, surveys, and personal customer information. A leak of sensitive data can result in a great economic loss and/or damage to the organization’s image. There is also in Brazil the General Law for the Protection of Personal Data (LGPD), which provides for various aspects of the personal protection of information. Information protection systems have been taking shape over the last few years, such as firewalls, intrusion detection and prevention systems (IDS/IPS) and virtual private networks (VPN). However, these technologies work very well on well-defined, structured and constant data, unlike medical records that have free writing fields. Complementing these technologies are Data Leakage Prevention Systems (DLPS). DLP systems help to identify, monitor, protect and reduce the risk of leaking sensitive data. However, conventional DLP solutions use only subscription comparisons and/or static comparisons. Thus, we propose to develop a model based on new technologies such as Natural Language Processing (NLP), Entity Recognition (NER) and Artificial Neural Networks (ANN) to be more assertive in extracting information and recognizing entities. Thus contributing with new perspectives to literature and therefore to the scientific community. Three approaches were implemented and tested, two based on ANN and the next based on machine learning algorithms. As a result, the approach that took in its implementation the use of machine learning algorithm reached 98.0% of Accuracy, 86.0% of Recall and 91.0% of F1-Score. Keywords: Electronic Health Record