Detecting secrets in source code: using machine learning to reduce false positives

Update Item Information
Title Detecting secrets in source code: using machine learning to reduce false positives
Publication Type thesis
School or College College of Engineering
Department Computing
Author Saha, Aakanksha
Date 2019
Description Private and public git repositories often contain sensitive information in the source code. The sensitive information includes Application Programming Interface (API) keys, asymmetric private keys, passwords, and connection strings. It is possible to find this information by scanning the source code and configuration files using pattern matching. However, scanning of repositories often results in a large number of false positives, that is, entries wrongly identified as a secret. Our research aims to reduce the number of false positives by combining the regular expression based approach with machine learning. The regular expression based approach helps in identifying different types of potential secrets such as RSA private keys (private keys based on RSA algorithm), API keys, client secrets, and generic passwords hard-coded in the source code, whereas machine learning intelligently distinguishes a real secret from a false positive. The combination of these two techniques allows for the identification of secrets and subsequent reduction of possible false positives. We apply a trained voting classifier, an ensemble of major supervised machine learning algorithms, on a dataset consisting of approximately two thousand test examples. With this combination, we achieve a precision of 84% and a recall of 89%. We also evaluate our machine learning model using a precision-recall curve that can be used by an operator to find the optimal trade-off between precision and recall. Our results show a significant motivation to use machine learning in increasing the efficiency and efficacy of a secret detector tool.
Type Text
Publisher University of Utah
Dissertation Name Master of Science
Language eng
Rights Management (c) Aakanksha Saha
Format application/pdf
Format Medium application/pdf
ARK ark:/87278/s6gb859f
Setname ir_etd
ID 1714238
Reference URL https://collections.lib.utah.edu/ark:/87278/s6gb859f