A Multivariate Bayesian Approach to Modeling Vulnerability Discovery in the Software Security Lifecycle Open Access
Downloadable ContentDownload PDF
Software vulnerabilities that enable well-known exploit techniques for committing computer crimes are preventable, but they continue to be present in releases. When Blackhats (i.e., malicious researchers) discover these vulnerabilities they oftentimes release corresponding exploit software and malware. If vulnerabilities—or discoveries of them—are not prevented, mitigated, or addressed, customer confidence could be reduced. In addressing the issue, software-makers must choose which mitigation alternatives will provide maximal impact and use vulnerability discovery modeling (VDM) techniques to support their decision-making process. In the literature, applications of these techniques have used traditional approaches to analysis and, despite the dearth of data, have not included information from experts and do not include influential variables describing the software release (SR) (e.g., code size and complexity characteristics) and security assessment profile (SAP) (e.g., security team size or skill). Consequently, they have been limited to modeling discoveries over time for SR and SAP scenarios of unique products, whose results are not readily comparable without making assumptions that equate all SR and SAP combinations under study. This research takes an alternative approach, applying Bayesian methods to modeling the vulnerability-discovery phenomenon. Relevant data were obtained from expert judgment (i.e., information elicited from security experts in structured workshops) and from public databases. The open-source framework, MCMCBayes, was developed to perform Bayesian model averaging (BMA). It combines predictions of interval-grouped discoveries by performance-weighting results from six variants of the non-homogeneous Poisson process, two regression models, and two growth-curve models. Utilizing expert judgment also enables forecasting expected discoveries over time for arbitrary SR and SAP combinations, thus helping software-makers to better understand the effects of influential variables they control on the phenomenon. This requires defining variables that describe arbitrary SR and SAP combinations as well as constructing VDM extensions that parametrically scale results from a defined baseline SR and SAP to the arbitrary SR and SAP of interest. Scaling parameters were estimated using elicited multivariate data gathered with a novel paired comparison approach. MCMCBayes uses the multivariate data with the BMA model for the baseline to perform predictions for desired SR and SAP combinations and to demonstrate how multivariate VDM techniques could be used. The research is applicable to software-makers and persons interested in applications of expert-judgment elicitation or those using Bayesian analysis techniques with phenomena having non-decreasing counts over time.