Using Machine Learning Algorithms to Predict Technology-Based Small Business Commercialization Open Access
Downloadable ContentDownload PDF
The Small Business Innovation Research (SBIR) program was established in 1982 in response to a decline in United States (U.S.) research and development (R&D;) productivity. The SBIR program aims to provide small businesses an avenue to commercialize products and services. The SBIR program is structured into three phases: Phase I – 6-month Concept Development phase; Phase II – 2-year Prototype Development phase, and Phase III – Commercialization phase where small businesses pursue commercialization objects with non-SBIR program funding. The purpose of the SBIR program is to provide small businesses with funding assistance for innovative ideas with commercialization potential. Prior research employs only logistic regression models and investigates the success factors that contribute to SBIR Phase III success using: NIH SBIR data, Navy SBIR data, and National Aeronautics and Space Administration (NASA) SBIR data. This praxis uses data from USASPENDING.gov to build upon and extend previous studies to identify characteristics of small businesses with Phase III success. Six machine learning algorithms: Random Forest, J48, Random Tree, Reduced Error Pruning Tree, Logistic Model Tree, and Logistic Regression are explored. Prior studies informed small businesses of their likelihood ratios, however, the Random Forest models provide information about companies that have shown a propensity for Phase III success. Partial dependence plots are also created to show the effect of small business features on Phase III success. Decision tree and logistic regression models are compared against each other to evaluate their performance in accurately predicting SBIR Phase III success. A Small Business Commercialization prediction model is developed using Department of Defense, NASA, and Department of Homeland Security SBIR program data. Results indicate the Random Forest algorithm outperforms all other algorithms and can predict SBIR Phase III success with 93.4% accuracy for Small Business Commercialization. Predictive models identify the funding requesting agency (within the major agency) associated with small business Phase III success. It is determined, Phase III success can be accurately predicted using small business characteristics, major agency, and funding requesting agency data. Most to least correlated variables to SBIR Phase III success are: Annual Revenue, Number of Employees, Women Owned Small Business, Minority Owned Small Business, Region, Major Agency, and Funding Requesting Agency. These variables indicate, in descending order, the features that matter most for Phase III success. Further analysis reveals Annual Revenue and Number of Employees are the most important features for Phase III success. With the results of these models, small business managers can use their small business characteristics to select a SBIR funding requesting agency most favorable for Phase III success.