Forecasting of resource utilization in large cloud computing systems is essential to maintain high-quality service and fulfill service level agreements to ensure cost-effectiveness and profitability. Accurately forecasting long-term resource utilization can be challenging, as it is dynamic and fluctuates at short time intervals. A forecasting model was developed using multivariate time series k-nearest neighbor (MTS k-NN) regression to forecast optimal resource utilization for CPU and memory, with additional explanatory variables (exogenous) that contribute to resource utilization. The forecasting effectiveness and accuracy of the MTS k-NN model were evaluated using real-world (Google trace) data. The model was compared with two classical statistical time-series forecasting models, autoregressive integrated moving average (ARIMA), and autoregressive integrated moving average with exogenous (ARIMAX) variables, and six other machine learning algorithms: support vector machine (SVM), artificial neural network (ANN), multivariate adaptive regression splines (MARS), classification and regression trees (CART), random forest (RF), and naïve Bayes (NB). Results indicate that CPU and memory are inextricably linked with exogenous variables, and the addition of exogenous variables with time-series is a significant factor in forecasting resource utilization that should not be overlooked and demonstrates the accuracy of the approach. The MTS k-NN model can potentially be applied to real-world scenarios for forecasting optimal resource utilization to facilitate efficient management of cloud computing systems to overcome resource utilization challenges, especially capacity planning and performance management, and maintain QoS and satisfy SLAs.
|In Administrative Set: