Predictive Analysis Applied to Milk Cooling Using Regression Models and a Synthetic Dataset
Main Article Content
Abstract
Predictive analysis plays a crucial role in optimizing agro-industrial processes, such as milk cooling, which is essential for maintaining its quality. This study investigates the application of multiple regression models to predict critical variables in the milk cooling process, using a synthetic dataset with 10,000 samples. The dataset was structured to reflect key parameters like milk volume and initial temperature, inspired by information from a reference technical document on the numerical simulation of milk cooling \cite{rezende2021numerical}. Twenty regression algorithms were trained and evaluated to predict the actual cooling time, heat flux, and simulated cooling time. The results demonstrate that decision tree-based models (e.g., Gradient Boosting, LightGBM, Random Forest) achieved high accuracy ($R^2 > 0.99$) in predicting cooling times and good performance ($R^2 > 0.97$) for heat flux. This study highlights the utility of synthetic datasets for the development and evaluation of predictive models in a controlled environment, providing valuable insights for understanding and potentially optimizing the milk cooling process.
Article Details
Upon receipt of accepted manuscripts, authors will be invited to complete a copyright license to publish the paper. At least the corresponding author must send the copyright form signed for publication. It is a condition of publication that authors grant an exclusive licence to the the INFOCOMP Journal of Computer Science. This ensures that requests from third parties to reproduce articles are handled efficiently and consistently and will also allow the article to be as widely disseminated as possible. In assigning the copyright license, authors may use their own material in other publications and ensure that the INFOCOMP Journal of Computer Science is acknowledged as the original publication place.
References
bibitem{rezende2021numerical}
REZENDE, R. P.; ANDRADE, E. T. de; CORREA, J. L. G.; MAGALHÃES, R. R. Numerical simulation applied to milk cooling. textit{Revista Engenharia na Agricultura}, Viçosa, MG, v. 29, p. 122-128, Jul. 2021. ISSN 2175-6813. DOI: 10.13083/reveng.v29i1.9527. Available at: url{https://www.reveng.ufv.br/reveng/article/view/9527}. Accessed on: [Access Date].
bibitem{shmueli2010}
Shmueli, G. (2010). To explain or to predict?. textit{Statistical Science}, 25(3), 289-310.
bibitem{jordon2022}
Jordon, J., Yoon, J., & van der Schaar, M. (2022). Synthetic data: A conceptual and practical guide. textit{Patterns}, 3(6), 100516.
bibitem{hastie2009}
Hastie, T., Tibshirani, R., & Friedman, J. (2009). textit{The Elements of Statistical Learning: Data Mining, Inference, and Prediction}. Springer Science & Business Media.
bibitem{shmueli2010to}
SHMUELI, G. (2010). To explain or to predict?. textit{Statistical Science}, 25(3), p. 289-310.
bibitem{jordon2022synthetic}
JORDON, J.; YOON, J.; VAN DER SCHAAR, M. (2022). Synthetic data: A conceptual and practical guide. textit{Patterns}, 3(6), 100516.
bibitem{hastie2009elements}
HASTIE, T.; TIBSHIRANI, R.; FRIEDMAN, J. (2009). textit{The Elements of Statistical Learning: Data Mining, Inference, and Prediction}. 2nd ed. New York: Springer Science & Business Media. (Springer Series in Statistics).
bibitem{zhao2014temperature}
ZHAO, B. (2014). Temperature-coupled field analysis of LPG tank under fire based on wavelet finite element method. textit{Journal of Thermal Analysis and Calorimetry}, 117(1), p. 413-422.
bibitem{nimdum2015experimental}
NIMDUM, P.; PATAMAPROHM, B.; RENARD, J.; VILLALONGA, S. (2015). Experimental method and numerical simulation demonstrate non-linear axial behaviour in composite filament wound pressure vessel due to thermal expansion effect. textit{International Journal of Hydrogen Energy}, 40(39), p. 13231-13241.
% New references added for the Methodology section
bibitem{scikitlearn}
PEDREGOSA, F. et al. (2011). Scikit-learn: Machine Learning in Python. textit{Journal of Machine Learning Research}, 12, p. 2825-2830.
bibitem{chen2016xgboost}
CHEN, T.; GUESTRIN, C. (2016). XGBoost: A Scalable Tree Boosting System. In: textit{Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining}, San Francisco, CA, USA, p. 785-794.
bibitem{ke2017lightgbm}
KE, G. et al. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In: textit{Advances in Neural Information Processing Systems 30 (NIPS 2017)}, Long Beach, CA, USA, p. 3146-3154.
bibitem{altman1992introduction}
ALTMAN, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. textit{The American Statistician}, 46(3), p. 175-185.
bibitem{breiman2001random}
BREIMAN, L. (2001). Random forests. textit{Machine Learning}, 45(1), p. 5-32.
bibitem{freund1997decision}
FREUND, Y.; SCHAPIRE, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. textit{Journal of Computer and System Sciences}, 55(1), p. 119-139.
bibitem{friedman2001greedy}
FRIEDMAN, J. H. (2001). Greedy function approximation: a gradient boosting machine. textit{Annals of Statistics}, 29(5), p. 1189-1232.
bibitem{drucker1997support}
DRUCKER, H. et al. (1997). Support vector regression machines. In: textit{Advances in Neural Information Processing Systems 9 (NIPS 1996)}, Denver, CO, USA, p. 155-161.
bibitem{goodfellow2016deep}
GOODFELLOW, I.; BENGIO, Y.; COURVILLE, A. (2016). textit{Deep Learning}. Cambridge, MA: MIT Press.
bibitem{montgomery2021introduction}
MONTGOMERY, D. C.; PECK, E. A.; VINING, G. G. (2021). textit{Introduction to Linear Regression Analysis}. 6th ed. Hoboken, NJ: John Wiley & Sons.
% New references added for the Further Discussion section
bibitem{pan2009survey}
PAN, S. J.; YANG, Q. (2009). A Survey on Transfer Learning. textit{IEEE Transactions on Knowledge and Data Engineering}, 22(10), p. 1345-1359.
bibitem{lundberg2017unified}
LUNDBERG, S. M.; LEE, S-I. (2017). A Unified Approach to Interpreting Model Predictions. In: textit{Advances in Neural Information Processing Systems 30 (NIPS 2017)}, Long Beach, CA, USA, p. 4765-4774.
bibitem{ribeiro2016should}
RIBEIRO, M. T.; SINGH, S.; GUESTRIN, C. (2016). "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In: textit{Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining}, San Francisco, CA, USA, p. 1135-1144.