Predictive Analysis Applied to Milk Cooling Using Regression Models and a Synthetic Dataset

Main Article Content

Luiz Carlos Brandão Junior
Ricardo Rodrigues Magalhães

Abstract

Predictive analysis plays a crucial role in optimizing agro-industrial processes, such as milk cooling, which is essential for maintaining its quality. This study investigates the application of multiple regression models to predict critical variables in the milk cooling process, using a synthetic dataset with 10,000 samples. The dataset was structured to reflect key parameters like milk volume and initial temperature, inspired by information from a reference technical document on the numerical simulation of milk cooling \cite{rezende2021numerical}. Twenty regression algorithms were trained and evaluated to predict the actual cooling time, heat flux, and simulated cooling time. The results demonstrate that decision tree-based models (e.g., Gradient Boosting, LightGBM, Random Forest) achieved high accuracy ($R^2 > 0.99$) in predicting cooling times and good performance ($R^2 > 0.97$) for heat flux. This study highlights the utility of synthetic datasets for the development and evaluation of predictive models in a controlled environment, providing valuable insights for understanding and potentially optimizing the milk cooling process.

Article Details

How to Cite
Brandão Junior, L. C., & Rodrigues Magalhães, R. (2025). Predictive Analysis Applied to Milk Cooling Using Regression Models and a Synthetic Dataset. INFOCOMP Journal of Computer Science, 24(2). Retrieved from https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/5222
Section
Machine Learning and Computational Intelligence

References

bibitem{rezende2021numerical}

REZENDE, R. P.; ANDRADE, E. T. de; CORREA, J. L. G.; MAGALHÃES, R. R. Numerical simulation applied to milk cooling. textit{Revista Engenharia na Agricultura}, Viçosa, MG, v. 29, p. 122-128, Jul. 2021. ISSN 2175-6813. DOI: 10.13083/reveng.v29i1.9527. Available at: url{https://www.reveng.ufv.br/reveng/article/view/9527}. Accessed on: [Access Date].

bibitem{shmueli2010}

Shmueli, G. (2010). To explain or to predict?. textit{Statistical Science}, 25(3), 289-310.

bibitem{jordon2022}

Jordon, J., Yoon, J., & van der Schaar, M. (2022). Synthetic data: A conceptual and practical guide. textit{Patterns}, 3(6), 100516.

bibitem{hastie2009}

Hastie, T., Tibshirani, R., & Friedman, J. (2009). textit{The Elements of Statistical Learning: Data Mining, Inference, and Prediction}. Springer Science & Business Media.

bibitem{shmueli2010to}

SHMUELI, G. (2010). To explain or to predict?. textit{Statistical Science}, 25(3), p. 289-310.

bibitem{jordon2022synthetic}

JORDON, J.; YOON, J.; VAN DER SCHAAR, M. (2022). Synthetic data: A conceptual and practical guide. textit{Patterns}, 3(6), 100516.

bibitem{hastie2009elements}

HASTIE, T.; TIBSHIRANI, R.; FRIEDMAN, J. (2009). textit{The Elements of Statistical Learning: Data Mining, Inference, and Prediction}. 2nd ed. New York: Springer Science & Business Media. (Springer Series in Statistics).

bibitem{zhao2014temperature}

ZHAO, B. (2014). Temperature-coupled field analysis of LPG tank under fire based on wavelet finite element method. textit{Journal of Thermal Analysis and Calorimetry}, 117(1), p. 413-422.

bibitem{nimdum2015experimental}

NIMDUM, P.; PATAMAPROHM, B.; RENARD, J.; VILLALONGA, S. (2015). Experimental method and numerical simulation demonstrate non-linear axial behaviour in composite filament wound pressure vessel due to thermal expansion effect. textit{International Journal of Hydrogen Energy}, 40(39), p. 13231-13241.

% New references added for the Methodology section

bibitem{scikitlearn}

PEDREGOSA, F. et al. (2011). Scikit-learn: Machine Learning in Python. textit{Journal of Machine Learning Research}, 12, p. 2825-2830.

bibitem{chen2016xgboost}

CHEN, T.; GUESTRIN, C. (2016). XGBoost: A Scalable Tree Boosting System. In: textit{Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining}, San Francisco, CA, USA, p. 785-794.

bibitem{ke2017lightgbm}

KE, G. et al. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In: textit{Advances in Neural Information Processing Systems 30 (NIPS 2017)}, Long Beach, CA, USA, p. 3146-3154.

bibitem{altman1992introduction}

ALTMAN, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. textit{The American Statistician}, 46(3), p. 175-185.

bibitem{breiman2001random}

BREIMAN, L. (2001). Random forests. textit{Machine Learning}, 45(1), p. 5-32.

bibitem{freund1997decision}

FREUND, Y.; SCHAPIRE, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. textit{Journal of Computer and System Sciences}, 55(1), p. 119-139.

bibitem{friedman2001greedy}

FRIEDMAN, J. H. (2001). Greedy function approximation: a gradient boosting machine. textit{Annals of Statistics}, 29(5), p. 1189-1232.

bibitem{drucker1997support}

DRUCKER, H. et al. (1997). Support vector regression machines. In: textit{Advances in Neural Information Processing Systems 9 (NIPS 1996)}, Denver, CO, USA, p. 155-161.

bibitem{goodfellow2016deep}

GOODFELLOW, I.; BENGIO, Y.; COURVILLE, A. (2016). textit{Deep Learning}. Cambridge, MA: MIT Press.

bibitem{montgomery2021introduction}

MONTGOMERY, D. C.; PECK, E. A.; VINING, G. G. (2021). textit{Introduction to Linear Regression Analysis}. 6th ed. Hoboken, NJ: John Wiley & Sons.

% New references added for the Further Discussion section

bibitem{pan2009survey}

PAN, S. J.; YANG, Q. (2009). A Survey on Transfer Learning. textit{IEEE Transactions on Knowledge and Data Engineering}, 22(10), p. 1345-1359.

bibitem{lundberg2017unified}

LUNDBERG, S. M.; LEE, S-I. (2017). A Unified Approach to Interpreting Model Predictions. In: textit{Advances in Neural Information Processing Systems 30 (NIPS 2017)}, Long Beach, CA, USA, p. 4765-4774.

bibitem{ribeiro2016should}

RIBEIRO, M. T.; SINGH, S.; GUESTRIN, C. (2016). "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In: textit{Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining}, San Francisco, CA, USA, p. 1135-1144.