COMPARISON OF THE PERFORMANCE OF REGRESSION-SPECIFIC AND MULTI-PURPOSE ALGORITHMS

Nasir Usman; Darniati Darniati; Rosnani Rosnani; Musdalifa Thamrin; Nurahmad Nurahmad; Nurdiansyah Nurdiansyah; Muhammad Faisal

doi:10.59003/nhj.v4i8.1274

Authors

Nasir Usman STMIK Profesional Makassar
Darniati Darniati STMIK Profesional Makassar
Rosnani Rosnani STMIK Profesional Makassar
Musdalifa Thamrin STMIK Profesional Makassar
Nurahmad Nurahmad Universitas Handayani Makassar
Nurdiansyah Nurdiansyah Universitas Dipa Makassar
Muhammad Faisal Universitas Muhammadiyah Makassar

DOI:

https://doi.org/10.59003/nhj.v4i8.1274

Keywords:

Regression-Specific, Multi-Purpose Algorithms, Comparison Technique, Boston Housing Dataset

Abstract

Regression is a data science method for evaluating the relationship between independent and dependent variables. This study compares the performance of various regression algorithms using the Boston Housing Dataset, which consists of 506 samples divided into 80% for training and 20% for testing. Performance evaluation was conducted using metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and the Coefficient of Determination (R²). All algorithms were implemented with default hyperparameter settings provided by the Scikit-learn library to ensure fair comparison. The results showed that versatile algorithms, particularly Gradient Boosting Machines (GBM) and Random Forest, achieved the best performance with R² values of 0.92 and 0.89, respectively, and lower errors. Conversely, regression-specific algorithms, such as Linear Regression and Ridge Regression, recorded R² values of approximately 0.67, while the k-Nearest Neighbors algorithm had the lowest performance with an R² of 0.65. Versatile algorithms proved to be more effective for datasets with complex non-linear patterns, while regression-specific algorithms were better suited for linear data patterns. These findings provide guidance for practitioners in selecting algorithms based on data characteristics and analysis objectives.

Downloads

Download data is not yet available.

References

Botchkarev, A. (2019). A New Typology Design of Performance Metrics to Measure Errors in Machine Learning Regression Algorithms. Interdisciplinary Journal of Information, Knowledge, and Management, 14, 045–076. https://doi.org/10.28945/4184

Botev, Z., Chen, Y.-L., LrEcuyer, P., MacNamara, S., & Kroese, D. P. (2018). EXACT POSTERIOR SIMULATION FROM THE LINEAR LASSO REGRESSION. 2018 Winter Simulation Conference (WSC), 1706–1717. https://doi.org/10.1109/WSC.2018.8632237

Branco, P., Torgo, L., & Ribeiro, R. P. (2017). A Survey of Predictive Modeling on Imbalanced Domains. ACM Computing Surveys, 49(2), 1–50. https://doi.org/10.1145/2907070

Cunningham, P., & Delany, S. J. (2022). k-Nearest Neighbour Classifiers - A Tutorial. ACM Computing Surveys, 54(6), 1–25. https://doi.org/10.1145/3459665

Elshazli, M. T., Hussein, D., Bhat, G., Abdel-Rahim, A., & Ibrahim, A. (2024). Advancing infrastructure resilience: machine learning-based prediction of bridges’ rating factors under autonomous truck platoons. Journal of Infrastructure Preservation and Resilience, 5(1), 5. https://doi.org/10.1186/s43065-024-00096-x

Mądziel, M. (2024). Energy Modeling for Electric Vehicles Based on Real Driving Cycles: An Artificial Intelligence Approach for Microscale Analysis. https://doi.org/10.20944/preprints202402.0120.v1

Natekin, A., & Knoll, A. (2013). Gradient boosting machines, a tutorial. Frontiers in Neurorobotics, 7. https://doi.org/10.3389/fnbot.2013.00021

Njomaba, E., Ofori, J. N., Guuroh, R. T., Aikins, B. E., Nagbija, R. K., & Surový, P. (2024). Assessing Forest Species Diversity in Ghana’s Tropical Forest Using PlanetScope Data. Remote Sensing, 16(3), 463. https://doi.org/10.3390/rs16030463

Senapati, A. (2023). Correlation Coefficient-based Breakpoint detection @Piecewise Linear Regression. https://doi.org/10.21203/rs.3.rs-2917422/v1

Sfravara, F., Barberi, E., Bongiovanni, G., Chillemi, M., & Brusca, S. (2024). Development of a Predictive Model for Evaluation of the Influence of Various Parameters on the Performance of an Oscillating Water Column Device. Sensors, 24(11), 3582. https://doi.org/10.3390/s24113582

Soekamto, Y. S., Chandra, M., Wiradinata, T., Tanamal, R., & Saputri, T. R. D. (2023). Property Category Prediction Model using Random Forest Classifier to Improve Property Industry in Surabaya (pp. 256–265). https://doi.org/10.2991/978-94-6463-144-9_24

Vieira, J., Duarte, R. P., & Neto, H. C. (2019). kNN-STUFF: kNN STreaming Unit for Fpgas. IEEE Access, 7, 170864–170877. https://doi.org/10.1109/ACCESS.2019.2955864

Wahyuningsih, T., Iriani, A., Dwi Purnomo, H., & Sembiring, I. (2024). Predicting students’ success level in an examination using advanced linear regression and extreme gradient boosting. Computer Science and Information Technologies, 5(1), 29–37. https://doi.org/10.11591/csit.v5i1.p29-37

Xin, S. J., & Khalid, K. (2018). Modelling House Price Using Ridge Regression and Lasso Regression. International Journal of Engineering & Technology, 7(4.30), 498. https://doi.org/10.14419/ijet.v7i4.30.22378

Yan, L., Wu, C., & Liu, J. (2020). Visual Analysis of Odor Interaction Based on Support Vector Regression Method. Sensors, 20(6), 1707. https://doi.org/10.3390/s20061707

Yang, Y., Gong, H., & Zang, J. (2023). The U.S. Opinion on China’s Climate Issue During the Biden Administration from the Perspective of Big Data Software WordSmith 8.0. In Proceedings of the 2022 3rd International Conference on Big Data and Informatization Education (ICBDIE 2022) (pp. 23–30). Atlantis Press International BV. https://doi.org/10.2991/978-94-6463-034-3_4

Yin, Q., Ye, X., Huang, B., Qin, L., Ye, X., & Wang, J. (2023). Stroke Risk Prediction: Comparing Different Sampling Algorithms. International Journal of Advanced Computer Science and Applications, 14(6). https://doi.org/10.14569/IJACSA.2023.01406115

Zhang, M., Hu, R., & Jiang, L. (2019). Three‐dimensional sound reproduction in vehicle based on data mining technique. Concurrency and Computation: Practice and Experience, 31(4). https://doi.org/10.1002/cpe.4936