COMPARISON OF THE PERFORMANCE OF REGRESSION-SPECIFIC AND MULTI-PURPOSE ALGORITHMS
DOI:
https://doi.org/10.59003/nhj.v4i8.1274Keywords:
Regression-Specific, Multi-Purpose Algorithms, Comparison Technique, Boston Housing DatasetAbstract
Regression is a data science method for evaluating the relationship between independent and dependent variables. This study compares the performance of various regression algorithms using the Boston Housing Dataset, which consists of 506 samples divided into 80% for training and 20% for testing. Performance evaluation was conducted using metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and the Coefficient of Determination (R²). All algorithms were implemented with default hyperparameter settings provided by the Scikit-learn library to ensure fair comparison. The results showed that versatile algorithms, particularly Gradient Boosting Machines (GBM) and Random Forest, achieved the best performance with R² values of 0.92 and 0.89, respectively, and lower errors. Conversely, regression-specific algorithms, such as Linear Regression and Ridge Regression, recorded R² values of approximately 0.67, while the k-Nearest Neighbors algorithm had the lowest performance with an R² of 0.65. Versatile algorithms proved to be more effective for datasets with complex non-linear patterns, while regression-specific algorithms were better suited for linear data patterns. These findings provide guidance for practitioners in selecting algorithms based on data characteristics and analysis objectives.
Downloads
References
Botchkarev, A. (2019). A New Typology Design of Performance Metrics to Measure Errors in Machine Learning Regression Algorithms. Interdisciplinary Journal of Information, Knowledge, and Management, 14, 045–076. https://doi.org/10.28945/4184
Botev, Z., Chen, Y.-L., LrEcuyer, P., MacNamara, S., & Kroese, D. P. (2018). EXACT POSTERIOR SIMULATION FROM THE LINEAR LASSO REGRESSION. 2018 Winter Simulation Conference (WSC), 1706–1717. https://doi.org/10.1109/WSC.2018.8632237
Branco, P., Torgo, L., & Ribeiro, R. P. (2017). A Survey of Predictive Modeling on Imbalanced Domains. ACM Computing Surveys, 49(2), 1–50. https://doi.org/10.1145/2907070
Cunningham, P., & Delany, S. J. (2022). k-Nearest Neighbour Classifiers - A Tutorial. ACM Computing Surveys, 54(6), 1–25. https://doi.org/10.1145/3459665
Elshazli, M. T., Hussein, D., Bhat, G., Abdel-Rahim, A., & Ibrahim, A. (2024). Advancing infrastructure resilience: machine learning-based prediction of bridges’ rating factors under autonomous truck platoons. Journal of Infrastructure Preservation and Resilience, 5(1), 5. https://doi.org/10.1186/s43065-024-00096-x
Mądziel, M. (2024). Energy Modeling for Electric Vehicles Based on Real Driving Cycles: An Artificial Intelligence Approach for Microscale Analysis. https://doi.org/10.20944/preprints202402.0120.v1
Natekin, A., & Knoll, A. (2013). Gradient boosting machines, a tutorial. Frontiers in Neurorobotics, 7. https://doi.org/10.3389/fnbot.2013.00021
Njomaba, E., Ofori, J. N., Guuroh, R. T., Aikins, B. E., Nagbija, R. K., & Surový, P. (2024). Assessing Forest Species Diversity in Ghana’s Tropical Forest Using PlanetScope Data. Remote Sensing, 16(3), 463. https://doi.org/10.3390/rs16030463
Senapati, A. (2023). Correlation Coefficient-based Breakpoint detection @Piecewise Linear Regression. https://doi.org/10.21203/rs.3.rs-2917422/v1
Sfravara, F., Barberi, E., Bongiovanni, G., Chillemi, M., & Brusca, S. (2024). Development of a Predictive Model for Evaluation of the Influence of Various Parameters on the Performance of an Oscillating Water Column Device. Sensors, 24(11), 3582. https://doi.org/10.3390/s24113582
Soekamto, Y. S., Chandra, M., Wiradinata, T., Tanamal, R., & Saputri, T. R. D. (2023). Property Category Prediction Model using Random Forest Classifier to Improve Property Industry in Surabaya (pp. 256–265). https://doi.org/10.2991/978-94-6463-144-9_24
Vieira, J., Duarte, R. P., & Neto, H. C. (2019). kNN-STUFF: kNN STreaming Unit for Fpgas. IEEE Access, 7, 170864–170877. https://doi.org/10.1109/ACCESS.2019.2955864
Wahyuningsih, T., Iriani, A., Dwi Purnomo, H., & Sembiring, I. (2024). Predicting students’ success level in an examination using advanced linear regression and extreme gradient boosting. Computer Science and Information Technologies, 5(1), 29–37. https://doi.org/10.11591/csit.v5i1.p29-37
Xin, S. J., & Khalid, K. (2018). Modelling House Price Using Ridge Regression and Lasso Regression. International Journal of Engineering & Technology, 7(4.30), 498. https://doi.org/10.14419/ijet.v7i4.30.22378
Yan, L., Wu, C., & Liu, J. (2020). Visual Analysis of Odor Interaction Based on Support Vector Regression Method. Sensors, 20(6), 1707. https://doi.org/10.3390/s20061707
Yang, Y., Gong, H., & Zang, J. (2023). The U.S. Opinion on China’s Climate Issue During the Biden Administration from the Perspective of Big Data Software WordSmith 8.0. In Proceedings of the 2022 3rd International Conference on Big Data and Informatization Education (ICBDIE 2022) (pp. 23–30). Atlantis Press International BV. https://doi.org/10.2991/978-94-6463-034-3_4
Yin, Q., Ye, X., Huang, B., Qin, L., Ye, X., & Wang, J. (2023). Stroke Risk Prediction: Comparing Different Sampling Algorithms. International Journal of Advanced Computer Science and Applications, 14(6). https://doi.org/10.14569/IJACSA.2023.01406115
Zhang, M., Hu, R., & Jiang, L. (2019). Three‐dimensional sound reproduction in vehicle based on data mining technique. Concurrency and Computation: Practice and Experience, 31(4). https://doi.org/10.1002/cpe.4936
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Nasir Usman

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
NHJ is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Articles in this journal are Open Access articles published under the Creative Commons CC BY-NC-SA License This license permits use, distribution and reproduction in any medium for non-commercial purposes only, provided the original work and source is properly cited.
Any derivative of the original must be distributed under the same license as the original.