Machine learning:

a bibliometric analysis




machine learning, Big Data analysis, bibliometric analysis, prediction.


Objective: Present an overview of scientific articles published in the last ten years on the topic of machine learning (ML), with an emphasis on predictive algorithms.

Method/approach: Bibliometric analysis, with support from the PRISMA protocol, to evaluate authors, universities and countries, regarding productivity, bibliographic citations and focuses on the topic, with a sample of 773 articles from the Scopus and Web of Science databases, from 2013 to May /2023.

Originality/value: There is an absence of studies in the literature that consolidate articles related to ML and Big Data. The research contributes to covering this gap, favoring the design of future actions and research.

Main results: The following were identified in the ML bibliometric corpus: most cited authors with the greatest number of publications, most productive countries and universities, journals with the greatest number of publications and citations, areas of knowledge with the greatest number of publications, and the most prestigious articles. In the ML themes and domains, the following were identified: main co-occurrences of keywords, emerging themes (grouped into five clusters), and word clouds by title and abstract. Studies on the impact of data acquisition and predictive analysis represent opportunities for future research.

Theoretical/methodological contributions: The PRISMA protocol enabled the identification and relevant quantitative and qualitative analyzes of articles, consolidating scientific knowledge on the topic.  

Social/managerial contributions: Ease of understanding the maturity of research on ML and Big Data by company managers and researchers, regarding the feasibility of investments to obtain competitive advantages with such technologies.


Download data is not yet available.

Author Biographies

Emerson Martins, CEETEPS – State Center for Technological Education Paula Souza / São Paulo (SP) - Brazil

Master in Management and Technology in Production Systems (CEETEPS) and Researcher at the IT Strategic Management Research Group (CEETEPS/CNPq)

Napoleao Verardi Galegale, CEETEPS – State Center for Technological Education Paula Souza / São Paulo (SP) – Brazil

PhD in Controllership and Accounting (FEA/USP), Master in Production Engineering (POLI/USP), Professor and Researcher at UPEP/CEETEPS and FEA/PUC-SP, leader of the IT Strategic Management Research Group (CEETEPS/CNPq ) and Business Consultant


Ahani A., Nilashi M., Ibrahim O., Sanzogni L., Weaven S., (2019) - Market segmentation and travel choice prediction in Spa hotels through TripAdvisors online reviews

Ahmadi H., Arji G., Shahmoradi L., Safdari R., Nilashi M., Alizadeh M., (2019) - The application of internet of things in healthcare a systematic literature review and classification.

Ali M.A.M., Bashar A., Rabbani M.R., Abdulla Y., (2020) - Transforming Business Decision Making with Internet of Things IoT and Machine Learning ML.

Alonso-Betanzos A., Bolon-Canedo V., (2018) - Big-Data Analysis, Cluster Analysis, and Machine-Learning Approaches.

Antonopoulos I., Robu V., Couraud B., Et Al (2020) - Artificial intelligence and machine learning approaches to energy demand-side response: A systematic review.

Athmaja S.; Hanumanthappa M., Kavitha V., (2017) - A Survey of Machine Learning Algorithms for Big Data Analytics.

Baryannis G., Validi S., Dani S., Antoniou G., (2019) - Supply chain risk management and artificial intelligence state of the art and future research directions.

Batistic S., Van D.L.P., (2019) - History Evolution and Future of Big Data and Analytics A Bibliometric Analysis of Its Relationship to Performance in Organizations.

Bhavnani S.P., Parakh K., Atreja A., Et Al (2017) - 2017 Roadmap for Innovation - ACC Health Policy Statement on Healthcare Transformation in the Era of Digital Health, Big Data and Precision Health.

Bilgic E., Cakir O., Kantardzic M., Duan Y., Cao G., (2021) - Retail analytics: store segmentation using Rule-Based Purchasing behavior analysis.

Böse J.-H., Flunkert V., Gasthaus J., Et Al (2017) - Probabilistic demand forecasting at scale.

Bui T.D., Tsai F.M., Tseng M.L., Tan R.R., Yu K.D.S., Lim M.K., (2021) - Sustainable supply chain management towards disruption and organizational ambidexterity A data driven analysis.

Calatayud A., Mangan J., Christopher M., (2019) - The self-thinking supply chain - Supply Chain Management - Emerald Group Holdings Ltd. - United Kingdom.

Cerruela García G., Luque Ruiz I., Gómez-Nieto M., (2016) - State of the art trends and future of bluetooth low energy near field communication and visible light communication in the development of smart cities - Sensors (Switzerland) - MDPI AG – Spain.

Chandra S. E Verma S., (2021) - Big Data and Sustainable Consumption A Review and Research Agenda – Vision - Sage Publications India Pvt. Ltd – India.

Chang, P.C., Liu, C.H., And Fan, C.Y. (2009) - Data clustering and fuzzy neural network for sales forecasting: A case study in printed circuit board industry.

Chen M., Mao S., Liu Y., (2014) - Big data: A survey - Mobile Networks and Applications.

Chen M., Hao Y.X., Hwang K., Wang L., Wang L., (2017) - Disease Prediction by Machine Learning Over Big Data From Healthcare Communities.

Choi T.-M., Wallace S.W., Wang Y., (2018) - Big Data Analytics in Operations Management.

Dinov I.D., Heavner B., Tang M., et al (2016) - Predictive Big Data Analytics A Study of Parkinsons Disease Using Large Complex Heterogeneous Incongruent MultiSource and Incomplete Observations - Plos One - Public Library Science - United States.

Duan Y., Edwards J.S., Dwivedi Y.K., (2019) - Artificial Intelligence for Decision Making In The Era Of Big Data Evolution Challenges And Research Agenda.

Dwivedi Y.K., Hughes L., Ismagilova E., et al (2021) - Artificial Intelligence AI Multidisciplinary perspectives on emerging challenges opportunities and agenda for research practice and policy.

George G., Osinga E., Lavie D., Scott B., (2016) - Big data and data science methods for management research.

Gill S. S., Tuli S., Xu M., et al, (2019) - Transformative effects of IoT Blockchain and Artificial Intelligence on cloud computing Evolution vision trends and open challenges.

Gupta N., Ahuja N., Malhotra S., Bala A., Kaur G., (2017) - Intelligent heart disease prediction in cloud environment through ensembling - Expert Systems – Wiley – India.

Hashimoto D.A., Rosman G., Rus D., Meireles O.R., (2018) - Artificial Intelligence in Surgery Promises and Perils - Annals of Surgery - Lippincott Williams & Wilkins - United States.

Hassija V., Chamola V., Saxena V., Jain D., Goyal P., Sikdar B., (2019) - A Survey on IoT Security Application Areas Security Threats and Solution Architectures.

Hu H., Wen Y., Chua T-S., Li X., (2014) - Toward scalable systems for big data analytics A technology tutorial - IEEE Access - Institute of Electrical and Electronics Engineers Inc.

Kitchens B., Dobolyi D., Li J., Abbasi A., (2018) - Advanced Customer Analytics Strategic Value Through Integration of RelationshipOriented Big Data.

Kou G., Chao X., Peng Y., Alsaadi F.E., Herrera-Viedma E., (2019) - Machine learning methods for systemic risk analysis in financial sectors.

Kousis A. E Tjortjis C., (2021) - Data mining algorithms for smart cities A bibliometric analysis - Algorithms - MDPI AG – Greece.

Lichman, M. (2013) - UCI Machine Learning Repository. Disponível em:

Johnson A.E.W., Ghassemi M.M., Nemati S., Niehaus K.E., Clifton D.A., Clifford G.D., (2016) - Machine Learning and Decision Support in Critical Care.

Jordan, M.I. E Mitchell, T.M. (2015) - Machine learning: Trends perspectives and prospects. Science, 349:255–260.

Ke J., Zheng H., Yang H., Chen X. (2017) - Short-term forecasting of passenger demand under on-demand ride services: A spatio-temporal deep learning approach.

Krawczyk B., (2016) - Learning from imbalanced data open challenges and future directions - Progress in Artificial Intelligence – Springernature – Poland.

L'heureux A., Grolinger K., Elyamany H.F., Capretz M.A.M., (2017) - Machine Learning with Big Data Challenges and Approaches - IEEE Access - Institute of Electrical and Electronics

Levy, Y.; Ellis, T.J. A system approach to conduct an effective literature review in support of information systems research. Informing Science Journal, v.9, p.181-212, 2006.

Ma C., Zhang H.H., Wang X.F., (2014) - Machine learning for Big Data analytics in plants - Trends in Plant Science - Elsevier Science London – China.

Mishra D., Gunasekaran A., Papadopoulos T., Childe S.J., (2018) - Big Data and supply chain management a review and bibliometric analysis.

Moher, D., Shamseer, L., Clarke, M., Ghersi, D., Liberati, A., Stewart, L. A. (2015) - Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Systematic Reviews, 4(1).

Moreira Mwl., Rodrigues Jjpc., Kumar N., Saleem K., Illin Iv, (2019) - Postpartum depression prediction through pregnancy data analysis for emotionaware smart systems updates.

Nguyen H.D., Tran K.P., Thomassey S., Hamad M., (2021) - Forecasting and Anomaly Detection approaches using LSTM and LSTM Autoencoder techniques with the applications in supply chain management.

Nguyen T., Zhou L., Spiegler V., Ieromonachou P., Lin Y., (2018) - Big data analytics in supply chain management A stateoftheart literature review.

Qian T.Q., Zhu S.J., Hoshida Y., (2019) - Use of big data in drug development for precision medicine an update.

Razavian N., Blecker S., Schmidt A.M., Smith-Mclallen A., Nigam S., Sontag D., (2015) -PopulationLevel Prediction of Type 2 Diabetes From Claims Data and Analysis of Risk Factors

Sahoo S., (2021) - Big data analytics in manufacturing a bibliometric analysis of research in the field of business management.

Sharma, R., Kamble, S.S., Gunasekaran, A., Kumar, V., Kumar, A., (2020) - A systematic literature review on machine learning applications for sustainable agriculture supply chain performance - Computers & Operations Research - Pergamon-Elsevier Science Ltd – England.

Shokouhyar S., Shokoohyar S., Sobhani A., Gorizi A.J., (2021) - Shared mobility in post-COVID era: New challenges and opportunities - Sustainable Cities and Society - Elsevier Ltd

Silver, D., Huang, A. E Guez, A. (2016) - Mastering the game of go with deep neural networks and tree search - Nature, 529:484–489.

Silver, D., Schrittwieser, J., Simonyan, K. E Antonoglou, I. (2017) - Mastering the game of go without human knowledge - Nature, 550:354–359.

Raschka, S. E Mirjalili, V. (2017) - Python Machine Learning, 2nd Ed.- Packt Publishing, Birmingham, UK, 2 edition.

Trieu V.-H., (2017) - Getting value from Business Intelligence systems A review and research agenda - Decision Support Systems - Elsevier B.V. – Australia.

Tzeng G.-H., Shen K.-Y., (2017) - New concepts and trends of hybrid multiple criteria decision making - ISBN 9780367573133

Wanasinghe T.R., Wroblewski L., Petersen B.K., et al (2020) - Digital Twin for the Oil and Gas Industry Overview Research Trends Opportunities and Challenges.

Wang D., Liu X., Wang, M., (2013) - A dt-svm strategy for stock futures prediction with big data - IEEE 16th International Conference on Computational Science and Engineering.

Wang J.L., Zhao P.L., Hoi S.C.H., Jin R., (2014) - Online Feature Selection and Its Applications - IEEE Transactions on Knowledge and Data Engineering - IEEE Computer Soc - United States.

Wang W., Gao J.Y., Zhang M.H., et al (2018) - Rafiki Machine Learning as an Analytics Service System - Proceedings of The Vldb Endowment - Assoc Computing Machinery – China.

Wang Y., Chen Q., Hong T., Kang C., (2019) - Review of Smart Meter Data Analytics Applications Methodologies and Challenges.

Xu J., Huang E., Chen C.-H., Lee L.H., (2015) - Simulation optimization A review and exploration in the new era of cloud computing and big data.



How to Cite

Martins, E., & Galegale, N. V. (2023). Machine learning: : a bibliometric analysis. International Journal of Innovation, 11(3), e24056.