a bibliometric analysis
Keywords:machine learning, Big Data analysis, bibliometric analysis, prediction.
Objective: Present an overview of scientific articles published in the last ten years on the topic of machine learning (ML), with an emphasis on predictive algorithms.
Method/approach: Bibliometric analysis, with support from the PRISMA protocol, to evaluate authors, universities and countries, regarding productivity, bibliographic citations and focuses on the topic, with a sample of 773 articles from the Scopus and Web of Science databases, from 2013 to May /2023.
Originality/value: There is an absence of studies in the literature that consolidate articles related to ML and Big Data. The research contributes to covering this gap, favoring the design of future actions and research.
Main results: The following were identified in the ML bibliometric corpus: most cited authors with the greatest number of publications, most productive countries and universities, journals with the greatest number of publications and citations, areas of knowledge with the greatest number of publications, and the most prestigious articles. In the ML themes and domains, the following were identified: main co-occurrences of keywords, emerging themes (grouped into five clusters), and word clouds by title and abstract. Studies on the impact of data acquisition and predictive analysis represent opportunities for future research.
Theoretical/methodological contributions: The PRISMA protocol enabled the identification and relevant quantitative and qualitative analyzes of articles, consolidating scientific knowledge on the topic.
Social/managerial contributions: Ease of understanding the maturity of research on ML and Big Data by company managers and researchers, regarding the feasibility of investments to obtain competitive advantages with such technologies.
Ahani A., Nilashi M., Ibrahim O., Sanzogni L., Weaven S., (2019) - Market segmentation and travel choice prediction in Spa hotels through TripAdvisors online reviews https://doi.org/10.1016/j.ijhm.2019.01.003
Ahmadi H., Arji G., Shahmoradi L., Safdari R., Nilashi M., Alizadeh M., (2019) - The application of internet of things in healthcare a systematic literature review and classification. https://doi.org/10.1007/s10209-018-0618-4
Ali M.A.M., Bashar A., Rabbani M.R., Abdulla Y., (2020) - Transforming Business Decision Making with Internet of Things IoT and Machine Learning ML. https://doi.org/10.1109/dasa51403.2020.9317174
Alonso-Betanzos A., Bolon-Canedo V., (2018) - Big-Data Analysis, Cluster Analysis, and Machine-Learning Approaches. https://doi.org/10.1007/978-3-319-77932-4_37
Antonopoulos I., Robu V., Couraud B., Et Al (2020) - Artificial intelligence and machine learning approaches to energy demand-side response: A systematic review. https://doi.org/10.1016/j.rser.2020.109899
Athmaja S.; Hanumanthappa M., Kavitha V., (2017) - A Survey of Machine Learning Algorithms for Big Data Analytics. https://doi.org/10.1109/iciiecs.2017.8276028
Baryannis G., Validi S., Dani S., Antoniou G., (2019) - Supply chain risk management and artificial intelligence state of the art and future research directions. https://doi.org/10.1080/00207543.2018.1530476
Batistic S., Van D.L.P., (2019) - History Evolution and Future of Big Data and Analytics A Bibliometric Analysis of Its Relationship to Performance in Organizations. https://doi.org/10.1111/1467-8551.12340
Bhavnani S.P., Parakh K., Atreja A., Et Al (2017) - 2017 Roadmap for Innovation - ACC Health Policy Statement on Healthcare Transformation in the Era of Digital Health, Big Data and Precision Health. https://doi.org/10.1016/j.jacc.2017.10.018
Bilgic E., Cakir O., Kantardzic M., Duan Y., Cao G., (2021) - Retail analytics: store segmentation using Rule-Based Purchasing behavior analysis. https://doi.org/10.1080/09593969.2021.1915847
Böse J.-H., Flunkert V., Gasthaus J., Et Al (2017) - Probabilistic demand forecasting at scale. https://doi.org/10.14778/3137765.3137775
Bui T.D., Tsai F.M., Tseng M.L., Tan R.R., Yu K.D.S., Lim M.K., (2021) - Sustainable supply chain management towards disruption and organizational ambidexterity A data driven analysis. https://doi.org/10.1016/j.spc.2020.09.017
Calatayud A., Mangan J., Christopher M., (2019) - The self-thinking supply chain - Supply Chain Management - Emerald Group Holdings Ltd. - United Kingdom. https://doi.org/10.1108/SCM-03-2018-0136
Cerruela García G., Luque Ruiz I., Gómez-Nieto M., (2016) - State of the art trends and future of bluetooth low energy near field communication and visible light communication in the development of smart cities - Sensors (Switzerland) - MDPI AG – Spain. https://doi.org/10.3390/s16111968
Chandra S. E Verma S., (2021) - Big Data and Sustainable Consumption A Review and Research Agenda – Vision - Sage Publications India Pvt. Ltd – India. https://doi.org/10.1177/09722629211022520
Chang, P.C., Liu, C.H., And Fan, C.Y. (2009) - Data clustering and fuzzy neural network for sales forecasting: A case study in printed circuit board industry. https://doi.org/10.1016/j.knosys.2009.02.005
Chen M., Mao S., Liu Y., (2014) - Big data: A survey - Mobile Networks and Applications. https://doi.org/10.1007/s11036-013-0489-0
Chen M., Hao Y.X., Hwang K., Wang L., Wang L., (2017) - Disease Prediction by Machine Learning Over Big Data From Healthcare Communities. https://doi.org/10.1109/access.2017.2694446
Choi T.-M., Wallace S.W., Wang Y., (2018) - Big Data Analytics in Operations Management. https://doi.org/10.1111/poms.12838
Dinov I.D., Heavner B., Tang M., et al (2016) - Predictive Big Data Analytics A Study of Parkinsons Disease Using Large Complex Heterogeneous Incongruent MultiSource and Incomplete Observations - Plos One - Public Library Science - United States. https://doi.org/10.1371/journal.pone.0157077
Duan Y., Edwards J.S., Dwivedi Y.K., (2019) - Artificial Intelligence for Decision Making In The Era Of Big Data Evolution Challenges And Research Agenda. https://doi.org/10.1016/j.ijinfomgt.2019.01.021
Dwivedi Y.K., Hughes L., Ismagilova E., et al (2021) - Artificial Intelligence AI Multidisciplinary perspectives on emerging challenges opportunities and agenda for research practice and policy. https://doi.org/10.1016/j.ijinfomgt.2019.08.002
George G., Osinga E., Lavie D., Scott B., (2016) - Big data and data science methods for management research. https://doi.org/10.5465/amj.2016.4005
Gill S. S., Tuli S., Xu M., et al, (2019) - Transformative effects of IoT Blockchain and Artificial Intelligence on cloud computing Evolution vision trends and open challenges. https://doi.org/10.1016/j.iot.2019.100118
Gupta N., Ahuja N., Malhotra S., Bala A., Kaur G., (2017) - Intelligent heart disease prediction in cloud environment through ensembling - Expert Systems – Wiley – India. https://doi.org/10.1111/exsy.12207
Hashimoto D.A., Rosman G., Rus D., Meireles O.R., (2018) - Artificial Intelligence in Surgery Promises and Perils - Annals of Surgery - Lippincott Williams & Wilkins - United States. http://dx.doi.org/10.1097/SLA.0000000000002693
Hassija V., Chamola V., Saxena V., Jain D., Goyal P., Sikdar B., (2019) - A Survey on IoT Security Application Areas Security Threats and Solution Architectures. https://doi.org/10.1109/access.2019.2924045
Hu H., Wen Y., Chua T-S., Li X., (2014) - Toward scalable systems for big data analytics A technology tutorial - IEEE Access - Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/access.2014.2332453
Kitchens B., Dobolyi D., Li J., Abbasi A., (2018) - Advanced Customer Analytics Strategic Value Through Integration of RelationshipOriented Big Data. https://doi.org/10.1080/07421222.2018.1451957
Kou G., Chao X., Peng Y., Alsaadi F.E., Herrera-Viedma E., (2019) - Machine learning methods for systemic risk analysis in financial sectors. https://doi.org/10.3846/tede.2019.8740
Kousis A. E Tjortjis C., (2021) - Data mining algorithms for smart cities A bibliometric analysis - Algorithms - MDPI AG – Greece. https://doi.org/10.3390/a14080242
Lichman, M. (2013) - UCI Machine Learning Repository. Disponível em: https://archive.ics.uci.edu/ml/datasets/wine
Johnson A.E.W., Ghassemi M.M., Nemati S., Niehaus K.E., Clifton D.A., Clifford G.D., (2016) - Machine Learning and Decision Support in Critical Care. https://doi.org/10.1109/jproc.2015.2501978
Jordan, M.I. E Mitchell, T.M. (2015) - Machine learning: Trends perspectives and prospects. Science, 349:255–260. https://doi.org/10.1126/science.aaa8415
Ke J., Zheng H., Yang H., Chen X. (2017) - Short-term forecasting of passenger demand under on-demand ride services: A spatio-temporal deep learning approach. https://doi.org/10.1016/j.trc.2017.10.016
Krawczyk B., (2016) - Learning from imbalanced data open challenges and future directions - Progress in Artificial Intelligence – Springernature – Poland. https://doi.org/10.1007/s13748-016-0094-0
L'heureux A., Grolinger K., Elyamany H.F., Capretz M.A.M., (2017) - Machine Learning with Big Data Challenges and Approaches - IEEE Access - Institute of Electrical and Electronics https://doi.org/10.1109/access.2017.2696365
Levy, Y.; Ellis, T.J. A system approach to conduct an effective literature review in support of information systems research. Informing Science Journal, v.9, p.181-212, 2006. https://doi.org/10.28945/479
Ma C., Zhang H.H., Wang X.F., (2014) - Machine learning for Big Data analytics in plants - Trends in Plant Science - Elsevier Science London – China. https://doi.org/10.1016/j.tplants.2014.08.004
Mishra D., Gunasekaran A., Papadopoulos T., Childe S.J., (2018) - Big Data and supply chain management a review and bibliometric analysis. https://doi.org/10.1007/s10479-016-2236-y
Moher, D., Shamseer, L., Clarke, M., Ghersi, D., Liberati, A., Stewart, L. A. (2015) - Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Systematic Reviews, 4(1). https://doi.org/10.1186/2046-4053-4-1
Moreira Mwl., Rodrigues Jjpc., Kumar N., Saleem K., Illin Iv, (2019) - Postpartum depression prediction through pregnancy data analysis for emotionaware smart systems updates. https://doi.org/10.1016/j.inffus.2018.07.001
Nguyen H.D., Tran K.P., Thomassey S., Hamad M., (2021) - Forecasting and Anomaly Detection approaches using LSTM and LSTM Autoencoder techniques with the applications in supply chain management. https://doi.org/10.1016/j.ijinfomgt.2020.102282
Nguyen T., Zhou L., Spiegler V., Ieromonachou P., Lin Y., (2018) - Big data analytics in supply chain management A stateoftheart literature review. https://doi.org/10.1016/j.cor.2017.07.004
Qian T.Q., Zhu S.J., Hoshida Y., (2019) - Use of big data in drug development for precision medicine an update. https://doi.org/10.1080/23808993.2019.1617632
Razavian N., Blecker S., Schmidt A.M., Smith-Mclallen A., Nigam S., Sontag D., (2015) -PopulationLevel Prediction of Type 2 Diabetes From Claims Data and Analysis of Risk Factors https://doi.org/10.1089/big.2015.0020
Sahoo S., (2021) - Big data analytics in manufacturing a bibliometric analysis of research in the field of business management. https://doi.org/10.1080/00207543.2021.1919333
Sharma, R., Kamble, S.S., Gunasekaran, A., Kumar, V., Kumar, A., (2020) - A systematic literature review on machine learning applications for sustainable agriculture supply chain performance - Computers & Operations Research - Pergamon-Elsevier Science Ltd – England. https://doi.org/10.1016/j.cor.2020.104926
Shokouhyar S., Shokoohyar S., Sobhani A., Gorizi A.J., (2021) - Shared mobility in post-COVID era: New challenges and opportunities - Sustainable Cities and Society - Elsevier Ltd https://doi.org/10.1016/j.scs.2021.102714
Silver, D., Huang, A. E Guez, A. (2016) - Mastering the game of go with deep neural networks and tree search - Nature, 529:484–489. https://doi.org/10.1038/nature16961
Silver, D., Schrittwieser, J., Simonyan, K. E Antonoglou, I. (2017) - Mastering the game of go without human knowledge - Nature, 550:354–359. https://doi.org/10.1038/nature24270
Raschka, S. E Mirjalili, V. (2017) - Python Machine Learning, 2nd Ed.- Packt Publishing, Birmingham, UK, 2 edition.
Trieu V.-H., (2017) - Getting value from Business Intelligence systems A review and research agenda - Decision Support Systems - Elsevier B.V. – Australia. https://doi.org/10.1016/j.dss.2016.09.019
Tzeng G.-H., Shen K.-Y., (2017) - New concepts and trends of hybrid multiple criteria decision making - ISBN 9780367573133
Wanasinghe T.R., Wroblewski L., Petersen B.K., et al (2020) - Digital Twin for the Oil and Gas Industry Overview Research Trends Opportunities and Challenges. https://doi.org/10.1109/access.2020.2998723
Wang D., Liu X., Wang, M., (2013) - A dt-svm strategy for stock futures prediction with big data - IEEE 16th International Conference on Computational Science and Engineering. https://doi.org/10.1109/cse.2013.147
Wang J.L., Zhao P.L., Hoi S.C.H., Jin R., (2014) - Online Feature Selection and Its Applications - IEEE Transactions on Knowledge and Data Engineering - IEEE Computer Soc - United States. https://doi.org/10.1109/tkde.2013.32
Wang W., Gao J.Y., Zhang M.H., et al (2018) - Rafiki Machine Learning as an Analytics Service System - Proceedings of The Vldb Endowment - Assoc Computing Machinery – China. https://doi.org/10.48550/arXiv.1804.06087
Wang Y., Chen Q., Hong T., Kang C., (2019) - Review of Smart Meter Data Analytics Applications Methodologies and Challenges. https://doi.org/10.1109/tsg.2018.2818167
Xu J., Huang E., Chen C.-H., Lee L.H., (2015) - Simulation optimization A review and exploration in the new era of cloud computing and big data. https://doi.org/10.1142/s0217595915500190
How to Cite
Copyright (c) 2023 Autores
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
The author(s) authorize the publication of the article in the journal.
The author(s) ensure that the contribution is original and unpublished and is not being evaluated in other journal(s).
The journal is not responsible for the opinions, ideas and concepts expressed in the texts because they are the sole responsibility of the author(s).
The publishers reserve the right to make adjustments and textual adaptation to the norms of APA.
Authors retain copyright and grant the journal right of first publication, with the work after publication simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access) at http://opcit.eprints.org/oacitation-biblio.htmlAuthors are able to use ORCID is a system of identification for authors. An ORCID identifier is unique to an individual and acts as a persistent digital identifier to ensure that authors (particularly those with relatively common names) can be distinguished and their work properly attributed.