Big data analytics applied to health services: a literature review

. Abstract: The purpose of this study is to understand the concepts and evolution of big data analytics applied to health services, considering activities that involve the diagnosis, treatment, and management of the patient. The literature review, consulting the databases Science Direct, Scopus and Web of Science and employing the keywords health analytics and big data


Introduction
Organizational contexts, especially environments containing a large and often-overlooked volume of data and information, require techniques and tools capable of analyzing and interpreting this volume of data so as to improve the performance of certain activities. Known as big data, this computer science problem has been widely discussed and employed, giving a real meaning to information storage (Shafqat et al, 2020).
Volume, variety, velocity and veracity are the defining characteristics of big data applications Oussous et al, 2018). The public health context, the reach and coverage of the Single Health System, and the private clinics, laboratories, and specialized centers produce a large volume of data and information that may be employed in predictive analyses that aim for strategic decision-making.
The research problem is to understand the concepts and evolution that surround health analytics based on big data analytics, whose data management and interpretation may generate several challenges related to size, format, heterogeneity, and sample incongruity Zhou et al, (2019), allowing for interpretations represented by knowledge and expressed as decision-making actions in patient treatment and management.

Big data
Collecting and handling big data is a challenge but also an opportunity for an in-depth analysis, with an economic perspective, of the emerging problems in healthcare .
Several authors define and elucidate the concept of big data. In the view of Manjika (2011), it is a data set whose size is beyond the capacity of the traditional tools of data storage with initiatives of collecting, storing, managing, and analyzing possibilities of information extraction. The literature often defines big data in terms of data size, generally big and multidimensional as its central characteristics (Emani, Cullot, Nicolle, 2015).
Big data requires significant resources and robust methods and technologies that are capable of cleaning, processing, analyzing, protecting, and providing granular access to big, evolving data sets from different providers and institutions. It is directly influenced by emerging technologies like cloud computing and internet of things (Oussous et al, 2018).
Moreover, the aforementioned authors point out that the nature of the data treated by big data solutions is structured, that is, data from which information or knowledge can be obtained directly, yet unstructured and semi-structured data can be susceptible to analysis for extracting information, highlighting its heterogeneity. Nevertheless, due to their complex nature, traditional tools of business intelligence cannot be superior to big data applications.
Big data is characterized by the volume, seeing that digital data are continuously generated by different information technology devices and their growth is extremely accelerated, illustrating the velocity. These data need to be processed to extract relevant information and construct insights. Finally, it is characterized by the variety, due to the heterogeneous nature of the data (Oussous et al, 2018).
Authors like Gandomir and Haider (2015) present other characteristics that concisely define big data: Vision: Definition of a goal; Verification: Data processed according to specifications;

Validation: Fulfillment of the goal;
Value: Pertinence of the information; Complexity: Evolution and relationship of the data; Immutability: It may be permanent, as long as it is managed.
Due to their characteristics and processing methods, big data applications present high hardware and software requirements. Seeing that the actions of collecting, integrating, and storing represent challenges in data management, it is essential to have a reliable process of information extraction and significantly lower defenses in order to ensure efficiency (Emani, Cullot, Nicolle, 2015).
Given this complexity, the challenge of dealing with big data lies in making it manageable, that is, facilitating the extraction of reliable information for an acceptable cost, moment in which the aggregation of data from different encoded sources entail efficiency gains in terms of storage, access, management, and security (Oussous et al, 2018).
With that understanding, the next steps will explore how big data analytics based on analytical techniques allows achieving significant findings that may reveal noteworthy strategic conditions in management, operations, businesses, and especially sustainable competitive advantages.

Health analytics through big data analytics
Linking health analytics to big data applications may contribute to the collective knowledge, track the results of prevention-based strategies, and improve the efficiency in patient management. This context, overlooked by managers, engineers, and policy-makers, may represent an avenue to be explored in healthcare .
The big data generated in the context of health analytics may represent a way to personalize services, constantly seeking efficiency (Nambiar et al, 2013). Starting from different, heterogeneous data sources, the analyses show that the beneficial conditions are remarkable, such as the adjustment of online medical prescriptions, adaptation of health plans to the population's symptoms, evolution of diseases, enhancements in hospital treatments and operations, besides the reduction of healthcare expenses (Oussous et al, 2018).
Analytical techniques lead to big data analytics, whose range of tools and storage resources may contribute to its development. The authors Emani, Cullot and Nicolle (2015) point to a few techniques that stand out due to data variety: • Association rules: seeking relationships between entities • Machine learning: learning complex patterns and making intelligent, machine-based decisions • Data mining: developing statistical combinations, machine learning, and database management • Cluster analysis: employed in unsupervised machine learning, its purpose is to divide the data into smaller groups with the same set of still-unknown characteristics • Crowdsourcing: employed in the collection of data, metadata, and/or resources seeking to enhance the current data semantics • Text analytics: analyzing large collections of texts to extract information, with the possibility of use for topic modeling, with the question-answer relation.
Besides the analytical techniques, the term analytics gathers data from management information systems, operational research, and statistics .
Notably, interested parties (including doctors, patients, healthcare organizations, governments, and universities) are in some way invested in maximizing the potential of big data-based health analytics. However, due to the lack of qualified professionals, they often resort to data management automation from other organizations, which hinders quick responses to existing demands .
In addition to the hardware and software requirements, health analytics through big data analytics requires qualified professionals capable of integrating data from several sources and applying the analytical techniques that allow information extraction.

Methodology
The systematic review of the literature uses the adequacy of the Prisma and Methodi Ordinatio methodologies, with materials already published and found through a search in the databases.
A search using keywords without time restrictions due to the limited phenomenon of the study.
The results included published articles, which were identified to verify whether they met the research question and the phenomenon studied. The intention of the systematic literature review is to find variables that can be analyzed in a given context, taking into account the research question. Developed by Pagani, Kovaleski, Resende, (2015), the Methodi Ordinatio it was applied to classify articles by scientific research, taking into account the impact variables, year of publication and number of citations.
Aiming to compose a portfolio of papers promoting new reflections, this study restricted itself to finding publications that specifically approach health analytics and big data analytics. Its strategy was based on stages that concisely enabled the fulfillment of the research problem: 1st stage: establishing the research purposeunderstanding the concepts and evolution of big data analytics applied to health analytics.
2nd stage: defining the keyword combinationhealth analytics, big data analytics. 5th stage: selecting the publication typespapers and conference papers.
The following section exhibits the papers found, discussing the techniques employed and the contexts to which they were applied, in addition to how the big data analytics process was established with a link to health analytics.

Results
Systematizing the relevant findings, the searches on the Science Direct, Scopus and Web of Science databases returned, respectively, 21, 14 and 16 publications that approach the theme at hand. Checking for duplicate results led to the exclusion of 12 works, reaching a portfolio of 37 publications, one of which removed because it was not presented in the event to which it was submitted. Final search in the databases and gross results are described in Figure 1.
Afterward, each study was analyzed to scertain specific: Afterward, each study was analyzed to scertain specific information, namely: •

Number of publications per year;
•

Main authors and their publications;
• Countries that publish about the theme; • Main journals; • Field of the study object.
Records identified through database searching (n = 51)

Included Eligibility Identification
Additional records identified through other sources (n = 0) Records after duplicates removed (n = 12) Records screened (n = 39)

Studies included in quantitative synthesis
(meta-analysis) (n =37) Graph 1 presents the number of publications per year on the Scopus database: Graph 1 -Number of publications per year Source: research data (2020) Graph 2 displays the main authors and their number of publications.

Graph 2 -Main authors and their number of publications
Source: research data (2020) Graph 3 presents the main fields of research development that had studies on the theme published by international journals, pointing to the percentage corresponding to each field.
Graph 3 -Fields of the study objects

Main authors and number of publications
Source: research data (2020)

Discussion
In an initial analysis, the publications found illustrate that the theme itself has been significantly approached over the last five years, highlighting its current relevance due to its relation to computer science. The technological evolution of the use of analytical techniques and big data accompanies fields like engineering, business, and medicine. However, the number of studies decreases considerably when it comes to big data analytics in the health field. Galetsi, Katsaliaki, Kumar, (2020), in a study published in the International Journal of Information Management, called Big data analysis in the health sector, theoretical framework, techniques and perspectives presented a systematic overview of the literature in order to determine the how big data analities have been able to improve the health domain, supported by resource-based theory to identify the sources of big data and the analytical techniques that allow big data capabilities to create values, using content analysis, present the most popular analytical techniques that scientists use to make meaningful interpretations of data are: modeling, machine learning, data mining, visualization and statistical analysis. It is a database of the National Health Service that provides valuable information about human health divided into data related to demographics, health, and diseases.
The authors sought to develop the analytical technique of machine learning geared toward the main determining factors of mental illnesses, with a specific focus on depression. Other analytical challenges included resource selection, data harmonization, and data analysis.
The analysis of the information from the Biobank employed demographic, clinical, biological, and genomic data, as well as data from questionnaires and images, specifically magnetic resonance, comprising 3,297 morphometric measurements. Afterward, it developed a decision support system to detect and prevent common diseases.
Publishing at the Proceedings -IEEE Symposium on Computer-Based Medical Systems, Concerned with the service quality and the efficiency and sustainability of the healthcare and social assistance systems, big data analytics and its technologies can potentially process and analyze these data in order to produce relevant ideas and aid the decision-making process.
Report Their purpose was to present a methodology to approach big data analytics geared toward integrated care-taking.
At the same IEEE Symposium mentioned above, Balaji, Patil and Macgregor, (2017) presented the Artemis project applied to a hospital in Toronto, Canada, involving big data analytics employed to feed the system of clinical decision support. The authors revealed opportunities and challenges related to intensive pediatric care, seeking to improve medical care in lowresource locations based on remote, real-time patient monitoring.
Personalized healthcare based on health policies and big data analytics applied to holistic records were the focus of the study conducted by Kbioassist, (2017) work environment that provides analytical experiments and complex analyses of health data, aiming for evidence-based decision-making. The authors reported the case of a pathology laboratory, whose performance is described in operational terms.
Published in the journal Advances in Intelligent Systems and Computing, the paper of Kakhki, Singh and Loyd, (2015) focused on patient activation, that is, patients who perform an active role in their health management and care with the perception of confidence in their capability to manage their health. The authors discussed this aspect considering big data analytics to identify, monitor, and enhance the health organization in patient activation, promoting health analytics.
Considering the mobile technologies utilized by individuals,  associated the analysis of behavioral data as a method through which personalized healthcare and monitoring become tools to improve the quality of healthcare. Employing mobile phones, the behavioral analysis developed a social sensor fed automatically with daily data, which were analyzed through techniques of machine learning, data mining, and predictive analysis. These techniques identify atypical behaviors and the initiatives that health centers may develop.
The same authors also highlighted, in another publication with the same research object, that reality mining enables monitoring, a method that may follow the aging of the population and reduce impacts in medical care cases. Reality mining contributes to the predictability of a population in the event of an epidemic.
Comprising diverse techniques in different contexts, big data analytics applied to health analytics may significantly contribute to stages like diagnosis, treatment, and monitoring of several diseases. Information technology improves medical decisions in terms of time, cost, and options, considering the relations between data produced and inferred from different sources, enabling a better way to conduct and manage healthcare.

Final considerations
Fulfilling its purpose, this paper explored the concepts and evolution of health analytics based on big data analytics. Nations that have a wide-coverage healthcare system may employ big data analytical techniques to guide health analytics focusing on the operational and strategic aspects of health policies.
The examined publications revealed significant aspects represented by technology, applied context, and analytical techniques, as well as their relation to patient treatment and management, providing healthcare professionals with analyses of cause, risk, prediction, electronic records, and decision-making support.
The contexts to which big data analysis was applied included hospitals, pathology laboratories, patient activation management, integrated care, government health databases, holistic health records and electronic health records.
The analysis of all publications indexed in the Science direct, Scopus and Web of Science database with the keywords health analytics and big data analytics showed that the number of publications is small due to the complexity of the problem, however an evolution in the investigations is being perceived from 2018, a limitation presented is the behavior of keywords in the databases containing themselves in the analytical context, in addition, few articles were published in journals with a high impact factor, which is an opportunity for researchers develop new investigations on the subject.
The challenge of health analytics based on big data analytics lies in the complexity of data from different sources. However, it is possible to establish health analytics in systems that complement the central one, enabling the application of analytical techniques capable of influencing the time of treatment and the decisions involved in patient management.