Research data: basis of future research analytics

doi:10.15406/mojph.2021.10.00356

MOJ

eISSN: 2379-6383

Public Health

Mini Review Volume 10 Issue 2

Research data: basis of future research analytics

Sathish Muthu,^1,2

Verify Captcha

Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.

Madhan Jeyaraman,^2,3 Girinivasan Chellamuthu²

¹Department of Orthopaedics, Government Medical College & Hospital, Dindigul, Tamil Nadu, India
²Research Associate, Orthopaedic Research Group, Coimbatore, Tamil Nadu, India
³Department of Orthopaedics, School of Medical Science & Research, Sharda University, New Delhi, India

Correspondence: Sathish Muthu, Department of Orthopaedics, Government Medical College & Hospital, Dindigul, Tamil Nadu, India, Tel +91 9600856806

Received: December 08, 2020 | Published: April 8, 2021

Citation: Muthu S, Jeyaraman M, Chellamuthu G. Research data: basis of future research analytics. MOJ Public Health. 2021;10(2):35-38. DOI: 10.15406/mojph.2021.10.00356

Download PDF

Abstract

Living in an era of data explosion, there is an urgent need for making high-quality scholarly research data available for technologies like Artificial Intelligence(AI) to make something more out of it, other than publications. AI being portrayed as the sentience of machines, with more computational power and advances analytical methods has the potential to deliver personalised health care to individuals with greater probability. Although various data points such as electronic medical records, insurance company records, financial records, etc. can be used for such purposes research data remains the key building block based on which the deep learning algorithms can generate meaningful results. Hence, we discuss the significance, need and various methods for making high-quality scholarly research data available for the future to identify intricate and potentially unknown patterns hidden in them by harnessing the potential of AI. We also recommend research data deposition to be made a necessary pre-requisite before the publication of the results derived out of them.

Introduction

Any research output that has been collected, observed and analysed to arrive at a result constitutes the research data.¹ Research data goes beyond the entries made in the spreadsheet. Research data includes the raw inputs, processed data, algorithms, protocols, methods, materials, photographs, etc. It is an essential component of research which is need for the reproduction of a given scientific output. Living in an era of data explosion, there is an urgent need for making high-quality scholarly research data available for technologies like Artificial Intelligence(AI) to make something more out of it, other than publications. AI being portrayed as the sentience of machines, with more computational power and advances analytical methods has the potential to deliver personalised health care to individuals with greater probability.²Although various data points such as electronic medical records, insurance company records, financial records, etc. can be used for such purposes research data remains the key building block based on which the deep learning algorithms can generate meaningful results.^3,4 Hence, we discuss the significance, need and various methods for making high-quality scholarly research data available for the future to identify intricate and potentially unknown patterns hidden in them by harnessing the potential of AI.

Research data management

If one analyses the lifecycle of research data, it is neither static nor isolated. The lifecycle of data does not end with its creation, processing, analysis, representation, and publication but it also includes its preservation and availability for verification and reuse in the future as shown in Figure 1.⁵ Data management is an efficient way of handling data along its lifecycle to ensure that the data is collected in a way that is understandable so that it can be used by other researchers to test its validity or to re-analyse from a different perspective. The most critical part of the data management is to preserve and make the data available to others by providing access to data deposition made in discipline-specific repositories.Publications are no longer considered as the output of research, but data in itself is being considered as the important output of research. This blurs the line between publications and data which leads to an increasing number of data journals like Scientific Data from Nature,⁶ GigaScience from Oxford Academic⁷ which remain as data banks for future research analysis.

Figure 1 Data life cycle.

Why deposit research data?

Orthopaedic surgery has evolved with various techniques and technologies developed in recent decades but high-quality evidence to support their usage in everyday practise is lacking due to various ethical and cost concerns. This gives the necessary ground for solutions derived from deep learning approaches of AI. Although multiple clinical data registries maintain high-quality health care data, essential data on the current research that is being published remains a critical element of analysis through AI. Hence, Research data sharing remains the way forward for scientific progress. Research data sharing allows for the validation, replication, re-analysis, re-interpretation, new analysis or inclusion into meta-analysis. It increases the reproducibility and credibility of the research.⁸It increases the value of the investment made in funding scientific research. It also reduces the burden of the authors in managing data access requests. By linking the research data in the associated publication, it increases the visibility and ensures greater recognition.

Principles of human research data deposition

Appropriate ethical committee approval along with patient consent following all applicable local laws must be sought before sharing patient-related data in public domains.⁹ Data sharing should never compromise participant privacy. Data that result in the identification of the participant such as name, physical address, birth dates, contact information, etc. should not be included in research data deposition. Even data that does not directly identify the participant may also be inappropriate when they are used in combination such as data from a small group of vulnerable populations or private groups. Steps necessary to de-identify the research data towards the participant identification is always recommended. Various guidelines have been put forth on these grounds by national and international agencies on research data deposition.^10–16

Methods of data deposition

Data repository

All the research data and the related metadata for the reported findings are better managed by deposition in a data repository. It can be deposited in a specialty-specific repository that accepts specific structured data types or cross-disciplinary repositories that accepts various data types. However, generalisation from cross-disciplinary repositories remains challenging making specialty-specific repositories as the ideal mode of data deposition.¹⁷

Supporting files

Although repositories are the preferred method of research data availability, authors can also provide the research data as a supporting file linked to the research publication. Authors should use formats that are standard to their discipline to allow wide dissemination.

Choosing a data repository

For the management of research data, data repositories remain the most preferred method of data deposition. FAIR data principles provide the necessary guidelines in the selection of an ideal data repository which is a critical step to achieve the goals of data deposition.¹⁸

Findable

To make sure others can find our data, we must ensure that it is hosted by a stable recognised repository which assigns a globally unique persistent identifier such as DOI to your research data so that it is findable for future human and machine use. To ensure the findability of our research data, all the necessary fields that contribute to the metadata records must be filled.

Accessible

Granting access to medical research data has its ethical concerns and hence open sharing may not be possible all the time. However, specific research data supporting the publication can be made available with an appropriate level of security.

Interoperable

For an integrative analysis by humans and machines, data deposition must be made in an open file format using standard vocabulary. Specific file formats and vocabularies are dictated by disciple-specific repositories to maintain the interoperability of the research data.

Reusable

Research data that is made findable, accessible and interoperable is always fit for reuse. Sometimes additional documentation may be required alongside to make the data understandable and thus reusable to anyone who is not familiar with the data that is being provided. The sample list of discipline-specific and inter-disciplinary repositories available for data deposition for research involving the orthopaedic spine surgery is shown in Table 1. There are various registries available like FAIRsharing¹⁹ and re3data²⁰which give information on the data repositories available based on the discipline of choice along with the list of journals supporting their use.

Data Repository	How article and data are linked	Dataset Size Limits	Repository URL
Data Repository	How article and data are linked	Dataset Size Limits	Repository URL
Discipline Specific Repository - Spine:
ClinicalTrials.gov (NCT)	Authors should specify NCT accession numbers	1 GB per dataset	http://clinicaltrials.gov/
Neuroimaging Informatics Tools and Resources Collaboratory (NITRC)	Authors should specify NITRC accession numbers.	Image Repository	http://www.nitrc.org/
Neuroscience Information Framework (NIF)	Authors should mention Research Resource IDentifier (RRID)	Image/Dataset Repository	http://www.neuinfo.org/
OpenNeuro	Authors should specify OpenNeuro accession numbers	Image Repository	http://www.openneuro.org
Inter-disciplinary Repository:
Mendeley Data	Mendeley Data banners will be shown on ScienceDirect when the repository has data for the article	10 GB per dataset	https://data.mendeley.com/
Harvard Dataverse	Some journals have a dedicated Dataverse repository set up for authors to upload their data that belongs with the article. Authors should include the dataset DOI in the article.	10 GB per dataset	https://dataverse.harvard.edu/
Figshare	Authors should include the dataset DOI in the article.	1 TB per dataset	http://figshare.com/
Dryad Digital Repository	Authors should include the dataset DOI in the article.	300 GB per dataset	https://datadryad.org/stash
Open Science Framework	Authors should include the dataset DOI in the article.	5 GB per dataset	https://osf.io/
Zenodo	Authors should include the dataset DOI in the article.	50 GB per dataset	https://zenodo.org/

Table 1 Sample list of discipline specific and inter-disciplinary repositories available for data deposition for research involving spine
URL, Uniform resource locator; GB, Giga Byte; TB, Tera Byte

Concerns in data deposition

If the data is restricted for public deposition due to ethical or security reasons, only restricted access can be given to the researchers and reviewers under specific conditions. Research data deposition has its data protection issues which need to be given adequate attention. Sharing of data over the internet may be a concern when it is too large to be feasibly hosted by a repository which needs to be sorted on a case to case basis.²¹ In case of data obtained from a third party, further restrictions apply to the availability of the research data.

Challenges in orthopaedic surgery

Even if data deposition is made mandatory for research publications in orthopaedics, there are certain challenges ahead for making them useful for AI-based analysis. First, uniform appropriately labelled dataset templates have to be established for universal use in orthopaedic surgery research.²² Second, research in the orthopaedics frequently involves image-based analysis which needs manual labeling of the data for classification for machine learning to occur.^23–25 Although unsupervised algorithms were developed to allow the ML models to analyse and classify such image-based data, with poor quality and quantity of the training datasets for the ML algorithms, there are chances of erroneous decisions thereby reducing the validity of their decisions.²⁶ Finally, many AI-based algorithms are trained and validated for use within an institution and hence its transferability into for universal application so that it undergoes continuous learning and evolution from the new datasets available remains a challenge.²⁷

Directions for the future

With the advancement in the field of Artificial Intelligence(AI), with appropriate research data availability, computer-based algorithms can perform intricate and extremely complex analysis to detect potential previously unknown patterns in them. Machine learning(ML) is one such advancement of AI which is based on artificial neural networks that involve the construction and application of statistical algorithms that make observations from the existing data and continuously learn to create a predictive model based on the data. There are various ML-based models developed to assist surgeons in decision making^28–30 and predicting outcomes of treatment offered and estimating their probability of failure^31–34 on an individual basis. The potential and the probability of the generated conclusions are increased with the availability of baseline high-quality research data.³⁵ Hence, the deposition of research data must be considered as an essential step in every research publication to extend the scope of the research beyond its limits.

Conclusion

With the continuous evolution in the computational capacity of AI, high-quality scholarly data remain the essential prerequisite for increasing the validity of their predictive outcomes. While this technology is still in its infancy, preventing its full-fledged integration and implementation into the health care system, making the necessary baseline dataset by research data deposition would help to harness their potential towards patient care in near future. Hence, we recommend research data deposition to be made a necessary pre-requisite before the publication of the results derived out of them.

Acknowledgments

The authors esteem and appreciate Miss Helen Nmesomachi Wokem for having pain stakingly done the word processing of the manuscript and other formatting necessary for the finishing.