Mini Review Volume 10 Issue 2
1Department of Orthopaedics, Government Medical College & Hospital, Dindigul, Tamil Nadu, India
2Research Associate, Orthopaedic Research Group, Coimbatore, Tamil Nadu, India
3Department of Orthopaedics, School of Medical Science & Research, Sharda University, New Delhi, India
Correspondence: Sathish Muthu, Department of Orthopaedics, Government Medical College & Hospital, Dindigul, Tamil Nadu, India, Tel +91 9600856806
Received: December 08, 2020 | Published: April 8, 2021
Citation: Muthu S, Jeyaraman M, Chellamuthu G. Research data: basis of future research analytics. MOJ Public Health. 2021;10(2):35-38. DOI: 10.15406/mojph.2021.10.00356
Living in an era of data explosion, there is an urgent need for making high-quality scholarly research data available for technologies like Artificial Intelligence(AI) to make something more out of it, other than publications. AI being portrayed as the sentience of machines, with more computational power and advances analytical methods has the potential to deliver personalised health care to individuals with greater probability. Although various data points such as electronic medical records, insurance company records, financial records, etc. can be used for such purposes research data remains the key building block based on which the deep learning algorithms can generate meaningful results. Hence, we discuss the significance, need and various methods for making high-quality scholarly research data available for the future to identify intricate and potentially unknown patterns hidden in them by harnessing the potential of AI. We also recommend research data deposition to be made a necessary pre-requisite before the publication of the results derived out of them.
Any research output that has been collected, observed and analysed to arrive at a result constitutes the research data.1 Research data goes beyond the entries made in the spreadsheet. Research data includes the raw inputs, processed data, algorithms, protocols, methods, materials, photographs, etc. It is an essential component of research which is need for the reproduction of a given scientific output. Living in an era of data explosion, there is an urgent need for making high-quality scholarly research data available for technologies like Artificial Intelligence(AI) to make something more out of it, other than publications. AI being portrayed as the sentience of machines, with more computational power and advances analytical methods has the potential to deliver personalised health care to individuals with greater probability.2 Although various data points such as electronic medical records, insurance company records, financial records, etc. can be used for such purposes research data remains the key building block based on which the deep learning algorithms can generate meaningful results.3,4 Hence, we discuss the significance, need and various methods for making high-quality scholarly research data available for the future to identify intricate and potentially unknown patterns hidden in them by harnessing the potential of AI.
Research data management
If one analyses the lifecycle of research data, it is neither static nor isolated. The lifecycle of data does not end with its creation, processing, analysis, representation, and publication but it also includes its preservation and availability for verification and reuse in the future as shown in Figure 1.5 Data management is an efficient way of handling data along its lifecycle to ensure that the data is collected in a way that is understandable so that it can be used by other researchers to test its validity or to re-analyse from a different perspective. The most critical part of the data management is to preserve and make the data available to others by providing access to data deposition made in discipline-specific repositories.Publications are no longer considered as the output of research, but data in itself is being considered as the important output of research. This blurs the line between publications and data which leads to an increasing number of data journals like Scientific Data from Nature,6 GigaScience from Oxford Academic7 which remain as data banks for future research analysis.
Why deposit research data?
Orthopaedic surgery has evolved with various techniques and technologies developed in recent decades but high-quality evidence to support their usage in everyday practise is lacking due to various ethical and cost concerns. This gives the necessary ground for solutions derived from deep learning approaches of AI. Although multiple clinical data registries maintain high-quality health care data, essential data on the current research that is being published remains a critical element of analysis through AI. Hence, Research data sharing remains the way forward for scientific progress. Research data sharing allows for the validation, replication, re-analysis, re-interpretation, new analysis or inclusion into meta-analysis. It increases the reproducibility and credibility of the research.8 It increases the value of the investment made in funding scientific research. It also reduces the burden of the authors in managing data access requests. By linking the research data in the associated publication, it increases the visibility and ensures greater recognition.
Principles of human research data deposition
Appropriate ethical committee approval along with patient consent following all applicable local laws must be sought before sharing patient-related data in public domains.9 Data sharing should never compromise participant privacy. Data that result in the identification of the participant such as name, physical address, birth dates, contact information, etc. should not be included in research data deposition. Even data that does not directly identify the participant may also be inappropriate when they are used in combination such as data from a small group of vulnerable populations or private groups. Steps necessary to de-identify the research data towards the participant identification is always recommended. Various guidelines have been put forth on these grounds by national and international agencies on research data deposition.10–16
Data repository
All the research data and the related metadata for the reported findings are better managed by deposition in a data repository. It can be deposited in a specialty-specific repository that accepts specific structured data types or cross-disciplinary repositories that accepts various data types. However, generalisation from cross-disciplinary repositories remains challenging making specialty-specific repositories as the ideal mode of data deposition.17
Supporting files
Although repositories are the preferred method of research data availability, authors can also provide the research data as a supporting file linked to the research publication. Authors should use formats that are standard to their discipline to allow wide dissemination.
For the management of research data, data repositories remain the most preferred method of data deposition. FAIR data principles provide the necessary guidelines in the selection of an ideal data repository which is a critical step to achieve the goals of data deposition.18
To make sure others can find our data, we must ensure that it is hosted by a stable recognised repository which assigns a globally unique persistent identifier such as DOI to your research data so that it is findable for future human and machine use. To ensure the findability of our research data, all the necessary fields that contribute to the metadata records must be filled.
Granting access to medical research data has its ethical concerns and hence open sharing may not be possible all the time. However, specific research data supporting the publication can be made available with an appropriate level of security.
For an integrative analysis by humans and machines, data deposition must be made in an open file format using standard vocabulary. Specific file formats and vocabularies are dictated by disciple-specific repositories to maintain the interoperability of the research data.
Research data that is made findable, accessible and interoperable is always fit for reuse. Sometimes additional documentation may be required alongside to make the data understandable and thus reusable to anyone who is not familiar with the data that is being provided. The sample list of discipline-specific and inter-disciplinary repositories available for data deposition for research involving the orthopaedic spine surgery is shown in Table 1. There are various registries available like FAIRsharing19 and re3data20 which give information on the data repositories available based on the discipline of choice along with the list of journals supporting their use.
Data Repository |
How article and data are linked |
Dataset Size Limits |
Repository URL |
|
Discipline Specific Repository - Spine: |
||||
ClinicalTrials.gov (NCT) |
Authors should specify NCT accession |
1 GB per dataset |
http://clinicaltrials.gov/ |
|
Neuroimaging Informatics Tools and Resources Collaboratory (NITRC) |
Authors should specify NITRC accession numbers. |
Image Repository |
http://www.nitrc.org/ |
|
Neuroscience Information Framework (NIF) |
Authors should mention Research Resource IDentifier (RRID) |
Image/Dataset Repository |
http://www.neuinfo.org/ |
|
OpenNeuro |
Authors should specify OpenNeuro accession numbers |
Image Repository |
http://www.openneuro.org |
|
Inter-disciplinary Repository: |
||||
Mendeley Data |
Mendeley Data banners will be shown on ScienceDirect when the repository has data |
10 GB per dataset |
https://data.mendeley.com/ |
|
Harvard Dataverse |
Some journals have a dedicated Dataverse repository set up for authors to upload their data that belongs with the article. Authors should include the dataset DOI in the article. |
10 GB per dataset |
https://dataverse.harvard.edu/ |
|
Figshare |
Authors should include the dataset DOI in the article. |
1 TB per dataset |
http://figshare.com/ |
|
Dryad Digital Repository |
Authors should include the dataset DOI in the article. |
300 GB per dataset |
https://datadryad.org/stash |
|
Open Science Framework |
Authors should include the dataset DOI in the article. |
5 GB per dataset |
https://osf.io/ |
|
Zenodo |
Authors should include the dataset DOI in the article. |
50 GB per dataset |
https://zenodo.org/ |
Table 1 Sample list of discipline specific and inter-disciplinary repositories available for data deposition for research involving spine
URL, Uniform resource locator; GB, Giga Byte; TB, Tera Byte
If the data is restricted for public deposition due to ethical or security reasons, only restricted access can be given to the researchers and reviewers under specific conditions. Research data deposition has its data protection issues which need to be given adequate attention. Sharing of data over the internet may be a concern when it is too large to be feasibly hosted by a repository which needs to be sorted on a case to case basis. 21 In case of data obtained from a third party, further restrictions apply to the availability of the research data.
Even if data deposition is made mandatory for research publications in orthopaedics, there are certain challenges ahead for making them useful for AI-based analysis. First, uniform appropriately labelled dataset templates have to be established for universal use in orthopaedic surgery research.22 Second, research in the orthopaedics frequently involves image-based analysis which needs manual labeling of the data for classification for machine learning to occur.23–25 Although unsupervised algorithms were developed to allow the ML models to analyse and classify such image-based data, with poor quality and quantity of the training datasets for the ML algorithms, there are chances of erroneous decisions thereby reducing the validity of their decisions.26 Finally, many AI-based algorithms are trained and validated for use within an institution and hence its transferability into for universal application so that it undergoes continuous learning and evolution from the new datasets available remains a challenge.27
With the advancement in the field of Artificial Intelligence(AI), with appropriate research data availability, computer-based algorithms can perform intricate and extremely complex analysis to detect potential previously unknown patterns in them. Machine learning(ML) is one such advancement of AI which is based on artificial neural networks that involve the construction and application of statistical algorithms that make observations from the existing data and continuously learn to create a predictive model based on the data. There are various ML-based models developed to assist surgeons in decision making28–30 and predicting outcomes of treatment offered and estimating their probability of failure31–34 on an individual basis. The potential and the probability of the generated conclusions are increased with the availability of baseline high-quality research data.35 Hence, the deposition of research data must be considered as an essential step in every research publication to extend the scope of the research beyond its limits.
With the continuous evolution in the computational capacity of AI, high-quality scholarly data remain the essential prerequisite for increasing the validity of their predictive outcomes. While this technology is still in its infancy, preventing its full-fledged integration and implementation into the health care system, making the necessary baseline dataset by research data deposition would help to harness their potential towards patient care in near future. Hence, we recommend research data deposition to be made a necessary pre-requisite before the publication of the results derived out of them.
The authors esteem and appreciate Miss Helen Nmesomachi Wokem for having pain stakingly done the word processing of the manuscript and other formatting necessary for the finishing.
The authors declare that there are no conflicts of interest.
None.
©2021 Muthu, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.