Review data

It is important that research data made available via DORIS can be understood and reused by others. Research data disseminated through DORIS must therefore undergo careful review.

Both the metadata and the data described in SND’s documentation system DORIS need to be reviewed before they can be published. Data descriptions in DORIS are handled by both researchers and reviewers. At present, SND also performs a final check of each incoming data description and dataset before publication, but this step will gradually be phased out.

Checking files and delivery

  1. Virus scan: Delivered files must not contain viruses. 
  2. Readability check: Delivered files must be possible to open and readable. 
  3. Completeness check: The delivery must include all the data intended for dissemination, along with the documentation necessary to reuse the data. 
  4. Format check: The delivered files should be in a suitable format for reuse and dissemination. If a file format is not considered suitable, it may be supplemented with file versions converted into a format better suited for sharing and/or long-term preservation.  

The material to be shared and/or described must be research data. This refers to digital material that can form the basis for scientific analysis, regardless of research field. Data not originally collected for research purposes can still be shared via DORIS if they may be used for scientific analysis – for example, long-term environmental monitoring data or various types of public authority data.

If the material is not considered to qualify as research data, the researcher should be contacted and referred to a more appropriate repository. Figures and tables containing supplementary information for research articles are not considered research data in terms of complete datasets, as supplementary material typically constitutes only a part of the full data. In some cases, a dataset may consist of only a few columns; in such cases, it is still better to make the data visible in a data description than to not share anything at all.

Download the data 

How you access the files depends on your organization’s storage solution in relation to DORIS.  

If the higher education institution does not have its own institution-specific storage connected to DORIS, the files are temporarily stored in SND’s storage, SND CARE, and you can download the files directly via DORIS. If the data are stored locally, your own institutional access routines apply. 

Currently, the limit for files uploaded through the data description form is 2 GB per file. Larger files are uploaded to SND CARE via an SFTP account. If a researcher needs help setting up an SFTP account, you (or the researcher) can contact SND at incoming@snd.se for assistance.

In cases involving larger data volumes, an automated check may be needed to ensure the delivery is complete. In such cases, tools that use file checksums are employed.

Note: If the researcher has already shared the data in another repository, portal, or website where the dataset has been assigned a DOI, they do not need to upload the data files again in DORIS. Instead, the researcher should create a data description that links to the published dataset via the DOI and provide only the metadata needed to make the dataset searchable on Researchdata.se. Your role as a reviewer is then primarily to review the metadata. However, if you have access to the data, it is useful to review the files as well, particularly to assess whether the documentation needs to be supplemented with additional information in DORIS.

Scan for viruses  

Use antivirus software to automatically scan the submitted files. If you suspect that any files contain a virus, they must be deleted or replaced. This should be done in consultation with the researcher who submitted them.

Note: Many combinations of data and/or code can trigger false positives in antivirus programs. You are encouraged to contact your local IT department for support with virus scanning tools.

Files stored in SND CARE have already been scanned for viruses.

Check readability and file format 

Review the data to ensure that they are understandable and reusable. Every dataset is unique, and the review process will therefore differ from case to case. 

Are all necessary files included?

It can sometimes be difficult to determine, based on the existing metadata in the form, whether all necessary files have been submitted. A dataset may include several files and if there is not enough information about the contents of the dataset, the researcher should be contacted for clarification on the various files and associated documentation.  

Researchers often submit only a portion of the data produced during a research project in a data description. As long as the material is reusable, the publication process can continue. However, researchers can be encouraged to share more than what is required by, for example, a journal, to support secondary users and enable new research questions.

Has the researcher uploaded all data related to the project/study, or are they planning to upload additional datasets later?

Are the files in a suitable format?

If the study includes many files, you may select to review a sample of them, but it is recommended that you open at least a few files of each file type. Also check the file formats. The Swedish National Archives in their FormatE project recommends certain criteria for archival file formats, and formats that meet these criteria can also be considered suitable for long-term preservation. Their information is currently only available in Swedish. 

Ensure that the files are free from irrelevant content (e.g., formatting or example variables that are unused or unrelated to the research results). Examples of such formatting include colour highlights, formulas, macros, or embedded items such as images or text boxes in Excel files. If such formatting must remain – for example, colour-coding in text data – it is important that the purpose of the formatting is clearly explained.

How are file and folder names and any databases structured?

In addition to being possible to open, files must be understandable, and it should be easy to navigate any folder structures or databases. Files and folders should be named in a consistent and meaningful way. This is especially important for large datasets with many files. File names, content, and the relationships between files may need to be explained in a supplementary documentation file. 

In most cases, the file names are those used by the researcher. What you should generally check is whether the names are reasonably self-explanatory and consistently applied. Keep in mind that file names are visible in the catalogue entry. 

Be cautious about changing file names without first checking with the researcher. There may be references to the file names in accompanying documentation, structured metadata, or even within the data files themselves. Note that changing file names can break code. All changes should therefore be approved by the researcher and, if necessary, documented in the Notes field in DORIS.

Important! Variable names and similar content within the data files should not be changed unless the researcher has been involved in the process.

Are the metadata sufficient for publication and reuse?

The minimum requirements for metadata for publication via DORIS are automatically met if all mandatory fields in the data description are completed. However, it is up to you as a reviewer to assess whether additional metadata is needed. You may be able to supplement the metadata with information from the accompanying documentation – but remember that what might seem like a large task for you may not be difficult or time-consuming for the researcher, or vice versa.

Is the documentation sufficient?

What qualifies as sufficient documentation varies depending on the material and may differ between disciplines and data types. Documentation is often essential for secondary users to understand the research material.