Recommended file formats

All digital file formats risk becoming obsolete over time. If this happens, future software may no longer be able to read or display the file contents correctly. To reduce the risk of files becoming unreadable, researchers should choose file formats that are likely to remain usable in the long term. Suitable formats for sharing and long-term preservation should be:

  • common and widely used
  • well-documented (meaning: with an available technical specification explaining how the format stores information)
  • open or non-proprietary.

In some cases, data may be stored in a format that is not the most open option but that is a recognized standard within a specific research domain – in other words, a format commonly used and supported by most researchers in the field. In such cases, it may be advisable to publish the data both in the format that is widely used within the discipline and, where possible, in a format more suitable for long-term preservation.

The same considerations apply to the long-term preservation of data as to data publication. To reduce the risk of files becoming unreadable, choose formats that are likely to be supported in the future. It is preferable that the format can be opened with a range of different software applications. 

The Swedish National Archives has published a set of criteria for archival formats, which can also be used as guidance for selecting formats suitable for long-term preservation. (The information is currently in Swedish only.)

Recommended file formats in DORIS

SND has evaluated and compiled a list of formats suitable for research data intended for publication in a research data catalogue. The recommended formats are divided into two categories: formats for sharing and formats for long-term archival preservation

The formats recommended for sharing are widely used and compatible with open-source software. This list also includes some proprietary formats that meet these criteria.

The table also contains suggested suitable formats for long-term preservation in archives or similar environments. Preserving data in these formats allows for future analysis using different software, even if all features of the original format may not be retained.

Note: Data can be shared in more than one format – for example, one format optimised for sharing and immediate reuse by other researchers, and another format optimised for long-term preservation. When converting files between formats, it is important to retain the original files and include them in the preserved dataset.

More information and file format guides on Researchdata.se

For more information, see the file format pages on Researchdata.se. There you will also find links to SND’s dedicated file format guides. Where available, the table below includes links to relevant pages on Researchdata.se.

If you are reviewing data provided in formats not listed in the table, please contact the SND office at snd@snd.se for advice on suitable approaches for sharing and long-term preservation.

Suggested formats

– for sharing

– for long-term preservation

Text

  • ASCII (.txt), Unicode (.txt) 
  • MS Word (.docx) 
  • OpenDocument Text (.odt) 
  • PDF (.pdf), PDF/A (.pdf) 
  • HTML (.html)  
  • Markdown (.md)  
  • XML (.xml) 
  • SGML (.sgml) 
  • Rich Text Format (.rtf) 
  • ASCII (.txt), Unicode (.txt) 
  • MS Word (.docx) 
  • OpenDocument Text (.odt) 
  • PDF/A (.pdf) 
  • HTML (.html)
  • Markdown (.md) 
  • XML (.xml) 

Databases, spreadsheets, and statistical data

  • Microsoft Excel, formally Office Open XML Workbook format, (.xlsx),  
  • OpenDocument Spreadsheet (.ods)  
  • Delimited text format (usually called .csv or .tsv)  
  • SQL syntax in a text file (.sql) 
  • SIARD (.siard) 
  • SQLite (usually called .db, .db3, .sqlite) 
  • SPSS (.sav, .por) 
  • STATA (.dta) 
  • R (.rdata, .rda) 
  • Microsoft Excel, formally Office Open XML Workbook format, (.xlxs),  
  • OpenDocument Spreadsheet (.ods)  
  • Delimited text (usually called .csv or .tsv)  
  • SIARD (.siard) 
  • SQLite (usually called .sql, .db, .sqlite) 
  • R (.rdata, .rda) 

Images 

Raster images

  • TIFF (.tif)
  • JPEG2000 (.jp2) PNG (.png) 
  • JPEG (.jpg) 

Vector images

  • Scalable Vector Graphics (.svg) 

Raster images

  • TIFF (.tif)
  • JPEG2000 (.jp2) PNG (.png) 
  • JPEG (.jpg) 

Vector images

  • Scalable Vector Graphics (.svg) 

Video

  • Lossless AVI (.avi) 
  • Matroska (.mkv) 
  • MPEG-1 (.mpg, .mpeg, …) 
  • MPEG-2 (.mpg, .mpeg, …) 
  • MPEG-4 H.264 (.mp4) 
  • MPEG-4 Part 14/MP4 (.mp4) 
  • QuickTime File Format QTFF (.mov)
  • QTFF (.mov)  
  • Lossless AVI (.avi) 
  • Matroska (.mkv) 
  • MPEG-1 (.mpg, .mpeg, …) 
  • MPEG-2 (.mpg, .mpeg, …) 
  • MPEG-4 H.264 (.mp4) 
  • MPEG-4 Part 14/MP4 (.mp4) 

Audio

  • Waveform Audio (.wav) 
  • Broadcast Wave Format (.bwf) 
  • Audio Interchange File Format (.aif, .aiff) 
  • Free Lossless Audio Codec(.flac) 
  • Matroska (.mka) 
  • MPEG-1, MPEG-2 (.mpg, .mpeg, …) 
  • MPEG-1 Audio Layer III (.mp3) 
  • Advanced Audio Coding (.aac) 
  • Ogg Vorbis (.ogg) 
  • Waveform Audio (.wav) 
  • Broadcast Wave Format (.bwf) 
  • Audio Interchange File Format (.aif, .aiff) 
  • Free Lossless Audio Codec(.flac) 
  • Matroska (.mka) 
  • MPEG-1, MPEG-2 (.mpg, .mpeg, …) 

Spatial data

  • OGC GeoPackage (.gpkg)  
  • ESRI Shapefile (.shp)  
  • GeoJSON (.geojson)  
  • Keyhole Markup Language (.kml)  
  • GeoTIFF (.tif, .tiff)  
  • ESRI GRID (.adf, .asc, .grd)  
  • Digital Elevation Model (DEM) Format (.dem)  
  • Geographic Markup Language (.gml)  
  • NetCDF (.nc)  
  • MapInfo (.tab, .dat)  
  • MapInfo Interchange Format (.mif, .mid)  
  • CSV (.csv) 
  • OGC GeoPackage (.gpkg)  
  • ESRI Shapefile (.shp)  
  • GeoJSON (.geojson)  
  • Keyhole Markup Language (.kml)  
  • GeoTIFF (.tif, .tiff)  
  • Digital Elevation Model (DEM) Format (.dem)  
  • NetCDF (.nc)  
  • CSV (.csv) 

Photogrammetry and 3D data 

  • Wavefront OBJ (.obj)
  • X3D (.x3d) –the ASCII version 
  • AutoCAD DXF (.dxf) 
  • COLLADA (.dae) 
  • Stanford PLY (.ply)  
  • Universal 3D Format (.u3d) 
  • VRML (.vrml)  
  • Filmbox File (.fbx)
  • CSV (for point clouds) 
  • STL (STereoLithography format, for triangular facets) 
  • Wavefront OBJ (.obj) 
  • X3D (.x3d) – the ASCII version 

Markup language

  • HTML (.html)
  • JSON (.json)
  • XML (.xml)

    RDF

    • W3C standards