Use file formats that are open, standard, and well documented.
Stable file formats are highly unlikely to become obsolete, orphaned, or subject to abandonware, in which software or hardware is no longer maintained by its creator.
The table below outlines data types and stable, preferred file format examples.
Data type
Preferred file format examples
Containers
TAR, GZIP, ZIP
Databases
XML, CSV
Geospatial
SHP, DBF, GeoTIFF, NetCDF
Moving images
MOV, MPEG, AVI, MXF
Sounds
WAVE, AIFF, MP3, MXF
Statistics
ASCII, DTA, POR, SAS, SAV
Still images
TIFF, JPEG2000, PDF, PNG, GIF, BMP
Tabular data
CSV
Text
XML, PDF/A, HTML, ASCII, UTF-8
Web archive
WARC
Using stable file formats makes your data more replicable, more easily combined with other datasets, and has a much higher likelihood of being accessed in the future. Stable file formats have a long history of access and use. Some file formats even predate personal computers – data in the form of comma-separated values was supported as early as 1972 (as “list-directed input/output).
Stable file formats are those that are:
-
Non-proprietary. Non-proprietary file formats are usable by many different operating systems and different versions of operating systems, and are not restricted by a specific software or manufacturer. When working with proprietary software, you may have to choose to export your data into a stable file format.
-
Uncompressed. Compression algorithms modify your data in order to make files smaller by rounding off bits of ‘nonessential’ information. Low-quality images or sounds could impact how your data is analyzed and the results of your work. Working with a raw format and sharing and saving only the compressed formats could mean that your work is no longer reproducible.
-
Unencrypted. Encryption algorithms can change or be lost — rendering your data inoperable.