What data formats are available?
When you request downloaded data from the Data Portal, you have three file format optionsfrom which to choose.
NetCDF (.nc)
NetCDF files are perhaps the most useful and popular format used by ocean scientists. They can include a large number of variables in the same compressed file, along with a lot of associated metadata. Because of this, they are sometimes referred to as self-documenting data files.
If you are going to request more than a few days of data, especially from cabled streamed instruments, and you plan on using Matlab, R, or Python to analyze your data, this is probably the best format to choose.
For more information on how to use these data files, please check out our tutorials on how to use NetCDF data files in Matlab or Python (coming soon).
You can also use NASA’s free Panoply software tool to investigate the contents of a NetCDF file. Panoply allows you to quickly see a list of all variables in the file, as well as the metadata for each variable, and the global attributes (metadata) for the file. You can also use the tool to create quick plots of the included data.
CSV (.csv)
CSV (comma separated variable) files are another common format used by many oceanographers because they can be easily opened in programs like Excel and Matlab. The files themselves are just text files, which means you can also read them in a basic text editor or view them on the command line.
However, because they are simple text files, they also tend to be very large. You can compress them using utilities like tar or gzip for transferring between computers, but generally they need to be uncompressed to work with.
For example, if you request one week of CTD data from the Oregon Offshore Cabled Shallow Profiler Mooring (which is a cabled instrument with high resolution data), you will have approximately four 50MB files in your response. Note, that the Data Portal automatically splits CSV data files so they aren’t any larger than 50MB, because many programs can’t handle larger files.
- You can easily examine most CSV files in Excel simply by opening them.
- Matlab can also read CSV files using load() or csvread() but these functions require the data file to contain only numeric values. The data files produced by the Data Portal include text strings, so you will need to use the newer readtable() function or ezread() from the Matlab File Exchange.
- In Python, you can use csv.DictReader() or pandas.read_csv().
JSON (.json)
JSON (JavaScript Object Notation) files are a relatively newer format that contain data in structured objects that are easily read by JavaScript or Python code. Unlike CSV files, they can contain structured data so that the data files can include metadata along with each variable or measurement, like NetCDF files. However, JSON files are text-based like CSV files, so they can easily become very large. Thus you will likely only want to use them when requesting data for very short time frames or when you know there are only a few data points.
In Python, you can use json.load() or pandas.read_json() to read in a JSON file.
— Last revised on June 30, 2017 —