How do I download data from the Data Portal?
- Navigate to the data catalog page or plot data from a data stream of interest
- Click the download icon
- Choose a time range, email address to which the data will be delivered, and format (NetCDF, JSON, or CSV)
- If you click “include provenance” you will receive additional .json files that include provenance information (i.e. how the data were created: algorithm and calibration information, data source, and any functions applied to the data). These can be opened and read using a text editor or custom scripts.
- If you click “Include Annotations” you will receive a file containing any information about the data added by the data evaluation team, including events or errors that were noted during the deployment period.
- .csv files will not include the full header information included in NetCDF files, only the data and column labels
- Once you click “Download”, you should see a confirmation on screen saying “Your request has been received!” and indicating the email address to which your data link will be sent.
- You will receive an email to your user account email address when your download is complete, usually within 30 seconds to a minute. Larger downloads (e.g. over a year of data) will take longer.
- The email you receive will come from email@example.com; if you do not receive your email in a reasonable amount of time, check your spam folder and make sure that this address is in your “safe senders” list.
- The email will contain two links: one leads to a THREDDS server, which is optimized to provide several means of parsing, viewing, and aggregating NetCDF files. The other link leads to a public Apache HTTP server, which is the optimal way to download .txt, .json, or .csv files (i.e. any file that doesn’t end in “.nc” or “.ncml”).
- Your user folder will contain a data directory named using the time you sent your request and the instrument and data stream requested.
- The data directory contains multiple NetCDF files (.nc), because the OOI software worker threads split the data requests into smaller chunks in order to return the data faster. The directory will also contain .json provenance files (if you checked that option), text confirmation files that are created when the system is fulfilling the data request, and an .ncml file that allows aggregation of the individual NetCDF files. Any files labeled “failure.json” indicate that no valid data or calibration values were found within the time range of that particular chunk of data that the worker thread was assigned.
- The data directory may contain data from an instrument you did not select – this will only occur if data from that associated instrument was used to build an L2 data product, which requires data from two or more separate data streams (e.g. corrected current speed from a Bulk Meteorology instrument, which requires information from an associated current meter as well as the Met sensor)
- To download individual files, either go to the Apache server and click on a file, or go to the THREDDS server, click on a file, and choose “HTTPServer” which should immediately start the download process.
- To aggregate all data from multiple individual .nc files into a single larger file, go to the THREDDS server, click on the .ncml file, and choose “NetcdfSubset” (NCSS; last link under the “Access” heading). This will load a new page that displays the NetCDF Subset Service options. Choose a time range and the variables you would like to output, which can be downloaded in NetCDF-3 or .csv format using the “Output Format” pulldown menu in the lower right. Depending on the size of the download and number of .nc files being aggregated, the subset service may take a long time to load (up to 5 minutes for datasets over 1 Gb). If the service is taking an unreasonable amount of time or if it times out, choose a shorter time span to aggregate or use a custom script to download and aggregate the data.
- If you have any issues or encounter an unexpected error, contact the Help Desk.
— Last revised on August 15, 2016 —