Top Menu

How do I use raw formatted data?

Please select an instrument type below to view more information about how to utilize that particular data format.


Broadband Hydrophone
Broadband Hydrophone (HYDBB) files are currently in .mseed format. Tools for opening those files can be found on the IRIS site (http://ds.iris.edu/ds). Eventually these files will be converted into FLAC and .wav format, which should be able to be opened and listened to using most audio player software.


Cabled Bioacoustic Sonar Data
Cabled bioacoustic sonar data (ZPLSC-B; modified Kongsberg EK-60 echosounders) are in vendor format .raw files. These files can be opened using EchoView or vendor software.


Cabled Deep Profiler (MMP)
You must have python 2.7 installed in order to run this package.

Download mi-dataset from the oceanobservatories github page. https://github.com/oceanobservatories/mi-dataset. There are two options for downloading the parsers. 1) Download from the site directly and unzip into a folder. 2) Use git to clone the repository (i.e. enter “git clone https://github.com/oceanobservatories/mi-dataset.git” in a terminal window)

To install mi-dataset, go to the working directory that you want to save the repository into. On unix systems this is using the cd command. On Windows systems, this is using the dir() command.

$ cd /Users/michaesm/Documents/
$ git clone https://github.com/oceanobservatories/mi-dataset.git
$ cd mi-dataset/
$ pip install -r requirements.txt
$ pip install msgpack-python 
$ pip install . 

This will install the mi-dataset package in python

To use mi-dataset, there is a utility included in the mi-dataset package that takes a driver and list of raw data files as inputs and parses the files into a few different formats (csv, json, pd-pickle (pandas dataframe in python), and xr-pickle (xarray dataset in python). The –fmt flag selects the file format to save the raw data to. The –out flag selects the directory that you want to save the raw data files to.

$ python utils/parse_file.py --help
Usage: parse_file.py [OPTIONS] DRIVER [FILES]...
Options:
  --fmt [csv|json|pd-pickle|xr-pickle]
  --out PATH
  --help                          Show this message and exit.

A good way to decide which driver is needed to parse a specific raw data file is to check the ooi-integration csv files. Once you find the raw data directory that you would like to parse in that ingestion CSV, you will need to grab the ‘uframe_route’ on the left. A lookup table exists at https://github.com/ooi-data-review/parse_spring_files/blob/master/uframe_routes.csv which maps the ‘uframe_route’ to the proper driver that you will need to run in mi-dataset.

Here is an example of how to parse the following Ingestion CSV: https://github.com/ooi-integration/ingestion-csvs/blob/master/CE09OSSM/CE09OSSM_R00001_ingest.csv#L17

On line 17 of the above csv file, you can see the following row

uframe_route, filename_mask, reference_designator, data_source
Ingest.ctdbp-cdef-dcl_recovered, /omc_data/whoi/OMC/CE09OSSM/R00001/cg_data/dcl27/ctdbp1/*.ctdbp1.log, CE09OSSM-RID27-03-CTDBPC000, recovered_host


The raw data for the reference designator CE09OSSM-RID27-03-CTDBPC000 from the recovered deployment #1 for the Washington Offshore Surface Mooring (CE09OSSM) is located at: /omc_data/whoi/OMC/CE09OSSM/R00001/cg_data/dcl27/ctdbp1/*.ctdbp1.log

This directory corresponds to the web directory at <a href="https://rawdata.oceanobservatories.org/files/CE09OSSM/R00001/">https://rawdata.oceanobservatories.org/files/CE09OSSM/R00001/</a>. Download the raw data files that you want to look at from the previous link. 

For this example, I downloaded the file at <a href="https://rawdata.oceanobservatories.org/files/CE09OSSM/R00001/cg_data/dcl27/ctdbp1/20150412.ctdbp1.log">https://rawdata.oceanobservatories.org/files/CE09OSSM/R00001/cg_data/dcl27/ctdbp1/20150412.ctdbp1.log</a>

Next you would look go to the lookup table at <a href="https://github.com/ooi-data-review/parse_spring_files/blob/master/uframe_routes.csv">https://github.com/ooi-data-review/parse_spring_files/blob/master/uframe_routes.csv</a> and search for 'Ingest.ctdbp-cdef-dcl_recovered.' You will find that the mi-dataset driver that mounts to this uframe_route is 'mi.dataset.driver.ctdbp_cdef.dcl.ctdbp_cdef_dcl_recovered_driver'

 
$ cd /Users/michaesm/Documents/mi-dataset
$ python utils/parse_file --fmt csv --out ./parsed mi.dataset.driver.ctdbp_cdef.dcl.ctdbp_cdef_dcl_recovered_driver /Users/michaesm/Downloads/20150412.ctdbp1.log

The parse_file.py script will parse the raw data file ‘20150412.ctdbp1.log’ with the driver ‘mi.dataset.driver.ctdbp_cdef.dcl.ctdbp_cdef_dcl_recovered_driver’ and save the parsed data as a csv to /Users/michaesm/Documents/mi-dataset/parsed directory.


Cabled HD video data
Cabled HD video data (CAMHD) are in two formats, both able to be opened and played using most video player software (VLC, Quicktime, Windows Media Player, etc.). Uncompressed full-HD video files are in .mov file format, which are very large and may take a long time to download. Smaller files created using lossless compression are in .mp4 format, which are still large but are more easily downloaded and played.


Glider Data
Uncabled glider data are in vendor formatted files: .tbd, .sbd, .dbd, and .pd0 (for the DVL ADCP files). They can be opened using the TWR Slocum glider software, or various other tools. Some of the tools can be found via the TWR forum (https://datahost.webbresearch.com/; requires registration and login), and some open access code has been made available by Rutgers University (http://marine.rutgers.edu/~kerfoot/slocum/gliders.php ). If you have more specific questions, please contact the Help Desk.


Streamed Cabled Data
For streamed cabled data (.dat files), the Digi computer that handles the streaming data adds a timestamp to every file, which slightly changes the format. To strip out that timestamp and return the file to vendor or original format, follow the steps below:

  • Step 1: Install bbe (bbe-.sourceforge.net). For example, if you have brew installed, in a bash shell type: brew install bbe
  • Step 2: Create output directory in working directory: mkdir -p ‘output’
  • Step 3: Download your desired raw binary data files from https://rawdata.oceanobservatories.org/files/ and put them in your working directory
  • Step 4: Iterate through raw binary data files in your working directory. In this example they end in .dat
    for file in ./*.dat
    do
    Step 4.1: Rename output files
    convertedfile="$(basename $file .dat)_converted.dat";
    Step 4.2: Strip out the timestamps
    bbe -b "/\xa3\x9d\x7a/:16" -e "D" -o "./output/$convertedfile"  "$file";
    done
    


Uncabled Mooring Data
Uncabled mooring data, including wire-following profiler moorings (aka MMP, which stands for Modified McLane Profiler), are in the following formats (see OMC_Data_Format_2016-05-25.txt). Not all instruments are currently included here. More are being added, but if you notice a missing instrument type or have questions about the data you find on the archive, contact the Help Desk.


Uncabled Surface Piercing Profiler
Raw uncabled surface piercing profiler mooring data (any platform with the suffix “SP”, for Surface Piercing) can be saved into easily accessible .mat files via a MATLAB routine developed by WET Labs called ‘rdWETLabs_2015_04.m‘. This script reads in a directory of profiler text files (found in the “extract” subdirectory, eg. https://rawdata.oceanobservatories.org/files/CE01ISSP/D00001/extract/) that contain all of the data extracted from the binary files output directly from the profiler, and aggregates them into one MAT file. The MAT file contains the following variables:

  • ACS: WETLabs ac-s
  • ADCP: VELPT Aquadopp
  • CTD:
  • OCR: SPKIR – Satlantic OCR507
  • OPT: DOSTA
  • PARS: PARAD – WETLabs PARS
  • TRIP: FLORT – WETLabs ECO Triplet
  • SNA: SUNA Nitrate
  • HMR: Heading, pitch, roll
  • SBE: Fast pressure
  • WM: Winch status
  • SUM: Wave summary
  • WB: Wave Burst

If you have any issues running the Matlab routine or run into versioning issues, please contact the Help Desk.

— Last revised on August 15, 2016 —