Understanding data streams and parameters
In the OOI, each instrument deployed throughout the system — whether it’s a CTD or DO sensor on a glider or mooring, an ADCP, a seismic sensor, or a hydrophone — is given a unique Reference Designator code. You can learn how to decipher a Reference Designator to figure out where an instrument is deployed within the OOI network.
However, if you look up an instrument in the Data Portal’s main catalog, you will find that each instrument often has 2 or more entries in the system.
So why is this the case? And how can one find the actual data products or parameters?
The short answer is, Data Streams are the primary way the Data Portal delivers data, and each instrument can have multiple data streams available. Each data stream contains a collection of parameters, which allows users to select specific streams depending on their needs.
What is a Data Stream?
A data stream is really just a collection of parameters (this is the generic term that includes both science and engineering measurements), or data products (which refers more specifically to just science measurements), that are produced by the OOI system. It is the fundamental way data is provided and cataloged by the OOI data system. These streams were designed by OOI scientists to group a reasonable number of associated measurements together.
Each stream is defined by its type (science or engineering), delivery method, and content. The stream identifier is a code that uniquely defines the stream’s content. All instruments that produce the same stream will have the same set of parameters included.
For example, if you search for the 2-Wavelength Fluorometer on the Irminger Sea Apex Surface Mooring, 2 streams will be found:
- telemetered_flord-g-ctdbp-p-dcl-instrument
- recovered-host_flord-g-ctdbp-p-dcl-instrument-recovered
Both of these are are science data streams containing fluorometry data from that instrument, but they differ in their delivery method: one is telemetered, while the other is recovered-host.
There are generally three kinds of streams, as defined by the Delivery Method:
- telemetered – Data delivered via satellite to the system in (near) real-time
- recovered – Data added to the system after the instrument is recovered (3 months after deployment for gliders, or 6 to 12 months for moorings). There can be multiple kinds of recovered data:
- recovered-host – Recovered data stored on a platform data logger
- recovered-inst – Recovered data stored on the instrument itself
- recovered-wfp – Recovered data stored on the wire-following profiler’s data logger
- streamed – This is used for cabled instruments that stream data in real-time via electro-optical cable
For additional information, please check out the Data Terms section in the Glossary.
Generally, instruments will have either a matching telemetered/recovered pair of streams (or multiple pairs when data is available from data loggers and the instrument), or they will have a single “streamed” data stream.
The last part of the stream identifier generally tells you what you will find in the stream, that is, whether it includes instrument data, engineering, diagnostic, metadata, or some other set of derived data products.
The biggest difference you are likely to see between telemetered and recovered data streams is that they may cover different time periods (depending on issues with real-time telemetry or instrument storage) or have different levels of data density. Recovered data typically have a higher resolution. You can find details about the telemetered vs. recovered sampling rate in the Observation and Sampling Approach document.
What is a Parameter?
As noted above, a Data Stream is really just a collection of various parameters, or Data Products. You can view the list of parameters when generating quick plots in the data portal, or by downloading the data to your local machine and opening the file.
If you download a NetCDF file, you can view the metadata associated with each parameter. This will include a description, units, fill value, and a long (aka English) name. In addition, the data_product_identifier typically includes a common name for the product (e.g. density) as well as its processing level (e.g. L0, L1, L2). For a description of each of these levels, please check out the Glossary.
Most streams include scientific or engineering data products collected by an instrument, though calibration data is also sometimes included. Many streams also include derived parameters, like seawater salinity or density, that are calculated using other data products. And some streams consist primarily of derived data products, like the streams for hourly averaged METBK data (telemetered_metbk-hourly), which include a number of flux calculations, and the 15-second BOTPT stream (botpt_nano_sample_15s), which includes predicted tides and detided bottom pressure.
When you examine a specific data stream, you will find that many parameters have corresponding “quality” parameters that indicate whether specific values pass the automated QC checks. For more information, please see Interpreting QC variables and results.
Why are there multiple Data Streams?
Our goal is to get as much data to the community without too much processing or removal of data, so that individual researchers can make their own determinations on quality. While including multiple streams for each instrument may increase the complexity of the system (rather than consolidating real-time and recovered data into one single stream for example), it keeps the provenance of the data clear so researchers who want to look into differences that arise through different telemetry methods can do so.
— Last revised on July 12, 2017 —