Jupyter Notebook Produces Quality Flags for pH Data

OOI uses the SAMI2-pH sensor from Sunburst Sensors, LLC to measure seawater pH throughout the different arrays.  Assessing the data quality from this instrument is an involved process as there are multiple parameters produced by the instrument that are then used to calculate the seawater pH.  These measurements are subject to different sources of error, and those errors can propagate through the calculations to create an erroneous seawater pH value. Based upon the vendor documentation and MATLAB code Sunburst provides to convert the raw measurements, OOI data team members have created a set of rules from those different measurements to flag the pH data as either pass, suspect or fail.

The resulting flags can be used to remove failed data from further analysis. They can also be used to help generate annotations for further Human in the Loop (HITL) QC checks of the data to help refine quality metrics for the data. OOI team member, Chris Wingard (OSU), has written up the QC process as a Python Jupyter notebook. This notebook and other example notebooks are freely available to the scientific community via the OOI GitHub site (within the OOI Data Team Python toolbox accessed from https://oceanobservatories.org/community-tools/ ).

In this notebook, Wingard shows how the quality rules can be used to remove bad pH data from a time series, and how they can be used to then create annotations. The impact of using these flags is shown with a set of before and after plots of the seawater pH as a function of temperature.  The quality controlled data can then be used to estimate the seasonal cycle of pH to set climatological quality control flags.

Here an example is shown using data from a pH sensor on the Oregon Inshore Surface Mooring (CE01ISSM) near surface instrument frame (NSIF), deployed at 7 m depth (site depth is 25 m).

Figure 25: pH data from the Oregon Inshore Surface Mooring (CE01ISSM) near surface instrument frame (NSIF).  Good data are shown in black, failed data in red.  Note that simple range tests on the final calculated pH are often not enough to distinguish good from failed data.  The automated QC processing examines intermediate measurements and fails data if intermediate measurements are outside acceptable ranges and propagated to final measurements.
Figure 26: Good data together with annual cycles (red) constructed with available good data from initial deployment through 2021.  Data which falls outside three standard deviations of the climatology is flagged as suspect.  The climatological tests are used to flag suspect data.  Simple range tests for suspect (cyan) and failed (magenta) data are also shown.  The annual cycle at this site is strongly influenced by annual summer upwelling and winter storms and river plumes.  The summer decrease in pH is consistent with cold, relatively acidic upwelled water high in CO2 (see e.g., Evans et al., 2011)


Evans, W., B. Hales, and P. G. Strutton (2011), Seasonal cycle of surface ocean pCO2on the Oregon shelf,J. Geophys. Res., 116, C05012, doi:10.1029/2010JC006625.