Processing an Ocean of Data: OOI Insights from the NSF CI Compass Workshop

At the NSF CI Compass virtual workshop, “Data Management: From Instrument to First Storage,” Jeff Glatstein, OOI Senior Manager of Cyberinfrastructure at WHOI, shared key insights into the challenges and advancements in handling large-scale ocean data.

OOI manages enormous data volumes: 175 billion rows of numerical data, 2.8 petabytes of raw data, 13,000 hours of video, and over 2 million digital stills. In the last quarter alone, 36 terabytes of data were delivered to researchers, highlighting the complexity of managing continuous, multi-source data streams.

Jeff introduced the Jupiter Hub environment, which allows researchers to download and process data directly while supporting FAIR metadata standards. He also addressed challenges like storage costs, technical debt, and the need for scalable infrastructure as data demands grow.

Recent upgrades, including Cassandra improvements and enhanced monitoring systems, have improved efficiency. Future efforts focus on GPU use for AI applications, building a third-generation data center, and ensuring cybersecurity and disaster recovery with geographically distributed storage.

Jeff emphasized the importance of collaboration between observatories and the standardization of data formats to improve integration and sharing. His presentation highlighted the ongoing work required to make ocean data accessible, secure, and valuable for research.

The workshop highlighted the critical role facilities like OOI play in advancing ocean science, offering the tools and infrastructure necessary to manage complex data and foster collaboration across the research community.

To learn more about OOI’s data management strategies, check out Jeff Glatstein’s full slide deck from the NSF Workshop. View it here.

OOI Summary

(c): Jeffrey Glatstein

Data Lifecycle: Acquisition

(c): Jeffrey Glatstein