Processing an Ocean of Data: OOI Insights from the NSF CI Compass Workshop

OOI Summary

At the NSF CI Compass virtual workshop, “Data Management: From Instrument to First Storage,” Jeff Glatstein, OOI Senior Manager of Cyberinfrastructure at WHOI, shared key insights into the challenges and advancements in handling large-scale ocean data.

OOI manages enormous data volumes: 175 billion rows of numerical data, 2.8 petabytes of raw data, 13,000 hours of video, and over 2 million digital stills. In the last quarter alone, 36 terabytes of data were delivered to researchers, highlighting the complexity of managing continuous, multi-source data streams.

Jeff introduced the Jupiter Hub environment, which allows researchers to download and process data directly while supporting FAIR metadata standards. He also addressed challenges like storage costs, technical debt, and the need for scalable infrastructure as data demands grow.

Recent upgrades, including Cassandra improvements and enhanced monitoring systems, have improved efficiency. Future efforts focus on GPU use for AI applications, building a third-generation data center, and ensuring cybersecurity and disaster recovery with geographically distributed storage.

Jeff emphasized the importance of collaboration between observatories and the standardization of data formats to improve integration and sharing. His presentation highlighted the ongoing work required to make ocean data accessible, secure, and valuable for research.

The workshop highlighted the critical role facilities like OOI play in advancing ocean science, offering the tools and infrastructure necessary to manage complex data and foster collaboration across the research community.

To learn more about OOI’s data management strategies, check out Jeff Glatstein’s full slide deck from the NSF Workshop. View it here.

[caption id="attachment_35610" align="alignnone" width="640"]OOI Summary (c): Jeffrey Glatstein[/caption] [caption id="attachment_35607" align="alignnone" width="640"]Data Lifecycle: Acquisition (c): Jeffrey Glatstein[/caption] Read More

Sharing OOI’s Cyberinfrastructure with NSF COMPASS Fellows

On Thursday March 7, 2024, Senior Manager of Cyberinfrastructure and OOI Data Lead Jeffrey Glatstein introduced a group of 18 students and seven instructors in the National Science Foundation’s CI Compass Fellowship Program (CICF) for Undergraduates to the challenges of collecting, distributing, and keeping safe OOI’s vast amount of data.  Students Fellows in CICF learn about real-world cyberinfrastructure challenges, and how to begin solving them for NSF Major Facilities

Glatstein first gave the group an overview of the type, amount, and diversity of data OOI collects from more than 900 instruments. The collected data consists of 135 billion rows of numerical data, 1.2 petabytes of raw data and nearly 10,000 hours of high-definition video, 327,000 hours of audio recordings, and 1.28 million digital still images.

He then shared OOI’s data delivery track record. Since 2016, OOI has responded to 987 million requests for calculated data, providing 333 terabytes of data.

[media-caption path="https://oceanobservatories.org/wp-content/uploads/2024/03/240307_OOI-Jeff-Glatstein_3.png" link="#"]Senior Manager of Cyberinfrastructure and OOI Data Lead Jeffrey Glatstein (green highlighted screen) discussed OOI’s cyberinfrastructure with 18 COMPASS Fellows and their instructors. Credit: CICF.[/media-caption]

Glatstein then went into detail about OOI’s extensive cybersecurity measures, which includes multiple storage sites with duplicate data to ensure that if the primary system went down, it could be quickly restored.  This is particularly important for OOI because its stores data for posterity for future use in long-term time-series.

The students were amazed and engaged, as demonstrated by multiple questions after the presentation.  Said Glatstein, “I enjoy sharing our work with students. It helps open their eyes to career opportunities in unexpected areas, like the ocean sciences, and their questions provides another way into looking at the work that we do.”

 

Read More

Jupyter Hub Town Hall: March 6, 1 pm ET

Curious about Jupyter Hub, which gives users access to computational environments and resources in their own workspaces on shared resources?

OOI is hosting a virtual town hall on Wednesday March 6, 2024 at 1 pm Eastern, where you can learn how researchers and educators are using OOI’s Jupyter Hub in their research and classrooms.  During this one-hour town hall, OOI Data Lead Jeffrey Glatstein, OOIFB Chair and Queens College Assistant Professor Dax Soule, and OOI Data Expert Stace Beaulieu will give hands-on demonstrations of OOI Jupyter Hub.  The emphasis will be on practical applications of this important resource, with plenty of time to ask questions.

Mark your calendar!  Register to add this important resource to your repertoire.

  • Date: Wednesday March 6, 20224
  • Time: 1-2 pm Eastern
  • Location: Zoom Webinar
  • Register here. 

 

Read More

Demo of Data Explorer New Features Video Available

In case you missed the latest demonstration of the newest features of Data Explorer, you can watch it here. Axiom Data Science Senior Software Engineer Brian Stone and OOI’s Senior Manager of Cyberinfrastructure Jeffrey Glatstein explain the latest additions to Data Explorer include a beta display of high-definition video streams, additional differentiation between the Axial Seamount and Oregon Margin Regional Cabled Array Assets, human-in-the-loop quality control flag display, and two ADCP instruments that were previously not visualized.

[embed]https://youtu.be/BcMi3lHSUB0[/embed] Read More

OOI Virtual Town Hall: Demo of Data Explorer Latest Features

Join Axiom Data Science Senior Software Engineer Brian Stone and OOI’s Senior Manager of Cyberinfrastructure Jeffrey Glatstein as they demonstrate the latest features of  Data Explorer and answer your questions about how you might use their features in your research. Advance registration needed. Please register here.

Read More

OOI Data to be Archived by NOAA’s National Centers for Environmental Information

NOAA’s National Centers for Environmental Information (NCEI) and the Woods Hole Oceanographic Institution (WHOI) established a Cooperative Research and Development Agreement (CRADA) to share high-quality oceanic data collected from the National Science Foundation (NSF)-funded Ocean Observatories Initiative’s instrument arrays. The goal of the partnership is to archive and deliver the initiative’s data for continued research on ocean processes.

“WHOI is pleased to be working with NCEI for the long-term preservation of data produced by the Ocean Observatories Initiative” said Jeffrey Glatstein, Senior Manager of Cyberinfrastructure at WHOI. “The initiative is a science-driven ocean observing network that delivers real-time data from more than 900 instruments to address critical science questions regarding the world’s oceans. Given the long-term timeframe of the program and the impact of having a continual record of measurements, the archiving of this data is a significant step in making these data available for researchers in the future.”

“Under this partnership agreement, NOAA expects to be provided at least 30 years of high-quality oceanographic data produced by the Ocean Observatories Initiative, commissioned in 2017, for preservation and stewardship” said Jason Cooper, NCEI’s Archivist.

NCEI will be responsible for acquiring and managing the required IT storage for the data that WHOI will provide, which is expected to amount to roughly seven terabytes. NCEI will ensure that the metadata associated with the data are up to federal and international standards such as those regarding storage, preservation, and accessibility. In addition to providing the data that have been collected to date, the agreement also calls for WHOI to transfer an additional 710 gigabytes of data annually for the next ten years.

WHOI is an independent, non-profit organization dedicated to ocean research, exploration, and education. WHOI and their Oregon State University (OSU) and University of Washington (UW) partners designed and now manage the instrument arrays involved in this CRADA partnership that collect chemical, biological, geophysical, and physical measurements in the global ocean from below the seafloor to the surface.

The CRADA allows NCEI and WHOI to achieve their common goal of supporting research related to ocean and atmospheric processes by sharing their findings with the scientific community, policymakers, and the general public.

 

 

Read More

Data Explorer Receives Environmental Business Journal Award

The Environmental Business Journal (EBJ) recognized the Ocean Observatories Initiative’s (OOI) Data Explorer’s ability to manage and visualize data with one of its annual awards for Information Technology.  

Tetra Tech will accept the award at the EBJ awards ceremony in March 2023 for its role in designing open-source software to support the management, accessibility, and visualization of ocean data. Axiom Data Science, a Tetra Tech company, worked with the OOI Cyberinfrastructure team to develop both front-end and back-end systems for data management and visualization of OOI data feeds.

The OOI, funded by the National Science Foundation, delivers real-time data from sensors in the Atlantic and Pacific Oceans to address critical questions regarding the world’s oceans. Axiom Data Science played a foundational role in designing and operationalizing the Data Explorer, the primary gateway for discovering, visualizing, and accessing OOI data. The Data Explorer makes it possible to search across data points, download full datasets, and compare datasets across regions and disciplines for more than 900 instruments in near real-time. In 2022, Axiom Data Science upgraded the OOI cyberinfrastructure to improve the OOI’s ability to serve ultra-high resolution data streams from next generation ocean instrumentation that span the ocean floor to the sea surface.

“We are honored to have worked with Axiom Data Science to make OOI’s vast amount of data accessible and useable and congratulate them on this recognition of their exceptional work,” said Jeffrey Glatstein, OOI’s Data Delivery Lead and Senior Manager of Cyberinfrastructure.  “The capabilities of Data Explorer are only beginning to be realized and will serve the ocean community for years to come.”

 

 

Read More