---------------------------------------------------------------------------------------------------------------------------
NOTES --- NOTES --- NOTES --- NOTES --- NOTES --- NOTES --- NOTES --- NOTES
---------------------------------------------------------------------------------------------------------------------------
OOI CI Release 1 LCO Review
Notepad
This is now the active Etherpad for this presentation.
This pad is intended for shared notes and discussions. Official tracking of questions and topics under discussion (or yet to be discussion) is on http://etherpad.com/ooici-r1lco-trackpad.
You may enter your name in the box at the top right if you want to be identified. (This is not required.)
PRELIMINARY FINDINGS
****************************************** PAST TOPICS *********************************************
Q&A SESSION
Meeting retrospective:
Ingolf: taken the audience from the highest level to the lowest level.
The review (process) itself had a great positive impact on the team.
No immediate questions.
--------------------------------------------------------------------------------------------------------------------------------------
ELABORATION PHASE PLAN
Link to elaboration plan:
http://www.oceanobservatories.org/spaces/download/attachments/21205028/2010-02-25_CI+Task+List.xls
Are all elaboration prototype efforts geared towards the IOOS integration effort?
John: This is one of the principal ways to direct the work; another is the 3 sensors that need to be integrated for release 1. These are the two described system level scenarios.
Is there a notion of what the scope is for LCA (for the IOOS integration)?
yes (John has it)
Is there an identified statement what drives the detailed risk level associated with technical tasks?
John: Is informed by the high level risk items but has not yet been formalized.
Given limited physical resources (sensors), how can interfaces be tested to them?
John: expects to be developing this strategy over elaboration phase and the rest of the release, by taking exemplar sensors and going through the process of instrument activation, in conversation with Marine IOs. There is an interim strategy to directly connect a capability container at the vendor's site (or Marine IO's site) to the instrument and remotely connecting to the capability container from the CI sites.
INTEGRATION AND TEST/VERIFICATION PLAN
Who is in charge of the integration of CI with Marine IO components?
John: The initial steps are at the level of CI, then problem level integration are managed by COL. Bill: There is a notion of formal deployment and acceptance of a CI release, which is COL's responsibility.
What is the validation step with users?
John says: The different milestones (LCO, LCA, IOC, Release) have different stakeholders and are under different authority. The problem level acceptance process is 30 days after a CI release milestone.
INTERFACE MANAGEMENT
Are the EA models CI diagrams or OOI diagrams? In other words, have these been coordinated w/ the marine IOs?
Michael writes: The current EA model is a central OOI model, which is server based, has exclusive authoring locks for SEs, and central consistency and archiving. The model is split into compartments. There is currently one OOI-level part, authored by OL (R. Howard), coordinated among all IOs, and an OOI-CI part, authored by CI (managed by M. Meisinger), internal to CI. There are cross links from the OOI diagrams and elements to the CI elements. In addition, DOORS requirements have been imported as third compartment into the model, with cross-references to CI design elements.
The two marine IO's are not yet currently using the same mechanism.
Have the OOI level EA diagrams been coordinated and vetted with the Marine IOs?
Jack says: Yes, and they are subject to configuration control.
Need for Inter-IO risk assignments and cross-links?
Matt says: This is currently being talked about on the PM and SE level.
RISK REGISTER/MANAGEMENT
Explain highest risks currently:
Shared domain vocabularies
What else can be done to mitigate if prototyping is not immediately effective?
There is a comprehensive plan in the works. These technologies are not in the scope of release 1, but rather 2-3, so that immediate action is not necessarily required.
Distinguish strategic and tactical risks. At any time, risks can be pulled forward and resources invested to mitigate.
How often does the risk board meet?
Matt says: Every two weeks in alternation with the change control board. This is too often but required currently in the startup phase. Detailed technical risks are not immediately part of this process, but are expected to be included within the next 6 months.
Instrument integration strategy?
This original risks was split into two (the other is instrument management strategy).
The risk targets the articulation of the strategy and the architectural elements. Work currently going on at OL level and in the CI Office of S&A are targeting this risk.
CYBERPOPS AND NETWORK
What in the network design is being done to address cybersecurity?
Michael writes: The DIF architecture and implementation(s) directly address cybersecurity. A DIF is a closed network (directly on Layer-2 hardware or on the application Layer-7) that requires explicit enrollment of only trusted identities. Stacking multiple DIF networks on top of another enables to increase or reduce scope of reachable trusted participants.
Are there additional measures being done with hardware?
Matt says: The inner ring is an MPLS cllud that enables complete governance of resources. Redundant circuits are planned and can be governed. Very strict routing and direct control.
At the distribution sites are firewalls towards the internet
Within the network are mechanisms of intrusion detection and monitoring software
On the software level, the choice of all internal communication through AMQP (the Exchange and the cap container) enables the enforcement of governance and security.
Do firewalls limit throughput?
Are you involved with advanced networking activities such as in joint Tecs (?)
Matt says: I am the network expertise. Has extensive experience on operating high available, secure, scalable systems. The operations manager will take over. This is not a research part of OOI-CI, it is a deployment challenge "only". Leverage relations with NW GigaPop, FutureNet.
Release-1 roll out, Woods Hole network access point?
Matt says: CI has responsibility of putting a (smaller) Acquisition CyberPop there. Badwidth coming im limited by satellite, but large burst of data when ships are coming back from service cycles. CG has choice of computing there, but all data are recovered to Portland, the central reliable archive. There is also a network connection from Woods Hole into the high bandwidth national circuit network.
--------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------
USER EXPERIENCES
itunes organizing metaphor, uses as "container" - possible UI target for release 1
Software agent (Dorthy): living on searches. They do crawling continuously and update the result set. In our data world, imaging a DAP server infrastructure that continuously fetches for new datasets and metadata on behalf of the community.
Roy: The ooi team needs to start about how to implement a search strategy: limiting search or expanding search. Faceted search are limiting search in a certain way.
Event response behaviors, advanced search mechanisms: They will first have an operational presence in the system; then OOI will make it available to its users and create situational presence.
Evaluate trust: crowdsourcing. Trust can be created by accumulation (multiple untrusted individuals reporting a coherent story)
Software is not a user (in a way that an application is not a user of an operation system).
Matthew says: computing is changing. up until recently the only things we did was viewing and manipulating. now we are seeing a third core verb emerging: participation. We have to start of finding software "participating". This piece of software is acting on behalf of an institution, organization, individual. The agent. The agent is a participating resource (besides information and taskable resources).
Did CI look at all at methods allowing users to define and design their own user interfaces.
Susanne: This is the idea we are getting at. GoogleWave is along this way.
Matthew: Find a way of providing user provided interfaces within the system so that other users can make them part of their toolchain.
Ingolf: Interaction interface specifications are key to enabling these ideas.
Do you need an OOI approval to get data or an app on an OOI-iTunes?
Matthew says: The reason why we support different facilities is to support different communities with different policies and governance. The OOI-store most probably will require OOI certification. A public store (reapplying the same mechanisms) is the decision of the community to set the policy for certification etc.
The following is quite interesting especially the live video inclusion in the back portion. Look at how the orientation of the camera is calculated on the fly. Between this and Google Earth/Ocean we have some very significant geo-referenced application platforms to operate on top of.
Virtual Earth + Seadragon + Photosynth = http://www.ted.com/talks/blaise_aguera.html
DATA MANAGEMENT SUBSYSTEM DEMO
which version of PyDap? David says: New version, 3.0rc8
Cacheing whole data set or just the request? Matt says: For now, the whole data set. We have a model for representing partial sets.
What are the criteria for notification? Paul H says: In this case just arrival of a new data set. But has been more refined in the past, and (John says) can be very refined.
Which entity/activity generates the notification? Paul H says: The event notification service sends it out. Arrivals of data at the data store is the event that triggers the notification.
Suggestion for service notifications going directly to the web page using a URL. Paul H says: Users can do that with very little trouble (so we can show that going forward).
The data Exchange prototype shows some core concepts of development of distributed systems and deployment of distributed systems. Developers can run a set of multiple distributed processes on their local development environment (e.g. their desktop) in a constrained environment. The system is still a fully capable distributed system, communicating asynchronously. The operational deployment is similar, only that different physical servers host multiple instances of each service.
Matthew says: This enables OOI to apply pod-based deployment and update (the Ebay model): Replace 10% of service instances with a new version and see and test for a period of time of this version survives. Then replace a 50% pod and see if this survives and then do the rest. This reduces the risk of exposing users and the community to new versions substantially.
--------------------------------------------------------------------------------------------------------------------------------------
DATA MANAGEMENT SUBSYSTEM
data set: a fixed bounded thing covering time and space
data stream: blocks of information coming back undounded in space and time
This is a limited definition which is a helpful definition for the scope of this document. It does not represent our full understand of this idea.
What user data management policies can be controlled in Release 1?
Michael writes: Probably very few. This is mostly in scope of R2
Where will this fail? David says: Goes back to inventory, building that on Redis.
Inventory is a registry? Basically yes. Representation of COI registry, extending it (or restricting it).
COI Registry is also Redis? That is what we are working with, but not the final choice.
It has some fault tolerant capabilities, relating to master-slave. Matt says: More in Michael's realm.
What role does Bill Howe's work play in the system? David says: For release 1, we are targeting the necessary IOOS functionality. Probably will be too ambitious to go after that, CDM will cover R-1 needs. But we are after the kind of separate of concern supported by BIll H's approaches. Matt says: There is an underlying notion we can get away with a few canonical forms that can address a few functions. (Informed by ERRDAP work, which showed plausibility.) Fundamental jump is to say can we have a firt class representation of topological structures and corresponding implementation? Bill H provides degenerative cases and n-dimensional structure. WIth Redis set capabilities and formal representations of topological representations, this could be really good by the end of Release 3. We understand this is an unsolved problem, but with Roy's guidance and Bill's technology we think
What is our definition of data, and is the system sensitive to that? We also use models, and you're proposing dealing with them. Matt says: We're trying to solve higher level problem beyond even models. Went through unsuccessful exercise in 2009, which helped us recognize the problem, difficulty of achieving it in this time period. David says: Collaboration lets us show this at scale, for both observations and models. (Voltages and assays.) Roy says: Note this is not parameter-specific. David says: Is more based on topology.
Said you have to get model right, high risk. For first release for this group, how soon does it have to be right? John says:
SENSING AND ACQUISITION SUBSYSTEM
Instrument providers means? Michael says: This is speaking loosely as a broad class (operators, scientists, etc.), not meant as vendors per se.
Recommend reviewing consistency of the proposed Release 1 functionality with the L4 requirements in DOORS (specifically ensuring the release tags are up-to-date).
Will the entire use-case for direct access be implemented in R1? To include the UI and the OMS interface? Arjuna and Michael says: The UI is not really a part of Release 1, but very limited UI and management interfaces (not of final deployed quality) will necessarily be present.
Where will an example like SIAM run? John says: 2 components, driver and middleware. The drivers, to the extent they are ported, will run in a capability container close to the instrument. The middleware aspects will be distributed to components like the Instrument Agent, which also may be a part of the capability container. Michael says: Capability Containers will be of different flavors and level of functionality, as needed in the particular case—some may be written in C and operating in the marine platform, for example. Arjuna says: We could have some of these components packaged for use in the AUVs, for example.
Will the
--------------------------------------------------------------------------------------------------------------------------------------
COMMON EXECUTION INFRASTRUCTURE Demo
How often do the Amazon services perform unexpectedly? And is there an identified risk/concern over the reliability and availability of these services?
Discussion of the conversion of our codes that would be necessary to convert code into other Platforms (whereas these commands can be run on several platforms). Yields the information that the Microsoft environment (supported apps) is significantly evolved from its state 10 months ago.
Where is it being run from? Tim says: On the cloud. Matt says: Nice to keep all the management contexts operational even when site went down. Michael says: First and foremost is to make sure resources are there highly available. Matt says: Implications on capitalization of equipment. Redundancy investments are not needed, they can be taken from the cloud; this is core goal of release 1.
While we have redundancy of services, we want redundancy of persistent data. What is our data consistency model? Matt says: Eventually want consistent data basic model. Used ERRDAP deployed across multiple computers in cloud. Many applications not necessarily designed for sharing, so acquisition phase needed in technology integration phase to operate on scale.
Jack says: Calculation of reliability information requires count of number of machines. John says: Reliability shouldn't be a function of individual machines.
Amazon avoided because open cost model (IO charges) is unpredictable. Matt says: What we're doing to fix that is to establish a peering relationship with 10Gig circuit off OOI backbone while minimizing transit costs. Talking about separate deal with Amazon to not charge ingress/egress. (Educational model was original driver for that.) Kate says: This helps with both cost and performance.
--------------------------------------------------------------------------------------------------------------------------------------
COMMON EXECUTION INFRASTRUCTURE SUBSYSTEM
There are existing technologies that can do all of this, like Condor. Have you considered adopting one of those existing technologies, developed by experts? Michael says: Yes, I have just introduced how everything fits in, but Kate Keahey is our expert who will be presenting the details of the technologies we are applying.
How does the Elastic Processing Unit (EPU) load balancing work?
Michael writes: The EPU controller receives work request messages and routes it to worker instances. It is also possible that work messages are not routed through the controller, but that workers take work messages from a work "message queue" in the COI Exchange directly.
Is the CEI only used for elastic scaling on demand with very short reaction times?
Michael writes: No, not only. In Release-1, the biggest stress case is actually the initial start and control of the core CI services, with most emphasis on high availability and very little scaling to demand. The rection times do not matter that much there. Nonetheless, new instances need to be started automatically in case existing instances fail or need to be taken down for maintenance.
Why should "will it work with COI?" be an issue? Can't COI create wrappers to smooth over any disconnects?
Can you define what you mean by a harness? Kate says: A set of applications that can test the various services. Matt says: Key point is that it takes a lot of processing units to generate demand for the other processing units.
One of the things that confused me was that you built a platform with (that). Kate says: Do they exist independently, such that we can configure at will? Do they run Linux? Tim says: Number of dsployable types you can host is very small, suggesting be limited to that? Matt says: The answer is at a higher level. Characterized Infrastructure and Platform as-a-service -- the ones that limited the context werre in the latter category. THese are significant community resources, we will project ours as local services within those platforms. But we are projecting our applications in its native form to the larger services. We are an IaaS service ourselves—our strategy is to operate all these as one. Collaboration stategy goes uphill into PaaSs when community demands.
Example of Platform-as-a-Service: Microsoft Azura, Google Hadoop.
Planner is policy-configurable, what is method for configuring this, policy from yesterday? Michael says: Yes, they are related. Yesterday we focused on access policy and commitments. Anoher type of policy is resource scheduling, a subclass of policy following the agent pattern. The dots can be connected but not directly. Kate says: The demo after the break will show how this work directly.Michael says: Everything we heard yesterday is applicable. Matt says: The notion of being able to process new processes on a set of nodes, and the notion of enrolling in a community and have it managed -- you can see we can manage resources across cloud environements, and use domains of authority. Now a community that wants to do a joint compaign with models and assets can be provisioned and enroll in a secure group, not a localized space. Resources like instruments and models can be bound into this, along with analysis and simulation functions, on demand and without speaking to network administrators anywhere (or anyone else!). The interactive ocean observing will be carried out by communities ganging together, with our support.
Agrees on platform as a service issue. Service model has implications on computing model and systems work under it. Does it map properly to what applications need? Applications on uncertainty analysis of dat acoming out of sensors requires very tightly coupled codes running in close coordination. Kate says: It has been proven that some science applications do well on these services, the question is how large. When the rubber hits the road we'll have to see if it will scale, hopefully sooner rather than later. The role of IaS is not to develop IaaS, but to help it happen. We're trying to adopt what's out there. Known challenges: networking to Amazon, for example.
Question is, large majority of applications will run nicely, for which model is great. Kate says: What about other case? Working on these other models. Matt says: 1) Set of assumptions projected on us -- need to articulate what those are for us, how they should be projected. 2) Projecting into tightly coupled resources requirements that we are not trying to solve. We do want to couple to that as if it's a process. It should look to us like a high-availability or operational unit; we don't schedule it we tell Teragrid to execute it. Through container environemnt? No, that's the agent, but we are not permitted to support the developing of models, but in R4 we can coordinate and control models. Same design we can do with instrument device. Agent handles presentation of device to network, som edevices present themselves. Modeling app running on BlueWaters can be set up and coupled, but we're not making any statement on how it's managed on BlueWaters.
Suggestion: Lots of work going on everywhere, felt we were re-inventing things—talk about those other things specifically, to present a much better picture. Matt says: Network will show computational resources in multiple locations, representing clusters. I want to know where we want to be 5 years from now. Design for intent, build for demand. Drawings go way out in terms of time, but technologies are chosen for what's there. You'll see us do more design work, but that's to make the integration decisions.
Assume demos from yesterfday is how agents architect.
General Question - is there a trace from L4 requirements to architectural elements and if so, where is that documented?
Michael writes: The L4 CEI subsystem requirements refer to the high level services of the CEI, which are the work packages of the CEI WBS. Specific technical naming within the architecture and design applies, such that the actual wording of the requirements might not use the same language as the architecture models. We have not yet established these trace links in the Enterprise Architect tool for CEI (for effort reasons), but will do so by LCA.
Question and Answer Period - Wednesday
What is the explict objectives for release - more detail and what is useable by release one.
Data distrubition network usable by modelers (users of release 1) Target community will to be users start with the first users of needs.
Will there be beta users? Starting from LCA work with early adopter communities. Notion of alpha and beta releases. Specific dates not set yet. Suggestion, collect specific people to be inital testers. Specifically IOOS and sensors are initial cases.
Question of charge - Committee evaluation should be from the view of the stakeholders.
Focus prototyping on high risk elements first.
Scope may be deferred because of risk. What might be defered first. Highest risk elements are pulled forward first. Low risk elements can be deferred.
By what criteria were risk determined. - Looked at dependency risks, technical risks, estimated effort to accomplish. Evaluation of risk has been ongoing and intense. Primary driver of how work is scheduled and organized. What are the factors which create the risk. Those factors not articulated in this review yet. We did not provide overview of risk. Risk registry being used for high level assesment. Risk registry does not yet go down to the details of the risk as we start development. Documentation of risk lower levels is not there yet. Committee would like to see a walk through of the procedures for one of the prototypes presented.
Does service have human interface? If so what is that experssion. Many services may not necessarily be visible to most users, but all services will be designed to be accessed directly by sophisticated users.
There is a risk in adopting a standard (even if you have prototyped in the area). The risk is that it is dead-end in the evolution (Think CORBA, which one program I was on adopted). The more mature a standard is, the less of one type risk it has, but a higher risk that it is end-of-life (think S-100 Bus). So, the question is: has OOI (CI) addressed this risk, what level of risk is this (low/medium/high). Considering the 20-year operational life, what is being considered to mitigate this risk?
Suggestion for future presentations and communications; more emphasis upon working relationships, collaborations, discussions with on-demand computing activities and approaches being done by TeraGrid, OSG, Condor, etc. This will help show that OOI is not only aware of similar work (and is working with others), but also to emphasize the fact that OOI is not reinventing or duplicating efforts or be viewed as "rolling their own" approach.
--------------------------------------------------------------------------------------------------------------------------------------
COMMON OPERATING INFRASTRUCTURE DEMONSTRATION
How do you handle the naming of the resources? Matt says: Not using an address, using a name as a stand-in. It is a controlled namespace.
Does this show the federated architecture yet? Michael says: Yes, in the CC demonstration there were interceptors and agents; it did not show much of the actual services themselves.
Good laboratory demonstrations. What about at scale? Matt says: Haven't put it under load yet. But it's built for scale, and we're buidling for distribution for scale. Is a good question for architectural review.
Michael says/writes: The message broker architecture is proven to allow for massive scaling if individual processes and message volume. The COI architecture is applicable to various degrees to interactions in the system. (1) The most extensive are governance interactions (negotiations). They occur only infrequently. (2) Then there are service invocations. They have less governance cost and can be scaled. (3) There can be raw message level interactions without any additional overhead.
COMMON OPERATING INFRASTRUCTURE SUBSYSTEM
Will large bandwidth data (e.g. HD video, hydrophones) also utilize the messaging infrastructure? Or is there a different "special purpose" mechanism?
Michael writes: Setting up the use of high bandwidth (e.g. start streaming, change angle) requires a very small interaction beforehand, which is where messaging with governance is applied. The actual use (sending large bandwidth data) does not have to go through governance and messaging (but can). It could be routed at the network level.
There is a mechanism to go outside the infrastructure for these messages? Michael says: Yes, once the control function is executed, there is no requirement that every message go through the infrastructure. Matt says: It turns out you can send more messages through this kind of messaging system without problems than people think.
Since this is a Release 1 activity, what information do you need to make the decision of which it will be and when do you need to receive the information?
Slide 29: What does 'Allocate PubSub' mean?
Michael writes: These are the actions that take place before actually using the messaging system. This means to subscribe to a "queue" in the messaging system, or to register as a publisher with the messaging system (or roughly opening a "connection" to another endpoint). In this diagram it occurs after registering (enrolling) as a communicator with the Exchange.
For the selected technologies, what relationships do you have with the various vendors to ensure any and all issues with their products are handled in a timely manner? (addressed elsewhere?)
what are the benefits of implementing two versions of the capability container? Michael says: Shows we're not limited on implementation technologies, which will change. We want to take advantage of elements in both architectural packages, this gives direct access to the capabilities of the language. Matt says: This also enables the user base to be able to develop their own functionalities and services in a supported language.
how much of the architecture has been prototyped and how many of the major risks have been addressed? (see next)
How many of those technologies has been integrated into prototypes, and how much of the architecture has been prototyped? Michael says: We've split the architecture into elements that are evaluated based on risk. From those risks, we developed prototypes (during pilot period and inception phase) to address them. We have prototyped broker, DIF, Capability Container, initial FIPA work, Rules engine in that context, Redis attribute store, and several other prototypes.
Of those high risk items, how many have been addressed? John says: WIll address explicitly Thursday; many high risk elements have been reduced to medium or low. Matt says: Almost every risk on the Risks slide have been touched, some multiple times. The order hasn't changed in the mitigation, but we understand the technologies more.
In designing the architecture, have we considered how to leverage existing off-the-shelf technology (a lot of this is relatively conventional)? Michael says: OOI is designed to be an integration project of existing technologies. We have advanced visions to make it transformative, which drives many COI elements, but most of the rest includes existing technologies to be integrated. You see this much more in other subsystems—for example, iRODS in the Data Management subsystem is a large body of functionality that we merely must interface.
It sounds like we've done a lot of work, why are they still high risk? Matt says: Evaluate risk on 2 dimensions, likelihood and consequence. The latter is high, even when likelihood is moderate, so that keeps risks high.
Slide 43 (2940-00063 OV6 COI seems to represent a lot of complexity risk? Michael says: This is a layered diagram, showing all the different parts of the stack. It could be drawn more simply, or with more detail. Von says: This represents a deeper level of thinking about issues before it is handed off to the coders than may usually be the case. Matt says: Think about multiple aspects: number of processes involved, possibility of communication faults that require addressing,.... One of the things this reflects it the transition from the Application developer owning all these pieces, to moving them into the infrastructure, and the necessary allocation of responsibility to all these lower level processes. (Note the nature of the banking system, and the EC2 processes working through queues, in case processes go down.) Michael says: the number of distributed processes in the drawing is actually not that high. Most of the arrows represent the equivalent of event handler calls in the capability container.
The other part is examining where fault recovery plays a role, because those will kill you if not recovered from.
Will resource registry (for searching) be distributed, or highly centralized? Michael says: We are thinking of an implementation like a Redis key-value store, with additional registration overlaid, then the back end can be one or several registries (potentially federated so that a discovery service discovers both). What consistency guarantees are being considered? Matt says: Local consistency appears to be fine for most data. Underlying model has notion of an owner, serial access/consistency means material can be brought back to owner. Michael says: Versioning enables consistency based on keyed versions.
A big remote sensing program that required campaigns for recomputation of data product, the entire output of the campaign has to be consistent. So heads up as to complexity, need for multiple keys for example. (Robert H.)
Nice to see building on proven technologies. You're building the architecture and various prototypes, how do you validate them? Do you give test cases to community to try out? Michael says: We are validating through continuous iterations and community involvement. It is the intent to make these systems exposed to the external users. Matt says: The reason for scheduling some of these activities and layers at this stage is to enable the presence of the consumer.
Depending on what's running in the container, Java may be best choice. For example, for compatibility. Matt says: At this point, we're focusing on internals, performance can be optimized later. The notion of a service to front a stand-alone application like Blast, avoding the need to touch the code. That also isolates the networks, service wrappers wall off and provide a facade.
Service should be implemented as a function of need. User should not have to change their workflow. If in Matlab, should be able to use that tool. How easy is it to support a whole slew of applications, with all those libraries? Matt says: We don't have to put all our processing code inside those applications, things like CILogon show we are moving to communication channels as the connection point. Allows the OOI network to be onboard with the local computer.
So it will appear as a VPN? Matt says: Effectively. EIther you use the browser, which provides security context, or you'll bring process onto the box like a VPN client.
A lot of external technologies are open source. What relationships have you pursued with vendors. Matt says: Our choice of technologies focuses on growing vibrance, resting on open source crowd use (responsiveness, community size).
--------------------------------------------------------------------------------------------------------------------------------------
CI ARCHITECTURE
PDF of diagrams is at http://www.oceanobservatories.org/spaces/download/attachments/21205028/2010-02-22_EA-Model_Diagrams.pdf (tested and working!)
Navigable HTTP export is at http://ooici.net/eaexport
Release 1 includes platform direct access? (does not include platform agent, right?)
Michael writes: The platform agent (nor the instrument agent) is not required to implement direct access to instruments (serial, IP) and host platforms (terminal, ssh). Both direct access to instruments and platforms in in the scope of R-1.
What role does CI Architecture have in the change control process?
Michael writes: The elements of the CI architecture are under change control. This includes the narrative with illustrations and each individual specification drawing. As such, every change to the architecture needs to be approved in the change control process, typically around milestones such as LCO and LCA in a packet. Depending on the scope of the change, different CCB levels are involved. The architecture covers all relevant design decisions and interfaces. Therefore the architecture is the central mechanism for baselining and change control. The architecture documentation ends at a certain level of detail (fine design), which is left to the implementer and not directly subject to change control.
Does the group have a Communications person to develop the "Story"?
Michael writes: The architecture team develops the story from the architecture point of view; they also relate the story to requirements and to the architecture elements. The project scientist project the views of the community. A PR-quality presentation of the story will need to be developed by the communications team; its roots are coming together now.
How much detail in the Enterprise Architect - All the drawings? How fine a detail, and when does the effort to track and document get in the way and become counterproductive? Who is the primary audience for this? It appears as though there is too much detail and effort on documentation. John writes: The drawings in Enterprise Architect are developed to the point that the development team can pick up those descriptions and produce software to create a working system, based on the descriptions. (The development team contributes to the development of the architectural detail, as it refines the common understanding of what is necessary.) There are two primary audiences: the development team inside the project, including the architecture team, who need to understand each other's work and the shared view of the system; and the external reviewers and observers at meetings like these, who may want or need to understand the organization of the system. The emphasis on communication artifacts results from the number of project participants, from many different institutions, who must collaborate on the development of the system.
How does the group "track" and learn about new technologies? Is there a formal plan for this or is it more serendipitous? Maybe some sort of technology working group or effort which identifies a small subset of folks who periodically explore and discuss opportunities. John writes: On the logistical level, technologies are tracked in a large shared spreadsheet on Google docs. (A copy of the spreadsheet was distributed as part of the review artifacts.) The activity is both formal and informal. Key aspects include the identification of leaders in each discipline, who bring their own technologies and expertise, and their awareness of other related technologies; research and discussion with each other and the related communities, as development proceeds; and the continued management of our shared knowledge through the common spreadsheet. In addition, the workshops held over the last few years have been explicitly constructed to elicit additional knowledge from a wider community.
OOI Integrated Observatory (OOI ION)
OOI ION: What data products will be ingested, streamed and cataloged?
OOI ION: What does provide initial instruments integration mean? What instruments, when for Release 1? John writes: Three prototype instruments are being identified to validate the Release 1 system. Throughout Release 1, we will develop the instrument interface software—including the Instrument Agents, representing a higher level of abstract access to the instruments, and the software drivers for the instruments—and will be testing that software with the prototype instruments. Instrument control and data acquisition will be possible. The actual user interfaces of defining an instrument and moving it through its life-cycle (with various testing and deployment steps) will not be part of R-1. The actual deployment of instrument agents on a paltform and the interaction with the platform agent are not in the scope of R-1 either.
OOI ION: Are there identifiable or incremental capabilities that can be "rolled out" prior to the Release 1? That is, are there capabilities that can be deployed or tested by users before the release is finalized? Perhaps at Alpha or Beta?
Michael writes: The Data Exchange is a collaboration project with NOAA/IOOS and targets an early deployment of CI technologies and functions to a limited audience. This could be an avenue to elicit user feedback before the actual release.
There was mention of no single/centralized data store. What is the plan/strategy to archive data over the life-span of the project (20+ years). Will all data collected over the life of the observatory be available "on-demand"?
JBG writes: Data store is not single/centralized, but there is an extensive strategy for archiving data over time. Eventually we anticipate, and rely on, a national data storage paradigm taking ownership of data storage.
This means no satellite data? What about model data? Matt says: We are not capitalizing the development/storage of models is not in our scope. Satellite data transiting across our network is cached according to popularity in a period of time, but aged uninteresting data sets will be phased out. We understand the sizes involved, and the storage is being managed in a progressive way. Long term storage will be carried by the responsible community.
How many L4 requirements are being addressed in R1? Out of how many total?
Network Deployment Diagram: Mid-Atlantic should be Pioneer Array; Southern Sea should be Southern Ocean.
List of release 1 products looks like laundry list, not clear how much will be done in year 1? Michael says: Subsystem presentations will help go into this detail. Matt says: We could focus on any 2 of these for the entire year, but consider the system a progressive effort, with the shape of this and subsequent releases determined by this team that includes the reviewer.
Long term data requirements for OOI is huge. Assumption that we're addressing this sufficiently should be more fully evaluated. Matt says: We believe that the scope of networking and distribution infrastructure, and work with partners, realizes an easy way to manage growth as needed, while the cost of storage continues to halve every year (not abating).
Is there a provision for intermediate smaller releases in between? Matt says: Yes, bug fixes are possible, but to the extent these take resources, they come out of contingency. John says: We will have the capability embedded in our system to fix bugs quickly and roll out releases. Matt says: There is another level of more detailed iteration (not shown) in which the pace of work is established every 8 weeks (very timeboxed). Michael says: Whole process is supported by service oriented architecture, incremental roll out of different service versions is possible.
Unified process is sensitive to granularity of use cases and length of cycles. Disconnect betwen design elements and the high-level use cases. What is connection between high level use cases and granularity of specific capabilities? Michael says: Subsystem presentations will speak to more detailed use cases; more material exists behind the present material.
How closely are the users involved in the process of developing use cases? Michael says: There have been extensive workshops with many users which have resulted in considerable material developed and integrated into much of this material. Workshop page at http://www.oceanobservatories.org/spaces/display/WS/ consolidates this information, in consistent sets of resources that are fully documented. Matt says: From workshops we went to prototypes to reflect combinations of technologies and concerns, to try to avoid dead ends. These prototyping results are summarized at http://www.oceanobservatories.org/spaces/display/CIDev/Prototyping/
There was a statement with respect to off-cycle releases and how that would be addressed. The answer mentioned use of contingency and diverting development resources. Once a release is deployed, why would that effort/budget not be covered by the O&M effort? John writes: Even if the issue can be addressed in the short term,
--------------------------------------------------------------------------------------------------------------------------------------
USE CASES
Discussion of normalizing data model didn't address what normalization is and why OOI wants to do it. Would an ontology obviate the need to normalize the OOI data model to a particular, existing data model (such as CDM)?
John G writes: The purpose is to make it possible to work with all the data sets using a consistent data structure and semantic content (terminology). Ontologies are potentially a very useful representation framework for standardizing the semantic content, but community-specific knowledge must be encoded in the ontologies, People have worked on describing standards and data structure using ontologies (using ontologies as a language to describe the data models), but this is at best extremely challenging. (Would be happy to discuss this further.)
Michael writes: Normalization is the activity of transforming encoding (what are the bytes), format and structure, and the semantical interpretation into the OOI selected canonical form. The necessity of doing this is (a) to be able to transform from any data representation into any other data representation, and (b) to reason about any kind of dataset in a uniform way. I don't think that the use of ontologies "on-the-fly" already gets us there. However, ontologies are very useful to support the transformations in and out of the canonical form.
****************************************** UPCOMING TOPICS **************************************
--------------------------------------------------------------------------------------------------------------------------------------
############################################################################
NOTES --- NOTES --- NOTES --- NOTES --- NOTES --- NOTES --- NOTES --- NOTES
---------------------------------------------------------------------------------------------------------------------------
OOI CI Release 1 LCO Review
Notepad
This is now the active Etherpad for this presentation.
This pad is intended for shared notes and discussions. Official tracking of questions and topics under discussion (or yet to be discussion) is on http://etherpad.com/ooici-r1lco-trackpad.
You may enter your name in the box at the top right if you want to be identified. (This is not required.)
- Index to Presentations (including links to slide presentations) at http://www.oceanobservatories.org/spaces/display/syseng/Index+to+Presentation+Outlines
From chat session:
February 25, 2010 8:28 (unnamed): The Amazon EC2 Service Level Agreement commitment is 99.95% availability for each Amazon EC2 Region.
PRELIMINARY FINDINGS
****************************************** PAST TOPICS *********************************************
Q&A SESSION
Meeting retrospective:
Ingolf: taken the audience from the highest level to the lowest level.
The review (process) itself had a great positive impact on the team.
No immediate questions.
--------------------------------------------------------------------------------------------------------------------------------------
ELABORATION PHASE PLAN
Link to elaboration plan:
http://www.oceanobservatories.org/spaces/download/attachments/21205028/2010-02-25_CI+Task+List.xls
Are all elaboration prototype efforts geared towards the IOOS integration effort?
John: This is one of the principal ways to direct the work; another is the 3 sensors that need to be integrated for release 1. These are the two described system level scenarios.
Is there a notion of what the scope is for LCA (for the IOOS integration)?
yes (John has it)
Is there an identified statement what drives the detailed risk level associated with technical tasks?
John: Is informed by the high level risk items but has not yet been formalized.
Given limited physical resources (sensors), how can interfaces be tested to them?
John: expects to be developing this strategy over elaboration phase and the rest of the release, by taking exemplar sensors and going through the process of instrument activation, in conversation with Marine IOs. There is an interim strategy to directly connect a capability container at the vendor's site (or Marine IO's site) to the instrument and remotely connecting to the capability container from the CI sites.
INTEGRATION AND TEST/VERIFICATION PLAN
Who is in charge of the integration of CI with Marine IO components?
John: The initial steps are at the level of CI, then problem level integration are managed by COL. Bill: There is a notion of formal deployment and acceptance of a CI release, which is COL's responsibility.
What is the validation step with users?
John says: The different milestones (LCO, LCA, IOC, Release) have different stakeholders and are under different authority. The problem level acceptance process is 30 days after a CI release milestone.
INTERFACE MANAGEMENT
Are the EA models CI diagrams or OOI diagrams? In other words, have these been coordinated w/ the marine IOs?
Michael writes: The current EA model is a central OOI model, which is server based, has exclusive authoring locks for SEs, and central consistency and archiving. The model is split into compartments. There is currently one OOI-level part, authored by OL (R. Howard), coordinated among all IOs, and an OOI-CI part, authored by CI (managed by M. Meisinger), internal to CI. There are cross links from the OOI diagrams and elements to the CI elements. In addition, DOORS requirements have been imported as third compartment into the model, with cross-references to CI design elements.
The two marine IO's are not yet currently using the same mechanism.
Have the OOI level EA diagrams been coordinated and vetted with the Marine IOs?
Jack says: Yes, and they are subject to configuration control.
Need for Inter-IO risk assignments and cross-links?
Matt says: This is currently being talked about on the PM and SE level.
RISK REGISTER/MANAGEMENT
Explain highest risks currently:
Shared domain vocabularies
What else can be done to mitigate if prototyping is not immediately effective?
There is a comprehensive plan in the works. These technologies are not in the scope of release 1, but rather 2-3, so that immediate action is not necessarily required.
Distinguish strategic and tactical risks. At any time, risks can be pulled forward and resources invested to mitigate.
How often does the risk board meet?
Matt says: Every two weeks in alternation with the change control board. This is too often but required currently in the startup phase. Detailed technical risks are not immediately part of this process, but are expected to be included within the next 6 months.
Instrument integration strategy?
This original risks was split into two (the other is instrument management strategy).
The risk targets the articulation of the strategy and the architectural elements. Work currently going on at OL level and in the CI Office of S&A are targeting this risk.
CYBERPOPS AND NETWORK
What in the network design is being done to address cybersecurity?
Michael writes: The DIF architecture and implementation(s) directly address cybersecurity. A DIF is a closed network (directly on Layer-2 hardware or on the application Layer-7) that requires explicit enrollment of only trusted identities. Stacking multiple DIF networks on top of another enables to increase or reduce scope of reachable trusted participants.
Are there additional measures being done with hardware?
Matt says: The inner ring is an MPLS cllud that enables complete governance of resources. Redundant circuits are planned and can be governed. Very strict routing and direct control.
At the distribution sites are firewalls towards the internet
Within the network are mechanisms of intrusion detection and monitoring software
On the software level, the choice of all internal communication through AMQP (the Exchange and the cap container) enables the enforcement of governance and security.
Do firewalls limit throughput?
Are you involved with advanced networking activities such as in joint Tecs (?)
Matt says: I am the network expertise. Has extensive experience on operating high available, secure, scalable systems. The operations manager will take over. This is not a research part of OOI-CI, it is a deployment challenge "only". Leverage relations with NW GigaPop, FutureNet.
Release-1 roll out, Woods Hole network access point?
Matt says: CI has responsibility of putting a (smaller) Acquisition CyberPop there. Badwidth coming im limited by satellite, but large burst of data when ships are coming back from service cycles. CG has choice of computing there, but all data are recovered to Portland, the central reliable archive. There is also a network connection from Woods Hole into the high bandwidth national circuit network.
--------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------
USER EXPERIENCES
itunes organizing metaphor, uses as "container" - possible UI target for release 1
Software agent (Dorthy): living on searches. They do crawling continuously and update the result set. In our data world, imaging a DAP server infrastructure that continuously fetches for new datasets and metadata on behalf of the community.
Roy: The ooi team needs to start about how to implement a search strategy: limiting search or expanding search. Faceted search are limiting search in a certain way.
Event response behaviors, advanced search mechanisms: They will first have an operational presence in the system; then OOI will make it available to its users and create situational presence.
Evaluate trust: crowdsourcing. Trust can be created by accumulation (multiple untrusted individuals reporting a coherent story)
Software is not a user (in a way that an application is not a user of an operation system).
Matthew says: computing is changing. up until recently the only things we did was viewing and manipulating. now we are seeing a third core verb emerging: participation. We have to start of finding software "participating". This piece of software is acting on behalf of an institution, organization, individual. The agent. The agent is a participating resource (besides information and taskable resources).
Did CI look at all at methods allowing users to define and design their own user interfaces.
Susanne: This is the idea we are getting at. GoogleWave is along this way.
Matthew: Find a way of providing user provided interfaces within the system so that other users can make them part of their toolchain.
Ingolf: Interaction interface specifications are key to enabling these ideas.
Do you need an OOI approval to get data or an app on an OOI-iTunes?
Matthew says: The reason why we support different facilities is to support different communities with different policies and governance. The OOI-store most probably will require OOI certification. A public store (reapplying the same mechanisms) is the decision of the community to set the policy for certification etc.
The following is quite interesting especially the live video inclusion in the back portion. Look at how the orientation of the camera is calculated on the fly. Between this and Google Earth/Ocean we have some very significant geo-referenced application platforms to operate on top of.
Virtual Earth + Seadragon + Photosynth = http://www.ted.com/talks/blaise_aguera.html
DATA MANAGEMENT SUBSYSTEM DEMO
which version of PyDap? David says: New version, 3.0rc8
Cacheing whole data set or just the request? Matt says: For now, the whole data set. We have a model for representing partial sets.
What are the criteria for notification? Paul H says: In this case just arrival of a new data set. But has been more refined in the past, and (John says) can be very refined.
Which entity/activity generates the notification? Paul H says: The event notification service sends it out. Arrivals of data at the data store is the event that triggers the notification.
Suggestion for service notifications going directly to the web page using a URL. Paul H says: Users can do that with very little trouble (so we can show that going forward).
The data Exchange prototype shows some core concepts of development of distributed systems and deployment of distributed systems. Developers can run a set of multiple distributed processes on their local development environment (e.g. their desktop) in a constrained environment. The system is still a fully capable distributed system, communicating asynchronously. The operational deployment is similar, only that different physical servers host multiple instances of each service.
Matthew says: This enables OOI to apply pod-based deployment and update (the Ebay model): Replace 10% of service instances with a new version and see and test for a period of time of this version survives. Then replace a 50% pod and see if this survives and then do the rest. This reduces the risk of exposing users and the community to new versions substantially.
--------------------------------------------------------------------------------------------------------------------------------------
DATA MANAGEMENT SUBSYSTEM
data set: a fixed bounded thing covering time and space
data stream: blocks of information coming back undounded in space and time
This is a limited definition which is a helpful definition for the scope of this document. It does not represent our full understand of this idea.
What user data management policies can be controlled in Release 1?
Michael writes: Probably very few. This is mostly in scope of R2
Where will this fail? David says: Goes back to inventory, building that on Redis.
Inventory is a registry? Basically yes. Representation of COI registry, extending it (or restricting it).
COI Registry is also Redis? That is what we are working with, but not the final choice.
It has some fault tolerant capabilities, relating to master-slave. Matt says: More in Michael's realm.
What role does Bill Howe's work play in the system? David says: For release 1, we are targeting the necessary IOOS functionality. Probably will be too ambitious to go after that, CDM will cover R-1 needs. But we are after the kind of separate of concern supported by BIll H's approaches. Matt says: There is an underlying notion we can get away with a few canonical forms that can address a few functions. (Informed by ERRDAP work, which showed plausibility.) Fundamental jump is to say can we have a firt class representation of topological structures and corresponding implementation? Bill H provides degenerative cases and n-dimensional structure. WIth Redis set capabilities and formal representations of topological representations, this could be really good by the end of Release 3. We understand this is an unsolved problem, but with Roy's guidance and Bill's technology we think
What is our definition of data, and is the system sensitive to that? We also use models, and you're proposing dealing with them. Matt says: We're trying to solve higher level problem beyond even models. Went through unsuccessful exercise in 2009, which helped us recognize the problem, difficulty of achieving it in this time period. David says: Collaboration lets us show this at scale, for both observations and models. (Voltages and assays.) Roy says: Note this is not parameter-specific. David says: Is more based on topology.
Said you have to get model right, high risk. For first release for this group, how soon does it have to be right? John says:
SENSING AND ACQUISITION SUBSYSTEM
Instrument providers means? Michael says: This is speaking loosely as a broad class (operators, scientists, etc.), not meant as vendors per se.
Recommend reviewing consistency of the proposed Release 1 functionality with the L4 requirements in DOORS (specifically ensuring the release tags are up-to-date).
Will the entire use-case for direct access be implemented in R1? To include the UI and the OMS interface? Arjuna and Michael says: The UI is not really a part of Release 1, but very limited UI and management interfaces (not of final deployed quality) will necessarily be present.
Where will an example like SIAM run? John says: 2 components, driver and middleware. The drivers, to the extent they are ported, will run in a capability container close to the instrument. The middleware aspects will be distributed to components like the Instrument Agent, which also may be a part of the capability container. Michael says: Capability Containers will be of different flavors and level of functionality, as needed in the particular case—some may be written in C and operating in the marine platform, for example. Arjuna says: We could have some of these components packaged for use in the AUVs, for example.
Will the
--------------------------------------------------------------------------------------------------------------------------------------
COMMON EXECUTION INFRASTRUCTURE Demo
How often do the Amazon services perform unexpectedly? And is there an identified risk/concern over the reliability and availability of these services?
Discussion of the conversion of our codes that would be necessary to convert code into other Platforms (whereas these commands can be run on several platforms). Yields the information that the Microsoft environment (supported apps) is significantly evolved from its state 10 months ago.
Where is it being run from? Tim says: On the cloud. Matt says: Nice to keep all the management contexts operational even when site went down. Michael says: First and foremost is to make sure resources are there highly available. Matt says: Implications on capitalization of equipment. Redundancy investments are not needed, they can be taken from the cloud; this is core goal of release 1.
While we have redundancy of services, we want redundancy of persistent data. What is our data consistency model? Matt says: Eventually want consistent data basic model. Used ERRDAP deployed across multiple computers in cloud. Many applications not necessarily designed for sharing, so acquisition phase needed in technology integration phase to operate on scale.
Jack says: Calculation of reliability information requires count of number of machines. John says: Reliability shouldn't be a function of individual machines.
Amazon avoided because open cost model (IO charges) is unpredictable. Matt says: What we're doing to fix that is to establish a peering relationship with 10Gig circuit off OOI backbone while minimizing transit costs. Talking about separate deal with Amazon to not charge ingress/egress. (Educational model was original driver for that.) Kate says: This helps with both cost and performance.
--------------------------------------------------------------------------------------------------------------------------------------
COMMON EXECUTION INFRASTRUCTURE SUBSYSTEM
There are existing technologies that can do all of this, like Condor. Have you considered adopting one of those existing technologies, developed by experts? Michael says: Yes, I have just introduced how everything fits in, but Kate Keahey is our expert who will be presenting the details of the technologies we are applying.
How does the Elastic Processing Unit (EPU) load balancing work?
Michael writes: The EPU controller receives work request messages and routes it to worker instances. It is also possible that work messages are not routed through the controller, but that workers take work messages from a work "message queue" in the COI Exchange directly.
Is the CEI only used for elastic scaling on demand with very short reaction times?
Michael writes: No, not only. In Release-1, the biggest stress case is actually the initial start and control of the core CI services, with most emphasis on high availability and very little scaling to demand. The rection times do not matter that much there. Nonetheless, new instances need to be started automatically in case existing instances fail or need to be taken down for maintenance.
Why should "will it work with COI?" be an issue? Can't COI create wrappers to smooth over any disconnects?
Can you define what you mean by a harness? Kate says: A set of applications that can test the various services. Matt says: Key point is that it takes a lot of processing units to generate demand for the other processing units.
One of the things that confused me was that you built a platform with (that). Kate says: Do they exist independently, such that we can configure at will? Do they run Linux? Tim says: Number of dsployable types you can host is very small, suggesting be limited to that? Matt says: The answer is at a higher level. Characterized Infrastructure and Platform as-a-service -- the ones that limited the context werre in the latter category. THese are significant community resources, we will project ours as local services within those platforms. But we are projecting our applications in its native form to the larger services. We are an IaaS service ourselves—our strategy is to operate all these as one. Collaboration stategy goes uphill into PaaSs when community demands.
Example of Platform-as-a-Service: Microsoft Azura, Google Hadoop.
Planner is policy-configurable, what is method for configuring this, policy from yesterday? Michael says: Yes, they are related. Yesterday we focused on access policy and commitments. Anoher type of policy is resource scheduling, a subclass of policy following the agent pattern. The dots can be connected but not directly. Kate says: The demo after the break will show how this work directly.Michael says: Everything we heard yesterday is applicable. Matt says: The notion of being able to process new processes on a set of nodes, and the notion of enrolling in a community and have it managed -- you can see we can manage resources across cloud environements, and use domains of authority. Now a community that wants to do a joint compaign with models and assets can be provisioned and enroll in a secure group, not a localized space. Resources like instruments and models can be bound into this, along with analysis and simulation functions, on demand and without speaking to network administrators anywhere (or anyone else!). The interactive ocean observing will be carried out by communities ganging together, with our support.
Agrees on platform as a service issue. Service model has implications on computing model and systems work under it. Does it map properly to what applications need? Applications on uncertainty analysis of dat acoming out of sensors requires very tightly coupled codes running in close coordination. Kate says: It has been proven that some science applications do well on these services, the question is how large. When the rubber hits the road we'll have to see if it will scale, hopefully sooner rather than later. The role of IaS is not to develop IaaS, but to help it happen. We're trying to adopt what's out there. Known challenges: networking to Amazon, for example.
Question is, large majority of applications will run nicely, for which model is great. Kate says: What about other case? Working on these other models. Matt says: 1) Set of assumptions projected on us -- need to articulate what those are for us, how they should be projected. 2) Projecting into tightly coupled resources requirements that we are not trying to solve. We do want to couple to that as if it's a process. It should look to us like a high-availability or operational unit; we don't schedule it we tell Teragrid to execute it. Through container environemnt? No, that's the agent, but we are not permitted to support the developing of models, but in R4 we can coordinate and control models. Same design we can do with instrument device. Agent handles presentation of device to network, som edevices present themselves. Modeling app running on BlueWaters can be set up and coupled, but we're not making any statement on how it's managed on BlueWaters.
Suggestion: Lots of work going on everywhere, felt we were re-inventing things—talk about those other things specifically, to present a much better picture. Matt says: Network will show computational resources in multiple locations, representing clusters. I want to know where we want to be 5 years from now. Design for intent, build for demand. Drawings go way out in terms of time, but technologies are chosen for what's there. You'll see us do more design work, but that's to make the integration decisions.
Assume demos from yesterfday is how agents architect.
General Question - is there a trace from L4 requirements to architectural elements and if so, where is that documented?
Michael writes: The L4 CEI subsystem requirements refer to the high level services of the CEI, which are the work packages of the CEI WBS. Specific technical naming within the architecture and design applies, such that the actual wording of the requirements might not use the same language as the architecture models. We have not yet established these trace links in the Enterprise Architect tool for CEI (for effort reasons), but will do so by LCA.
Question and Answer Period - Wednesday
What is the explict objectives for release - more detail and what is useable by release one.
Data distrubition network usable by modelers (users of release 1) Target community will to be users start with the first users of needs.
Will there be beta users? Starting from LCA work with early adopter communities. Notion of alpha and beta releases. Specific dates not set yet. Suggestion, collect specific people to be inital testers. Specifically IOOS and sensors are initial cases.
Question of charge - Committee evaluation should be from the view of the stakeholders.
Focus prototyping on high risk elements first.
Scope may be deferred because of risk. What might be defered first. Highest risk elements are pulled forward first. Low risk elements can be deferred.
By what criteria were risk determined. - Looked at dependency risks, technical risks, estimated effort to accomplish. Evaluation of risk has been ongoing and intense. Primary driver of how work is scheduled and organized. What are the factors which create the risk. Those factors not articulated in this review yet. We did not provide overview of risk. Risk registry being used for high level assesment. Risk registry does not yet go down to the details of the risk as we start development. Documentation of risk lower levels is not there yet. Committee would like to see a walk through of the procedures for one of the prototypes presented.
Does service have human interface? If so what is that experssion. Many services may not necessarily be visible to most users, but all services will be designed to be accessed directly by sophisticated users.
There is a risk in adopting a standard (even if you have prototyped in the area). The risk is that it is dead-end in the evolution (Think CORBA, which one program I was on adopted). The more mature a standard is, the less of one type risk it has, but a higher risk that it is end-of-life (think S-100 Bus). So, the question is: has OOI (CI) addressed this risk, what level of risk is this (low/medium/high). Considering the 20-year operational life, what is being considered to mitigate this risk?
Suggestion for future presentations and communications; more emphasis upon working relationships, collaborations, discussions with on-demand computing activities and approaches being done by TeraGrid, OSG, Condor, etc. This will help show that OOI is not only aware of similar work (and is working with others), but also to emphasize the fact that OOI is not reinventing or duplicating efforts or be viewed as "rolling their own" approach.
--------------------------------------------------------------------------------------------------------------------------------------
COMMON OPERATING INFRASTRUCTURE DEMONSTRATION
How do you handle the naming of the resources? Matt says: Not using an address, using a name as a stand-in. It is a controlled namespace.
Does this show the federated architecture yet? Michael says: Yes, in the CC demonstration there were interceptors and agents; it did not show much of the actual services themselves.
Good laboratory demonstrations. What about at scale? Matt says: Haven't put it under load yet. But it's built for scale, and we're buidling for distribution for scale. Is a good question for architectural review.
Michael says/writes: The message broker architecture is proven to allow for massive scaling if individual processes and message volume. The COI architecture is applicable to various degrees to interactions in the system. (1) The most extensive are governance interactions (negotiations). They occur only infrequently. (2) Then there are service invocations. They have less governance cost and can be scaled. (3) There can be raw message level interactions without any additional overhead.
COMMON OPERATING INFRASTRUCTURE SUBSYSTEM
Will large bandwidth data (e.g. HD video, hydrophones) also utilize the messaging infrastructure? Or is there a different "special purpose" mechanism?
Michael writes: Setting up the use of high bandwidth (e.g. start streaming, change angle) requires a very small interaction beforehand, which is where messaging with governance is applied. The actual use (sending large bandwidth data) does not have to go through governance and messaging (but can). It could be routed at the network level.
There is a mechanism to go outside the infrastructure for these messages? Michael says: Yes, once the control function is executed, there is no requirement that every message go through the infrastructure. Matt says: It turns out you can send more messages through this kind of messaging system without problems than people think.
Since this is a Release 1 activity, what information do you need to make the decision of which it will be and when do you need to receive the information?
Slide 29: What does 'Allocate PubSub' mean?
Michael writes: These are the actions that take place before actually using the messaging system. This means to subscribe to a "queue" in the messaging system, or to register as a publisher with the messaging system (or roughly opening a "connection" to another endpoint). In this diagram it occurs after registering (enrolling) as a communicator with the Exchange.
For the selected technologies, what relationships do you have with the various vendors to ensure any and all issues with their products are handled in a timely manner? (addressed elsewhere?)
what are the benefits of implementing two versions of the capability container? Michael says: Shows we're not limited on implementation technologies, which will change. We want to take advantage of elements in both architectural packages, this gives direct access to the capabilities of the language. Matt says: This also enables the user base to be able to develop their own functionalities and services in a supported language.
how much of the architecture has been prototyped and how many of the major risks have been addressed? (see next)
How many of those technologies has been integrated into prototypes, and how much of the architecture has been prototyped? Michael says: We've split the architecture into elements that are evaluated based on risk. From those risks, we developed prototypes (during pilot period and inception phase) to address them. We have prototyped broker, DIF, Capability Container, initial FIPA work, Rules engine in that context, Redis attribute store, and several other prototypes.
Of those high risk items, how many have been addressed? John says: WIll address explicitly Thursday; many high risk elements have been reduced to medium or low. Matt says: Almost every risk on the Risks slide have been touched, some multiple times. The order hasn't changed in the mitigation, but we understand the technologies more.
In designing the architecture, have we considered how to leverage existing off-the-shelf technology (a lot of this is relatively conventional)? Michael says: OOI is designed to be an integration project of existing technologies. We have advanced visions to make it transformative, which drives many COI elements, but most of the rest includes existing technologies to be integrated. You see this much more in other subsystems—for example, iRODS in the Data Management subsystem is a large body of functionality that we merely must interface.
It sounds like we've done a lot of work, why are they still high risk? Matt says: Evaluate risk on 2 dimensions, likelihood and consequence. The latter is high, even when likelihood is moderate, so that keeps risks high.
Slide 43 (2940-00063 OV6 COI seems to represent a lot of complexity risk? Michael says: This is a layered diagram, showing all the different parts of the stack. It could be drawn more simply, or with more detail. Von says: This represents a deeper level of thinking about issues before it is handed off to the coders than may usually be the case. Matt says: Think about multiple aspects: number of processes involved, possibility of communication faults that require addressing,.... One of the things this reflects it the transition from the Application developer owning all these pieces, to moving them into the infrastructure, and the necessary allocation of responsibility to all these lower level processes. (Note the nature of the banking system, and the EC2 processes working through queues, in case processes go down.) Michael says: the number of distributed processes in the drawing is actually not that high. Most of the arrows represent the equivalent of event handler calls in the capability container.
The other part is examining where fault recovery plays a role, because those will kill you if not recovered from.
Will resource registry (for searching) be distributed, or highly centralized? Michael says: We are thinking of an implementation like a Redis key-value store, with additional registration overlaid, then the back end can be one or several registries (potentially federated so that a discovery service discovers both). What consistency guarantees are being considered? Matt says: Local consistency appears to be fine for most data. Underlying model has notion of an owner, serial access/consistency means material can be brought back to owner. Michael says: Versioning enables consistency based on keyed versions.
A big remote sensing program that required campaigns for recomputation of data product, the entire output of the campaign has to be consistent. So heads up as to complexity, need for multiple keys for example. (Robert H.)
Nice to see building on proven technologies. You're building the architecture and various prototypes, how do you validate them? Do you give test cases to community to try out? Michael says: We are validating through continuous iterations and community involvement. It is the intent to make these systems exposed to the external users. Matt says: The reason for scheduling some of these activities and layers at this stage is to enable the presence of the consumer.
Depending on what's running in the container, Java may be best choice. For example, for compatibility. Matt says: At this point, we're focusing on internals, performance can be optimized later. The notion of a service to front a stand-alone application like Blast, avoding the need to touch the code. That also isolates the networks, service wrappers wall off and provide a facade.
Service should be implemented as a function of need. User should not have to change their workflow. If in Matlab, should be able to use that tool. How easy is it to support a whole slew of applications, with all those libraries? Matt says: We don't have to put all our processing code inside those applications, things like CILogon show we are moving to communication channels as the connection point. Allows the OOI network to be onboard with the local computer.
So it will appear as a VPN? Matt says: Effectively. EIther you use the browser, which provides security context, or you'll bring process onto the box like a VPN client.
A lot of external technologies are open source. What relationships have you pursued with vendors. Matt says: Our choice of technologies focuses on growing vibrance, resting on open source crowd use (responsiveness, community size).
--------------------------------------------------------------------------------------------------------------------------------------
CI ARCHITECTURE
PDF of diagrams is at http://www.oceanobservatories.org/spaces/download/attachments/21205028/2010-02-22_EA-Model_Diagrams.pdf (tested and working!)
Navigable HTTP export is at http://ooici.net/eaexport
Release 1 includes platform direct access? (does not include platform agent, right?)
Michael writes: The platform agent (nor the instrument agent) is not required to implement direct access to instruments (serial, IP) and host platforms (terminal, ssh). Both direct access to instruments and platforms in in the scope of R-1.
What role does CI Architecture have in the change control process?
Michael writes: The elements of the CI architecture are under change control. This includes the narrative with illustrations and each individual specification drawing. As such, every change to the architecture needs to be approved in the change control process, typically around milestones such as LCO and LCA in a packet. Depending on the scope of the change, different CCB levels are involved. The architecture covers all relevant design decisions and interfaces. Therefore the architecture is the central mechanism for baselining and change control. The architecture documentation ends at a certain level of detail (fine design), which is left to the implementer and not directly subject to change control.
Does the group have a Communications person to develop the "Story"?
Michael writes: The architecture team develops the story from the architecture point of view; they also relate the story to requirements and to the architecture elements. The project scientist project the views of the community. A PR-quality presentation of the story will need to be developed by the communications team; its roots are coming together now.
How much detail in the Enterprise Architect - All the drawings? How fine a detail, and when does the effort to track and document get in the way and become counterproductive? Who is the primary audience for this? It appears as though there is too much detail and effort on documentation. John writes: The drawings in Enterprise Architect are developed to the point that the development team can pick up those descriptions and produce software to create a working system, based on the descriptions. (The development team contributes to the development of the architectural detail, as it refines the common understanding of what is necessary.) There are two primary audiences: the development team inside the project, including the architecture team, who need to understand each other's work and the shared view of the system; and the external reviewers and observers at meetings like these, who may want or need to understand the organization of the system. The emphasis on communication artifacts results from the number of project participants, from many different institutions, who must collaborate on the development of the system.
How does the group "track" and learn about new technologies? Is there a formal plan for this or is it more serendipitous? Maybe some sort of technology working group or effort which identifies a small subset of folks who periodically explore and discuss opportunities. John writes: On the logistical level, technologies are tracked in a large shared spreadsheet on Google docs. (A copy of the spreadsheet was distributed as part of the review artifacts.) The activity is both formal and informal. Key aspects include the identification of leaders in each discipline, who bring their own technologies and expertise, and their awareness of other related technologies; research and discussion with each other and the related communities, as development proceeds; and the continued management of our shared knowledge through the common spreadsheet. In addition, the workshops held over the last few years have been explicitly constructed to elicit additional knowledge from a wider community.
OOI Integrated Observatory (OOI ION)
OOI ION: What data products will be ingested, streamed and cataloged?
OOI ION: What does provide initial instruments integration mean? What instruments, when for Release 1? John writes: Three prototype instruments are being identified to validate the Release 1 system. Throughout Release 1, we will develop the instrument interface software—including the Instrument Agents, representing a higher level of abstract access to the instruments, and the software drivers for the instruments—and will be testing that software with the prototype instruments. Instrument control and data acquisition will be possible. The actual user interfaces of defining an instrument and moving it through its life-cycle (with various testing and deployment steps) will not be part of R-1. The actual deployment of instrument agents on a paltform and the interaction with the platform agent are not in the scope of R-1 either.
OOI ION: Are there identifiable or incremental capabilities that can be "rolled out" prior to the Release 1? That is, are there capabilities that can be deployed or tested by users before the release is finalized? Perhaps at Alpha or Beta?
Michael writes: The Data Exchange is a collaboration project with NOAA/IOOS and targets an early deployment of CI technologies and functions to a limited audience. This could be an avenue to elicit user feedback before the actual release.
There was mention of no single/centralized data store. What is the plan/strategy to archive data over the life-span of the project (20+ years). Will all data collected over the life of the observatory be available "on-demand"?
JBG writes: Data store is not single/centralized, but there is an extensive strategy for archiving data over time. Eventually we anticipate, and rely on, a national data storage paradigm taking ownership of data storage.
This means no satellite data? What about model data? Matt says: We are not capitalizing the development/storage of models is not in our scope. Satellite data transiting across our network is cached according to popularity in a period of time, but aged uninteresting data sets will be phased out. We understand the sizes involved, and the storage is being managed in a progressive way. Long term storage will be carried by the responsible community.
How many L4 requirements are being addressed in R1? Out of how many total?
Network Deployment Diagram: Mid-Atlantic should be Pioneer Array; Southern Sea should be Southern Ocean.
List of release 1 products looks like laundry list, not clear how much will be done in year 1? Michael says: Subsystem presentations will help go into this detail. Matt says: We could focus on any 2 of these for the entire year, but consider the system a progressive effort, with the shape of this and subsequent releases determined by this team that includes the reviewer.
Long term data requirements for OOI is huge. Assumption that we're addressing this sufficiently should be more fully evaluated. Matt says: We believe that the scope of networking and distribution infrastructure, and work with partners, realizes an easy way to manage growth as needed, while the cost of storage continues to halve every year (not abating).
Is there a provision for intermediate smaller releases in between? Matt says: Yes, bug fixes are possible, but to the extent these take resources, they come out of contingency. John says: We will have the capability embedded in our system to fix bugs quickly and roll out releases. Matt says: There is another level of more detailed iteration (not shown) in which the pace of work is established every 8 weeks (very timeboxed). Michael says: Whole process is supported by service oriented architecture, incremental roll out of different service versions is possible.
Unified process is sensitive to granularity of use cases and length of cycles. Disconnect betwen design elements and the high-level use cases. What is connection between high level use cases and granularity of specific capabilities? Michael says: Subsystem presentations will speak to more detailed use cases; more material exists behind the present material.
How closely are the users involved in the process of developing use cases? Michael says: There have been extensive workshops with many users which have resulted in considerable material developed and integrated into much of this material. Workshop page at http://www.oceanobservatories.org/spaces/display/WS/ consolidates this information, in consistent sets of resources that are fully documented. Matt says: From workshops we went to prototypes to reflect combinations of technologies and concerns, to try to avoid dead ends. These prototyping results are summarized at http://www.oceanobservatories.org/spaces/display/CIDev/Prototyping/
There was a statement with respect to off-cycle releases and how that would be addressed. The answer mentioned use of contingency and diverting development resources. Once a release is deployed, why would that effort/budget not be covered by the O&M effort? John writes: Even if the issue can be addressed in the short term,
--------------------------------------------------------------------------------------------------------------------------------------
USE CASES
Discussion of normalizing data model didn't address what normalization is and why OOI wants to do it. Would an ontology obviate the need to normalize the OOI data model to a particular, existing data model (such as CDM)?
John G writes: The purpose is to make it possible to work with all the data sets using a consistent data structure and semantic content (terminology). Ontologies are potentially a very useful representation framework for standardizing the semantic content, but community-specific knowledge must be encoded in the ontologies, People have worked on describing standards and data structure using ontologies (using ontologies as a language to describe the data models), but this is at best extremely challenging. (Would be happy to discuss this further.)
Michael writes: Normalization is the activity of transforming encoding (what are the bytes), format and structure, and the semantical interpretation into the OOI selected canonical form. The necessity of doing this is (a) to be able to transform from any data representation into any other data representation, and (b) to reason about any kind of dataset in a uniform way. I don't think that the use of ontologies "on-the-fly" already gets us there. However, ontologies are very useful to support the transformations in and out of the canonical form.
****************************************** UPCOMING TOPICS **************************************
--------------------------------------------------------------------------------------------------------------------------------------
############################################################################