1. The document discusses bringing the Open Science Grid (OSG) capabilities to the Virtual Cell (VCell) software to enable running computationally intensive VCell jobs on the OSG infrastructure.
2. It explores different approaches to deploying and monitoring VCell jobs on the OSG including using Condor, Globus Toolkit, and other OSG services.
3. While progress was made, fully integrating the two systems within 10 weeks proved challenging and would require ongoing work to resolve architectural decisions and achieve a dynamic view of the grid.
3. VCell Software Architecture (web-based distributed client/server framework) Compute Cluster Simulation Worker Service Siumulation Data Service Data Export Service Database Service Simulation Dispatch Service Connection Manager Server Manager Database Service Database Service Data Export Service Data Export Service Siumulation Data Service Siumulation Data Service Simulation Dispatch Service Simulation Dispatch Service Simulation Worker Service Simulation Worker Service Compiled Simulation Jobs Compiled Simulation Jobs Compiled Simulation Jobs Compiled Simulation Jobs Compiled Simulation Jobs Compiled Simulation Jobs Compiled Simulation Jobs Storage Cluster Servers at CCAM VCell meets OSG Client JMS Broker (SonicMQ) Database (Oracle) Batch Scheduler (PBSPro)
8. VCell meets OSG Compute Cluster Simulation Worker Service Siumulation Data Service Data Export Service Database Service Simulation Dispatch Service Connection Manager Server Manager Database Service Database Service Data Export Service Data Export Service Siumulation Data Service Siumulation Data Service Simulation Dispatch Service Simulation Dispatch Service Simulation Worker Service Simulation Worker Service Compiled Simulation Jobs Compiled Simulation Jobs Compiled Simulation Jobs Compiled Simulation Jobs Compiled Simulation Jobs Compiled Simulation Jobs Compiled Simulation Jobs Storage Cluster Servers at CCAM Outside Firewall VCell Architecture Client JMS Broker (SonicMQ) Database (Oracle) Batch Scheduler (PBSPro) OSG Services OSG OSG Web service
19. Example Job Count chart for BNL_ATLAS_1 Source: Gratia VCell meets OSG My Project
20.
21.
22.
23.
24.
25.
26.
27.
Editor's Notes
In my project, I don’t really care about the client much, execept maybe notifying and returning some information about progress, which is still far away Connection Manager maintains the active communication with the client and notifies the JMS JMS is what keeps Vcell running. Keeps a state of running jobs in queues and if something fails, respawns Keeps track of client nodes (compute cluster) Ensures messages get delivered - PBS runs the jobs and monitors on the Compute Cluster
A VO is just a loosely based set of users basically, usually affiliated to a core organization but not everyone is. A site would be of one of the subtypes: Compute Element (CE) or Storage Element (SE)
i.e, we need to agree on a few standards if we want to have a functioning grid
VDT – Virtual Data Toolkit - Forms the client and gateway infrastructure by taking a subset of tools like Condor, Globus and others - Pretty self sufficient GSI - Grid Security Infrastructure [http://www.globus.org/security/overview.html] - Provides a single sign-on for users on the grid. Every user is identified via a certificate provided by DOE (X.509 format), a third party Certificate Authority is used to certify the link between the public key and the user. Globus Toolkit – A set of tools used by a lot of grid sites to manage the workflow and monitor jobs GridFTP – GridFTP is like multi-threaded FTP, if your regular file transfer is a pipe, GridFTP is like a collection of those pipes to make a hose WSRF – A framework to represent objects on the grid, like computing resources, jobs as the resources using XML. The nice thing about WSRF is that it helps to maintain a state on both ends. An analogue to this would be the RESTful representation of objects in the web using XML, or Javascript Objects.
This is a small section of Condor Status output from a pool, this can be used to send jobs and ensure they’re placed in right type of systems