SlideShare a Scribd company logo
1 of 41
Download to read offline
Vas Vasiliadis
vas@uchicago.edu
February 28, 2024
Advanced Administration Topics
Agenda
• Managing nodes in a production deployment
• GCS troubleshooting
• Customizing domains
• Managing roles on endpoints and collections
• Customizing identity mapping
• Implementing sharing policies
• Additional storage gateway options
• Performance tuning
2
Adding DTNs to your
endpoint
3
Multi-node DTN behavior
• Transfer tasks sent to nodes in round-robin fashion
• Active nodes can receive transfer tasks
• Tasks on inactive node will pause until active again
• GCS manager assistant service
– Synchronizes configuration among nodes in the endpoint
– Stores encrypted configuration values in Globus service
4
GCSv5 deployment key
5
Adding a node requires just two commands
$ sudo globus-connect-server node setup --deployment-key THE_KEY
$ sudo systemctl restart apache2
Copy the deployment key
from the first node (DTN) to
every other node
Node setup pulls configuration from Globus service
Check your DTN cluster status:
globus-connect-server node list
Migrating/refreshing
DTNs
7
Migrating an endpoint to a new host (DTN)
• An endpoints is a logical construct è replace host
system without disrupting the endpoint
– Avoid replicating configuration data (esp. for guest collections!)
– Maintain continuity for custom apps, automation scripts, etc., that
use the endpoint UUID
• 1. Add new node to endpoint à 2. remove original node
• Again, deployment key is required
– Export node configuration with node setup --export-node
– Import on new DTN using node setup --import-node
Troubleshooting
Globus Connect Server
9
Before asking for help…
• self-diagnostic can identify many issues
– Are services running? GCS manager/assistant, GridFTP server
• Connectivity is a common cause
– Can Globus connect to the GCS Manager service?
– Is the DTN control channel reachable?
– Can the DTN establish data channel connection?
docs.globus.org/globus-connect-server/v5.4/troubleshooting-guide
…and we’re always here for you: support@globus.org
10
Customizing GCS
domains
11
GCS domain configuration (default)
12
/var/lib/globus-connect-server/gcs-
manager/etc/httpd/conf.d/G_COLL_UUID
Domains on DTN
vhost: Management API
abc.abc.data.globus.org
vhost: Mapped Collection
m-abc.abc.data.globus.org
vhost: Guest Collection
g-abc.abc.data.globus.org
/var/lib/globus-connect-server/gcs-
manager/etc/httpd/conf.d/M_COLL_UUID
/var/lib/globus-connect-server/gcs-
manager/etc/httpd/conf.d/EP_UUID
Multiple (sub)domains exist in a GCS deployment
• Endpoint
– e4faec.75bc.data.globus.org
• Mapped (m-...) and guest (g-...) collections
– m-8dd2b7.e4faec.75bc.data.globus.org
– g-e7b189.e4faec.75bc.data.globus.org
• Subdomains are distinct Apache vhosts
/var/lib/globus-connect-server/gcs-manager/etc/httpd/conf.d
• Management API for the GCS Manager service is at:
https://e4faec.75bc.data.globus.org/api
Customizing GCS domains
• Set up DNS record
– Avoid using FQDN for the DTN
– activedata.uchicago.edu and *.activedata.uchicago.edu (see below)
• Put SSL certificate/key on DTN
• As endpoint owner or admin run: endpoint domain update
--domain activedata.uchicago.edu
--certificate-path ...
--private-key-path ...
--wildcard ß important; otherwise collections use data.globus.org
--managed ß really important for certs/keys to be sync’d across DTNs
• Assuming --wildcard, domains for collections will look like…
– m-8dd2b7.activedata.uchicago.edu
– g-e7b189.activedata.uchicago.edu
Managing roles
15
Subscriptions and endpoint roles
• Subscription(s) configured for an institution
• Multiple Subscription Managers per subscription
• Subscription Manager ties endpoint to subscription
– Results in a “subscribed” endpoint
• Assign additional roles for endpoint management
– Administrator, Manager, Monitor
Be identity-, role-, and permission-aware
• Default: Only endpoint owner can configure an endpoint
• Delegate administrator role to other sysadmins
– Best practice: Delegate to a Globus group, not individuals
• Check identity using the session command
• Check resource permissions on storage gateways and
collections with --include-private-policies option
docs.globus.org/globus-connect-server/v5.4/reference/role
Collection roles
• Mapped collections have same roles as endpoints
• Guest collections add “Access Manager” role
– Critical for automation
• Any Auth client can assume endpoint/collection role
– Particularly useful for scripts that manage large deployments
– e.g., script to list guest collection information:
gist.github.com/vasv/cdb8607e2bfab08634b5aa99389e87c7
• Roles may be granted to Auth (app ) clients for…
– …group management
– …flow execution/monitoring
– …any other Globus resource that has role-based access control
Customizing/extending
identity mapping
19
Mapping identities to local accounts
• Default: Strip identity domain (everything after “@”)
– e.g., userX@globusdemo.org maps to local account userX
– Best for campus identities w/synchronized local accounts
• Use --identity-mapping option on storage gateway
– Specify expression in a JSON document
– Execute a custom script
docs.globus.org/globus-connect-server/v5.4/identity-mapping-guide
Simple custom mapping example
Note: Requires the storage
gateway to accept identities
from two domains
{
"DATA_TYPE":
"expression_identity_mapping#1.0.0",
"mappings": [
{
"source": "{username}",
"match": "42032579@wassamottau.edu",
"output": "vas",
"ignore_case": false,
"literal": false
},
{
"source": "{username}",
"match": "(.*)@uchicago.edu",
"output": "{0}",
"ignore_case": false,
"literal": false
}
]
}
Otherwise, default behavior
local user à domain username
Map
42032579@wossamottau.edu
to local user vas
“Hijacking” ID mapping for autoprovisioning
• Useful in large user communities; esp. where other
automated account management processes exist
1: Get username from input identity
2: If no local user with username, create user
3: Add local user to map file
• Use sample script in docs as a starting point
22
Implementing sharing
policies
23
Sharing restrictions
• Guest collections may be created in any directory
accessible by the collection, by any authorized local
account
• You can restrict who can share…
o --sharing-user-allow
--sharing-user-deny
o --posix-sharing-group-allow
o --posix-sharing-group-deny
• …and what they can share…
o --sharing-restrict-paths (specify JSON PathRestrictions)
Restrictive/specific sharing policies
• Setting policies for specific user/path combinations
$ globus-connect-server sharing-policy create 
--user myuser –user youruser 
--read /reference --read-write /cui/mysecrets
• Sharing policies cannot override restrictions on the
underlying storage gateway
{
"DATA_TYPE": "path_restrictions#1.0.0”,
"read_write": ["/home/"],
"none": ["/cui"]
}
#FAIL
Due to storage gateway
--restrict-paths policy
Limit whom data owners can share with
• Authentication policies limit guest collection access
to identities from specific domain(s)
• Attach auth policy to mapped collection
• Explicitly include/exclude identity domains
• Domains used to filter permissions when authorizing
access to a guest collection
26
Create auth policy and attach to collection
$ globus-connect-server auth-policy create 
> --include *.edu --include globus.org 
> "Allow sharing internally" 
> "R&E Sharing Policy"
Authentication Policy ID: 45ff23ed-43a8-438c-aaa8-e8e36708756e
$ globus-connect-server collection update 
> --guest-auth-policy-id 45ff23ed-43a8-438c-aaa8-e8e36708756e 
> 56c3dff0-d827-4f11-91f3-b0704c53aa4c
Allowed sharee domains
Apply policy to this collection
Additional storage
gateway options
28
High Assurance attributes of interest
• Typical settings:
– Access limited to single authentication domain
– Auth timeout typically reflects other institutional policies
• Detailed audit logs: /var/log/gridftp-audit.log
– Recall: regular logs in /var/log/gridftp.log
• MFA requirements
– Requires that IdP provides the acr and/or amr claims
Configuring a “private” data channel
• Default: data interface is set to the DTN’s public IP
address (see data_interface in
/etc/gridftp.d/globus-connect-server)
• Create /etc/gridftp.d/STORAGE_GATEWAY_ID
• Set data_interface PRIVATE_INTERFACE_IP_ADDRESS
• Replicate on every DTN (files in /etc/gridftp.d/ are
not sync'd between nodes by Globus)
30
Supporting non-POSIX systems
• Update your GCS packages
• Add the appropriate storage gateway
– Non-POSIX systems require add-on connector subscription(s)
• Gateway configuration options vary by connector
– e.g., specify bucket name(s) for AWS S3
• Collection authentication options vary by connector
– e.g., provide user access key and secret key for AWS S3
– Credentials must grant appropriate permissions
– Mapped collection may not actually “map” to local user account
Accessing non-POSIX
systems: AWS S3
(and S3-compatible systems)
32
On performance…
33
Globus is fast (and secure, and reliable), but…
72.8Gbps
Your observed performance will depend on…
• Data Transfer Node (CPU, RAM, bus, NIC, …)
• Network (devices, path quality, latency, …)
• Storage (hardware, attach mode, …)
• Dataset make-up (file#, size, tree depth, …)
– Remember: LoSF == Great sadness
• Strange things people do (one transfer/file …1M files)
• …?
35
Interpreting reported performance
36
A more accurate
speed measurement
(expect wide variance)
“Effective” includes
service overhead
(primarily to guarantee
data integrity!)
You should have Great Expectations
37
Dataset Size Transfer Time: 1min Transfer Time: 5min Transfer Time: 20min Transfer Time: 1hr
10 PB 1,333.33 Tbps 266.67 Tbps 66.67 Tbps 22.22 Tbps
1PB 133.33 Tbps 26.67 Tbps 6.67 Tbps 2.22 Tbps
100TB 13.33 Tbps 2.67 Tbps 666.67 Gbps 222.22 Gbps
10TB 1.33 Tbps 266.67 Gbps 66.67 Gbps 22.22 Gbps
1TB 133.33 Gbps 26.67 Gbps 6.67 Gbps 2.22 Gbps
100GB 13.33 Gbps 2.67 Gbps 666.67 Mbps 222.22 Mbps
10GB 1.33 Gbps 266.67 Mbps 66.67 Mbps 22.22 Mbps
1GB 133.33 Mbps 26.67 Mbps 6.67 Mbps 2.22 Mbps
100MB 13.33 Mbps 2.67 Mbps 0.67 Mbps 0.22 Mbps
ESnet EPOC target for all DOE labs
(requires at least a 10G connection)
Science DMZ: Network configuration best practice
38
Source
security
filters
Destination
security
filters
Destination
Science DMZ
Source
Science DMZ
Source
Border Router
Destination
Border Router
Source Router Destination Router
User
Organization
DATA
CONTROL
Physical Control Path
Logical Control Path
Physical Data Path
Logical Data Path
Port 443
(configurable)
Ports 50000-51000*
(default)
Data Transfer
Node (DTN)
Data Transfer
Node (DTN)
* Not actively listening; only used when transfer is in progress; may be restricted to private network
Please see TCP ports reference: https://docs.globus.org/resource-provider-guide/#open-tcp-ports_section
ESnet
makes
magic
happen
Globus transfer performance is a team sport
• Network use parameters: concurrency, parallelism
• Maximum, Preferred values for each
• Transfer considers source and destination endpoint settings
min(
max(preferred src, preferred dest),
max src,
max dest
)
• Also be aware of pipelining effects
• Service limits, e.g. concurrent requests
40
Globus network use parameters
• May only be changed on managed endpoints
• Modify via the web app: Console à Endpoints tab
• Modify via Globus Connect Server CLI
– Run globus-connect-server endpoint modify
• Strong recommendation: Do not change network use
parameters before establishing baseline performance
41

More Related Content

Similar to Advanced Globus System Administration Topics

Connecting Your System to Globus (APS Workshop)
Connecting Your System to Globus (APS Workshop)Connecting Your System to Globus (APS Workshop)
Connecting Your System to Globus (APS Workshop)Globus
 
Introduction to Globus for System Administrators (GlobusWorld Tour - UMich)
Introduction to Globus for System Administrators (GlobusWorld Tour - UMich)Introduction to Globus for System Administrators (GlobusWorld Tour - UMich)
Introduction to Globus for System Administrators (GlobusWorld Tour - UMich)Globus
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System AdministratorsGlobus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System AdministratorsGlobus
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System AdministratorsGlobus
 
Globus for System Administrators (GlobusWorld Tour - Columbia University)
Globus for System Administrators (GlobusWorld Tour - Columbia University)Globus for System Administrators (GlobusWorld Tour - Columbia University)
Globus for System Administrators (GlobusWorld Tour - Columbia University)Globus
 
Tutorial: Introduction to Globus for System Administrators
Tutorial: Introduction to Globus for System AdministratorsTutorial: Introduction to Globus for System Administrators
Tutorial: Introduction to Globus for System AdministratorsGlobus
 
Globus Endpoint Migration and Advanced Administration Topics
Globus Endpoint Migration and Advanced Administration TopicsGlobus Endpoint Migration and Advanced Administration Topics
Globus Endpoint Migration and Advanced Administration TopicsGlobus
 
Globus Endpoint Administration (GlobusWorld Tour - STFC)
Globus Endpoint Administration (GlobusWorld Tour - STFC)Globus Endpoint Administration (GlobusWorld Tour - STFC)
Globus Endpoint Administration (GlobusWorld Tour - STFC)Globus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System AdministratorsGlobus
 
Globus for System Administrators (GlobusWorld Tour - UCSD)
Globus for System Administrators (GlobusWorld Tour - UCSD)Globus for System Administrators (GlobusWorld Tour - UCSD)
Globus for System Administrators (GlobusWorld Tour - UCSD)Globus
 
Globus for System Administrators (CHPC 2019 - South Africa)
Globus for System Administrators (CHPC 2019 - South Africa)Globus for System Administrators (CHPC 2019 - South Africa)
Globus for System Administrators (CHPC 2019 - South Africa)Globus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System AdministratorsGlobus
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformGlobus
 
Globus Endpoint Setup and Configuration - XSEDE14 Tutorial
Globus Endpoint Setup and Configuration - XSEDE14 TutorialGlobus Endpoint Setup and Configuration - XSEDE14 Tutorial
Globus Endpoint Setup and Configuration - XSEDE14 TutorialGlobus
 
Automating Research Data Flows and an Introduction to the Globus Platform
Automating Research Data Flows and an Introduction to the Globus PlatformAutomating Research Data Flows and an Introduction to the Globus Platform
Automating Research Data Flows and an Introduction to the Globus PlatformGlobus
 
Globus Connect Server v5 Q&A Briefing
Globus Connect Server v5 Q&A BriefingGlobus Connect Server v5 Q&A Briefing
Globus Connect Server v5 Q&A BriefingGlobus
 
Globus Command Line Interface (APS Workshop)
Globus Command Line Interface (APS Workshop)Globus Command Line Interface (APS Workshop)
Globus Command Line Interface (APS Workshop)Globus
 
Important work-arounds for making ASS multi-lingual
Important work-arounds for making ASS multi-lingualImportant work-arounds for making ASS multi-lingual
Important work-arounds for making ASS multi-lingualAxel Faust
 

Similar to Advanced Globus System Administration Topics (20)

Connecting Your System to Globus (APS Workshop)
Connecting Your System to Globus (APS Workshop)Connecting Your System to Globus (APS Workshop)
Connecting Your System to Globus (APS Workshop)
 
Introduction to Globus for System Administrators (GlobusWorld Tour - UMich)
Introduction to Globus for System Administrators (GlobusWorld Tour - UMich)Introduction to Globus for System Administrators (GlobusWorld Tour - UMich)
Introduction to Globus for System Administrators (GlobusWorld Tour - UMich)
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Globus for System Administrators (GlobusWorld Tour - Columbia University)
Globus for System Administrators (GlobusWorld Tour - Columbia University)Globus for System Administrators (GlobusWorld Tour - Columbia University)
Globus for System Administrators (GlobusWorld Tour - Columbia University)
 
Tutorial: Introduction to Globus for System Administrators
Tutorial: Introduction to Globus for System AdministratorsTutorial: Introduction to Globus for System Administrators
Tutorial: Introduction to Globus for System Administrators
 
Globus Endpoint Migration and Advanced Administration Topics
Globus Endpoint Migration and Advanced Administration TopicsGlobus Endpoint Migration and Advanced Administration Topics
Globus Endpoint Migration and Advanced Administration Topics
 
Globus Endpoint Administration (GlobusWorld Tour - STFC)
Globus Endpoint Administration (GlobusWorld Tour - STFC)Globus Endpoint Administration (GlobusWorld Tour - STFC)
Globus Endpoint Administration (GlobusWorld Tour - STFC)
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Globus for System Administrators (GlobusWorld Tour - UCSD)
Globus for System Administrators (GlobusWorld Tour - UCSD)Globus for System Administrators (GlobusWorld Tour - UCSD)
Globus for System Administrators (GlobusWorld Tour - UCSD)
 
Globus for System Administrators (CHPC 2019 - South Africa)
Globus for System Administrators (CHPC 2019 - South Africa)Globus for System Administrators (CHPC 2019 - South Africa)
Globus for System Administrators (CHPC 2019 - South Africa)
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus Platform
 
Globus Endpoint Setup and Configuration - XSEDE14 Tutorial
Globus Endpoint Setup and Configuration - XSEDE14 TutorialGlobus Endpoint Setup and Configuration - XSEDE14 Tutorial
Globus Endpoint Setup and Configuration - XSEDE14 Tutorial
 
Automating Research Data Flows and an Introduction to the Globus Platform
Automating Research Data Flows and an Introduction to the Globus PlatformAutomating Research Data Flows and an Introduction to the Globus Platform
Automating Research Data Flows and an Introduction to the Globus Platform
 
Globus Connect Server v5 Q&A Briefing
Globus Connect Server v5 Q&A BriefingGlobus Connect Server v5 Q&A Briefing
Globus Connect Server v5 Q&A Briefing
 
Azure storage deep dive
Azure storage deep diveAzure storage deep dive
Azure storage deep dive
 
Globus Command Line Interface (APS Workshop)
Globus Command Line Interface (APS Workshop)Globus Command Line Interface (APS Workshop)
Globus Command Line Interface (APS Workshop)
 
Important work-arounds for making ASS multi-lingual
Important work-arounds for making ASS multi-lingualImportant work-arounds for making ASS multi-lingual
Important work-arounds for making ASS multi-lingual
 

More from Globus

Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowGlobus
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaSGlobus
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesGlobus
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusGlobus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for ResearchersGlobus
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with GlobusGlobus
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersGlobus
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersGlobus
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Globus
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeGlobus
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New UsersGlobus
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsGlobus
 
Globus Automation
Globus AutomationGlobus Automation
Globus AutomationGlobus
 
Introduction to Globus
Introduction to GlobusIntroduction to Globus
Introduction to GlobusGlobus
 
Working with Globus Platform Services
Working with Globus Platform ServicesWorking with Globus Platform Services
Working with Globus Platform ServicesGlobus
 
Using Globus to Streamline Research at Scale
Using Globus to Streamline Research at ScaleUsing Globus to Streamline Research at Scale
Using Globus to Streamline Research at ScaleGlobus
 
Introduction to Globus for Researchers
Introduction to Globus for ResearchersIntroduction to Globus for Researchers
Introduction to Globus for ResearchersGlobus
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New UsersGlobus
 

More from Globus (18)

Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a Flow
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaS
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All Scales
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using Globus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for Researchers
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with Globus
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for Researchers
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for Developers
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and Portals
 
Globus Automation
Globus AutomationGlobus Automation
Globus Automation
 
Introduction to Globus
Introduction to GlobusIntroduction to Globus
Introduction to Globus
 
Working with Globus Platform Services
Working with Globus Platform ServicesWorking with Globus Platform Services
Working with Globus Platform Services
 
Using Globus to Streamline Research at Scale
Using Globus to Streamline Research at ScaleUsing Globus to Streamline Research at Scale
Using Globus to Streamline Research at Scale
 
Introduction to Globus for Researchers
Introduction to Globus for ResearchersIntroduction to Globus for Researchers
Introduction to Globus for Researchers
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 

Recently uploaded

Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 

Advanced Globus System Administration Topics

  • 1. Vas Vasiliadis vas@uchicago.edu February 28, 2024 Advanced Administration Topics
  • 2. Agenda • Managing nodes in a production deployment • GCS troubleshooting • Customizing domains • Managing roles on endpoints and collections • Customizing identity mapping • Implementing sharing policies • Additional storage gateway options • Performance tuning 2
  • 3. Adding DTNs to your endpoint 3
  • 4. Multi-node DTN behavior • Transfer tasks sent to nodes in round-robin fashion • Active nodes can receive transfer tasks • Tasks on inactive node will pause until active again • GCS manager assistant service – Synchronizes configuration among nodes in the endpoint – Stores encrypted configuration values in Globus service 4
  • 6. Adding a node requires just two commands $ sudo globus-connect-server node setup --deployment-key THE_KEY $ sudo systemctl restart apache2 Copy the deployment key from the first node (DTN) to every other node Node setup pulls configuration from Globus service Check your DTN cluster status: globus-connect-server node list
  • 8. Migrating an endpoint to a new host (DTN) • An endpoints is a logical construct è replace host system without disrupting the endpoint – Avoid replicating configuration data (esp. for guest collections!) – Maintain continuity for custom apps, automation scripts, etc., that use the endpoint UUID • 1. Add new node to endpoint à 2. remove original node • Again, deployment key is required – Export node configuration with node setup --export-node – Import on new DTN using node setup --import-node
  • 10. Before asking for help… • self-diagnostic can identify many issues – Are services running? GCS manager/assistant, GridFTP server • Connectivity is a common cause – Can Globus connect to the GCS Manager service? – Is the DTN control channel reachable? – Can the DTN establish data channel connection? docs.globus.org/globus-connect-server/v5.4/troubleshooting-guide …and we’re always here for you: support@globus.org 10
  • 12. GCS domain configuration (default) 12 /var/lib/globus-connect-server/gcs- manager/etc/httpd/conf.d/G_COLL_UUID Domains on DTN vhost: Management API abc.abc.data.globus.org vhost: Mapped Collection m-abc.abc.data.globus.org vhost: Guest Collection g-abc.abc.data.globus.org /var/lib/globus-connect-server/gcs- manager/etc/httpd/conf.d/M_COLL_UUID /var/lib/globus-connect-server/gcs- manager/etc/httpd/conf.d/EP_UUID
  • 13. Multiple (sub)domains exist in a GCS deployment • Endpoint – e4faec.75bc.data.globus.org • Mapped (m-...) and guest (g-...) collections – m-8dd2b7.e4faec.75bc.data.globus.org – g-e7b189.e4faec.75bc.data.globus.org • Subdomains are distinct Apache vhosts /var/lib/globus-connect-server/gcs-manager/etc/httpd/conf.d • Management API for the GCS Manager service is at: https://e4faec.75bc.data.globus.org/api
  • 14. Customizing GCS domains • Set up DNS record – Avoid using FQDN for the DTN – activedata.uchicago.edu and *.activedata.uchicago.edu (see below) • Put SSL certificate/key on DTN • As endpoint owner or admin run: endpoint domain update --domain activedata.uchicago.edu --certificate-path ... --private-key-path ... --wildcard ß important; otherwise collections use data.globus.org --managed ß really important for certs/keys to be sync’d across DTNs • Assuming --wildcard, domains for collections will look like… – m-8dd2b7.activedata.uchicago.edu – g-e7b189.activedata.uchicago.edu
  • 16. Subscriptions and endpoint roles • Subscription(s) configured for an institution • Multiple Subscription Managers per subscription • Subscription Manager ties endpoint to subscription – Results in a “subscribed” endpoint • Assign additional roles for endpoint management – Administrator, Manager, Monitor
  • 17. Be identity-, role-, and permission-aware • Default: Only endpoint owner can configure an endpoint • Delegate administrator role to other sysadmins – Best practice: Delegate to a Globus group, not individuals • Check identity using the session command • Check resource permissions on storage gateways and collections with --include-private-policies option docs.globus.org/globus-connect-server/v5.4/reference/role
  • 18. Collection roles • Mapped collections have same roles as endpoints • Guest collections add “Access Manager” role – Critical for automation • Any Auth client can assume endpoint/collection role – Particularly useful for scripts that manage large deployments – e.g., script to list guest collection information: gist.github.com/vasv/cdb8607e2bfab08634b5aa99389e87c7 • Roles may be granted to Auth (app ) clients for… – …group management – …flow execution/monitoring – …any other Globus resource that has role-based access control
  • 20. Mapping identities to local accounts • Default: Strip identity domain (everything after “@”) – e.g., userX@globusdemo.org maps to local account userX – Best for campus identities w/synchronized local accounts • Use --identity-mapping option on storage gateway – Specify expression in a JSON document – Execute a custom script docs.globus.org/globus-connect-server/v5.4/identity-mapping-guide
  • 21. Simple custom mapping example Note: Requires the storage gateway to accept identities from two domains { "DATA_TYPE": "expression_identity_mapping#1.0.0", "mappings": [ { "source": "{username}", "match": "42032579@wassamottau.edu", "output": "vas", "ignore_case": false, "literal": false }, { "source": "{username}", "match": "(.*)@uchicago.edu", "output": "{0}", "ignore_case": false, "literal": false } ] } Otherwise, default behavior local user à domain username Map 42032579@wossamottau.edu to local user vas
  • 22. “Hijacking” ID mapping for autoprovisioning • Useful in large user communities; esp. where other automated account management processes exist 1: Get username from input identity 2: If no local user with username, create user 3: Add local user to map file • Use sample script in docs as a starting point 22
  • 24. Sharing restrictions • Guest collections may be created in any directory accessible by the collection, by any authorized local account • You can restrict who can share… o --sharing-user-allow --sharing-user-deny o --posix-sharing-group-allow o --posix-sharing-group-deny • …and what they can share… o --sharing-restrict-paths (specify JSON PathRestrictions)
  • 25. Restrictive/specific sharing policies • Setting policies for specific user/path combinations $ globus-connect-server sharing-policy create --user myuser –user youruser --read /reference --read-write /cui/mysecrets • Sharing policies cannot override restrictions on the underlying storage gateway { "DATA_TYPE": "path_restrictions#1.0.0”, "read_write": ["/home/"], "none": ["/cui"] } #FAIL Due to storage gateway --restrict-paths policy
  • 26. Limit whom data owners can share with • Authentication policies limit guest collection access to identities from specific domain(s) • Attach auth policy to mapped collection • Explicitly include/exclude identity domains • Domains used to filter permissions when authorizing access to a guest collection 26
  • 27. Create auth policy and attach to collection $ globus-connect-server auth-policy create > --include *.edu --include globus.org > "Allow sharing internally" > "R&E Sharing Policy" Authentication Policy ID: 45ff23ed-43a8-438c-aaa8-e8e36708756e $ globus-connect-server collection update > --guest-auth-policy-id 45ff23ed-43a8-438c-aaa8-e8e36708756e > 56c3dff0-d827-4f11-91f3-b0704c53aa4c Allowed sharee domains Apply policy to this collection
  • 29. High Assurance attributes of interest • Typical settings: – Access limited to single authentication domain – Auth timeout typically reflects other institutional policies • Detailed audit logs: /var/log/gridftp-audit.log – Recall: regular logs in /var/log/gridftp.log • MFA requirements – Requires that IdP provides the acr and/or amr claims
  • 30. Configuring a “private” data channel • Default: data interface is set to the DTN’s public IP address (see data_interface in /etc/gridftp.d/globus-connect-server) • Create /etc/gridftp.d/STORAGE_GATEWAY_ID • Set data_interface PRIVATE_INTERFACE_IP_ADDRESS • Replicate on every DTN (files in /etc/gridftp.d/ are not sync'd between nodes by Globus) 30
  • 31. Supporting non-POSIX systems • Update your GCS packages • Add the appropriate storage gateway – Non-POSIX systems require add-on connector subscription(s) • Gateway configuration options vary by connector – e.g., specify bucket name(s) for AWS S3 • Collection authentication options vary by connector – e.g., provide user access key and secret key for AWS S3 – Credentials must grant appropriate permissions – Mapped collection may not actually “map” to local user account
  • 32. Accessing non-POSIX systems: AWS S3 (and S3-compatible systems) 32
  • 34. Globus is fast (and secure, and reliable), but… 72.8Gbps
  • 35. Your observed performance will depend on… • Data Transfer Node (CPU, RAM, bus, NIC, …) • Network (devices, path quality, latency, …) • Storage (hardware, attach mode, …) • Dataset make-up (file#, size, tree depth, …) – Remember: LoSF == Great sadness • Strange things people do (one transfer/file …1M files) • …? 35
  • 36. Interpreting reported performance 36 A more accurate speed measurement (expect wide variance) “Effective” includes service overhead (primarily to guarantee data integrity!)
  • 37. You should have Great Expectations 37 Dataset Size Transfer Time: 1min Transfer Time: 5min Transfer Time: 20min Transfer Time: 1hr 10 PB 1,333.33 Tbps 266.67 Tbps 66.67 Tbps 22.22 Tbps 1PB 133.33 Tbps 26.67 Tbps 6.67 Tbps 2.22 Tbps 100TB 13.33 Tbps 2.67 Tbps 666.67 Gbps 222.22 Gbps 10TB 1.33 Tbps 266.67 Gbps 66.67 Gbps 22.22 Gbps 1TB 133.33 Gbps 26.67 Gbps 6.67 Gbps 2.22 Gbps 100GB 13.33 Gbps 2.67 Gbps 666.67 Mbps 222.22 Mbps 10GB 1.33 Gbps 266.67 Mbps 66.67 Mbps 22.22 Mbps 1GB 133.33 Mbps 26.67 Mbps 6.67 Mbps 2.22 Mbps 100MB 13.33 Mbps 2.67 Mbps 0.67 Mbps 0.22 Mbps ESnet EPOC target for all DOE labs (requires at least a 10G connection)
  • 38. Science DMZ: Network configuration best practice 38 Source security filters Destination security filters Destination Science DMZ Source Science DMZ Source Border Router Destination Border Router Source Router Destination Router User Organization DATA CONTROL Physical Control Path Logical Control Path Physical Data Path Logical Data Path Port 443 (configurable) Ports 50000-51000* (default) Data Transfer Node (DTN) Data Transfer Node (DTN) * Not actively listening; only used when transfer is in progress; may be restricted to private network Please see TCP ports reference: https://docs.globus.org/resource-provider-guide/#open-tcp-ports_section
  • 40. Globus transfer performance is a team sport • Network use parameters: concurrency, parallelism • Maximum, Preferred values for each • Transfer considers source and destination endpoint settings min( max(preferred src, preferred dest), max src, max dest ) • Also be aware of pipelining effects • Service limits, e.g. concurrent requests 40
  • 41. Globus network use parameters • May only be changed on managed endpoints • Modify via the web app: Console à Endpoints tab • Modify via Globus Connect Server CLI – Run globus-connect-server endpoint modify • Strong recommendation: Do not change network use parameters before establishing baseline performance 41