How to get the maximum performance from your AEP server. This will discuss ways to improve execution time of short running jobs and how to properly configure the server depending on the expected number of users as well as the average size and duration of individual jobs. Included will be examples of making use of job pooling, Database connection sharing, and parallel subprotocol tuning. Determining when to make use of cluster, grid, or load balanced configurations along with memory and CPU sizing guidelines will also be discussed.
2. The information on the roadmap and future software development efforts are
intended to outline general product direction and should not be relied on in making
a purchasing decision.
3. Content
• Tuning for different types of protocols
• Quick protocols
– Protocol Job Pooling
• Using PoolIDs
• Database connection pooling
• Long protocols
– Profiling protocols
– Tuning parallel subprotocols
– Disk I/O
• Server specifications
– General guidelines
– Cluster, Grid, and Load balancing
• When is it right and how do you choose?
4. Short Running: General Guidelines
• Job Pooling and blocking requests
– Use Database connection sharing
• Report templates
– “HTML Template” or “Pilotscript” components
– Much faster
– Harder to maintain
– Ideal for reports that rarely change
• Pilotscript is faster than Java is faster than Perl
• Minimize disk I/O
• Hashmap values instead of “Join Data From …”
– Use Cache Persistence mode in SQL Select for each Data
5. Job Pooling
• Each job execution occurs in a single scisvr process
– Isolated memory
– One bad protocol cannot crash the server
• Without job pooling, each job spawns a new process
• With job pooling, jobs with the same pool ID can reuse
idle processes
6. Job Pooling Performance
• Prevent reloading system files and configuration data
• Reuse allocated memory
• Skip initialization
• Fast running protocols see substantial improvement
• Longer protocols do not see much improvement
10. Job Pooling Disadvantages
• Some components may not reinitialize correctly
– Can be difficult to track down these errors
• Stale resources can cause subsequent protocol failure
– Example: persistent DB connections that have timed out at the DB
• Ties up memory resources
– The AEP server manages this and will shut down job pools when memory
resources begin to get low
• Can tie up 3rd party licenses if they are not properly released
• Hard to get a good grasp of how much memory is really being used
• Not as useful for Windows servers with “full” impersonation
11. Job Pooling Memory limits
• Under heavy memory usage, pooled processes will shut
down
– 80% total RAM usage
– 15% total RAM usage for an individual process
– Example: A server has 8 GB of RAM
• Idle pooled processes will shut down when RAM usage reaches 6.4 GB
• If an individual idle process reaches 1.2 GB, it will shut down
12. Debugging
• http://<server>:<port>/scitegic/managepools?action=debug
– Shows each pool by ID.
• Configuration
• Processes that belong to the pool
– PID
– Owner (impersonation only)
– Number of times the server has executed jobs (including warm ups)
– State
• Queue
– Apache Process/Threads that are waiting for a server in this pool
13. Using Job Pooling From Clients
• 9.0:
– Set the __poolID parameter on the Implementation tab of
the top level protocol
– Share the same __poolID with related protocols
14. Using Job Pooling From Clients
• 8.5
– Pro Client
• Automatic based on jobID
– Create Protocol Link…
• Add __poolID as a parameter to your URL
– http://<server>:<port>/auth/launchjob?_protocol=ABC&__poolID=MyPool
– Reporting Forms
• Add __poolID using “Hidden Form Data”
– Protocol Function
• use “Application ID” or “Pool ID” parameters
– Web Port and Reporting Protocol Links
• Add __poolID as a parameter to your protocol
– Client SDKs
• Pass in __poolID as a parameter when you call the LaunchXXX() methods
15. • Connection Timeout
– Keeps the connection
open while scisvr is idle
– Supported by ODBC and
JDBC data sources
Database Connection Sharing
16. Report Templates
• Web applications should consider using templates.
– HTML Template component
• Uses Velocity template engine
– Pilotscript text processing
• Extremely fast
• Good for reports that rarely change format
– Faster, but harder to maintain
– Difficult to handle images
• Typical timings:
– Table component and Viewer: 1.5 seconds
– HTML Template and Viewer: 0.7 seconds
– Pilotscript text manipulation: 0.05 seconds
• Use the reporting collection to create the original report, then view the source
and convert to a template
17. Long Running: General Guidelines
• Profile protocols for bottlenecks using Ctrl-T timings
• Disk I/O Performance
– Consider improving network disk I/O
– Minimize large scale disk I/O
• Use parallel subprotocols to speed up slow sections
– Offload large calculations to additional servers
– Make use of clusters and grids to spread out processing
• Make remote requests asynchronous or batched when possible
• Download large datasets and process locally
• Create custom readers to minimize excess data reading
– Don’t read 100000 records only to use the first 10 records.
18. Component Performance Timings
• Displays either percentage or total time for each
component.
– Subprotocols display total time of internal components plus
overhead
• Press Control-T or Right-Click->Show Process Times
• Useful to track down bottlenecks
• Times are relatively accurate.
– In particular, timings on Linux are susceptible to discrepancies
19. Disk I/O
• Performance of your disk I/O has huge impact
• Linux: Consider switching from NFS to IBM’s GPFS
– Much more scalable
– Much faster
• Minimize large disk read/writes.
20. Parallel Subprotocols
• Allow parallel execution across multiple CPUs and multiple servers or
cluster/grid nodes
• Work by batching incoming data records and sending out to server list for
processing
• General guidelines:
– Each batch should take a minimum 10 seconds to see a performance benefit, the
longer the better!
– Overhead
• Serializing input and output data records
• 1-3 seconds per batch
• Launching
• Polling for completion
• Serialization of data records
– 2 processes per CPU as starting point
21. Parallel Subprotocol Mechanism
• Modifies and launches “Parallel Subprotocol Template”
• Input data records are serialized, then shipped to remote
server
• Data is deserialized, processed, then serialized again
• Shipped back to original server and deserialized
• 4 Cache read/write events!
– Avoid sending large data records
– Consider sending file references instead
– For instance: with Imaging collection
22. Parallel Subprotocol Debugging
• Most remote errors are swallowed up
• Look in <root>/logs/messages/scitegicerror_scisvr.log of
the remote server to see error stacks
• Run with “Debugging” option
– use Shift-Left click or Shift-F5
– Debugging messages will show errors and status from the
subprotocol batches
23. Server Guidelines
• Predict and analyze your usage
– Type of application
– Number of simulataneous users
• Good starting point
– 2 active jobs per CPU
– RAM: Minimum 1 GB per active job + 2 GB for system processes
– Local disk for temporary files
– GPFS instead of NFS
24. Deployment Options
• Single Server
– Multiple CPUs
– Ideal for most applications
• Cluster (Linux)
– Distributes individual protocols to remote nodes
– Simple grid
– Ideal for ad-hoc analysis servers that occasionally require heavy processing
• Slower launch times than single server.
• Better data processing scalability
• Grid (Linux)
– Queues individual protocols via 3rd party grid software
– Tested on OGE, PBS, LSF. Custom option is available
– Ideal for large scale processing with very long application run times
• Slowest launch times
• Best data processing scalability
25. Deployment Options
• Load Balanced (Windows and Linux)
– Multiple identical single servers with a 3rdparty HTTP proxy
– Each individual request is distributed
– Protocol DB is READ-ONLY
• All changes are made through packages
– Parallel subprotocols do NOT distribute across nodes
– Ideal for canned applications that have large numbers of users
• Launch times are comparable to single server
• High scalability and high availability
• NOT useful as an ad-hoc server
• Cannot be used to build models (due to read-only Protocol DB)
26. • Optimization of protocol performance is application dependent
• For fast running protocols
– look at Job Pooling and Report Templates
– Avoid checkpoints and caches
• For long running protocols
– Use component timings to profile
– Parallelize whenever possible
– Batch and asynchronous remote requests
– Configure Disk I/O for maximum performance
• Deployment options for different applications
Summary