Accelrys Catalog is a powerful new technology for creating an index of the protocols and components within your organization. You will learn about strategies for indexing and how search capabilities can be deployed to professional client and Web Port end users. You will also learn how to use this technology to find out about system usage to aid with system upgrades, server consolidations, and general system maintenance. The protocol validation capability in the admin portal allows administrators to created standard reports on server usage characteristics. You will learn how to report on violations of IT policies (e.g. around security), bad protocol authoring practices, or missing or incomplete protocol documentation. Developers will also learn how to extend and customize the rules used to create these reports.
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
1. (ATS6-PLAT02) Accelrys Catalog and
Protocol Validation
Ton van Daelen
Director, Platform Product Management
ton.vandaelen@accelrys.com
Dana Honeycutt
R&D Lead
dana.honeycutt@accelrys.com
2. The information on the roadmap and future software development efforts are
intended to outline general product direction and should not be relied on in making
a purchasing decision.
3. Content: Accelrys Catalog and Protocol Validation
• What is it?
• Who is it for?
• How does it work?
• What’s behind it?
• How does it really work?
• How do I maintain it?
• How do I troubleshoot?
• Show me more details!
4. The Size of the Challenge
• 10-100 Pro client users
• 50-1000 Web users
• 1-10 servers
• -> 5000+ protocols to be managed
5. What is Accelrys Catalog?
• A searchable index of the component and protocol
database (XMLDB) on Accelrys Enterprise Platform (AEP)
servers
• A Google-like text search facility in the Pipeline Pilot
client and Web Port
• A query form and results browser in the Administration
Portal
6. Who is it for?
PP Pro Client User–
Personal Productivity
Search from Pro Client
Examples that use the ‘Http Connector’ component
PilotScript referencing ‘rsplit()’
Protocols using MAO data
Web Port User
Search from Web Port
Web Port Protocols containing specific
terms or phrases
AEP Administrator
Administer
Generate index
Set index update schedule
Structured Search
from Admin Portal
Protocols not run in
6+ months
Protocols by
specific author
Protocols with
many versions
Catalog
Xml log Validation reports
‘Canned’ reports
about policy / best
practice / security
violations
Xml
7. DemoPro Client Search
• See how Protocol Authors can find components and
protocols to speed up building of protocols and facilitate
re-use
8. How does it work? PP Client: text search
Search text
Search results
13. Admin Use Cases
• General queries. Find protocols:
– with components that are deprecated (ad hoc / report)
– not run in n days
– not changed in n days
– by client type (pro client, web port, web service, Notebook,
Isentris, …)
– with components with GUID x
– with SQL components with specific DSN
16. How does it work?
• The inner workings of Accelrys Catalog
17. Introduction to Text Searching
• Unstructured or
minimally-
structured searches
– Think “Google”
– Keyword-based,
non-relational; wide
range of user input
– Based on lookups
using pre-built word
(token) indexes
18. Introduction to Text Searching (cont’d)
• Strategies to make searches more effective
– Stop word removal: and, the, by, for, of, …
– Stemming: startedstart, clusterscluster, etc.
– Synonym aliasing: oncology=cancer, MB=megabyte, etc.
(supported but only minimally implemented; extensible)
– Language-specific document and query processing (support for
Asian languages)
19. What’s behind it? Apache Solr
• Open source text search server
• Part of Apache Software Foundation
• Uses and extends Lucene Java search
library
• Hosted by Tomcat in AEP
• http://lucene.apache.org/solr/
20. Solr: Under the Hood…
• Schema
– XML specification of document fields and their types
– Specifies how fields are tokenized and processed for indexing
• Solr config file
– XML specification of query and result set processing rules
– E.g. field weights
• Optional auxiliary files
– Stop words, synonyms, protected words (unstemmed)
• Host application container
– For AEP this is Tomcat
21. Tokenization and Filtering
• Tokenization options in Solr
– Break on whitespace
– Break on all non-letter characters
– Break on case changes (for CamelCaseTokenization)
– Break on character set changes (alphanum/ideographic/katakana)
• Additional filters
– Lowercase filter: converts all characters to lowercase
– CJK bigram filter: outputs adjacent character pairs for Asian languages
– Stem filter: applies stemming rules (many language-specific variants)
• Field indexing and query processing use same tokenization
– Better search results may be obtained by using slightly different analysis for indexing
versus querying
• See http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
23. Creating the Catalog Index
• XML Database = Component/Protocol Database
• For each item in XMLDB, an indexing protocol
– reads the item from the database
– creates data record properties corresponding to Solr fields
– joins in statistics from usage log
– converts the data record to a JSON “document”
– POSTs the document to Apache/Tomcat/Solr via HTTP
• Weighting
– Protocol name and description have higher weight
– Proximity has higher weight
24. Some Catalog Fields (defined in schema)
• name: protocol or component (or parameter) name
• path: location in XMLDB
• type: “component” or “protocol” (or “parameter”)
• parameters: names of top-level parameters
• author: user who created protocol/component
• lastsaveddate: date protocol/component last changed
• runcount: number of times protocol has been run
• lastrundate: date protocol was last run
• uses: list of components used by protocol
• alltext: composite field for free text search
25. Configuring Accelrys Catalog
• Configuration (admin portal)
– AEP servers to index
– Indexing schedule
• Note
– Indexer runs as scheduled service
– Indexing takes ~1 to 3 minutes
per 1000 XMLDB items
– Two index copies; users can
continue search while index is
rebuilt
– Tomcat and Solr automatically
installed and launched with
Apache
26. Limitations
• Usage info depends on protocol name (“Protocol 1” !)
• No indexing at runtime – it can take a day before index is
updated
27. Searching Remote Servers
• Support for 8.0 and 8.5 servers
– Not all xmldb features supported
– Supported for Admin Search, not Web Port or PP
• Configure at Catalog Settings page
– Remote server username must have admin privileges
28. How do I troubleshoot (indexing)?
• Status message in admin portal
– Persistent error may indicate need to adjust settings or insufficient
user privileges for remote server
– Check server log files for details
• Time to update index can vary widely (from ~7 minutes to ~8
hours in actual tests)
– Settings can be adjusted to trade off indexing time against parameter
value search functionality
– Server speed, server load, number and complexity of protocols in
XMLDB all affect indexing time
29. Global Properties page of Admin Portal
• Only go to this page if there is a persistent problem with indexing
• Set Package to Accelrys/Accelrys Catalog
• Disabling parameter indexing in Admin Search reduces indexing time by
50+%
– Set EnableParameterIndexing to “False”
• “Chunk size” settings trade off memory against indexing speed
– Decrease if indexer reports out-of-memory errors:
– ParallelBatchSize: number of components/protocols processed by each sub-job
of indexer to mitigate memory footprint of Component Reader (default: 40)
– NumComponentsPerGroup: number of protocol/component documents sent
to Solr in a single HTTP POST (default: 10)
– NumParametersPerGroup: number of parameter documents sent to Solr in a
single HTTP POST (default: 150). Only relevant if EnableParameterIndexing is
“True”.
30. How do I troubleshoot (searching)?
• If search results are not what you expect…
– Use Raw Query Output example protocol
• Connect to Pro Client as admin user
• Open Raw Query Output under Protocols/Utilities/…
• Set Query Catalog parameters and run
• Inspect output to see how Solr processed your query
– To really dig deep: use Solr admin page
• In Firefox (not IE), go to http://aepserver:aepport/appcatalog/admin
• Click on appcat1 or appcat2
• Try Query, Analysis, and Schema Browser tools
32. More Example Queries
• MAO type:Component
– Any components referencing ‘MAO’
• uses:"Xml Reader" AND NOT author:Accelrys
– Components/protocols that have an xml reader and are not authored by Accelrys
• lastrun:[* TO NOW-6MONTH]
– Last run at least six months prior
• runcount:0
– Never been run
NOTES:
• Field names case-sensitive. Field values not.
• Phrases require quoting (double quotes); single words do not.
• Boolean and other reserved words must be uppercase.
• Examples in help text
• Definitive list in schema.xml (Solr admin page)
33. Relevant Components and Protocols
• You may need admin privileges to see these
• Database and Application Integration/Admin/Catalog/Utilities /Internals
/Query Catalog
– Use this for custom queries within a protocol
– Supports faceted queries, exposes all schema fields
• Protocols/Utilities/Accelrys/Administration/Catalog/Update Catalog
Index
– Main indexing protocol; normally launched by scheduler
– Launches a separate remote job for each indexed server
• Protocols/Utilities/Accelrys/Administration/Catalog/Utilities folder
– Raw Query Output: Demonstrates use of Query Catalog component; useful for
debugging searches
– Schedule Catalog Indexing: Generates Catalog Settings admin page
35. Protocol Validation Use Cases …
• Bad design practices. Find protocols that:
– have shortcuts as copies
– have saved checkpoints
– store passwords
– have components that are owner access only
– don’t have top level parameters (Web Port)
– have component with absolute file paths
• Bad documentation practices. Find protocols that:
– don’t have help text (or default help)
– have components with missing captions
37. • Analyze Validation
report
– Performs analysis of the
protocol validation
report created for the
Validation Report page
of the Administration
Portal
38. Links for Advanced Topics
• Schema and Tokenization
– http://wiki.apache.org/solr/SchemaXml
– http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
– http://docs.lucidworks.com/display/solr/Tokenizers
• Solr query syntax and parsing
– http://wiki.apache.org/solr/SolrQuerySyntax
– http://wiki.apache.org/solr/ExtendedDisMax [used by Accelrys Catalog]
• Joins in Solr [how Search Catalog does parameter value searching]
– http://wiki.apache.org/solr/Join
• Faceting
– http://wiki.apache.org/solr/SolrFacetingOverview
– http://wiki.apache.org/solr/SimpleFacetParameters
– http://searchhub.org/2009/09/02/faceted-search-with-solr/
– Not exposed in UI; use Query Catalog component
39. • Accelrys Catalog is a powerful search technology built into
AEP
• With Protocol Validation this provides critical tools for
administering enterprise deployments
• Plan for 9.0 upgrade now
• Relevant talks
– (ATS6-DEV01) What’s new for Developers in AEP 9.0
– (ATS6-Roadmap01) Platform Roadmap
– (ATS6-PLAT07) Managing AEP in an enterprise environment
Summary