Emerging Technologies for Information Governance: We explore trends facing the information officer and how new technologies such as auto-classification are helping solve critical issues related to Information Governance.
3. PAGE3
www.DocuLynx.com
Challenges
Key Information Management Challenges
• 40% of a knowledge workers time is
spent managing their documents. –
Gartner
• 15% - 35% of a Knowledge Worker’s
time is spent searching for information.
• 15% of the work week is spent
reproducing information that already
exists. – IDC 2010
5. PAGE5
www.DocuLynx.com
New deployment architectures
emerging (private, public, and
hybrid)
Security concerns as to what is in
the cloud
Less IT control over content and
systems
Trends: Cloud
6. PAGE6
www.DocuLynx.com
Mobile clients increasingly are
replacing traditional modes of
information access
Availability of legacy data on mobile
devices
New security concerns surrounding
information access and access controls
Trends: Mobile
7. PAGE7
www.DocuLynx.com
Firewalls become transparent with
ever more forms of collaboration
and exchange of information
Emergence of new platforms (Cloud)
and devices
Need to identify and secure
sensitive information, including
corporate IP
Demands for new security concepts
Trends: Security
8. PAGE8
www.DocuLynx.com
Volume, Complexity, Variety, and
Velocity
Inclusion of unstructured data in
varied formats / 360⁰ view of the
business
Demands for new applications
including mash-ups and analytic
capabilities with access to all
relevant information
Trends: Big Data
9. PAGE9
www.DocuLynx.com
Ever increasing array of external
regulations and internal policies
Applied to emerging platforms
Courts require increasing transparency
of information for legal matters
ECM “one-size-fits-all strategy”, is not
an effective solution
“Many organizations have fragmented approaches and platforms to support their ECM needs.
Past Gartner surveys have shown that many organizations have an average over six different
ECM products in place while large organizations can have as many as 20.”
-Kenneth Chin, Gartner
Trends: Legal and Regulatory Compliance
10. PAGE10
www.DocuLynx.com
“Enterprises need to build an information
governance program (and supporting platforms -author)
with policies and processes to better manage
all the content in the enterprise as well as outside
the enterprise as social media, mobile, and cloud
technologies increase the breadth of where the
content exists.”
-Mick MacComascaigh, Gartner
13. PAGE13
www.DocuLynx.com
• Integration
–Process; Content; People
–360⁰ view of information
–Addressing Media discontinuity to
leverage all relevant informational value
in the enterprise
• Classification of information
–For compliance controls
–For security controls
–For discoverability
Adapted Information Governance Model:
Simplification
CLASSIFYCLASSIFY
&&
INTEGRATEINTEGRATE
Policy IntegrationPolicy Integration
Process TransparencyProcess Transparency
14. PAGE14
www.DocuLynx.com
•Integration
Process; Content; People
360⁰ view of information
Addressing Media discontinuity to
leverage all relevant informational
value in the Enterprise
Adapted Information Governance Model:
Business Value & Legal & Compliance Risk
CLASSIFYCLASSIFY
&&
INTEGRATEINTEGRATE
Policy IntegrationPolicy Integration
Process TransparencyProcess Transparency
15. PAGE15
www.DocuLynx.com
Today’s Information Intensive Application
Environments
Dynamic and information-
intensive processes
High degree of ad-hoc
collaboration between knowledge
workers
Incremental and progressive
responses from knowledge
workers
Event-driven
Given context
Source: December 28, 2009, “Dynamic Case Management
— An Old Idea Catches New Fire” Forrester report
16. PAGE16
www.DocuLynx.com
Challenges
Processing takes too
much time
Workers can‘t acess all the data and forms they need
Workers can‘t effectively collaborate and share knowledge
Compliancy is an issue
Cases don‘t follow a fixed sequence or stucture
Hard to track who did what, when and why
No opportunity for
improvement
Existing structures prevent flexibility
Can‘t identify patterns and learn from them
19. PAGE19
www.DocuLynx.com
Every Level Requires Different
Flexibility and Performance
ApplicationsApplications
Individual
Applications Individual
Apps
Standard
Apps
Standard
Apps
Process andProcess and
IntegrationIntegration
Business ModelsBusiness Models
KundenNew
Customer
MärkteNew
Market
Inno-
vation
New
InnovationProdukteNew
Products
Flexibility andFlexibility and
CollaborationCollaboration
21. PAGE21
www.DocuLynx.com
• ... is available on all platforms
(from Windows to Mainframe)
• … is installable onsite, in hybrid and in
full cloud environments
• … is available also as SaaS and Volume
based editions
• … is the platform for Information
Modernization & Virtualization
• … from ECM Solution to Information
Virtualization
• … ideal for the instantiation of CCAs
Next Generation Platform:
Content Services Platform (CSP)
docHarbordocHaven
22. PAGE22
www.DocuLynx.com
– Make information available to
the business processes
– Don’t shut down a repository
unless you want
– Don’t be forced to migrate to
a proprietary system and lose
independence
– Connectors can hook in
documents, data & workflows
– Centralized hub manages all
communications
Core Components:
Integration and Federation
docHarbordocHaven
23. PAGE23
www.DocuLynx.com
– Existing interfaces can be
used and new capabilities
plugged in
– New interfaces can be
developed with rapid
prototyping
– Based on Open Standards:
• Win
• Java
• Web
– Supports the development
of Mashups
Core Components:
User Experience
docHarbordocHaven
24. PAGE24
www.DocuLynx.com
– Work Routing Automation
– Business Process Automation
• Scanning to ILM
– Content-Centric Workflows
– Ability to tie-in enterprise
standard workflow engines as
needed
– Extraction automation
Core Components:
Content Automation
docHarbordocHaven
25. PAGE25
www.DocuLynx.com
– Support for compliance
(internal and external
regulatory) policies
– Support for eRecords
Management structures
– Process Monitoring
– Audit Trails
– Support for Content Analytics
on existing content
Core Components:
Compliance and Analytics
docHarbordocHaven
29. PAGE29
www.DocuLynx.com
• Classification of
Information and Long-term
Preservation
– For compliance controls
– For security controls
– For discoverability
Adapted Information Governance Model:
IT Efficiency & Legal/ Compliance Risk
CLASSIFYCLASSIFY
&&
INTEGRATEINTEGRATE
Policy IntegrationPolicy Integration
Process TransparencyProcess Transparency
30. PAGE30
www.DocuLynx.com
Some documents or emails contain sensitive information that needs to be retained for
compliance or legal reasons, while other data is little used or redundant. The challenge is to
identify what is what…
Adaptive auto-classification:
•unstructured data , e-mails
•Documents and Records
•ERP data and SharePoint objects
Classification is an enabler for:
•Archiving
•Encryption
•3’rd Party Applications
Combining classification and archiving is a big step forward for any enterprise faced with:
•ongoing regulatory demands
•staggering data storage costs
•outdated or paper-based archiving processes
emails
documentsdata
voice
web
content
video
Not all information is equal
32. PAGE32
www.DocuLynx.com
Identifying the right files
Only classification enables automation
• Critical information that should
not be moved to the cloud.
• Sensitive information that needs
to be encrypted before moving it
to the cloud.
• Important information that
needs to be archived.
• Passive information that is little
used and could be moved to IRM
immediately.
DocuClassifyDocuClassify
33. PAGE33
www.DocuLynx.com
Identifying the right files
Attaching the classification directly to the file creates …
Tell from the outside what is
inside
TransparencyTransparency
Every user and every
application can use the
information
SynergiesSynergies
Easy recognition of outliers
and hotspots
SecuritySecurity
Quick grouping for joint
processing
EfficiencyEfficiency
According to the same
principle within the entire
organisation
UniformityUniformity
34. PAGE34
www.DocuLynx.com
Classification Framework
Performance, Precision and Flexibility
Meta data based classification enables high
performance on high volumes of data
Pattern Matching is performant and exact
when using classification criteria that can be
expressed in regular expressions
Machine Learning with linguistic-statistical
analysis has universal applicability and
delivers strong results with fuzzy
classification criteria
Partnerships for Information Governance
with the likes of:
• KPMG
• Deloitte
• Fontis International
metadata-
based
metadata-
based
......
Machine
Learning
Machine
Learning
Pattern
Matching
Pattern
Matching
35. PAGE35
www.DocuLynx.com
DOCUMENT CLASS
Identifying the right files
More simplified creation of rules
Example for a meta data-based
criterion
Example for the comparison with
a training set
Assignment of classification
properties
Example for a criterion with
regular expressions
36. PAGE36
www.DocuLynx.com
(1) Any information object brings
Metadata (e.g. location, name, creator
etc.) with it. The actual set can vary
with the type of information.
(2) Classification rule assigns a document
type to the information object. Each DT
comes with a set of properties which
are used by the classification rule. The
property values enrich the metadata of
the information object.
(3) Any application can – based on the
classification expressed in the
properties values of a specific
information object – trigger an action
(e.g. archiving) for this object. This
could be other dg suite modules or 3rd
party applications.
(4)
Document Classes and the DocuClassify Cube®
The Document Class Model
Document
Class
Document
Class Classification
Properties
Classification
Properties
MetadataMetadata
(1)
Classification
Rules
Information
Object
(File, Mail, SP, ...)
Information
Object
(File, Mail, SP, ...)
(2)
Property
Values
Property
Values
(3rd
Party)
Actions
(3rd
Party)
Actions
(2)
(3)
Document Class Metadata Properties Actions
Invoice Location Retention If (retention>0) then archive
Engineering Plan (CAD) Location
User
Project ID
RestrictedAccess
If (Restricted Access) then block access on mobile devices
37. PAGE37
www.DocuLynx.com
Assign Classification Properties Directly to the Information Object
Leverage ADS, Security Descriptor, or eMail Headers
BC
AC
Security SettingSecurity Setting: Confidential: Confidential
ComplianceCompliance: 5 year retention: 5 year retention
38. PAGE38
www.DocuLynx.com
Adaptive Auto Classification Process Steps:
• Identification of an eRecords Schema
• Document Classes
• Retention Periods
• System is trained to identify documents
• File are automatically assigned to a Document Class
• Automatically
• Auditable
• High precision
• Execute Rule Set and classify documents
• Development = 5 Years
• Human Resources = 10 Years
• Etc…
• Documents with Retention Period > 0 move to archive
• Retention Period is recognized and taken over by the
archive
39. PAGE39
www.DocuLynx.com
Taxonomy Results based on information from Fontis:
Rule Identifier
by Fontis
Document Class Name Relevant for Country Retention_Period /Years
SUP HUM 002 Compensation_Records Russia 25
Australia 10
Global 10
SUP HUM 007 Employee_Lists Global 5
SUP HUM 004 Health_HazMat Global 40
SUP HUM 001 Applicant_Records US 3
Germany 0.5
Global 10
• Taxonomy is performed
• Document Classes are defined
• Retention Times are defined
40. PAGE40
www.DocuLynx.com
Select Document Class
According to taxonomy results
Select list of Classification
Properties to be applied
Enter IF Criteria to match
•HSM Metadata
•Content
•AD based
Adaptive Auto Classification Rules
DOCUMENT CLASSESDOCUMENT CLASSES
IF CRITERIAIF CRITERIA
THEN CLASSIFICATION PROPERTYTHEN CLASSIFICATION PROPERTY
41. PAGE41
www.DocuLynx.com
File / Sharepoint Object Migration
Sharepoint
Office
Sharepoint
…
Sharepoint
Office
Sharepoint
…
docuOffice
Target Archives
Variable Retention period
1 Year
5 Years
10 Years
25 years
…
Target Archives
Variable Retention period
1 Year
5 Years
10 Years
25 years
…
1. Selection
2. Transport
3. Report Generation
42. PAGE42
www.DocuLynx.com
• Intelligent storage optimization through classification
• More compliance through automated archiving of
relevant data
• Higher security through access control with Dynamic
Access Control and/or automated encryption
• Upload of files to SharePoint based on classification
• Upload of files based on the classification into a
DMS/ECM
• Classification as enabler for cloud approaches
• Increases the efficiency of pre-culling for eDiscovery
Use Cases for Classification
Enablement
DLP
Rights Management
Archiving
SharePoint Mgr
Cloud deployment
eDiscovery
44. PAGE44
www.DocuLynx.com
Enterprise Information Archive
Unified Approach for Long-term Audit-Proof Preservation
Other Apps
Production,
Transaction
Systems
…
Other Apps
Production,
Transaction
Systems
…
ERP
SAP, JDEdwards
MS Dynamics
…
ERP
SAP, JDEdwards
MS Dynamics
…
File Systems
Windows NTFS,
NetApp ,
…
File Systems
Windows NTFS,
NetApp ,
…
Mail
MS Exchange,
IBM Lotus
…
Mail
MS Exchange,
IBM Lotus
…
Sharepoint
Office, Sharepoint
…
Sharepoint
Office, Sharepoint
…
ArchivdataLivedata
Analyse
Classify
Enter. Search
Enables iterative search for discovery and supports initial
phases of the EDRM Model including legal hold
Mobile Access
Federated Search across live and archive data
Fulfills all compliance requirements regarding long-term
audit proof preservation
Enterprise-wide retention management methods can be
established
Dramatic reduction in storage demand and associated costs
due to single instancing and de-duplication
eDiscoveryeDiscovery Policy IntegrationPolicy IntegrationSingle
Instancing
Central
Policies
Central
Retention
Create
Records
Tagging and
Indexing
Search across
and within
45. PAGE45
www.DocuLynx.com
Unified Archive in Action
Business Challenges
•Growth in eMails has resulted in a 30% volume expansion per annum requiring greater exchange capacity and
resulting in greater administrative complexity
•Regulations require 30 year retention of manufacturing information from ERP and CAD systems
•eDiscovery needs required an ability to bring together emails, manufacturing data, CAD data, and transactional
information from a search and legal hold perspective.
Solution
•Phased implementation of Mail, ERP for SAP, and Connect for:
•8500 mailboxes
•27 SAP Systems
•COLD documents /CAD Systems Documents
Impact
•Overall infrastructure savings for Exchange and SAP Systems
•Enabling cost-effective and compliant back-up windows
•Meeting compliance requirements from an IT perspective
•Enables end-users intuitive search capability on all information including mobile access.
46. PAGE46
www.DocuLynx.com
DocuLynx Portfolio
Addressing Informance Governance and Lifecycle Challenges
DocuLynx Portfolio
Enterprise Information Infrastructure
Solutions
• Docu360⁰- a Content Services Platform for Smart
Applications
• Active Information Archiving Platform
o DocuSuite
o DocHarbor/Haven
• DocuClassify
• DocuSearch
Enterprise Information Application
Solutions
• AP Automation
• E-Signature
Information Conversion Services
(eg.Scanning)
DocuLynx specializes in adaptive auto-classification, a technology that analyzes and categorizes information to identify sensitive information for enterprise-wide compliance, legal hold , improved access and more cost-effective storage. Dataglobal specializes in classification of a wide variety of enterprise data, including e-mails, documents, files, records and ERP data, file servers and SharePoint objects, as well as video. As part of the auto-classification process, companies can establish higher security through access control and/or automated encryption.
Classification is also an enabler for actions like Cloud / on premise archiving: It classifies the data continuously based on its security level, and tags it appropriately. The data is then moved transparently into the Cloud/archive.
As a leader in cloud-based archiving solutions, DocuLynx sees this union of classification and archiving as a monumental step forward for any enterprise faced with ongoing regulatory demands, staggering data storage costs and outdated or paper-based archiving processes.
In any given enterprise, 70% of data is unstructured.
Classification facilitates intelligent movement of files, based on file analysis results and classification, to appropriate storage locations, including archives.
The result is policy-based, audit-proof long-term preservation of key information.
We work together with Fontis international.
Fontis delivers the gold standard for retention Period as a egal Data Service- subscription based
Classification is based on taxonomy (typically preparation project 1-2 days)
automatic
auditable
Classification is enabler for 3‘rd party applications like access control/encryption
example: archiving (in cloud /on premise)
The system is trained – files are automatically classified
Classification Property/Value defines retention time per document class
Fontis International delivers the gold standard of Legal Data Service. Example retention period by Document Class
Defines retention time per document class
Classification is based on taxonomy – a preparational project
Classification is the enabler for 3‘rd party applications and next steps ( archiving/ access control/encryption)
Big advantage: automatc processing for TB of legacy data
The result is a sustainable information governance framework that can stand up to the most intense scrutiny.
With Input:
Document Classes
Classification Properties / Values
IF: a list of criteria are met. This can be external metadata server name, user name, path and file name or any given combination
We can also look into the file and make content based decisions
Usual contant based decisons : Regular expressions , Strings like Social security Nuber
More effective: adaptive auto classification ( traiend system – identifies automatically file belongs to a trained document class)
retention periods
by country, by source of document
Rules are configured to classify all data.
Takes as an average about 1 -2 days customization
Next step: I can use the configured system on my existing data
Classification remains with the file and is an enabler for further actions
Classification allows for highly granular file management
Automatic – ideal for legacy data
After classification comes the follow on process
Example for our demo is archiving based on classification properties /values