SlideShare a Scribd company logo
1 of 55
Thinking differently about data…
6th August 2014
• Why? – change, old methods, useful
• What? – new, FCC, not all data is equal
• How? – new fabric - TCSV
• SCV
• Open
• Speed to value (tactical & strategic)
• No compromise on quality/integrity
• Complementary (to kit & thinking)
Why?
Different
Think
Different
We’re on a mission is to
make data play nicely

Scale
Speed
Agility
The data journey…
Data
Information
Knowledge
Action
Source Domain
Business Domain
Data Domain
Data
Information
Knowledge
Action
Data
Payload
Sources
‘Raw’ Formats
Data
Payload
‘Raw’ Formats
Data
Payload
Data
Payload
Data
Payloads
Data
Payloads
Data Payload Tools
Data
Payloads
Data
Payloads
What?
A new data fabric…
Time Context Signal Value
Sales
{
"name":“Salesforce"
"followers_count": 39061,
"friends_count": 12986,
"listed_count": 917,
}
Harvested at 2013-10-01 17:35:00
{
"category": "Company",
"talking_about_count": 58550,
"username": "healthyx",
"likes": 1985655,
"link": "http://healthyx"
}
Harvested at 2013-10-01 19:12:00
<performance>
<account>
Healthy X Limited
</account>
<cam>nutrigum – branding</cam>
<data>
<date v=“2013-10-01”>
<impr>14000</impr>
<clk>1500</clk>
<cnv>10</cnv>
</date>
</data>
</ performance >
Adserver
Sales <filterTags>Nutrigum</filterTags>
<tagStats>
<tag>~SOURCE~t</tag>
<tagDisplayName>
TWITTER
</tagDisplayName>
<matchCount>71</matchCount>
<popularity>
<popularityCount>
<timeInterval>
2013-10-01
</timeInterval>
<count>2.0</count>
<normalizedCount>
2.0</normalizedCount>
</popularityCount>
</popularity>
</tagStats>
eCRM
Date,productId,userId,number,ppu
2013-10-01,123,321,2,5.00
2013-10-01,123,521,1,5.00
2013-10-01,333,444,2,15.00
2013-10-01,854,111,1,20.00
Some Data…
There is more than
one way to skin this
rabbit…
Business domain
EDW
Source domain ETL
Acquisition QueriesCouple to Conform
warehouse
mart
mart
Legacy CRM
Agency Media Plan
REST
SOAP
ODBC
Email
Indicates schema coupled operation. Changes in 1 lead to changes in many others.
Select
Calculate
Select
Calculate
Select
Join
Select
Aggregate
Split
Join
Sort
Quality Control
Select
Aggregate
Calculate
Select
Aggregate
Join
Select
Aggregate
Calculate
Select
Join
Aggregate
Select
Join
Calculate
Aggregate
Business domain
Hadoop
Source domain HDFS
Acquisition QueriesCouple to Conform
Legacy CRM
Agency Media Plan
REST
SOAP
Sqoop
Email
Indicates schema coupled operation. Changes in 1 lead to changes in many others.
Select
Calculate
Select
Calculate
Select
Join
Select
Aggregate
Split
Join
Sort
Quality Control
Map Reduce
Job
Storage
TCSV
Data DomainSource domain
Acquisition
Legacy CRM
Agency Media Plan
REST
SOAP
ODBC
Email
Storage
Parse
Business
domain
Quality Control
Enrichment
Data Acceptance
Calculation
Indicates schema coupled operation
Queries
Unparse
Pure TCSV operation
In practice…
How?
Options to deploy…
IRI
Kantar
Millward
Brown
Mindshare
Finance
Nielsen
Dispatches
Litmus
Kantar
Ireland
1. Cloud
Client Team
Client Firewall
All data is harvested as normal into the DataShaka platform.
The DataShaka platform is as secure as Azure.
IRI
Kantar
Millward
Brown
Mindshare
Nielsen
Dispatches
Litmus
Kantar
Ireland
2. Cloud
Encrypted Data Harvest
Finance
Client Team
Client Firewall
Encryption
Agent
Delivery
Decryption*
Data from sensitive sources inside (or outside) the client
environment is encrypted at the value level (TCSV’s V) and
decrypted on delivery of data. Everything else is handled
normally. This is a hybrid solution because there are agents in
the client environment that would need to be managed.
* Can be provided in Excel
and Browser based systems
or through an agent that
decrypts data files for use
by a third party
application.
Data is encrypted at all times when outside of the client environment
IRI
Kantar
Millward
Brown
Mindshare
Nielsen
Dispatches
Litmus
Kantar
Ireland
3. Hybrid
Cloud VM in VPN
(Private Cloud)
Finance
Client Team
Client Firewall
VM
VMVM
DataShaka produce a managed set of VMs within the client
VPN. This utilises the platform as is for ‘normal’ data but
hosts finance data in a secure environment. The DISQ
registry is used to present a single interface in a secure way
from both sets.
This could be extended to an entire private cloud version of
the DataShaka platform.
IRI
Kantar
Millward
Brown
Mindshare
Nielsen
Dispatches
Litmus
Kantar
Ireland
4. On Premise Appliance
Finance
Client Team
Client Firewall
The DataShaka platform is provided on managed hardware
to be run within your data centre.
IRI
Kantar
Millward
Brown
Mindshare
Nielsen
Dispatches
Litmus
Kantar
Ireland
5. On Premise ‘your tin’
Finance
Client Team
Client Firewall
The DataShaka platform is provided to be installed on
hardware you manage yourself.
Integrity, Privacy,
Security, Availability…
TCSV Tools
Infrastructure
IPSA IPSA
TCSV
Methodology
O2 ~ DataShaka and Security: Integrity, Privacy, Security and Availability (IPSA)
Stack
(Methodology)
Tools Infrastructure
1. Integrity
(accuracy & consistency)
Consilience allows for data in different places
while retaining one single unified conceptual set.
CAMO describes the methodology for consilience
and TCSV is a CAMO. All of this is built to support
integrity and availability.
There are specific tools for checking data
taxonomy and for missing data. We provide
tools for Data Acceptance Testing (DAT)
As embodied on Windows Azure the DataShaka
tools are constructed to respect the integrity of
the underlying methodology. Each data
operation is recorded to provide full provenance.
2. Privacy
(of client data)
n/a
TCSV tools are designed to work with TCSV in a
content agnostic way. As mentioned in
methodology, privacy is a content specific
concern. Tools for processing TCSV can be
used to perform operations supportive of
privacy such as removal of PII.
The DataShaka platform is fully tenanted by
clients, with no cross pollination of data.
3. Security
(right people + right data)
In TCSV each point is uniquely identifiable by it’s
signature of T,C,S&V and sub-sets are similarly
identifiable. As such, TCSV is ideally suited to
embodiment within a system of individual point
level security/access control and above.
n/a
The DataShaka platform takes advantage of the
built-in security of Windows Azure. We use
Azure in a tenanted manner preventing cross
pollination or action between accounts.
4. Availability
(to SLA)
As with integrity the ‘unification’ methodology is
built for full availability of the unified set. As a
mutable set, enrichment is non-destructive giving
full availability to pre and post enriched queries.
n/a Reliant on infrastructure SLA’s
Quality…
Content Agnosticism alongside quality and matching
TCSV Tools – Content Agnostic
• Enrichment
• Taxonomy Rules
• Missing Data Rules
• DAT
• Query
• Combine
Mutable Chaordic TCSV Set
External Tools – Content Specific
• Matching
• Statistical Models
• Machine Learning
• Content Agnostic Tools work on a ‘100% match’
basis.
• They use configuration files to make queries and
apply rules to TCSV.
• TCSV has Natural Relationships and Natural
Connections built in. The tools help with interpretive
connections.
• External tools use content specific techniques to
establish matches and rules.
• Text Mining
• Statistical Modelling
• Fuzzy Logic
• Machine Learning
• These can be more traditional MDM tools
• Deceased Suppressions
• Address/Person Matching
• Fuzzy Matching
• External Tools generate rules and new TCSV to enrich
and manipulate the TCSV set.
DataShaka & Hadoop…
Raw Data
TCSV and Hadoop
IM Post on one way of doing it. http://www.datashaka.com/blog/techie/2014/02/how-do-you-get-an-elephant-to-speak-tcsv-
hdinsight
This is using the technology called Hive for allowing SQL like queries against Hadoop.
Another option on vanilla Hadoop is essentially when one is thinking of HDFS one can think of TCSV in terms of files. Using parsers to make raw data into
TCSV you remove the unhelpful differences and semi-structure the data. This allows you to take advantage of the consilience of TCSV while maintaining the
massive parallelism of Hadoop. TCSV can, of course, be stored outside the HDFS and, essentially, accessed via API or DISQ.
Query
MapReduce
Alternatives Alternatives
API
DISQ
Single Customer View
A customer exists in the ‘real world’
In data, a customer is represented by a set of
identifying features
These features include location, device, and
many other useful things.
These features change over time for any
individual customer
Because it is Content Agnostic and
connectionist, TCSV captures a customer,
indeed any discreet entity, and all of it’s
features as they change over time.
One point in time 4 sources
Twitter Handle
Name
Device
Mobile Number
User id
100% matches are
automatically connected
as ‘C’ is held uniquely in 1
unified set.
Interpretive connections
can be made using TCSV
interpretation.
These sources share ‘id’
‘mobile’ number and
Device. As such,
connections can be
added.
External Tool Rule derived externally
used to add new
connections
2014-06-01
Likes Nutrigum
Brand
16000
Signal
Time
Context
Context Type
Value
1200
Value
Followers
Signal
Time
Brand
1200
Context
Context
Type
Value
Nutrigum
Enrichedasnotavailableinsourcedata
Signal
Followers
Handle
@Ntrgm
Context
Context
Type
Context Twitter
Source
Context
Type
NB: Source often
added as context,
context can be
anything that might
be useful
2014-06-01
Time
Open…
Platform
SmartView
Platform
Tools Tools Tools
SmartView
Platform
Tools Tools
Open
Tools
SmV
Proving the technology Tools in other stacks TCSV for everyone
API API API
Data Landscaping…
Distant
Close
Dark
Light
• External
• Can Use
• Internal
• Can Use
• External
• Can’t Use
• Internal
• Can’t Use
Distant
Close
Dark
Light
• External
• Can Use
• Internal
• Can Use
• External
• Can’t Use
• Internal
• Can’t Use
Access
Education
Distant
Close
Dark
Light
CRF requests
Build CRM
SelfServeReporting
“SingleSourceOfTruth”data
teamreports
Web Data
Red Nose Day Web Data
• World changing fast (obvs)
• Old methods are not fit for purpose (to become a digital player)
• Time to think different (to coin a phrase)
• Why we made the decisions we made
• Data as FCC
• All data is not equal
• Exploiting information for value
• Data as fuel not a brake to an organisation (useful)
• Data as a service
• The data supply chain problem
• Flow of clean, curated, useful data
• Conformity (first) not move crap around
• SCV ‘story’
• Reducing costs
• Driving revenue (through better personalisation/enhanced provision)
• How continue to be relevant to new markets
• Sensor networks- IoT/M2M
• Deliver & exploit faster
• Power your transformation AND Drive value quickly/Quick wins (Rapid POC)
• No need to trade integrity (incl Quality) for agility (false compromise)
• Complementary to existing infrastructure & partners – TD, HW, Trillium etc (don’t slag off Hadoop)
• Plug ins (security)
• Want to be more than phones – a platform to sell other stuff
• Potential low cost architecture to leverage (Linux)
• Open agenda
Not to be presented
Efficiency and
Learning through
data
Efficiency through tooling and automation
Handles ever-increasing and ever-changing data
Comic Relief data team provide data products
• Self Serve
• Single Source Of Truth
‘Every’ team can use and learn from data
E.g. Marketing/Campaign
Including self serve query
Better informed marketing and campaigns
drive better charitable actions and more donations
Flexible and quality controlled
data acquisition for ever-changing sources
Easy access to quality data
Controlled, rational
easy to maintain
‘Data Lake’
5. Storage
6. Query
8. Reporting
7. Unparsing
4. Quality
1. Recording Action
2. Acquisition
10. Decision
Making
9. Analytics
3. Parsing
Store
Harvest
“Everything is a source...”
http
file
FTP
email
API
market
place
secure
server
Unify
DISQ
Unstructured
Relational
Graph
In Memory
Document Store
File System
Big Table
Deliver
Enterprise
Data Store
Time
T
Unified
Data
Context
C
Signal
S
Value
V

More Related Content

What's hot

Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asiaMuhammad Rifqi
 
Continuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseContinuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseDataWorks Summit
 
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupUsing Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupStratio
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureKhalid Salama
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data SolutionsGuido Schmutz
 
3 guiding priciples to improve data security
3 guiding priciples to improve data security3 guiding priciples to improve data security
3 guiding priciples to improve data securityKeith Braswell
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing DataWorks Summit
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsKamalika Dutta
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
The 5 Keys to a Killer Data Lake
The 5 Keys to a Killer Data LakeThe 5 Keys to a Killer Data Lake
The 5 Keys to a Killer Data LakeDataWorks Summit
 
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...Denodo
 
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...Data Con LA
 
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESBData Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESBDenodo
 
Data Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsData Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsSnapLogic
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Data Aggregation, Curation and analytics for security and situational awareness
Data Aggregation, Curation and analytics for security and situational awarenessData Aggregation, Curation and analytics for security and situational awareness
Data Aggregation, Curation and analytics for security and situational awarenessDataWorks Summit/Hadoop Summit
 
Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...
Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...
Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...DataWorks Summit
 

What's hot (20)

Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 
Continuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseContinuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the Enterprise
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupUsing Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
 
Big data summary_v2.1
Big data summary_v2.1Big data summary_v2.1
Big data summary_v2.1
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft Azure
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data Solutions
 
3 guiding priciples to improve data security
3 guiding priciples to improve data security3 guiding priciples to improve data security
3 guiding priciples to improve data security
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing
 
Big Data Landscape 2016
Big Data Landscape 2016Big Data Landscape 2016
Big Data Landscape 2016
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
The 5 Keys to a Killer Data Lake
The 5 Keys to a Killer Data LakeThe 5 Keys to a Killer Data Lake
The 5 Keys to a Killer Data Lake
 
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
 
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
 
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESBData Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
 
Data Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsData Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management Requirements
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Data Aggregation, Curation and analytics for security and situational awareness
Data Aggregation, Curation and analytics for security and situational awarenessData Aggregation, Curation and analytics for security and situational awareness
Data Aggregation, Curation and analytics for security and situational awareness
 
Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...
Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...
Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...
 

Viewers also liked

Kako Je Nastala Fontana
Kako Je Nastala FontanaKako Je Nastala Fontana
Kako Je Nastala Fontanaudfontana
 
The Quality Of Online Social Relationships, The
The Quality Of Online Social Relationships, TheThe Quality Of Online Social Relationships, The
The Quality Of Online Social Relationships, Theaccordionpolka
 
Radio Medij Jednakih Mogucnosti Za Sve
Radio   Medij Jednakih Mogucnosti Za SveRadio   Medij Jednakih Mogucnosti Za Sve
Radio Medij Jednakih Mogucnosti Za Sveudfontana
 
Poi politika
Poi politikaPoi politika
Poi politikaudfontana
 
Life in a fast moving tech company
Life in a fast moving tech companyLife in a fast moving tech company
Life in a fast moving tech companyRichard Edwards
 
Iskustva Iz Italije
Iskustva Iz ItalijeIskustva Iz Italije
Iskustva Iz Italijeudfontana
 
Rodno budzetiranje
Rodno budzetiranjeRodno budzetiranje
Rodno budzetiranjeudfontana
 
Prirucnik za uvodjenje principa rodne ravnopravnosti u javne politike
Prirucnik za uvodjenje principa rodne ravnopravnosti u javne politikePrirucnik za uvodjenje principa rodne ravnopravnosti u javne politike
Prirucnik za uvodjenje principa rodne ravnopravnosti u javne politikeudfontana
 
Istrazivanje bh novinari
Istrazivanje bh novinariIstrazivanje bh novinari
Istrazivanje bh novinariudfontana
 
4 slides for Strata (if we win...)
4 slides for Strata (if we win...)4 slides for Strata (if we win...)
4 slides for Strata (if we win...)Richard Edwards
 
Specificni Programi Za Razlicite Ciljne Grupe
Specificni Programi Za Razlicite Ciljne GrupeSpecificni Programi Za Razlicite Ciljne Grupe
Specificni Programi Za Razlicite Ciljne Grupeudfontana
 
Ljudskapravaosobasainvaliditetom
LjudskapravaosobasainvaliditetomLjudskapravaosobasainvaliditetom
Ljudskapravaosobasainvaliditetomudfontana
 

Viewers also liked (18)

Kako Je Nastala Fontana
Kako Je Nastala FontanaKako Je Nastala Fontana
Kako Je Nastala Fontana
 
LaComunity-CoSession Infonomia
LaComunity-CoSession InfonomiaLaComunity-CoSession Infonomia
LaComunity-CoSession Infonomia
 
The Quality Of Online Social Relationships, The
The Quality Of Online Social Relationships, TheThe Quality Of Online Social Relationships, The
The Quality Of Online Social Relationships, The
 
Aka
AkaAka
Aka
 
Radio Medij Jednakih Mogucnosti Za Sve
Radio   Medij Jednakih Mogucnosti Za SveRadio   Medij Jednakih Mogucnosti Za Sve
Radio Medij Jednakih Mogucnosti Za Sve
 
Poi politika
Poi politikaPoi politika
Poi politika
 
Life in a fast moving tech company
Life in a fast moving tech companyLife in a fast moving tech company
Life in a fast moving tech company
 
Iskustva Iz Italije
Iskustva Iz ItalijeIskustva Iz Italije
Iskustva Iz Italije
 
Rodno budzetiranje
Rodno budzetiranjeRodno budzetiranje
Rodno budzetiranje
 
Upf 2 Març True Brands
Upf 2 Març  True BrandsUpf 2 Març  True Brands
Upf 2 Març True Brands
 
Prirucnik za uvodjenje principa rodne ravnopravnosti u javne politike
Prirucnik za uvodjenje principa rodne ravnopravnosti u javne politikePrirucnik za uvodjenje principa rodne ravnopravnosti u javne politike
Prirucnik za uvodjenje principa rodne ravnopravnosti u javne politike
 
Istrazivanje bh novinari
Istrazivanje bh novinariIstrazivanje bh novinari
Istrazivanje bh novinari
 
4 slides for Strata (if we win...)
4 slides for Strata (if we win...)4 slides for Strata (if we win...)
4 slides for Strata (if we win...)
 
True Brand. UPF. Cast
True Brand. UPF. CastTrue Brand. UPF. Cast
True Brand. UPF. Cast
 
Specificni Programi Za Razlicite Ciljne Grupe
Specificni Programi Za Razlicite Ciljne GrupeSpecificni Programi Za Razlicite Ciljne Grupe
Specificni Programi Za Razlicite Ciljne Grupe
 
Pavimenti in resina
Pavimenti in resinaPavimenti in resina
Pavimenti in resina
 
Ljudskapravaosobasainvaliditetom
LjudskapravaosobasainvaliditetomLjudskapravaosobasainvaliditetom
Ljudskapravaosobasainvaliditetom
 
Pasapalabra de química
Pasapalabra de químicaPasapalabra de química
Pasapalabra de química
 

Similar to O2 060814

Confluent:AWS - GameDay.pptx
 Confluent:AWS - GameDay.pptx Confluent:AWS - GameDay.pptx
Confluent:AWS - GameDay.pptxAhmed791434
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorialrustd
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchSheetal Pratik
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dataconomy Media
 
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...StampedeCon
 
Reliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTReliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTGuido Schmutz
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Denodo
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Denodo
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsStreamsets Inc.
 
Accelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success StoriesAccelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success StoriesCambridge Semantics
 
Unlock value with Confluent and AWS.pptx
Unlock value with Confluent and AWS.pptxUnlock value with Confluent and AWS.pptx
Unlock value with Confluent and AWS.pptxAhmed791434
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Rackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWSRackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWSAmazon Web Services
 
TUW-ASE-Summer 2014: Data as a Service – Concepts, Design & Implementation, a...
TUW-ASE-Summer 2014: Data as a Service – Concepts, Design & Implementation, a...TUW-ASE-Summer 2014: Data as a Service – Concepts, Design & Implementation, a...
TUW-ASE-Summer 2014: Data as a Service – Concepts, Design & Implementation, a...Hong-Linh Truong
 
Virdatint Distributed Data Virtualization Basics_2.6
Virdatint Distributed Data Virtualization Basics_2.6Virdatint Distributed Data Virtualization Basics_2.6
Virdatint Distributed Data Virtualization Basics_2.6Virdatint
 
How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...
How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...
How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...Amazon Web Services
 
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...Amazon Web Services
 
TUW-ASE Summer 2015: Data as a Service - Models and Data Concerns
TUW-ASE Summer 2015: Data as a Service - Models and Data ConcernsTUW-ASE Summer 2015: Data as a Service - Models and Data Concerns
TUW-ASE Summer 2015: Data as a Service - Models and Data ConcernsHong-Linh Truong
 

Similar to O2 060814 (20)

Confluent:AWS - GameDay.pptx
 Confluent:AWS - GameDay.pptx Confluent:AWS - GameDay.pptx
Confluent:AWS - GameDay.pptx
 
HIPAA Compliance in the Cloud
HIPAA Compliance in the CloudHIPAA Compliance in the Cloud
HIPAA Compliance in the Cloud
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorial
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
 
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
 
Reliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTReliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoT
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
 
Accelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success StoriesAccelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success Stories
 
Engineering Data Pipeline for Data-Driven Analytics
Engineering Data Pipeline for Data-Driven AnalyticsEngineering Data Pipeline for Data-Driven Analytics
Engineering Data Pipeline for Data-Driven Analytics
 
Unlock value with Confluent and AWS.pptx
Unlock value with Confluent and AWS.pptxUnlock value with Confluent and AWS.pptx
Unlock value with Confluent and AWS.pptx
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Rackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWSRackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWS
 
TUW-ASE-Summer 2014: Data as a Service – Concepts, Design & Implementation, a...
TUW-ASE-Summer 2014: Data as a Service – Concepts, Design & Implementation, a...TUW-ASE-Summer 2014: Data as a Service – Concepts, Design & Implementation, a...
TUW-ASE-Summer 2014: Data as a Service – Concepts, Design & Implementation, a...
 
Virdatint Distributed Data Virtualization Basics_2.6
Virdatint Distributed Data Virtualization Basics_2.6Virdatint Distributed Data Virtualization Basics_2.6
Virdatint Distributed Data Virtualization Basics_2.6
 
How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...
How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...
How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...
 
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
 
TUW-ASE Summer 2015: Data as a Service - Models and Data Concerns
TUW-ASE Summer 2015: Data as a Service - Models and Data ConcernsTUW-ASE Summer 2015: Data as a Service - Models and Data Concerns
TUW-ASE Summer 2015: Data as a Service - Models and Data Concerns
 

Recently uploaded

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 

Recently uploaded (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 

O2 060814

  • 1. Thinking differently about data… 6th August 2014
  • 2. • Why? – change, old methods, useful • What? – new, FCC, not all data is equal • How? – new fabric - TCSV • SCV • Open • Speed to value (tactical & strategic) • No compromise on quality/integrity • Complementary (to kit & thinking)
  • 4.
  • 7. We’re on a mission is to make data play nicely 
  • 10.
  • 13. What?
  • 14. A new data fabric…
  • 16. Sales
  • 17. { "name":“Salesforce" "followers_count": 39061, "friends_count": 12986, "listed_count": 917, } Harvested at 2013-10-01 17:35:00 { "category": "Company", "talking_about_count": 58550, "username": "healthyx", "likes": 1985655, "link": "http://healthyx" } Harvested at 2013-10-01 19:12:00 <performance> <account> Healthy X Limited </account> <cam>nutrigum – branding</cam> <data> <date v=“2013-10-01”> <impr>14000</impr> <clk>1500</clk> <cnv>10</cnv> </date> </data> </ performance > Adserver Sales <filterTags>Nutrigum</filterTags> <tagStats> <tag>~SOURCE~t</tag> <tagDisplayName> TWITTER </tagDisplayName> <matchCount>71</matchCount> <popularity> <popularityCount> <timeInterval> 2013-10-01 </timeInterval> <count>2.0</count> <normalizedCount> 2.0</normalizedCount> </popularityCount> </popularity> </tagStats> eCRM Date,productId,userId,number,ppu 2013-10-01,123,321,2,5.00 2013-10-01,123,521,1,5.00 2013-10-01,333,444,2,15.00 2013-10-01,854,111,1,20.00 Some Data…
  • 18. There is more than one way to skin this rabbit…
  • 19. Business domain EDW Source domain ETL Acquisition QueriesCouple to Conform warehouse mart mart Legacy CRM Agency Media Plan REST SOAP ODBC Email Indicates schema coupled operation. Changes in 1 lead to changes in many others. Select Calculate Select Calculate Select Join Select Aggregate Split Join Sort Quality Control Select Aggregate Calculate Select Aggregate Join Select Aggregate Calculate Select Join Aggregate Select Join Calculate Aggregate
  • 20. Business domain Hadoop Source domain HDFS Acquisition QueriesCouple to Conform Legacy CRM Agency Media Plan REST SOAP Sqoop Email Indicates schema coupled operation. Changes in 1 lead to changes in many others. Select Calculate Select Calculate Select Join Select Aggregate Split Join Sort Quality Control Map Reduce Job Storage
  • 21. TCSV Data DomainSource domain Acquisition Legacy CRM Agency Media Plan REST SOAP ODBC Email Storage Parse Business domain Quality Control Enrichment Data Acceptance Calculation Indicates schema coupled operation Queries Unparse Pure TCSV operation
  • 23.
  • 24. How?
  • 26. IRI Kantar Millward Brown Mindshare Finance Nielsen Dispatches Litmus Kantar Ireland 1. Cloud Client Team Client Firewall All data is harvested as normal into the DataShaka platform. The DataShaka platform is as secure as Azure.
  • 27. IRI Kantar Millward Brown Mindshare Nielsen Dispatches Litmus Kantar Ireland 2. Cloud Encrypted Data Harvest Finance Client Team Client Firewall Encryption Agent Delivery Decryption* Data from sensitive sources inside (or outside) the client environment is encrypted at the value level (TCSV’s V) and decrypted on delivery of data. Everything else is handled normally. This is a hybrid solution because there are agents in the client environment that would need to be managed. * Can be provided in Excel and Browser based systems or through an agent that decrypts data files for use by a third party application. Data is encrypted at all times when outside of the client environment
  • 28. IRI Kantar Millward Brown Mindshare Nielsen Dispatches Litmus Kantar Ireland 3. Hybrid Cloud VM in VPN (Private Cloud) Finance Client Team Client Firewall VM VMVM DataShaka produce a managed set of VMs within the client VPN. This utilises the platform as is for ‘normal’ data but hosts finance data in a secure environment. The DISQ registry is used to present a single interface in a secure way from both sets. This could be extended to an entire private cloud version of the DataShaka platform.
  • 29. IRI Kantar Millward Brown Mindshare Nielsen Dispatches Litmus Kantar Ireland 4. On Premise Appliance Finance Client Team Client Firewall The DataShaka platform is provided on managed hardware to be run within your data centre.
  • 30. IRI Kantar Millward Brown Mindshare Nielsen Dispatches Litmus Kantar Ireland 5. On Premise ‘your tin’ Finance Client Team Client Firewall The DataShaka platform is provided to be installed on hardware you manage yourself.
  • 33. O2 ~ DataShaka and Security: Integrity, Privacy, Security and Availability (IPSA) Stack (Methodology) Tools Infrastructure 1. Integrity (accuracy & consistency) Consilience allows for data in different places while retaining one single unified conceptual set. CAMO describes the methodology for consilience and TCSV is a CAMO. All of this is built to support integrity and availability. There are specific tools for checking data taxonomy and for missing data. We provide tools for Data Acceptance Testing (DAT) As embodied on Windows Azure the DataShaka tools are constructed to respect the integrity of the underlying methodology. Each data operation is recorded to provide full provenance. 2. Privacy (of client data) n/a TCSV tools are designed to work with TCSV in a content agnostic way. As mentioned in methodology, privacy is a content specific concern. Tools for processing TCSV can be used to perform operations supportive of privacy such as removal of PII. The DataShaka platform is fully tenanted by clients, with no cross pollination of data. 3. Security (right people + right data) In TCSV each point is uniquely identifiable by it’s signature of T,C,S&V and sub-sets are similarly identifiable. As such, TCSV is ideally suited to embodiment within a system of individual point level security/access control and above. n/a The DataShaka platform takes advantage of the built-in security of Windows Azure. We use Azure in a tenanted manner preventing cross pollination or action between accounts. 4. Availability (to SLA) As with integrity the ‘unification’ methodology is built for full availability of the unified set. As a mutable set, enrichment is non-destructive giving full availability to pre and post enriched queries. n/a Reliant on infrastructure SLA’s
  • 35. Content Agnosticism alongside quality and matching TCSV Tools – Content Agnostic • Enrichment • Taxonomy Rules • Missing Data Rules • DAT • Query • Combine Mutable Chaordic TCSV Set External Tools – Content Specific • Matching • Statistical Models • Machine Learning • Content Agnostic Tools work on a ‘100% match’ basis. • They use configuration files to make queries and apply rules to TCSV. • TCSV has Natural Relationships and Natural Connections built in. The tools help with interpretive connections. • External tools use content specific techniques to establish matches and rules. • Text Mining • Statistical Modelling • Fuzzy Logic • Machine Learning • These can be more traditional MDM tools • Deceased Suppressions • Address/Person Matching • Fuzzy Matching • External Tools generate rules and new TCSV to enrich and manipulate the TCSV set.
  • 37. Raw Data TCSV and Hadoop IM Post on one way of doing it. http://www.datashaka.com/blog/techie/2014/02/how-do-you-get-an-elephant-to-speak-tcsv- hdinsight This is using the technology called Hive for allowing SQL like queries against Hadoop. Another option on vanilla Hadoop is essentially when one is thinking of HDFS one can think of TCSV in terms of files. Using parsers to make raw data into TCSV you remove the unhelpful differences and semi-structure the data. This allows you to take advantage of the consilience of TCSV while maintaining the massive parallelism of Hadoop. TCSV can, of course, be stored outside the HDFS and, essentially, accessed via API or DISQ. Query MapReduce Alternatives Alternatives API DISQ
  • 39. A customer exists in the ‘real world’ In data, a customer is represented by a set of identifying features These features include location, device, and many other useful things. These features change over time for any individual customer Because it is Content Agnostic and connectionist, TCSV captures a customer, indeed any discreet entity, and all of it’s features as they change over time.
  • 40. One point in time 4 sources Twitter Handle Name Device Mobile Number User id 100% matches are automatically connected as ‘C’ is held uniquely in 1 unified set. Interpretive connections can be made using TCSV interpretation. These sources share ‘id’ ‘mobile’ number and Device. As such, connections can be added.
  • 41. External Tool Rule derived externally used to add new connections
  • 44.
  • 46. Platform SmartView Platform Tools Tools Tools SmartView Platform Tools Tools Open Tools SmV Proving the technology Tools in other stacks TCSV for everyone API API API
  • 48. Distant Close Dark Light • External • Can Use • Internal • Can Use • External • Can’t Use • Internal • Can’t Use
  • 49. Distant Close Dark Light • External • Can Use • Internal • Can Use • External • Can’t Use • Internal • Can’t Use Access Education
  • 51.
  • 52. • World changing fast (obvs) • Old methods are not fit for purpose (to become a digital player) • Time to think different (to coin a phrase) • Why we made the decisions we made • Data as FCC • All data is not equal • Exploiting information for value • Data as fuel not a brake to an organisation (useful) • Data as a service • The data supply chain problem • Flow of clean, curated, useful data • Conformity (first) not move crap around • SCV ‘story’ • Reducing costs • Driving revenue (through better personalisation/enhanced provision) • How continue to be relevant to new markets • Sensor networks- IoT/M2M • Deliver & exploit faster • Power your transformation AND Drive value quickly/Quick wins (Rapid POC) • No need to trade integrity (incl Quality) for agility (false compromise) • Complementary to existing infrastructure & partners – TD, HW, Trillium etc (don’t slag off Hadoop) • Plug ins (security) • Want to be more than phones – a platform to sell other stuff • Potential low cost architecture to leverage (Linux) • Open agenda Not to be presented
  • 53. Efficiency and Learning through data Efficiency through tooling and automation Handles ever-increasing and ever-changing data Comic Relief data team provide data products • Self Serve • Single Source Of Truth ‘Every’ team can use and learn from data E.g. Marketing/Campaign Including self serve query Better informed marketing and campaigns drive better charitable actions and more donations Flexible and quality controlled data acquisition for ever-changing sources Easy access to quality data Controlled, rational easy to maintain ‘Data Lake’
  • 54. 5. Storage 6. Query 8. Reporting 7. Unparsing 4. Quality 1. Recording Action 2. Acquisition 10. Decision Making 9. Analytics 3. Parsing
  • 55. Store Harvest “Everything is a source...” http file FTP email API market place secure server Unify DISQ Unstructured Relational Graph In Memory Document Store File System Big Table Deliver Enterprise Data Store Time T Unified Data Context C Signal S Value V

Editor's Notes

  1. A21 story
  2. Well, sorry to pop that balloon, I’m going to talk about the elephant in the room of Big Data. The fact that extracting the value from BD is not a simple switch flick. It is non-trivial and it is difficult. Why?
  3. People talk idly about bringing all different kinds of data together to generate new insight but the truth is that this is hard to do.
  4. People talk idly about bringing all different kinds of data together to generate new insight but the truth is that this is hard to do.
  5. Hard to do at scale, at speed, at low cost and with agility (all of which of critical ioho)
  6. People talk idly about bringing all different kinds of data together to generate new insight but the truth is that this is hard to do.
  7. Wave RFP This is what we do! (and more)
  8. Repeat key for TCSV
  9. People talk idly about bringing all different kinds of data together to generate new insight but the truth is that this is hard to do.
  10. And self serve
  11. Too broad, ‘all things…’ H&P are buying this  Is the market big enough? Can we take a big enough % (lack of competition etc) Where do we have an unfair advantage? Focus down Into TCSV as quickly as possible ETL for unstructured data
  12. Data supply chain