5. Security in the SaaS world
• Security Policies/requirements are developed for on premises
solutions.
• In many cases SaaS applications are a initiated by the business
• SaaS providers implement ‘some’ security, but does it fit my needs?
• Limited control/visibility what users are doing in the cloud.
• No visibility over anomalies over different applications.
6. Security in the SaaS world
Does not meet
requirements
SHOWSTOPPER
Requirements
met by adding
control
COMPENSATED
Requirements
met by SaaS
provider
ACCEPTABLE
Change architecture
Adjustment expectations
Src: http://www.gartner.com/webinar/3100619
7. Evolution in security
Transport
• IP Firewalling
• Segmentation
Protocol inspection
• Proxies
• Deep inspection
Application Protection
• MDM
• Web Application
Firewalls
Data Centric
Audit & Protection
(DCAP)
• CASB
• SPSM
• CDPG
+ Unmanaged devices
Shadow IT
Company data is spread over multiple providers
How to protect DATA?
Note trend of
ABAC in DEV
10. CASB (Gartner)
• on-premises, or cloud-based security policy enforcement points
• placed between cloud service consumers and cloud service providers
• to combine and interject enterprise security policies as the cloud-
based resources are accessed.
• consolidate multiple types of security policy enforcement.
http://www.gartner.com/it-glossary/cloud-access-security-
brokers-casbs
11. Options to add security
SaaS
IaaS /PaaS
SPSM
Saas Platform Security
Management
CASB
Cloud Access Security
Brokers
CDPG
Cloud Data Protection Gateway
Encryption
Tokenization
Masking
User activity
monitoring
Data discovery
DLP
Remediation
Usage discovery
User activity monitoring
DLP (passive and active)
User activity blocking (real time)
Data discovery
SSO
Vendors: http://www.gartner.com/webinar/3100619
REALTIME RETROACTIVE
18. Call to action
• Detect shadow IT today (=High Risk)
• Start controlling access to SaaS applications
• Get visibility over user activity in SaaS applications
• Protect your company data in SaaS applications
22. PaaS?
• Provides a platform for:
• Development (cloud native apps)
• Content distribution (media / CDN)
• Internet of Things
• Automation
• Data processing & analytics
23. Data and data analytics?
Prescriptive
analytics
Predictive
Analytics
Diagnostic Analysis
Descriptive Analytics
Data Collection
Big Data
IoT
25. Debunking myths on data & analytics in the cloud
• Myth #1 – Predictive analytics & big data are just BI on steroids
• Myth #2 – All my data needs to go to the cloud! Y0u f00lz cr4zy?
• Myth #3 – You need to hold 3 PhD’s to do predictive analytics
Myth
confirmed?
Is it
plausible?
Blow everything up
No No
Yes
Yes
Myth Debunking Flowchart
26. Agenda
• PaaS?
• Myth #1
“Predictive analytics & big data are just BI on steroids”
• Myth #2
• Myth #3
• Conclusions
27. New in the data landscape…
1. “Big” data
2. “Artificial Intelligence” & learning from data
3. Fast & ubiquitious network connectivity
Evolution of data
28. (R)Evolution in data, the questions & tooling
Standard reports
Ad-hoc reports
Query & drilldown
Alerts
Statistical analysis
Forecasting/extrapolation
Predictive modeling
Optimization
Degree of intelligence
Value
Descriptive
Analytics
Predictive
Analytics
What happened?
How many? How often? Where?
Where exactly is the problem?
What actions are needed?
Why is this happening?
What if these trends continue?
What will happen next?
What is the best that can happen?
Traditional BI questions
ETL Tools, SQL & variants
Big data, or not.
New type of questions
New tooling, ELT,
machine learning, …
29. Big Data Traditional BI
Predictive
Analytics
• BI and Predictive Analytics worlds are converging :
• BI platform extensions to Big Data-esque & Advanced Analytics-y operations
• Big Data tooling gets SQL-like interfaces:
Drill, Impala, Hive, SparkSQL, HAWQ, Presto, Vortex, …
• Big Data tooling can do descriptive and predictive analytics:
MLLib, H2O, Oryx, Mahout, SAMOA, FlinkML, …
(R)Evolution & convolution
30. Agenda
• PaaS?
• Myth #1
• Myth #2
“All my data needs to go to the cloud! Y0u f00lz cr4zy?”
• Myth #3
• Conclusions
31. On-premises or cloud?
• Advantages of cloud:
• Start fast & fail fast
• Easy consumption of created data models
• Democratic in pricing & availability of algorithms
• Attention points for cloud (mostly exceptions!):
• Data privacy: legislation ↔ provider
• Data volume & velocity: bandwidth
33. Conclusion
• Compliant solutions available through provider
• Subsetting & anonimization easily possible with data transfer tools
34. Agenda
• PaaS?
• Myth #1
• Myth #2
• Myth #3
“You need to hold 3 PhD’s to do advanced analytics”
• Conclusions
35. Predictive Analytics
• Azure ML studio has a low learning curve
• Modular, drag & drop
• Pre-built machine learning algoritms with meaningful default settings
• Use case: very easy to publish
“predictive engine” for your
own applications
• Do you need expert knowledge?
• Is the out of the box 70% accuracy sufficient?
• Or do you need 95% prediction accuracy?
36. Example: predicting Belgian house prices
Model Features Prediction
accuracy
Linear 1 Just based on m2 living area 48,40%
Linear 2 m2 living area & postal code 69,43%
Linear 3 m2 living area, postal code, #
bedrooms, house type
70,36%
Decision Tree 1 m2 living area, postal code, #
bedrooms, house type
70,41%
Linear 4 Linear in: postal code, # bedrooms,
house type
3rd power in: m2 living area
71,17%
38. Conclusions
• Three valid use cases for data in the cloud:
• Reporting & analytics on big data sets, with new types of intelligence
• Storing and synchronizing (subsets of) your data in the cloud
• Adding intelligence to existing applications you develop
• Advantages of cloud:
• Easy to start, quick to get to results, fast decommissioning once completed
• Democratizing of tools & algorithmes lowers starting threshold
• Xylos can help with:
• advanced expertise (data scientists)
• data collection & storage expertise
• data consumption / visualization expertise
Andere segmeten
Uitleggen dat het segmentatie per functionaleit zetten
From now on: All is CASB
PaaS in a nutshell: the provider gives you everything you need, a platform, to get started with applications & data…
What do you use that platform for?
- You see some example use cases here
- They all have in common that you still need to do something “developish” to get it to work. The platform gives you the building blocks to construct a solution – it is not a ready to use piece of software by itself, that’s the SaaS world…
There are many interesting use cases for all these scenarios that you can use Platform As A Service for, but we’ll focus on data in the next 15 mins.
Collect data
Then do some basic statistics on that collected data, that is descriptive analytics like “what was the average revenu per sales last year”
You can also go further and not only look at the data from the past, but also learn from it… and based on what you learn, you can make predictions of the future… that is the realm of predictive analytics with methods like statistical learning or machine learning.
This is of course very related to other trends such as Internet of Things and Big Data…
-> You probably want to collect information from your things
-> If there are a lot of things, it can become a big data problem, where again you can do descriptive & predictive analytics.
Important:
-> Internet of Things discussion is mostly about hardware, which is nice & fun – we like that a lot as geeks – but the business value is of course in what you do with the data … what you do with the data is analytics.
-> Big Data platforms provide you with the tools to process terabytes & petabytes of data, but the real value is not in the tooling or the development inside these tools – also interesting & necessary – but rather again what you do, this time with the humungeous amount of data…
So it is all related, but for us, the analytics is in the center… it’s the driver why you do IoT or Big Data…
Myth 1 -- big data & analytics is nothing new… hrmmmppfffff
Myth 2 – All your data needs to go to the cloud, are you crazy? … Crazy… yes… the rest…. Hrmmmppfffff
Myth 3 – In order to do predictive analytics, you need to have at least 3 PhD’s in mathematics…. Hrmmmppfffff
So, let’s approach this MythBuster style, and see if we can blow something up…
Big Data
-> Not per se the volume, which is why some people prefer to drop the word “big” … But certainly the VELOCITY of the new data (social media, clickstream), and the VARIETY aspects. Is this useful? Different story, but there is certainly a difference in how data is treated & what strategic value it has, has opposed to earlier.
Artificial Intelligence, like self-driving cars and computers that will take over the world…
-> Much more compute power available so now it is feasible to do training of algorithms and continuously learning from data & making predictions on the spot
This is all related to cheaper hardware (storage for “big” data, compute for AI) but adding to that the fact that we have fast network connectivity almost everywhere, makes every device a possible data source & data consumer.
There is certainly something going on…
Some people call it a revolution, others will call it an evolution, but certainly there is a difference in the tooling and in the questions you can ask.
Subtle but important difference: ETL versus ELT
-> ETL: typically in a data warehouse context, where you take the data, structure it and then do things with
-> ELT: more the data lake approach: store the raw data, and see if and how you structure it afterwards – that is because inserting structure upfront limits what you can do with the data afterwards, it can insert some bias in what you can do with the data.
Example: if you structurally remove timestamps, you cannot ask any time related questions anymore afterwards = TRIVIAL
More subtle: if you remove faulty records, then you cannot ask details about how often corrupt data was sent, or how often a device failed…
Some people call it a revolution, others will call it an evolution, but certainly there is a difference in the tooling and in the questions you can ask.
Conclusion: they have different origin, but are growing closer together – however, there are also some differences in philosophy and approach that
ML tools open source: http://journalofbigdata.springeropen.com/articles/10.1186/s40537-015-0032-1
Advantages:
- You can get started right away, without prior installation of hardware or software. If it doesn’t work or the experiment you are doing fails, you can get rid of it equally fast. The example shown was created & published in 2 hours (!).
- No upfront investment in hardware/software, and the pricing is actually cheap, there is a good value for money.
- Because your data & data processing is in the cloud, you can easily publish it to applications, mobile devices, 3rd parties, …
Attention points:
- Data privacy: just to be clear, the major cloud platform vendors provide a better security layer than any of you can ever do. This is not an attention point about whether your data can be leaked or hacked – but really about government regulation that prevents certain types of data to be placed internationally or across European borders. This CAN be an issue, typically for highly regulated verticals such as healthcare or government. For most of you, it will most likely NOT be a legal issue but rather a “trust” issue between you and the provider. In my personal opinion (yet IANAL), this is a transient problem – if this regulation impedes innovation, then either the regulation will change, or the major cloud providers will work around it. The market opportunity is too big to just let it pass.
- Data volume & velocity CAN be an attention point, in particular if you need a lot of data (volume) or fast data (velocity) to be copied e.g. on-premises. Getting a lot of data inside the cloud is usually easy, getting it out requires more thought. This CAN be an issue in case you need access to raw, unprocessed data. Typically, if you need to store in your on-premises BI platform or data warehouse just summarized data, then there is not really an issue.
Many possible outputs: blob storage, a data base, data warehouse, data lake…
You can upload the data directly to the target, in the file format that is necessary to do so…
Or, you can use the Azure Data Factory to do the necessary transformations on your data. It uses an on-premises component to capture the data and put it in the destination of your choice.
Finally, for streaming data there are also options but let’s not go into details here.
The transformation of the data is key to understand the myth – in this process you can:
- take subsets of data – so certainly not all your data needs to go the cloud
- anonimize your data – typically for descriptive statistics, you need aggregated data (so no specifics about individual items), and for predictive statistics, typically you need numeric data on a case/person, without knowing who that person is.
To summarize
It looks like a scene from the Mad Max movie, but it really is Mythbusters…
Here, the “intelligence” is that we do not explicitly tell our program to multiply for example €1000 per m2 and substract/add a predefined value based on the postal code. Instead, based on the data we feed, we let the algorithm decide by itself what are the appropriate prices per m2 or the added value of having more bedrooms, without defining this “business logic” ourselves.
We see that out of the box, literally in a 2 minute exercise, we can publish a model that gets approximately 50% of the predictions right. This can be enough for your application – for example, if you are writing a game and need for your virtual characters/enemies to predict what the human player characters are going to do… if you get 50% right, it will already be a very though game to play…
If you bring in experts, they will probably tell you to start playing around with:
- More data (increase accuracy to about 70%)
- Use different models (slight increase compared to just more data)
- Start tweaking & tuning the model complexity
The latter is really a battle to conquer every additional percentage of accuracy in predictions. Obviously, to get high degrees of prediction, you will need more skilled people…
Adding more intelligence to your application can be done very easily… so the myth is partially true…