Session présentée lors du SQLSaturday Paris 2014
-----
Cette session traitera de la Gouvernance des Données, principalement autour de la plate-forme Power BI (sans négliger les considérations on-prem). Comment éviter « l’enfer des datasets » ? Quelles sont les bonnes pratiques pour partager des requêtes ? Qui est ce fameux Data-Steward ? Quel est son rôle ? Comment choisir la bonne personne ? … Nous essaierons de répondre à ces questions et de vous donner des orientations avec quelques exemples pendant cette session
4. @Djeepy1 | @Fleid_bi | @GUSS_France
Jean-Pierre Riehl
Membre du Board
http://blog.djeepy1.net
@djeepy1
Pure-Player Microsoft
• Practice Collaboration
• Practice Data & Business Intelligence
• Practice Infrastructure
• Practice Développement
MVP SQL Server
MCSE : SQL Server 2012
MCPD : Enterprise Application
Microsoft Certified Trainer
5. @Djeepy1 | @Fleid_bi | @GUSS_France
Jean-Pierre Riehl
Practice Manager Data & BI – AZEO
MVP SQL Server
President at GUSS
Florian Eiden
Managing Consultant, Data & Analytics - Cellenza
MVP SQL Server
Board Member at GUSS
Who are we ?
6. @Djeepy1 | @Fleid_bi | @GUSS_France
GUSS : PASS France chapter
Webcasts, Conferences, Afterworks
.Pro
Next event :
SQLSaturday Paris 2014
September 13th
Tour Montparnasse, Paris
English-speaking track
8. @Djeepy1 | @Fleid_bi | @GUSS_France
Self-Service BI
Corporate BI
• Managed
• DatawareHouse
• Company-wide
Team BI
• Shared
• Models
• Department-wide
Self-Service BI
• Quick & Easy
• Personal data
• Document-centric
12. @Djeepy1 | @Fleid_bi | @GUSS_France
Writer’s block
also known as
White Worksheet Syndrom
Issue #1
?
13. @Djeepy1 | @Fleid_bi | @GUSS_France
Too much Data !
« I want the Employee’s List »
– Duplicates
– Wrong sources
– Bad Data
– Poor or Bad description
– Etc.
Issue #2
17. @Djeepy1 | @Fleid_bi | @GUSS_France
Features and tools
Analyze
Visualize
Share
Question
Q&A
Mobility
Discover
Search, access, and transform
public and internal data sources
with Power Query
Share datasets and workbooks
refreshable from on-premises
and cloud based data sources,
with Power BI Sites
Easy data modeling and
lightning fast in-memory
analytics with Power Pivot
Bold new interactive data
visualizations with Power View
and Power Map
Ask questions and get
immediate answers with natural
language query
Mobile access through HTML5
and touch optimized apps
Scalable | Manageable | Trusted
18. @Djeepy1 | @Fleid_bi | @GUSS_France
Power BI - Big Picture
Power BI O365 Tenant
Power BI
Admin
Center
SQL
Data
Catalog
External
Data
Q&A
Cloud On-Prem
Oracle …
Excel
Power BI Sites Power Query Power Pivot
Power View Power Map
Cloud
Power Query
Data
Refresh
Index
Search Data
Management
Gateway
19. @Djeepy1 | @Fleid_bi | @GUSS_France
Ideas of costs
Q&A
All Inclusive
(ie. including Office 2013 ProPlus Licences)
20. @Djeepy1 | @Fleid_bi | @GUSS_France
Power BI* On-Prem
Data Sources Gateways & Data Sources
Shared Data Sources (SSRS)
Office Data Connection
Datasets Queries
Shared Datasets (SSRS)
Power Pivot for SharePoint
Models
Power Pivot
Data Management Gateway
Power Pivot for SharePoint
Dashboards Power View
Power View (BISM)
SSRS over Power Pivot
There was some SSBI before Power BI
Power BI vs. On-Prem
* Many more features to come
22. @Djeepy1 | @Fleid_bi | @GUSS_France
• Tools are only a part of the solution
• Good formula : People + Processes + Tools
– « Data governance is between 80 and 95%
communication » - Dec 2006 Data Governance Conference
• We have the tools, let’s talk about the rest…
If all you have is hammer…
24. @Djeepy1 | @Fleid_bi | @GUSS_France
• Wikipedia: Stewardship is an ethic that
embodies the responsible planning and
management of resources.
A Steward?
Data Steward of Gondor…
Not our idea, see Matthew Roche for complaints
25. @Djeepy1 | @Fleid_bi | @GUSS_France
IT : Information Technology
My pretty typical organization
Piercing items
Slashing items
Bludgeoning items
Armors & Shield
Business Divisions Functional Units
Finance HR Legal
26. @Djeepy1 | @Fleid_bi | @GUSS_France
Where are stewards needed in the org?
Piercing items
Slashing items
Bludgeoning items
Armors & Shield
Business Divisions Functional Units
Finance HR Legal
IT : Information Technology
27. @Djeepy1 | @Fleid_bi | @GUSS_France
My organization : actual perception
Piercing items
Slashing items
Bludgeoning items
Armors & Shield
Business Divisions Functional Units
Finance HR Legal
IT
28. @Djeepy1 | @Fleid_bi | @GUSS_France
Well, let’s be honest about what it looks like
Piercing items
Slashing items
Bludgeoning items
Armors & Shield
Business Divisions Functional Units
Finance HR Legal
IT
29. @Djeepy1 | @Fleid_bi | @GUSS_France
For maximum results: local initiatives
Piercing items
Slashing items
Bludgeoning items
Armors & Shield
Business Divisions Functional Units
Finance HR Legal
IT
30. @Djeepy1 | @Fleid_bi | @GUSS_France
• Why :
– Specific to your company, to be defined in your master plan
• How :
– “Responsible planning and management of resources”
• What :
– Elect data stewards that will enable, teach, police
Let’s get back to our steward
Slashing items
31. @Djeepy1 | @Fleid_bi | @GUSS_France
• Skills
– Interpersonal skills
– Good personal organization
– Data-awareness
• Data lifecycle specific to the company
• General understanding of BI/data technologies
• Data merging, cleaning, metadata maintenance
– Training in tools used in the company
• A chosen career path
– It’s an actual job, usually part time
– But not just an additional task in the schedule!
Required skills
33. @Djeepy1 | @Fleid_bi | @GUSS_France
The Journey of a Data Steward
• Help to find data
– Manage the Data Lake
– Create Data Sources
– Facilitate exploration
– Manage metadata
34. @Djeepy1 | @Fleid_bi | @GUSS_France
The Journey of a Data Steward
• Manage new data
– Find new Data Sources
– Find new Datasets
• Verify new datasets
– Check for Accuracy
– Check for duplicates
– Fix sources and queries
• Use of Workflows
35. @Djeepy1 | @Fleid_bi | @GUSS_France
Data Workflows
Create
Derive
Approve
Data Hub
Models, OData, Reports, DWH, MDM, etc.
Publish
Sandox
Enhance
Discovery
Data
Steward
Analyst
Developer
36. @Djeepy1 | @Fleid_bi | @GUSS_France
The Journey of a Data Steward
• Certify
– Ensure Corporate Policies
• Train & Teach
– Help for modeling
– Help for analysis
38. @Djeepy1 | @Fleid_bi | @GUSS_France
Information Management Platform
IT
Developers
Data
Steward
Importance of relations
Business
Users
Tools Tools Tools
39. @Djeepy1 | @Fleid_bi | @GUSS_France
Information Management Platform
Sales
IT
And reality is more complex
Mktg Production
48. @Djeepy1 | @Fleid_bi | @GUSS_France
• Tools are nothing without people and
processes
• Governance is different in every company
– Decided and sponsored by the executives, inscribed in
a global strategy
– Adapted to your organization
– The Data Steward as the local implementation of it
A matter of governance
49. @Djeepy1 | @Fleid_bi | @GUSS_France
1. Build an Information Management Platform
2. Identify your processes & Org Chart
3. Write the Data Steward « Job Profile »
4. Identify the right people for the job
5. Leverage Self-Service BI
• Ask your local experts
How to start tomorrow ?
50. @Djeepy1 | @Fleid_bi | @GUSS_France
• A Data Culture
– See Satya Nadella
April 15th 2015 presentation in SF
• To at last step up in the
knowledge pyramid!
– Machine learning o/
All this for what?
>> Let’s dive in with the current state of data governance
You know that slide.
Our concerns today are on the right side, with Self-Service BI and its natural extension Team BI. (from 2 person, we can tell it’s a team).
SSBI is not new, I use that slide for few years
The idea behind SSBI is to help people to answer business questions by themselves, doing analysis, authoring dasboard, etc.
We, as BI Consultants, when working on data strategies with our customers, big or small ones, we encourage people to use data
Empowering users needs Governance. You have to set some rules, policies to help people and show them the way to act.
Regarding our topic, we talk about Data Governance.
You do not want to turn your data assets into a something chaotic where everybody push or pull data.
So what are issues ?
Analyst’s Block
The problem is not knowing « Which data to get ? ».
We, developers, IT People, we do not have any ideas. Do you remember when you wanted to develop your first Power Query demo ?
No, the problem is « where is it » ? « How can I get it » ?
Does this man is silly ? In a world of Big Data, that guy told us about « too much data » ?
Note, we are not talking about Data but « datasets ».
Duplicates, wrong source, copy of copy of copy…
And the issue is to have enough information to choose between theses datasets.
We can apply the Pareto Principle. 20% of queries are the good ones
Imagine your company have a policy about protecting some data (anonymization, security)
What if I search for « salaries »
Its is an Information issue, notice that you have the same in your corporate portal
Governance implies rules are respected.
What’s in Power BI for Office 365? Let me walk through the new features which include a number of new capabilities within Excel 2013 and Office 365.
Powerful Self-Service BI in Excel 2013: We are taking our most powerful business intelligence solutions and building them directly into Excel. These solutions package the data discovery, analysis and visualization process into one self-service BI solution, which is essential for business users who are looking to get bigger returns on their data. Features include:
For data search and discovery, we’re introducing Power Query, formerly Project codename “Data Explorer.” We’ve created a data search engine so customers can query data from within their business and from external data sources on the Web, all within Excel. We’re working with partners to provide a private version of this search engine so businesses can customize the engine and index the data sources they commonly access. Power Query also cleans and merges data sets from multiple sources, enabling IT and BI users to focus on data insights rather than data management.
For analyzing and modeling data we will continue to offer Power Pivot. Power Pivot enables customers to create flexible models within Excel that can process large data sets very quickly using SQL Server’s in-memory database. Customers can customize the model as needed all within Excel – no extra coding needed.
For visualizing and exploring data we introduced Power View and Power Map, formerly Project codename “Geoflow.”
Using Power View, customers can manipulate data and compile it into charts, graphs, and other visual means. Great for presentations and reports.
Power Map is a 3D data visualization tool for mapping, exploring and interacting with geographic and temporal data. Customers can visually plot up to a million rows of data in 3D on Bing Maps, view data in a geographic space, and share findings through screenshot slides and cinematic, guided video tours.
Collaborate and stay connected with Office 365: While all of these tools enable great self-service BI, asking business users to work within a BI silo significantly decreases the potential value of their data to the entire organization. That’s why we’ve made all of these Excel capabilities available in the cloud in Office 365, so customers can share and access their BI models across the desktop, Web and devices, all in a trusted, managed environment.
To share insights and help customers get answers quickly, we’ve created BI Sites. Within their organization’s trusted environment, BI users can quickly create workspaces in Office 365 to share worksheets with colleagues, collaborate over insights and results, and quickly find data and reports. A couple key features:
We’ve incorporated a natural language query engine that IT can customize to help their users search for specific datasets quickly and easily.
We’ve created a Data Management Gateway, which allows IT to build connections to internal data sources so reports that are published to BI Sites in Office 365 will refresh either on-demand or on a scheduled basis, ensuring that users are always looking at the latest view of their data.
To better manage data, Power BI for Office 365 empowers a business’s IT organization to help its users become their own data stewards. This means that users can grant access to their BI Sites and published models based on their colleague’s credentials. In addition, we’re introducing a number of useful different diagnostic tools so IT can see which data sources are commonly accessed and which are sitting idle and can be decommissioned.
To enable users to stay connected to their data wherever they are, we’ve created a connected BI experience. BI users can access and receive live updates on their reports through their browser with HTML5 or through a mobile application designed for their device.
Enable organizations to extend their existing investments for their on premise data warehouses and operational systems as well as cloud based data sources and Hadoop clusters to create powerful, trusted and easy to use self-service BI solutions that can also monitor employee data access and usage.
Microsoft is uniquely positioned to deliver this solution, which we believe outpaces the rest of the market, in two ways.
Connected Platform: While some vendors offer self-service BI and visualization tools, we are the only vendor that delivers a complete data platform of connected technologies from relational data warehouses like SQL Server to big data stores like HDInsight to end user productivity tools like Office, to enable rapid insights from any data (structured or unstructured), in the tools that they know and love. Power BI for Office 365 brings together the best of Windows Azure and Office 365 bringing Big Data to a billion users..
BI & IT Management Together: Power BI for Office 365 is the industry’s first BI solution that brings self-service BI and essential IT management tools into one service. This allows businesses to leverage internal and external data in a safe environment, increasing collaboration around data insights.
These are public prices. Price depends on your contract (Enterprise agreement, Academic, etc.).
Contact your sales representative
It’s just to give you an idea and to quick compare with other tools and to know if it fits to your needs, your company, etc.
Search for data (corporate)
Join with Other Data (local/web)
Share the « dataset »
Publish Power Pivot
Search for it
>> Thank you JP, all this is really impressive!
But I can’t be the only one thinking that if all you have is a hammer, everything looks like a nail
1
MS is one of the best software company, and we can’t blame them to see software as the solution to everything – Even if the new CEO Satya Nadella seems willing to follow a different path with a focus on productivity that may not limits itself to tools?
Anyway if your only plan to solve the poor customer relationship habits in your company is deploying Dynamics, you’re in for disappointment…
2
The good way to look at any problem in an organization is to use that formula
Ok, equip your people with the best of tools (see Joel Spolsky tests), but you have to check that they know what to with it, how to do it, and why (in the reverse order, see Simon Sinek at Ted for that)
>> JP has given us the tools, let’s see the rest
>> Let’s start with people, and define what is a data steward and what role he holds in data governance
>> And we may begin by defining stewardship with a good wikipedia definition
Responsible planning and management of resources. In our case : data artifacts
That’s a perfect concept because that would adress all issues that JP was talking earlier. What about deploying stewards accross the company to do just that?
Data Steward of Gondor, if you forget the end of the story… finally it’s a kind of better image an alternative such as picturing a flight attendant.
>>Why? Because flight attendants are from the Airline company, not from the passenger group, and it’s a crucial distinction that we will clarify right now
>> To do so, let’s picture an organization doing some business in… apparently weapons!
Usual BU (those making products and money) and Functional units (those cost centers that we should get ride off… a topic for another time!)
>> Now let’s put our stewards on the map!
And that would be at the interface between IT and units, where they can do their « responsible planning and management of resources” locally, where it is needed, resolving the issues that JP told us about, and more
>> but let’s add clarity to this org chart, and represent more exactly what everyone thinks about IT
Not really pleasant for us IT grunts, but that’s the way it is, at least in medium to big size companies
>> Let’s translate that and see what that means for our stewardship program
And that’s the message you will send if you compare your stewards to flight attendant… or rather, push them from IT on each business team and try to impose « Data Governance » solely as part of some kind of corporate rule of law
>> It’s not going to be easy from your stewards to operate in that context… The answer to that? Well stewards should be locally grown!
Only then will you have acceptance, and change will be able to happen.
Because people are resistant enough to change naturally. Don’t pile on that by forcing a new program down their throats
To be clear: successful data governance programs are local initiative, adapted to local needs and constraints, structured around a common central sets of goal (bottom up), not the opposite
>> Just a little warning before continuing on what all that will mean for people in business units « elected » to become stewards
>> Because we talked about how IT can be seen as the Evil, but we are not alone! ;)
>> let’s go back on track and talk about the data steward
>> We’ve defined his place in the organization, how he has to be sourced, but what is his role?
Good technic to define an activty: again the « start with why » from Simon Sinek.
So why are you deploying a stewardship program in your company? And I’m not solely talking about the issues we have identified, I’m talking about your global strategy such as Data Culture, fighting against obscurantism rampant in the company, turing data as an asset that will serve as a differentiating factor, equiping a lean transformation, etc…
How you are going to achieve that goal: well in our case it’s quite clear
What : the data stewards!
That’s to remind yourself that if tools are not an end in themself, just naming data stewards outside of a more global, structured effort, is as useless
The data steward is an actual manifestation of the data governance plan, more alive than a 300 pages word document
>> Enable, teach, police, ok, but if they are locally grown, will they have the required skills? What are they?
>> So what makes a good data steward
Interpersonal skills of course: between the hammer and the anvil. He will deal with irresolvable situations, where he won’t be able to satisfy least everyone, some time anyone. No ego should be involved in these matters.
Good organization : keeping tracks of every demand made, every modification made, who will be impacted by what…
If the first two are prerequisite, the 2 others can be aquired on the job, even if that means a slow start
Usually every team already know who is the best candidate. The only Business Analyst that knows SQL well – or is the king of PivotTables, and he knows how to get that data from that system but check that the flag should be ‘N’…
But what should not be forgotten is that Data Steward is an actual job, and as such should be inscribed in the job description of the individual that will assume that role, going all the way to having yearly objectives redefined to match the new responsabilities, and if part time, having a clear time repartion on each task.
>> Well you have the right people, with the right skills, and the right set of goals, but that’s all too theoretical…
>> JP, maybe you can lead us through typical tasks in the day of our data steward?
Florian doesn’t understand that picture. He’s astonish every time we arrive to this slide
But I guess most of you get the idea. That city is well know for its stewardship. I the guy on his white horse fits in the description Florian has done for a Data Steward.
So let’s see what is the daily job of a Data Steward.
Note : do you recognize that guy ? I think it is not a good Data Steward (even if it’s a famous one)
Help to find data (existing or new)
Manage the Data Lake of the company he knows the data (silos, services, dwh, big data, etc.)
Create Data Source if needed
Manage Metadata on Queries / Data sources
Facilitate exploration it’s the Enabler part of the Data Steward. He helps people, connect them together, etc.
so, he need to know what happen ! (transition)
Look for new data sources / datasets / models / Datshboard
(remember, Data Steward must know everything happening to better help further
So, how does he know what’s going on
From Users
From IT
At Scale, need to share between Data Stewards Data Steward Council/Community (like an Enterprise Social Network as Yammer).
Verifying
Check for Accuracy
Fix source and queries
Exemples de workflow de gouvernance des données:
-création de rapport, dérivation de rapports et Datasets partagés
-réutilisation de modèle powerpivot
-création de modèle PowerPivot
--> il faut un chef d'orchestre et des métriques
Certify -> sources and queries
There is 2 meaning to “certify”. Certification of a datasets or a data sources aim to indicate that an item is “the good one”. In Power BI, it is a feature. It is used to promote a query against others. It is “the good one” and Power BI help users to find it (cf. Power Query Search Ribbon).
Check for security & compliance
Solve Issue #3
Train & Teach : as Data Steward is close to users and help them about many data-related scenario, why not use the Data Steward to solve modeling or analysis issues ?
Remember, we have said that Data Steward must be Data-Aware and need to have some « analysis » skills ;-)
Create Data Source (show IT Admin Center / sources / show how to create one)
+Expose data as OData
Qualify a Data Source
+ show recently created data sources
Qualify a Query
+ modify metadata of a query
As we saw, Data Steward must work with Users. He must also work with IT people.
Relational skills are very important. He is a Facilitator, an Enabler so he need good communication skills.
With its central role, it is also a good intermediate between Business Users and IT
Issues here are
-data stewards working together
-everybody must work with IT
-having a “multi-tenant” Platform
Data Steward Council / Center of Excellence
Common strategy/philosophy (Data Culture)
>> Now that we have seen some specific, day to day tasks of the data steward, we can take a more exhaustive look at where he interacts in the data life cycle of the company
>> Well we all know where all data is born : on user’s devices first, and in the OLTP databases serving their applications second
>> That part is then copied into the central datawarehouse (or whatever system you use for centralization, cleaning and historization of your data)
Note that if you don’t have a datawarehouse, well here is the first step of your data governance plan (with an option on MDM if required)
Because NB : self service BI won’t help you if you don’t have a BI platform. If will just be self service chaos
>>
>> And from there back to the users, via the reporting solution or the ad-hoc tools (Excel on SSAS for example)
Nothing new here, we are in charted territory, but that’s only the data lifecycle of the corporate, conformed, consolidated data
>>
>> Because in reality we all know that data is coming from all over the place:
a CSV file from a colleague shared on a network drive
a table full of statistics imported from wikipedia,
A listing of today’s activities exported from the CRM to Excel…
And it can quickly become a pain:
« I can’t match the states population with sales because they don’t use the same identifier! »
« Our headcount that this regional manager gave me in this file don’t match the one displayed on the HR dashboard! »
…
>> That’s when Data Steward should step in, and manage data sources in his perimeter
>> The data steward will be tasked to answer that with his local knowledge of data and the organization
You can use this transcoding table between states name and code if you want. You don’t know how to do that? Let me show you
Yes this regional manager always count North Eath employees in his number when he shouldn’t, here is the way to recouncile everything
You don’t have access to that source? Let me forward that to the right person
>> The data steward will manage data assets, but not at the scale of the organization (that’s IT’s job), but his team
>> and that’s what should be clearly defined in his job description!
In that perimeter, he will handle the data source lifecyle management
He will have to be trained for that and equiped to do it
>> In conclusion…
Learn from the field, your field
Build you governance from your experience, a bottom-up process built around goals defined at the strategic level