Learn best practices to make your organization a center of BI excellence! I’ll walk you through lessons learned during our implementation of an enterprise end-to-end BI solution, which is discussed in the Records Management Firm Saves $1 Million, Gains Faster Data Access with Microsoft BI case study published by Microsoft. Working experience with the dimensional modeling and the Microsoft BI stack is assumed..
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Best Practices for Implementing Enterprise BI Solution
1. Best Practices for Implementing
Enterprise BI Solution
Teo Lachev, Prologika
teo.lachev@prologika.com
2. Why BI projects fail
• 70-80% corporate BI projects fail (Gartner http://bit.ly/YRi028)
• Top reasons
Poor communication between IT and Business
Failure to ask the right questions
Other reasons from my experience
Business doesn’t know about BI
Inexperience and lack of technical knowledge
“When all you have is a hammer…”
Data inaccuracy
Performance degradation with large datasets
3. Agenda
• Share best practices and lessons learned
BI architecture
Data warehouse design
ETL
Semantic layer
Presentation layer
• Assumptions
Experience with Microsoft BI and database design
• Microsoft case study
Records Management Firm Saves $1 Million
http://bit.ly/15exUpM
Most performance practices around biggish data
5. About me
• Consultant, author, and mentor with focus on Microsoft BI
• Owner of Prologika – BI consulting and training
company based in Atlanta (www.prologika.com)
• Microsoft SQL Server MVP for 10 years
• Leader of Atlanta BI group (atlantabi.sqlpass.org)
6. Used phased approach
• Identify critical success factors
• Break project into phases
• Phase 1
• Most important
• Scope it relatively small
• Sets foundation
• Business process to model
• First conformant dimensions
• A few fact tables
7. Use classic BI solution architecture
Transactional reporting
Dimension
Tables
Fact
Tables
ETL
Integration Services
Multidimensional
OR
Historical &
trend reporting
Tabular
Data Sources
Data is extracted from
data sources,
transformed, and
loaded into DW
Data Warehouse
Data is stored in
dimensional schema
consisting of dimension
and fact tables
Semantic Layer
Great performance
Business calculations
Single version of truth
Client support
Security
Isolation
Ad-hoc reports
Operational reports
Dashboards
Third party tools
Presentation Layer
Standard reporting
Ad-hoc reporting
Dashboards
10. Check your environment
• I/O
BACKUP DATABASE [ContosoRetailDW] TO DISK='NUL';
Or use tools such as IOMeter or CrystalMark
I/O should be above 500 MB/sec
• Network speed
select * from <some fact table>
(consider discarding query results)
Num rows/sec = row count/execution time (sec)
Aim for > 100K rows/sec
• Virtualization
Disk pass-through enabled
Dedicated resources
12. Star schema is your best friend
• Your dimensional model is foundation
• Design it with end user in mind
• Avoid normalization
• Avoid summarized tables
• Use smartkey (YYYYMMDD) or
[date] keys for Date tables
• Use referential integrity
Teo’s insight: The fact that Tabular supports more
flexible relationships doesn’t mean that star
schema is obsolete - just the opposite.
13. Optimize physical storage
• Set database recovery to Simple
• Index considerations
Cluster key on DateKey column in fact tables
Other indexes as needed
File groups
File group per each large table
Files on different drives
Avoid using Primary file group
14. Use partitioning
• Partition large tables (above 50 Gb)
Partition switching
Better manageability
Partition elimination when querying data
Good read: “Partitioned Table and Index Strategies Using SQL
Server 2008” whitepaper by Ron Talmage
15. Use compression
• Consider page compression above 1 TB
• 50-80% saving in disk space
• To estimate storage savings:
Use SSMS Data Compression Wizard
sp_estimate_data_compression_savings stored procedure
EXEC sp_estimate_data_compression_savings 'dbo', 'FactResellerSales', 1, NULL, 'PAGE'
Good read: “Data Compression: Strategy, Capacity Planning and
Best Practices” whitepaper by Sanjay Mishra
17. Consider merge design pattern
• More efficient than SSIS transforms
• More flexible than SSIS lookups
• Easier to maintain
stored procedure with T-SQL
merge statement
LOB
Staging
Database
Files
select a,b
from st1 inner join
st2 where...
incremental
extraction
Data Sources
Staging Database
work table
dimension or
fact table
Data Warehouse
18. Consider Operational Data Store
• ODS advantages
• Offloads transactional data
• Maintains data history
• Smarter “staging” database
Start_Date
End_Date
Store
Product
1/1/2010
5/1/2010
Atlanta
Mountain Bike 1
5/2/2010
3/8/2012
Atlanta
Mountain Bike 2
3/9/2012
12/31/9999
Norcross
Mountain Bike 2
…
19. Index considerations
• Eliminate read locks
• Indexes: ALLOW_PAGE_LOCKS = OFF and ALLOW_ROW_LOCKS = OFF
• View hints WITH (NOLOCK) or
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
• Drop non-clustered indexes and constraints
With massive updates (10% or more)
Enables non-logged load
Consider COLUMNSTORE indexes when queries
aggregate data
20. Take advantage of partitioning
• Consider partition switching
Fast incremental load
Parallel partition load
Faster updates
• Use Manage Partition Wizard to generate
Switch in/out scripts
Staging table
Sliding window
For parallel partition load, change the table lock escalation
ALTER TABLE … SET ( LOCK_ESCALATION = AUTO)
To find the table lock escalation:
SELECT lock_escalation_desc FROM sys.tables WHERE name = ‘<table name>'
24. Choose semantic layer wisely
• Decision checkpoints
• Data volumes
• Complexity
• Scenarios for considering Multidimensional
Data warehousing
Large data volumes
Complex models
• Scenarios for considering Tabular
Promoting PowerPivot models to organizational models
Rapid development for simple models
Transactional reporting? (be careful)
25. Optimize Multidimensional
• Don’t be afraid of biggish data
• Avoid complex scope assignments
• Centralize business logic
• Consider fast storage
• Consider single cube
26. Tabular Considerations
• Improve your design experience http://bit.ly/106iKjt
• Small dataset during dev
• Disable automatic calculation
• Remove unnecessary columns
• Be careful about transactional reporting
• No cross-fact table support
• Performance degradation with
big data - http://bit.ly/136h60U
Dim Date
Fact Orders
Fact Receipts
27. Partition when makes sense
• Partition large measure groups (above 100 million)
Mostly management technique
Useful for incremental processing
Partition slice: ~50 million
• Automate with partition generator
http://bit.ly/partitiongenerator
• Use SQL views to wrap tables
28. When to use self-service BI?
• Know your end users
Power users
Financial analysts
• When self-service BI make sense?
Waiting for organizational BI to happen
Ideate and promote lateral thinking
Consider 80/20 rule
80% organizational BI
20% self-service BI
30. Dashboards
“A dashboard is a visual display of the most important information needed to achieve one or more objectives;
consolidated and arranged on a single screen so the information can be monitored at a glance.”
Stephen Few, “Information Dashboard Design” book
From “Information Dashboard Design” book
34. Consider your dashboard options
Technology
Pros
Cons
PerformancePoint
Designed for scorecards and KPIs
Supporting views
(reports, Excel spreadsheets, PP reports)
Decomposition tree
Customizable
BI pro-oriented
No “wow” effect
Power View
Highly interactive
Easy to implement
End user-oriented
No extensibility
No mobile support yet (but promised)
Currently requires Silverlight
(MS working on HTML5)
Excel Services
Use Excel pivot reports
Easy to implement
Reports updatable in SP 2013
Reports not updatable in SP 2010
No “wow” effect
Reporting Services reports
Highly customizable
Rich visualizations
Require report experience
Reports not updatable
Drillthrough requires new reports
35. Summary
• I shared proven practices and tips from past experience
• Keep things simple but have sound design
• How to contact me:
•
•
•
•
Email: teo.lachev@prologika.com
Web: www.prologika.com
Blog: http://prologika.com/cs/blogs/
Newsletter: http://prologika.com/Newsroom/News.aspx