This talk, given at Data Center Dynamics on July 12, 2013, summarizes the importance of predictive modeling to capturing lost cooling and power capacity in the data center. It also describes some results from a recent case study Future Facilities did at an Equinix data center in the Bay area.
WordPress Websites for Engineers: Elevate Your Brand
Why predictive modeling is essential for managing a modern computing facility
1. Why predictive modeling is
essential for managing a
modern computing facility
Jonathan G Koomey, Ph.D.
http://www.koomey
Research Fellow, Steyer-Taylor Center for Energy
Policy and Finance, Stanford University
Data Center Dynamics
San Francisco, CA
July 12, 2013
1
3. The business problem
• Data centers deliver computing services
that generate business value (i.e., profits)
• Decisions about IT deployment over the
facility life almost never take business
value fully into account, because of
– siloed departments and budgets
– misplaced incentives
– imperfect foresight
3
4. The data center problem
• Facilities are built using an estimate of
compute capacity that is never realized
• IT deployment decisions after construction
are almost never according to plan
• The result: lost capacity due to
fragmentation, resulting in stranded capex
and high cost per computation
4
5. Capacity fragments over time
5
The actual IT configuration will differ from the design assumptions. These differences will fragment
space, power, cooling & networking resources, and ultimately, limit data center capacity.
Source: Future Facilities
6. My focus today
• What is a model?
– Uses of models
– Making a model
• Why predictive modeling is essential for
avoiding stranded capex in data centers
• Case study: Predictive modeling for
Equinix
6
7. “An explicit model is a laboratory
for the imagination.”
–Anthony Starfield et al., How to Model It.
7
8. The Bay Model, Sausalito, CA
http://www.spn.usace.army.mil/Missions/Recreation/BayModelVisitorCenter.aspx
8
9. Everyone uses models, most badly
• Usually informal models
• Intuitive but not necessarily accurate
– Ignoring physics and interdependencies
– Ignoring effects of actions on lost capacity and
business value
• Need to be more formal!
9
10. Uses of formal models
• Organize
– thinking
– data
– assumptions
– terminology
– communication between teams
• Learn about complex systems
– Intuition usually isn’t enough!
• Test alternative choices to aid planning
10
11. Making a model
• Understand first principles
– Key drivers
– Functional relationships
• Formalize using equations or physical
structures
• Test against reality
– measure and calibrate
• Then (and only then) use model to test
alternatives!
11
12. Accurate calibration requires…
• Real-time measurements
• Comparison of model results to
measurement
• Understanding of physical reasons for
differences
• Adjustment of model parameters,
accounting for physical reality (can’t just
hard wire results!)
12
16. Key data center issues
• Constraints
– Reliability
– Power
– Cooling
– Space
– Networking
• Interdependencies between
– Constraints
– Business objectives
16
17. A complete model of a data center
should include…
• Characteristics of equipment
– Physical dimensions and location
– Operating characteristics (e.g., utilization)
– Power use/efficiency curves
– Equipment and building level air flows
• Characteristics of the physical space
– #, type, capacity, and location of vents/fans
– Obstructions (e.g., stray boxes and cabling)
– Modifications in the envelope
17
18. An accurate model also requires
• Real-time measurement (i.e., DCIM) of
– Temperature
– Air flows
– Power use
• Periodic calibration to reflect changed
conditions over time
• Performance and financial metrics to judge
progress
18
19. and all of these things need to
be tracked in real time for the
life of the facility!
19
21. Characteristics of Equinix facility
• Case study, Spring 2013
• Colocation facility in the SF Bay Area
• Floor 1, modeled white space: 8,750 sq ft
• Total facility floor space: 42,000 sq ft.
• Details on infrastructure
– 2 ft raised floor airflow delivery
– 42” false ceiling return plenum.
– 12 AHU’s N+2 redundancy
21
23. Predictive IT deployment
23
• How can Equinix
identify void
capacity for
clients?
• Void capacity can
be reclaimed!
• Simulating IT
changes prior to
installation will:
– Increase thermal
resilience
– Enable additional
cabinet power to
be utilized
Managing
IT Deployment
Projected
Configuration
From Current
Source: Future Facilities
25. Conclusions
• Data centers are complex systems, changing
constantly over time
– Like a game of Tetris
– Fragmentation leads to lost capacity
• Monitoring and measurement are not
enough!
• Much lost capacity can be reclaimed using
predictive modeling and state of the art tools,
with support of DCIM measurements
• Don’t turn knobs without knowing the likely
results!
25
26. References
• Koomey, Jonathan, Kenneth G. Brill, W. Pitt Turner, John R. Stanley, and Bruce Taylor.
2007. A simple model for determining true total cost of ownership for data centers. Santa
Fe, NM: The Uptime Institute. September. <http://www.uptimeinstitute.org/>
• Koomey, Jonathan. 2008. "Worldwide electricity used in data centers." Environmental
Research Letters. vol. 3, no. 034008. September 23. <http://stacks.iop.org/
1748-9326/3/034008>.
• Koomey, Jonathan. 2008. Turning Numbers into Knowledge: Mastering the Art of Problem
Solving. 2nd ed. Oakland, CA: Analytics Press. [http://www.analyticspress.com]
• Koomey, Jonathan. 2011. Growth in data center electricity use 2005 to 2010. Oakland, CA:
Analytics Press. August 1. <http://www.analyticspress.com/datacenters.html>
• Stanley, John, and Jonathan Koomey. 2009. The Science of Measurement: Improving Data
Center Performance with Continuous Monitoring and Measurement of Site Infrastructure.
Oakland, CA: Analytics Press. October 23. <http://www.analyticspress.com/
scienceofmeasurement.html>
• Starfield, Anthony M., Karl A. Smith, and Andrew L. Bleloch. 1990. How to Model It:
Problem Solving for the Computer Age. New York, NY: McGraw-Hill, Inc.
26