31. Rick Donohue
Product Owner; EnterpriseMonitoring
Rick.Donohue@Gettyimages.com
206.925.6526
Thank you
459248436, handout
Editor's Notes
A little about Getty Images for those of you who may not be familiar. Imagery is the universal language of our time and Getty Images is the global leader in visual communications – we distribute award-winning stills, footage, video, music and multimedia. We operate through various channels, predominately Gettyimages.com, iStockphoto.com and thinkstockphotos.com.
Getty has in house and freelance contributors around the world providing rich editorial content (that essentially means news worthy), of the top issues of our time and presenting them in a method everyone can understand.
Our Sports coverage is unparalleled. Getty is the exclusive provider of content for FIFA, the IOC and many, many more
Our contributors have access to locations around the world that other agencies and photographers literally can’t access giving us un-paralled access to the best shots
And our creative content pushes boundaries; defining and living on the edge of the visual trends of our time.
Finally, we have the most exhaustive collection of archival content anywhere in the world
Last year, we launched our embed which enables anyone to share the vast majority of our content for free for non-commercial purposes. I encourage you to head over to gettyimages.com and start using embed.
Getty Images Stream for iphone and Ipad, available in the app store
Naturally, to showcase this superb content, we have some equally amazing websites with istockphoto.com as well as our flagship brand, Gettyimages.com among others, including our consumer oriented photos.com. And as you know, behind the pretty pixels, is lots of hardware and lots, and lots and lots of code….
What does enterprise monitoring mean at Getty?
Whats working – what isn’t, scope of impact
Reliable – alerts (accurate) and uptime
Decomposition of services – visualizations of system activiy
Accessible to anyone; engineers to tech manager, business leaders
Want to do alerting differently than ‘just’ scom/Nagios, zabbix etc.
Entered into process ‘eyes wide open’ knowing that this is a very hard nut to crack
Splunk a huge leg up in meeting our goals
Went all-in on splunk
focal point of our monitoring/alerting architecture
All data in one place, even from other tools (solar winds, 24x7 etc.)
Integrates with other tools and workflows (Alerting, Incident workflow etc) SNOW
Democratization of data - anyone can visualize and consume the data
1 tool, many uses. Fewer apps to support
Vitals:
1 search head (4 in broken cluster)
9 indexers; 2 job servers
~1.4T/Day; ~350k searches/day
Highly optimized and tuned to perform on small hardware footprint – slow but works
Running 6.0.3
Recent expansion from 9 to 20 indexers
Upgrade to latest rev coming
Different from what I’ve seen in the market
Early adoption, app dev. Not much TS.
Different from many businesses
App dev built out heavily, tended to have faster resolution times thanks to splunk/correlations etc
TS leadership pointed to success, said now we must follow
~40% users w/ splunk window open at any time
1.0 w/ shotgun approach
State tracking v1
Created 1sy 2sy alerting, chasing pri1 and 2 events
Focused on hardware, not services; - ‘great, x is broken, what does it mean to the business’
Spotty coverage map
Random charts, for every group, only a few understood what was what
No holistic coverage for service stacks (think end user experience)
Time to start doing it better:
Now logically building our own schema based on our business, user experiences
Noc dash
EventDrivenArchitecture and Alerting schema
Integrating multiple other tools w/ Splunk
Site24x7
SolrWinds
Keynote
ServiceNow
CRM Data
Splunk is not a magic solution – none of the solutions are.
Hard work, most value is done in knowing your data
Hard work in deeply understanding your business and systems/integrations
Building roll up framework and logic that any alert fits into.
A skeleton of our business and all alerts and dashboards slot in to fill a need
Current state
Splunk Tech Add-Ons add much value
Limit need for, or compliment to other integration points
Adopted ES for INFOSEC
Heavily relied on for ITSM
Measuring impact of ITSM process
Change Management
Must have tool, can’t release without it
Data center move: monitoring performance from one DC to the other;
Those who didn’t want it, now really see value
Small Nagios ‘watches the watchmen’
Limited need for scom (NOC workflow, have roadmap to change that)
How is our new code performing – from different data centers, different geo’s
Reduced testing time – roll the code, see the impact
Enables us to be more aggressive, move faster, bring value to customers faster than we could otherwise
Takes the ‘hope’ out of the equation
Compare real user metrics from logs w/ Keynote
Speeds time to production, enables us to take calculated risks
Fits with agile way
Ensure new code doesn’t negatively impact performance or user experience
Immediate visibility into real-time code health/ performance stats
Tracking our performance
-incidents
-change
-monitored vs not
Knowing whats out there
“…would love to have known that chart existed during the p1”
Lessons Learned
Being the monitoring team, not the dashboard building team
Teach users to fish…but still need to fish for them at times
Query hygiene. Index=* all time real time
Tech talk
Splunk on-site trainings, even in Calgary
Splunk good partner, helped with training, huge assistance, even in Calgary
Comes natural to some, but not all
Convincing people to drop their old tools
Expand your hardware footprint as use grows; DON’T WAIT!
LOTS of tuning and optimization to ‘keep lights on’ w/o more infrastructure
Just buy the damn boxes, don’t invest huge learning curve for optimization. Follow best practice
Feeling the pain of not investing in infrastructure
Poor experience due to lack of hardware footprint
Biggest limitation is IO
Not building out infrastructure underneath will result in fragmentation, users finding new tools if performance isn’t there.
Encourage people to share searches and content they have built Learn by stealing!
Weekly ‘Splunk Tech Talk’ meeting – open forum to help new and expert users
Getty has in house and freelance contributors around the world providing rich editorial content (that essentially means news worthy), of the top issues of our time and presenting them in a method everyone can understand.