Video: https://youtu.be/gBI5vm5d25o
How working with big in the cloud makes cost considerations a primary concern (quality attribute) that you need to take care of
20. What do users actually need?“You can’t always get
what you want,
But if you try,
sometimes, you might find
you get what you need”
The rolling stones
Head of devops found it amusing
* So Caveat emptor – buyers beware
I met Alon Fliess CTO CodeValue who had this neat idea
Pay as you go
Unfortunetly for cloudoscope the main (by a long shot) cost factor is VM up time - so we abandoned it
But some of the concepts and ideas we had when we’ve built that stuck in my mine
There are many concerns “quality attributes” – the solution cost is one of them
we will see is how big data changes this and makes cost orientation something to “worry” about
I’ll present how cost orientation in the cloud is more complex
I will demonstrate that using several examples from the day to day life
First lets see how does it work with ”traditional”, on-premise
~7500 MB/Sec on performance POC
Actual workloads are more taxing
There’s a difference between the syntetic loads in POC to the ones in a real system
But you get more experienced with the technology and you can squeeze it for more
So that’s big data and cost on-premist…
Let’s meet AppsFlyer
The system handles very high loads
Generating even more messages that it gets
Data sinks in in two areas
5+ Billion sessions/day (5 out of 7 B events)
30+ million installs a day
The ball grows bigger…
The bill gets bigger as you get more traffic – maybe it is time to reconsider
Redshift challenges
concurrency in the face of more reports
1 week activity = 3 months retention (@ same granularity level)
Cost oriented partitioning in Google BigQuerymove from default by month by event
To by account by day
Need to understand what our customers actually need – the technical solution might be way simpler
Offloading from on-demand to scheduled reports
Get all clients data together (one pass over data) – then break it
Store for long term on S3 (clients can take data when they need it)
Auto expire (don’t let it grow too much)
Cost orientation not equal cheap – just (hopefully) cheaper
Small cost * big number = big bucks
Understand the cost structure of the services you use
Fixed ?
Relative?
Understand what users are actually looking for
Evolve the solution