21. 21
Once you find a copy, it needs a curator
Sizing (don’t use all of 10 TB of prod to test)
But your sample must represent the entirety of
the dataset.
Representative curation is futile with most
datasets (unknown unknowns).
Sizing means you restrict your tests to what you
left in.
Sizing hides performance issues (missing index)
So maybe it’s not worth it….
22. 22
Once you find a copy, it needs a curator
Sanitize it!
Can’t have SSN’s and
CC in test
23. 23
Once you find a copy, it needs a curator
Delete!
old data smells funny.
27. 27
The sum of the mess is worth more than its parts
There’s 5475 secondary copies with
no load, can we leverage them for
testing?
Fix: Let CF manage
your data.
31. 31
How do you fill in
that hand
wavy part in the
middle?
32. 32
Putting the E in Enterprise
Buy a CDM Product
Actifio, Delphix, ViPR
Great if they support your workloads!
And you can consume the form factors they
deliver
33. 33
Based on technology to allow layered writes
Layered FS (Docker, Docker, Docker)?
Clones, Linked Clones, VM Snaps
Writeable Snapshots (FlexClone, XtremIO,
LVM Snaps)
Building is harder than buying
BYO
34. 34
cf create-service
Snap Prod VM
Spin up VM
Allocate IP
Sanitize Data in PG
cf push demo
Test
Dispose
AMI and Postgres Demo
much sleuthing and failed attempts to generate legit test data later…
Act II
ACT III
I have a customer who hasn’t refreshed test data in three years.
ACT III
I have a customer who hasn’t refreshed test data in three years.
Represent the entirety of the dataset means things like previous schemas. Rows with missing additive fields, FK’s etc. Is selecting those records going to cause issues? What about formats assumed in the data itself (but surely no one stores encoded information in their database).
Everyone knows the data well enough to know what representative is? (no)
Represent the entirety of the dataset means things like previous schemas. Rows with missing additive fields, FK’s etc. Is selecting those records going to cause issues? What about formats assumed in the data itself (but surely no one stores encoded information in their database).
Everyone knows the data well enough to know what representative is? (no)
Represent the entirety of the dataset means things like previous schemas. Rows with missing additive fields, FK’s etc. Is selecting those records going to cause issues? What about formats assumed in the data itself (but surely no one stores encoded information in their database).
Everyone knows the data well enough to know what representative is? (no)
Represent the entirety of the dataset means things like previous schemas. Rows with missing additive fields, FK’s etc. Is selecting those records going to cause issues? What about formats assumed in the data itself (but surely no one stores encoded information in their database).
Everyone knows the data well enough to know what representative is? (no)