3. Geode Use Cases
3
● Global Data Cache: Hotel Marketing
● Microservice Enabler : Any Software Driven Enterprise
● Transaction Processing: Rail Ticketing
● Event Processing: Credit Card Fraud Detection
● Data Aware Compute Grid: Investment Performance Reporting
● Streaming Data Capture: Vehicle Fleet Operations
4. Geode for Global Data Caching: Background
4
• Customer systems rely heavily on “property data”
• The information does not change very often and is heavily queried.
• The systems that use the data are globally dispersed either at the
actual properties or an in-region data center.
• The system of record for the property data is a DB running on a
mainframe in the HQ data center.
• All regions need all property data.
6. Geode as a Micro Service Enabler: Background
6
Situation: any modern, software-centric enterprise ...
Company recognizes the need to modernize in order to
• deliver new products and experiences faster
• generally to be more agile
However ....
There is a huge existing infrastructure that
• must continue to function
• can’t be modernized all at once
7. Micro Services Require Their Own Data Store
7
To Deliver Value Faster, You Need All of These ...
● Agile Development Practices
● Devops Approach and Platform (e.g. Cloud Foundry)
● Agile Architecture (Micro Services)
and ..
● An Agile Data Store
See https://martinfowler.com/articles/microservices.html#DecentralizedDataManagement
8. Agile Requirements
• Each service controls its own
data store
• Time to market more important
than strict governance
• NoSQL API
Why Micro Services Require Their Own Data Store
Legacy Requirements
• Many apps share the DB
• Strict Governance of Schema
• SQL API
10. Geode for Transaction Processing: Background
10
● Rail ticketing system
● Very high volume, high concurrency
● The available seats on a trip can be highly contended
● To avoid selling the same seat many times, transactions must be used
to update "seats-available"
15. Geode as a Data Aware Compute Grid: Overview
15
● Customer already does monthly portfolio performance statements but
now wants to add online capabilities.
● This is a fairly heavy computation that was traditionally handled as a
long running batch job.
● Computing rate of return requires you to know the price of each holding
in the portfolio on every day. This is potentially a lot of data!
16. The Portfolio Performance Reporting Process
Date Symbol Holdings
01/01/2018 ACME 100
06/30/2018 ACME 150
Date Symbol Holdings Value
01/01/2018 ACME 100 13,200
01/02/2018 ACME 100 13,420
01/03/2018 ACME 100 13,370
... ... ... ....
This format preferred for
storage
This format
required for
computation
Holdings History
Daily Values
18. Geode as a Data Aware Compute Grid: Blueprint
18
19. Geode for Steaming Data: Background
19
● Customer operates a fleet of vehicles that constantly produce data like
location and engine diagnostics.
● The challenge is to turn the stream of data into useful business insights
like "where is this vehicle now" ?
21. Geode: Much More than a Cache !
21
● Global Data Cache: Hotel Marketing
● Microservice Enabler : Any Software Driven Enterprise
● Transaction Processing: Rail Ticketing
● Event Processing: Credit Card Fraud Detection
● Data Aware Compute Grid: Investment Performance Reporting
● Streaming Data Capture: Vehicle Fleet Operations
This is a caching scenario but it’s a globally distributed cache.
Remarks
Data is as current as possible given limitations imposed by the network
Note that the push approach means that the customer will never wait while property data is retrieved
We also have the traditional benefit of caching. A dramatic reduction in the read traffic to the DB, which in turn saves $
We inevitably need a second data store
Of course that creates another problem (data synchronization)
Geode is certainly not acting as a cache in this situation but as a fully independent data store.
Remarks About Consistency
This solution obviously involves some eventual consistency model which brings its own challenges BUT
often the actual business model already assumes eventual consistency
the alternative, complete synchronization, can be catastrophic for performance
great paper about this tradeoff by "Helland" called "Building on Quicksand"
Other Ways to Do This
There are many ways to do this - for example using Geode write-behind
Directing reads to Geode but writes to the original SOR
At any rate, this is certainly not a cache. This is Geode as a full data store complete with queries and transactions.
Ability to spread data by a chosen attribute, such as rail line, allows solution scale by exploiting opportunities to divide and conquer.
Server side Functions mean booking transactions can last microseconds, reducing contention.
Historical Note: This sort of use cases drove the early development of GemFire. Scenarios like this are why Geode prioritizes consistency over other qualities. For example it always does synchronous replication so that updates will not be lost if there is a failure.
The inputs to the model, called “features” will change as a result of the a card swipe.
Examples of features might include “location of last card swipe”, “spend in last hour/day/week”, “current account balance”, “merchant period to date charges”.
Partition by CC #
GemFire Function contains the code.
Regions contain the data
Remarks
Again, we see partitioned regions playing a vital role in parallelizing the problem and enabling scale.
Note that the data needed for generating the features and evaluating the model are all in the same process.
Alternative Approach
- If the feature can be updates asynchronously then AEQ can be used
Similar Systems
- Event Processing == High Speed Decision Making
A similar pattern is also used in online ad. serving where the "event" is actually serving a web page and processing the event means selecting the best ad to embed in the page.
This is an in-memory , data-aware, map-reduce !
Partition by Symbol - the holding are sliced by symbol and "scattered" to the nodes where the price data resides
Each node computes the daily values for its portion of the symbols
The daily values are gathered into the Rate of Return Service where the final calculation occurs (that’s the reduce part)
Remarks
This showed a more advanced problem where there was no way to completely parallelize either by symbol or by customer.
In contrast with a pure compute grid approach or even an hdfs based approach,
we are not touching disk to read in prices - they just sit there in memory
we are also not moving prices over the network
So that is how Geode can be used to do data-aware map reduce tasks
Data is Partitioned by Vehicle
The actual telemetry data is not kept in the grid for very long
AEQs are used to process the stream and compute useful summaries and projections
Those in turn drive dashboards.
Remarks
yet again, intelligent partitioning allows the work to be parallelized
The use of the AEQ in a very high volume streaming ingest system is essential because you don't want anything to slow down the basic "put" operations
This system illustrates a common principle. Use Geode to hold the "right now" view of the business and use a different technology for analytic and BI tasks that require history.
There we have Geode for streaming data ingest
I hope this has been informative and that you've learned something new about how Geode is used.
Mostly I hope you now see that Geode is much more than a cache.
Thank You!