Informix Update - Denna presentation hölls på IBM Data Server Day den 22 maj i Stockholm av Simon David, Technical Product Manager, Competitive Technologies & Enablement, Informix Development
2. Please Note:
IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal
without notice at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general product direction
and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment, promise, or
legal obligation to deliver any material, code or functionality. Information about potential future
products may not be incorporated into any contract. The development, release, and timing of any
future features or functionality described for our products remains at our sole discretion.
Performance is based on measurements and projections using standard
IBM benchmarks in a controlled environment. The actual throughput or
performance that any user will experience will vary depending upon many
factors, including considerations such as the amount of multiprogramming
in the user's job stream, the I/O configuration, the storage configuration,
and the workload processed. Therefore, no assurance can be given that an
individual user will achieve results similar to those stated here.
2012
4. Why Smart Meters Need Informix TimeSeries
What challenges are being faced in the Energy & Utilities Sector today?
What is a Smart Meter and how can it help?
How does Informix TimeSeries fit it?
Case studies
–1M Oncor PoC
–35M Internal benchmark
–100M AMT Sybex benchmark
4 22 May 2012 2012
5. Consumers need Smart Meters
Samuel Palmisano, chief executive officer of International Business Machines Corp., said
Improving the U.S. electric-transmission grid depends on providing better information to
consumers. Companies shouldn’t wait for government to set standards for data and
technologies to create a "smart grid," which lets consumers monitor their energy use and take
conservation steps that can save energy and money, [ September 21, 2010 at the Gridwise
Global Forum in Washington]
5 22 May 2012 2012
6. Energy Usage Issues
Emission reduction goals:
– EU 20% emissions reduction by 2020 as compared to 1990.
– UK is 60% reduction by 2050.
Long lead times for new, “clean” energy supply.
Lasting legacy of energy inefficiency:
– 80% of refrigerators bought in 2007 will be in use in 2020.
– Less than 1/3 of industrial infrastructure will be replaced by 2020.
– Over 20%of cars bought in 2007 will still be on the road in 2020.
Household efficiency a priority:
– 25-30% of carbon emissions are from regular households.
– 80% of home energy usage is heating.
– EC projects 27% savings through efficiency in buildings.
6 22 May 2012 2012
7. Why Smart Meters
Access to near-real-time electricity usage information.
Better control and management of electricity usage.
Enable retail electric providers to develop and offer new, innovative plans
that will lower consumer bills.
Help make smarter decisions and change behaviours to help reduce
consumption, or modify usage patterns.
Smart meter often refers to an electrical meter, but it can also mean a device
measuring natural gas or water consumption.
7 22 May 2012 2012
8. Who is Using Smart Meters
Utility Companies:
– In the U.S. – stimulus money used for smart meters.
– Main drive is not reducing billing costs.
– Better analysis of usage patterns.
– Can different tariffs change energy consumption?
.
Consumers:
– Looking to reduce energy costs.
– Wanting to improve their green credentials.
Governments:
– Need to show improvements in emissions.
– Want to reduce energy consumption/reliance.
8 22 May 2012 2012
9. Smart Meters Solves Real Problems
Real time information on Energy Usage.
Gain control over personal energy usage
– Modify electrical consumption:
• California study – reductions 5.7% to 8.7%.
• Norwegian study - reductions of 9%.
• UK study reduction of 12%.
• Oncor Texas, reductions of 5%-10%.
Power companies:
– Develop new innovative rate plans.
– Avoid building new plants.
– Avoid buyer power from other sources.
– Meet Green standards.
– Reliable power restored quicker after outages.
9 22 May 2012 2012
10. Data issues with Smart Meters
Data Issues - Terabytes of new Data:
– Ability to bring on new meters.
– Stores data for new regulatory reasons.
– Analyse usage.
– Automatically Read Meters.
New Data, New Applications:
– Billing
– Portal
– Compliance
– New Analytics
– Combine Meter and Weather data
12 22 May 2012 2012
12. What is Time Series Data?
Time series data is:
– A set of data where each item is time-stamped
• Think of an array where each element can be indexed by time or
by a timestamp
“Give me the Jan 1st element from time series “X”
Most useful when a range of data is normally read
“Give me the Jan 1st thru Jan 10th elements from time series “X”
Access to one time series is usually completed before moving to the
next time series.
14 2012
13. How are Time Series Used?
They access the data by time range
– Look at a range of data in the past
– Make predictions about a range in the future
Their analysis is often very proprietary
Many keep large volumes of data online
Many take in huge volumes of data each second
All these markets use relational data as well
All need to combine their relational data with time series data
15 2012
14. Key Strengths of Informix Timeseries
Performance
– Extremely fast data access
• Data layout optimized on disk
– Handles operations hard or impossible to do in standard SQL
Space Savings
– Can be over 70% space savings over standard relational layout
Toolkit approach allows users to develop their own algorithms
– Algorithms run in the database to leverage buffer pool
Conceptually closer to how users think of time series
16 2012
16. Same Table Stored as a Time Series
Meter_ID Origin 00:00 00:30 01:00 01:30 ...
1 2010-06-01 (1.3,0...15.6) (1.6,0...15.5) (1.4,0...15.5) (1.4,0...15.4)
2 2010-06-01 (0.4,0...12.3) (0.3,0...12.3) (0.2,0...12.2) (0.5,0...12.3)
3 2010-06-01 (0,3.5... 13.6) (0,4.3... 12.2)
There are only as many rows as meters Growth
Each row is very long and grows as data is inserted
Very fast access to a timeslot once the Meter_ID is selected
Very fast to read time-ordered set of values
18 2012
17. Informix TimeSeries
A “timeseries” datatype is available in Informix
– First introduced by Illustra in 1996
Additional “objects” associated with timeseries:
– “Calendar” datatype
• For defining when data can be collected
– Row types
• For defining what should be collected
– Containers
• For defining where the data should be stored
– Several Support tables:
• Calendar, tsinstancestable, tscontainertable
19 2012
18. Key Concepts: Regular Time Series
Data collected uniformly over time intervals is a “regular” time series
– For example: daily, hourly, etc...
A regular time series always has exactly one record per interval
If an interval is missing data then:
– Missing data on an existing page takes up (a little) space
– If all the intervals for a page are missing data then the page takes no space
Values in one interval typically do not carry into the next
Can be thought of as an array of data
20 2012
19. Key Concepts: Irregular Time Series
Irregular time series also use intervals of time, however:
– Unlike regular time series, irregular time series can store more than one
record into a given time interval
• For instance, multiple alerts can occur in the same second
– Missing data never takes any room on disk
Values in an irregular time series can be treated in two ways:
– Values may persist until next value arrives (stair step)
• Total usage counter
– Values are only valid at their given time point and do not “persist” (discreet)
• Power outage alert
Can also be thought of as an array of data
21 2012
20. Key Concepts: Calendar Datatype
Every Timeseries has an associated calendar
A calendar is made up of several parts:
– A name
– A pattern of intervals
– A start date
For instance, to create a calendar called “daily” that starts on
Jan 1 2010 and defines regular work days you would issue this query:
INSERT INTO Calendartable (c_name, c_calendar) VALUES
(‘weekday’, ‘start(2010-01-01 00:00:00), pattern({5 on, 2 off},
day)’);
The system catalog called “calendartable” holds all the calendars that have
been defined
22 2012
21. Key Concepts: Row Types
A Timeseries is made up of a series of timestamped rows
The granularity of the timestamp is 10 microseconds (.00001 seconds)
The SQL syntax that defines a row type is:
CREATE ROW TYPE reading (tstamp DATETIME, phase1 DECIMAL,…)
– NOTE: Timeseries requires the type of the first column (the type of tstamp) to be
“datetime year to fraction(5)”
Data in the row can be missing (NULL)
– Missing data takes no space in a time series
Rows can be marked as hidden
– Useful for holidays and other times where data is not available
23 2012
22. Key Concepts: Containers
A “container” is the name given to the data structure that hold data for one or more time
series.
It guarantees that time series data is stored clustered and sorted on disk
A container is explicetly created using this SQL syntax:
EXECUTE PROCEDURE TsContainerCreate(‘cont_name’, ‘dbspace_name’,
‘rowtype_name’, first_extent, next_extent);
– rowtype_name is the name of an existing row type
– DBSPACE_NAME is the name of an existing dbspace (predefined area of disk)
– FIRST_EXTENT is the size of the first extent of storage
– NEXT_EXTENT is the size of the subsequent extents of storage
TimeSeries 5.00 has an automatic container allocation mechanism
– With no container definition the dbspace of the table is used
– Otherwise user defined pools can be used
– Policy can be Round Robin or user defined
24 2012
23. Putting it all Together
Create a calendar for 30 minute intervals;
INSERT INTO Calendartable (c_name, c_calendar) VALUES
(‘interval’, ‘start(2010-01-01 00:00:00), pattern({1 on, 29 off}, minute)’);
Create a row type:
CREATE ROW TYPE reading (tstamp datetime year to fraction(5),
phase1 DECIMAL, phase2 DECIMAL, phase3 DECIMAL, temp DECIMAL);
Create a container:
EXECUTE PROCEDURE TsContainerCreate (‘int_cont1’, ‘tsdbs’, ‘reading’, 1024, 1024);
Create a table and insert a “blank” row for 1 meter:
CREATE TABLE meters (Meter_ID char(64), Actual timeseries(reading));
INSERT INTO meters VALUES (“9908898”,
“origin (2010-01-01 00:00:00), calendar(interval), container(int_cont1), regular”);
25 2012
24. Relational Storage – Traditional Index Method
Data pages have mixed Meter_IDs
Multiple page access required Meter_ID Start End
Key stored in both index and data root 2010
Meter_ID Start End Meter_ID Start End Meter_ID Start End
MX001 00:00 23:30 MX002 00:00 23:30 MX1209980 00:00 23:30
Index Page Data Page
Meter_ID TStamp Pointer Meter_ID TStamp usage
MX001 2010-06-01 00:00 MX001 2010-06-01 00:00 1.6
MX001 2010-06-01 00:30 MX001 2010-06-01 00:30 1.8
MX001 2010-06-01 01:00 MX002 2010-06-01 12:30 3.6
MX001 2010-06-01 01:30 MX003 2010-06-01 06:00 8.2
MX001 2010-06-01 02:00 MX001 2010-06-01 01:00 4.7
26 2012
25. Relational Storage – “High Performance” Index Method
Only index page access required
But All data is stored in both index Meter_ID Start End
and data pages root 2010
Meter_ID Start End Meter_ID Start End Meter_ID Start End
MX001 00:00 23:30 MX002 00:00 23:30 MX1209980 00:00 23:30
Index Page Data Page
Meter_I TStamp Usage Pointer Meter_ID TStamp Usage
D
MX001 2010-06-01 00:00 1.6
MX001 2010-06-01 00:00 1.6
MX001 2010-06-01 00:30 1.8
MX001 2010-06-01 00:30 1.8
MX002 2010-06-01 12:30 3.6
MX001 2010-06-01 01:00 4.7
MX003 2010-06-01 06:00 8.2
MX001 2010-06-01 01:30 2.5
MX001 2010-06-01 01:00 4.7
MX001 2010-06-01 02:00 2.1
27 2012
26. An Informix Table Containing a Timeseries Column
The timeseries in the table is a physical reference to a mini-btree in a container
Meter_ID Timeseries(reading) Container “A”
MX001 [container_A, 1]
MX002 [container_B, 2]
MX003 [container_A, 3]
MX004 [container_C, 4] Container “B”
MX234 [container_C, 5]
MX239 [container_B, 6]
MX675 [container_C, 7]
Container “C”
MX521 [container_C, 8]
28 2012
27. Timeseries Container Layout
The btree index key is the
time series id plus either:
• An integer for regular
time series
• A timestamp for irregular
time series
Each low-level page
holds sorted data for 4 5 7 8 12 16
exactly one time series
Index Twig Pages:
29 2012
28. Irregular Timeseries Storage Compared to Relational
Data values only stored once
No data pointers or pages
Multiple, smaller btrees TS_ID Start End
root 2010-01-01
TS_ID Start End TS_ID Start End TS_ID Start End
1 00:00 23:30 2 00:00 23:30 1000 00:00 23:30
Timeseries Page (irregular) Data Page
Meter_ID TStamp Usage Pointer Meter_ID TStamp
MX001 2010-06-01 01:03 1.6 MX001 2010-06-01 01:03 1.6
MX001 2010-06-01 01:45 1.8 MX001 2010-06-01 01:45 1.8
MX001 2010-06-01 02:06 1.9 MX002 2010-06-01 02:06 3.6
MX001 2010-06-01 02:08 2.1 MX003 2010-06-01 02:08 8.2
MX001 2010-06-01 02:25 1.8 MX001 2010-06-01 02:25 1.9
30 2012
29. Regular Timeseries Storage Compared to Relational
Data is only stored once
No timestamps or data pages
Multiple, smaller btrees TS_ID Start End
root 2010-01-01
TS_ID Start End TS_ID Start End TS_ID Start End
1 00:00 23:30 2 00:00 23:30 1000 00:00 23:30
Timeseries Page (regular) Data Page
Meter_ID TStamp Usage Pointer Meter_ID TStamp
MX001 2010-06-01 00:00 1.6 MX001 2010-06-01 00:00 1.6
MX001 2010-06-01 00:30 1.8 MX001 2010-06-01 00:30 1.8
MX001 2010-06-01 01:00 1.9 MX002 2010-06-01 12:30 3.6
MX001 2010-06-01 01:30 2.1 MX003 2010-06-01 06:00 8.2
MX001 2010-06-01 02:00 1.8 MX001 2010-06-01 01:00 1.9
31 2012
30. Informix Timeseries Space Saving
There is a small overhead for the b-tree pages
– Meter_ID and Timestamp stored
– Also pointer to Timeseries page
Irregular Timeseries must store Timestamp for each element
– 8 Bytes Extra overhead per element
Regular Timeseries uses known offsets
– No Timestamp stored
– Even more efficient
NULL data is compressed
– NULL elements (missing regular elements) take zero space
– Sparse arrays are not stored at all if no elements in time range
– Unlike relational storage NULL values take NO SPACE
– A row type of (DECIMAL(12), INTEGER, INTEGER) is 7 + 4 + 4 = 15 bytes
– Storing (NULL, 1, NULL) would only require 4 bytes
32 2012
31. Worked Example – Relational Method
Number of meters: 3,000,000
Interval: 15 minutes (96 readings per day)
Meter ID length: 8 bytes
Timestamp length: 12 bytes
Data length: 8 + 6 bytes + 2 bytes slot overhead
Data space: 3000000 * 96 * ( 8 + 12 + 8 + 6 + 2 ) = 10GB
Index space: 3000000 * 96 * ( 8 + 12 + 8 + 2 )
+ 10% b+tree overhead = 9GB
Total storage: = 19GB
19GB per day
19GB per day
33 2012
32. Worked Example – Informix Timeseries
Number of meters: 3,000,000
Interval: 15 minutes (96 readings per day)
Meter ID length: 64 bytes
Timestamp length: 12 bytes
Timeseries metadata: 86 bytes
Data length: 8 + 6 bytes + 2 bytes slot overhead
Fixed data space: 3000000 * ( 64 + 86 ) = 429MB
Timeseries overhead: 3000000 * ( 12 + 4 + 2 ) + 10% = 66MB
Variable data space: 3000000 * 96 * ( 8 + 6 + 2 ) = 4.4GB
That is aahuge saving of 76%
That is huge saving of 76%
34 2012
33. Timeseries Simplicity – Example
• Much simpler SQL – Apply a tariff
Relational:
SELECT meter_id, sum (value * 1.76)
FROM meters where (tstamp BETWEEN '2010-06-02 00:00' AND '2010-06-02 06:59')
OR (tstamp between '2010-06-02 21:00' AND '2010-06-02 23:59')
GROUP BY 1, 2;
Timeseries:
SELECT meter_id, apply_tariff (readings, tariff,
'2010-06-02 00:00', '2010-06-02 23:59')::Timeseries(applied_cost)
FROM meters;
But what if there is a missing value in the interval data?
What if you want to reference data outside the query range?
36 2012
34. Building Applications with the TimeSeries Datablade
Standard client access to server
– ESQL/C
– ODBC, JDBC, .NET
– Perl DBD::Informix, PHP, Ruby
Several Timeseries specific interfaces are available:
– SQL
– VTI
– SPL
– Java (client & server)
– C-API (client & server)
It’s a toolkit approach!
– Allow people to build their analytics in the server
37 2012
35. Informix Timeseries SQL Interface
Timeseries data is usually accessed through user defined routines (UDR’s)
from SQL
Over 80 predefined functions come with Informix Timeseries:
– Clip() - clip a range of a time series and return it
– LastElem(), FirstElem() - return the last (first) element in the time series
– Apply() – run a query across a time series
• Apply filters, project only subset of columns, apply functions to elements,
etc…
– AggregateBy() – Roll up or down values
• Change the frequency of a Timeseries from hourly to daily for instance
– SetContainerName() - move a Timeseries from one container to another
– BulkLoad() - load data into a Timeseries from a file
38 2012
36. TimeSeries SQL Examples
Get all meter data for meter 3 for the last day
SELECT Clip(reading, CURRENT – 1 units day, CURRENT)
FROM meters WHERE Meter_ID = ‘3’;
Get the last meter record for meter 3
SELECT GetLast (reading)
FROM meters WHERE Meter_ID = ‘3’;
Find the maximum usage by week for meter 3 over the last 30 days
SELECT AggregateBy (‘max($usage)’, ‘weeklycal’, reading,
CURRENT – 30 units day, CURRENT)
FROM meters WHERE Meter_ID = ‘3’;
39 2012
37. Informix Timeseries VTI Interface
Makes time series data look like standard relational data
– useful for programs that can’t our proprietary Timeseries format
– There is a small penalty for using VTI
Restrictions
– No secondary indices are allowed
– No triggers allowed
SQL to create a VTI table:
– If you have a table called “meters” with a time series column the
following query will create an equivalent VTI table:
EXECUTE PROCEDURE tscreatevirtualtab(‘readings’, ‘meters’);
40 2012
38. VTI Interface: Continued
Meters – The Timeseries data
Meter_ID Origin 00:00 01:00 02:00 03:00 ...
MX001 2010-06-01 1.3 1.6 1.4 1.5
MX002 2010-06-01 0.4 0.3 0.2 0.5
MX003 2010-06-01 3.5 4.3
Readings – A virtual view of the Timeseries data
Meter_ID TStamp usage
MX001 2010-06-01 00:00 1.3
The VTI view is equivalent to
MX001 2010-06-01 01:00 1.6 the tall thin relational table
MX001 2010-06-01 02:00 1.4 and can be easily accessed
MX001 2010-06-01 03:00 1.5 by any SQL client
...
MX002 2010-06-01 00:00 0.4
MX002 2010-06-01 01:00 0.3
41 2012
39. Informix Timeseries 5.00 VTI Interface
TimeSeries 5.00 VTI Enhancements
– Update regular VTI using primary key only
– Use of TimeSeries expressions (read only)
SQL to create a VTI table with an expression:
EXECUTE PROCEDURE TSCreateExpressionVirtualTab(
'day_agg_readings',
'devices',
'AggregateBy("sum($kwh),avg($phase_a),avg($phase_b),avg($phase_c)",
"cal1day", readings, 0)',
'reading',
1024,
'readings');
42 2012
40. Comparison of VTI vs Native Time Series Queries
Select a range of data for a meter:
– Native:
SELECT Clip (reading, “2010-01-01”, “2010-01-10”)
FROM Meters WHERE Meter_ID = “2”;
– VTI:
SELECT * FROM readings WHERE tstamp
BETWEEN “2010-01-01” AND “2010-01-10” AND Meter_ID = ”2”;
Find the max usage for a given meter in a given period of time
- Native:
SELECT Apply (“max($usage)”, “2010-01-01”, “2010-01-10”, reading)
FROM Meters WHERE Meter_ID = “2”;
- VTI:
SELECT max(usage) FROM readings
WHERE tstamp BETWEEN “2010-01-01” AND “2010-01-10 AND Meter_ID = “2”;
Note:
– Native will normally be faster than VTI, probably in 5 to 10% range
– It is often much faster to write custom user defined functions
– VTI functions are very convenient for standard SQL clients
43 2012
41. TimeSeries C-API Interface
Client and server versions of the API
Treats a time series like a table (sort of)
– Functions to open and close a time series
– Functions to scan a time series between 2 timestamps
– Functions to create a time series
– Functions to retrieve, insert, delete, update
Plus another 70 functions defined
44 2012
43. Timeseries Data Loading
Timeseries is a specialist type and benefits from a specialist data loading mechanism
Traditionally the Real Time Loader has been used for high speed Timeseries data
insert
– Developed for stock market trade data
– Good for irregular Timeseries
– Small symbol universe – 10s of thousands of stocks
– Data arriving in timestamp order
– Small number of active stocks
– Needs to cope with very high peak loads at exchange open & close
Smart Meter Data is a new challenge
– Timeseries is regular
– Many millions of meters
– Data batched by Meter Identifier
– All meters equally active
47 2012
44. Smart Meter Data Loader
Uses similar internal mechanism as RTL to directly access containers
Builds internal map of Meter ID and Timeseries ID
Can use fragmentation of base table for better parallelism
Parallel sessions can work on separate disks to reduce contention
Load rates can be in excess of 50,000 intervals per second per core
50 2012
45. Smart Meter Data Loader – Architecture
Random Distribution Meter_ID TS ID
7898765 1
2168768 2
9879821 3
1656578 4
8787987 5
4678768 6
7354658 7
2537591 8
8973547 9
1352857 10
3451759 11
7656472 12
6543897 13
Meter Data Loaders 3324516 14 Containers Physical Disks
Hash table
52 2012
47. Oncor PoC Details
Simulation
– 90 days worth of meter data for 1 million meters
• 15 minute intervals
• One value stored per interval
– 200 locations
– 500 feeders
– 34 substations
Hardware
– Power7 with 2 sockets each with 8 cores
– 64 bit SUSE Linux 11
– 128 GB of memory
• Memory actually needed, 44GB, although could probably be less
– 6 disks dedicated to the database, 2 additional for OS and LSE staging
• Disk space actually used by the database, about 350GB (110 days)
– Additional disks for the operating system and staging area for files
Software
– Informix Ultimate Edition 11.7
– Informix Timeseries
54 2012
48. Informix Time Series Schema
The Meter table looks like this: A Meter reading looks like this:
CREATE TABLE meters ( CREATE ROW TYPE meter_data (
esi_id char(64) not null primary key, tstamp datetime year to fraction(5),
suffix char(32), value decimal (14,3)
location char(16), );
feeder char(16),
sub_station char(16),
dbspace varchar(128), An update (correction) record
container varchar(128), looks like:
actual Timeseries(meter_data),
estimated Timeseries(meter_data), CREATE ROW TYPE update_day (
valid Timeseries(update_day) tstamp datetime year to fraction(5),
) last_update datetime year to fraction(5),
);
Hierarchy is sub_station->feeder->meter.
There are also tables for location,
sub_station and feeder not shown above.
55 2012
49. Primary Use Cases
Load 90 days worth of data for 1 million meters from LSE files
– Original set of LSE files massaged to generate 1 million distinct meters
Oracle 6 hours Timeseries 18 minutes
6-day ERCOT Settlement Extract
– Show support for the ERCOT settlement processes by creating LSE file consisting of every record
(every meter) for operating day - 6 (calendar day that occurred 6 days prior to current day). Must be
able to extract and create the LSE files for 1M meters for a specific day.
Oracle 5 hours T
Timeseries <7 minutes
22-Day Update ERCOT Settlement Extract
– Show support for the ERCOT settlement processes by creating LSE files consisting of every record
that has had a consumption interval record update since the prior extract / pull (6-Day). Only extract
the last or most current update for each meter, so if a meter has been updated four times, only the
last / current record is sent. The entire 96 15 minute intervals are sent each time as well.
Oracle 8 hours
Timeseries 4 minutes (90 day 11 minutes)
Missing Record ERCOT Settlement Extract
– Show support for the ERCOT settlement processes by creating an LSE file consisting of only the
meter IDs and date that is provided in a missing meter ID file from ERCOT. The dates will be as far
back as 90 days and no sooner than 28 days back in time.
4000 random reads on one day - 6 seconds
4000 random reads many days - 24 seconds
65 2012
50. Other Use Cases
Determine the count and the list of meter ID's for all meters with missing intervals
and / or register reads on a given day
Oracle 3-4 hours Timeseries <7 minutes
Determine the 90 day history for a given meter (90 day aggregation)
Oracle > 1 second Timeseries 0.04 seconds
Determine the count and list of meter IDs that exceeded a given high interval
value for a given day or given time period (multiple days). For example, count and
list of meters that had interval value of 12 or higher for a given period of time.
Timeseries <6 minutes
Determine list of meters that have 5 consecutive or more days with estimated
values only (no actual interval reads during a 5 day or more period)
Oracle 6 hours
Timeseries 17 minutes
66 2012
52. Internal Benchmark - Requirements
35 Million meters
– 10 minute intervals with 5 values
– 5 billion intervals per day
12 Months data storage
– Over 1.8 trillion intervals
– Regular TimeSeries 30TB
– Predicted Relational 84TB
OLTP concurrent users
– All running while data is loading
Complex aggregations
– Required new TSRollUp function
68 2012
53. Internal Benchmark - Hardware
IBM P780 with AIX 7.1
Storage: IBM DS8000 - 576 HDD 146GB/15krpm
Space used
–TimeSeries intervals: 30Tb
• Split over 64 logical devices, 768 containers
–Relational Tables: 112Gb
• 1 main data dbspace, 70 fragmentation dbspaces
–System use: 148Gb
• Root, log dbspaces + 6 temp dbspaces
64 cores, primary CPU thread affinitied to 64 Virtual Processors
1Tb main memory, up to 950Gb assigned to database server
–80Gb relational data buffers
–680Gb TimeSeries data buffers
–45Gb system memory
69 2012
54. Internal Benchmark - Results
Data loading
– Single day load: 20 minutes (64 Cores used)
– Historical load of 12 months: <6 days
– Daily load during queries: 160 minutes (8 Cores used)
– Data cleansing after load: 2 minutes
Query performance
– 3,000 concurrent sessions
– Single meter queries sub-second response time
– Larger summary queries executed in <5 seconds
– No performance degradation during data load
70 2012
56. AMT Sybex Benchmark
Most ambitious Smart Meter Benchmark to date
100 Million Meters
– 30 minute intervals
– 1, 2 or 3 daily registers
Target was to confirm a 24hr operational window
– Load data
– Validate data
– Calculate estimated corrections
– Billing run for 6% of the meters
Validation Load VEE
Database Query
Single IBM Power 750 server
72 2012
57. AMT Sybex Benchmark
Hardware
– IBM Power 750 32 cores (3.5GHz) running AIX 7.1
– 1 x Gb LAN Fibre adapter (dual port, using 1 port)
– 2 x 8Gb FC adapters (dual port, using 4 port)
– 512Gb memory
– 1 x IBM XIV Storage System with 15 x 2Tb data modules
Software
– IBM Informix Dynamic Server 11.70.FC3
– IBM Informix TimeSeries 5.00.FC1
– AMT-SYBEX SmartDTS v 6.0
Database Server
– 101,000,000 x 4Kb buffers
– 16 cpu vps
– 30 x 2Gb logical logs
– 40Gb physical log
– The time series were stored over 16 logical disks
73 2012
58. AMT Sybex Benchmark – Processing time
Daily Processing Time
Showing predictability of processing
as database size increases
540 4.0
480 3.5
420
3.0
360
2.5
300
Minutes
2.0
Tb
240
1.5
180
1.0
120
60 0.5
0 0.0
Validation Loading VEE Space Used
74 2012
59. AMT Sybex Benchmark – Performance Results
Individual operations
Operation Time in hrs CPU
Validate 2:18 100%
Load 3:15 80%
VEE 2:10 100%
Total 7:43
75 2012
60. AMT Sybex Benchmark – Performance Results
Individual operations
Operation Time in hrs CPU
Validate 2:18 100%
Load 3:15 80%
VEE 2:10 100%
Total 7:43
Billing Query 4:21 5%
Overall total 12:04
76 2012
61. AMT Sybex Benchmark – Performance Results
Combined operations
The Billing Query and the load can be run concurrently
Operation Time in hrs CPU
Validate 2:18 100%
Load + Billing 4:41 85%
VEE 2:10 100%
Overall Total 9:09
This result confirmed that a 9hr processing window was sufficient for the daily processing
77 2012
62. How Does This Benchmark Compare?
Comparison of Published Benchmarks for Meter Data Management
Daily Total Total DB App DB App
Meters
Reads Cores RAM cores cores RAM RAM
Informix TimeSeries
100M 4.9B 16 500 16 (shared) 500 (shared)
The Competition * 10M 970M 456 3668 48 <180 384 1.5TB
Daily Readings (meters * registers * intervals)
Database Resources (CPU cores)
Informix TimeSeries
Informix TimeSeries 4,900,000,000 total cores 16
48
The Competition
– db cores
180
The Competition 970,000,000
The Competition –
app server cores
5 times the performance < 1/5 the resources
… with significantly simpler management using a single node system
* Based on latest published Oracle benchmark
http://www.oracle.com/us/industries/utilities/ultilities-exadata-exalogic-wp-1499854.pdf
78 22 May 2012 2012