3. Agenda
A Bunch of DB2 Tips, Techniques, Thoughts, and Ideas
3 Confidential Material of NEON Enterprise Software, Inc.
4. Begin at the Beginning
You can’t memorize it all
Better to know “where”
to find the answer than
to try to “memorize”
arcane details.
Download the manuals!
The current link is:
http://www.ibm.com/support/docview.wss?rs=64&uid=swg27011656#db2zos
4 Confidential Material of NEON Enterprise Software, Inc.
5. Performance Management
Too many DBAs rely on the YBWJ methodology
Performance management is not simple performance
monitoring
— Requires proactive planning and SLA
— If you do not have an SLA, how do you know when you
have tuned the application sufficiently?
Performance management is:
— Monitoring – find the problem
— Analysis – figure out how to correct the problem
— Then Tuning – correct the problem; optimize
5 Confidential Material of NEON Enterprise Software, Inc.
6. Database Performance Tuning
• Application
−SQL
−Host Language Code
• Database
−Indexes
−Database Design (normalization)
−TS/IX organization
• System (DB2 Subsystem)
−ZPARMs, Pools, Locks/IRLM, etc.
• Environment
−Network connectivity and configuration
−TP Monitor, Operating System
6 Confidential Material of NEON Enterprise Software, Inc.
7. Managing the EDM Pool
EDM Pools as of V8
— DB2 Database Services Address Space (DBM1)
— What is in the EDM Pools?
DBDs
SKCTs
CTs
SKPTs
PTs
Auth Cache
Dyn SQL Prep
Free pages
7 Confidential Material of NEON Enterprise Software, Inc.
8. First Time Plan Execution
HDR SEC DIR
2 9 1
SEC HDR
4 3
Next Time: SKCT copied to CT
SEC SEC
6 5
SEC SEC
8 7
CT SKCT Disk
8 Confidential Material of NEON Enterprise Software, Inc.
9. The EDM Pool and Versions 8 and 9
V8: EDM Pool split into three specific pools:
— EDMPOOL: EDM Pool below 2GB Bar stores only CTs,
PTs, SKCTs, SKPTs
Should be able to reduce the size of this pool
Provide some VSCR for below the 2GB Bar storage
— EDM Pool above the 2GB Bar
EDMDBDC: DBDs
EDMSTMTC: Cached Dynamic Statements
V9: EDM Pool now five separate pools
— Above the 2GB Bar: EDM_SKELETON_POOL
All SKCTs and SKPTs
— A portion of the EDMPOOL (for CTs and PTs) is moved
above the bar, too
9 Confidential Material of NEON Enterprise Software, Inc.
10. General EDM Pool Tuning ROTs
If EDM Pool is too small:
Fewer threads can run concurrently
Increased response time due to loading of SKCT, SKPT, and DBD
Increased I/O to SCT02, SPT01 and DBD01
Repreparation on Caching Dynamic SQL
Watch out for DBD size:
EDM Pool Size > 5x maximum DBD size
Over allocation is probably better than under allocation
If the pool is too large you are wasting memory, though
General Rule of Thumb:
Shoot for 80% or better DBD/CT/PT read efficiency
10 Confidential Material of NEON Enterprise Software, Inc.
11. Some Application/SQL Tips
Simpler may be easier to understand, but
complex SQL is usually more efficient
In general, let SQL do the work, not the program
— Sign of trouble: IF or CASE logic after a cursor
Retrieve the minimum # of rows required (WHERE)
Retrieve only cols required; never more (SELECT *)
Always provide join predicates (no Cartesian products)
Get the data type and lengths correct!
— Host variable type/length = column type/length
— Even after V8, although it no longer cause Stage 2
Favor Stage 1 predicates
11 Confidential Material of NEON Enterprise Software, Inc.
Favor Indexable predicates
12. Do Not Ask for What You Already Know
12 Confidential Material of NEON Enterprise Software, Inc.
13. SQL Request
Application Tuning: Stage 1 and 2
Result
STAGE 2 - Evaluated after
data retrieval (non-sargable)
via the RDS (Relational Data
Services) which is more
RELATIONAL DATA SERVICES expensive than the Data
Manager.
DATA MANAGER STAGE 1 - Evaluated at the
time the data rows are
retrieved (sargable). There is
a performance advantage to
using Stage 1 predicates
because fewer rows are
passed to Stage 2 via the
Data Manager
BUFFER MANAGER
I/O
13 Confidential Material of NEON Enterprise Software, Inc.
14. Wrangling Unruly Predicates
Try to change Stage 2 predicates to Stage 1 and
non-indexable predicates to indexable predicates.
For example:
SELECT COLA, COLB, COL6
FROM T1
WHERE COL1 NOT BETWEEN ‘A’ AND ‘G’;
SELECT COLA, COLB, COL6
FROM T1
WHERE COL1 >= ‘H’; -or- COL1 BETWEEN ‘H’ AND ‘Z’;
14 Confidential Material of NEON Enterprise Software, Inc.
15. More Basic SQL Guidelines
Avoid sorting when possible:
indexes for ORDER BY and GROUP BY
judicious use of DISTINCT
UNION ALL versus UNION (if possible)
And…
Avoid
Black Boxes
15 Confidential Material of NEON Enterprise Software, Inc.
16. Limited FETCH
Can be used on cursors and singleton SELECT
Existence checking pre-V7:
SELECT 1 FROM SYSIBM.SYSDUMMY1
WHERE EXISTS
(SELECT 1
FROM TABLE
WHERE COL = :HV);
Existence checking V7+: Not the same as:
SELECT 1 FROM TABLE OPTIMIZE FOR n ROWS
WHERE COL = :HV
FETCH FIRST 1 ROW ONLY;
16 Confidential Material of NEON Enterprise Software, Inc.
17. Application Tuning: Commit
Avoid Bachelor Programming Syndrome
Plan and implement a COMMIT strategy
for every batch program
or experience TIMEOUTs and DEADLOCKs
17 Confidential Material of NEON Enterprise Software, Inc.
18. General Binding Tips
Always BIND production plans with EXPLAIN YES
PLAN_TABLE data allows you to evaluate the access paths
that changed due to the BIND
Tools (like Optimization Service Center and NEON’s Bind
ImpactExpert) can simplify this process
Consider using OPT_HINT to use previous access paths
When access paths get worse after a BIND, you can use
OPT_HINT to use the more efficient previous access path
instead
New Plan Stability feature can help you react to
problems…
18 Confidential Material of NEON Enterprise Software, Inc.
19. PLANMGMT BIND Options
PLANMGMT(OFF) - No change to existing behavior. A
Previous and package continues to have one active copy.
active copies of
package.
PLANMGMT(BASIC) - A package has one active copy. One
additional prior copy (PREVIOUS) is preserved.
PLANMGMT(EXTENDED) - A package has one active copy,
Original, and two additional prior copies (PREVIOUS and ORIGINAL)
previous and
active copies of are preserved.
package.
19 Confidential Material of NEON Enterprise Software, Inc.
20. The Ol’ Switcheroo
SWITCH (PREVIOUS) - changes the current and previous
packages:
The existing current package takes the place of the previous package.
The existing previous package takes the place of the current package.
Only if you bound
using PLANMGMT
EXTENDED (refer
to previous slide). SWITCH (ORIGINAL) - clones the original copy to take the
place of the current copy:
The existing current copy replaces the previous copy.
The existing previous copy is discarded.
20 Confidential Material of NEON Enterprise Software, Inc.
21. Do You Have a REBIND Strategy?
s BIND and REBIND are critical for application performance
s It is a wise course of action to plan your REBIND strategy
s There are several common approaches:
Daily maintenance: REBIND after RUNSTATS
— Perhaps not every day, but REBIND are done after RUNSTATS
Global REBIND after migration to new DB2 version
Global REBIND after installing new PTFs
— Above two mean access paths only change
when DB2 changes
REBIND after x days / weeks / months …
Let it Ride! (“If it ain’t broke, don’t fix it.”)
21 Confidential Material of NEON Enterprise Software, Inc.
22. Let It Ride?
Programs once bound, are (almost) never rebound.
Reason:
— Fear of access path degradation
Result:
— No improvement to access paths
— No CPU savings from new DB2 efficiencies
— Sub-optimal performance
— Every DB2 program potentially suffers for fear
that one or two SQL statements will become
inefficient
22 Confidential Material of NEON Enterprise Software, Inc.
23. Regular REBINDs or The Three R’s
A Better Approach: Regular, systematic REBINDing
— Sometimes referred to as the Three R’s
REORG, RUNSTATS, REBIND
— Reason:
Access paths will be up-to-date based on the
current state of the data.
Result:
— Generally, improved access paths
— CPU savings from new DB2 efficiencies
— Optimal performance
Of course, you can still get those “problem”
access paths.
23 Confidential Material of NEON Enterprise Software, Inc.
24. Problems With the Three R’s
They pose a lot of questions…
— When should you REORGanize?
To properly determine requires RUNSTATS (or RTS).
So should it be RUNSTATS, REORG, RUNSTATS, REBIND?
— When should you run RUNSTATS?
To properly determine you need to know the make-up,
usage, and volatility of your data.
— When should you REBIND?
When statistics have changed significantly enough to change
access paths.
24 Confidential Material of NEON Enterprise Software, Inc.
25. New & Improved: The Five R’s
s RTS (or RUNSTATS if you must)
s REORG
s RUNSTATS
s REBIND
s Review
25 Confidential Material of NEON Enterprise Software, Inc.
26. Take Time to Understand NULLs
Every null column requires a one
byte null indicator
Nulls do NOT save space – ever!
Nulls are not variable columns but…
Can be used with variable columns.
Do NOT bury your head in the
sand & ignore nulls
You can code a query against a SELECT SUM(SALARY)
database w/o any nulls and FROM EMP
receive null as the answer WHERE DEPTNO > 999;
26 Confidential Material of NEON Enterprise Software, Inc.
27. RI: System or User-Managed?
Standard declarative implementation.
Less coding required.
Easier to modify later. (DDL and CHECK)
More efficient.
Ad hoc and planned updates.
DB2 V8:
Requires program code to be written. Informational
Referential
Hard to modify later. Constraints
Sometimes there is the possibility for
better insert performance.
Works only for planned updates.
27 Confidential Material of NEON Enterprise Software, Inc.
28. Favor the Use of DB2 Declarative Integrity
28 Confidential Material of NEON Enterprise Software, Inc.
29. Data Type and Length
It might seem like a simplistic tip, but choosing the correct
data type and length for your data will greatly improve
data integrity.
DB2 Data Types
• CHAR / VARCHAR • INTEGER / SMALLINT / BIGINT
• CLOB / DBCLOB • DECIMAL
• GRAPHIC / VARGRAPHIC NUMERIC
• BLOB • FLOAT
• DATE / TIME / TIMESTAMP REAL / DOUBLE
• DECFLOAT
• BINARY / VARBINARY
• XML
29 Confidential Material of NEON Enterprise Software, Inc.
30. Numeric vs. Character
Need: numeric data with leading zeroes
Character Numeric (INT or DEC)
If input properly, leading zeroes Automatic edit checking for
always show. numeric data.
Requires rigorous edit checking Potential for more efficient
for data entry. access because filter factors
are more accurate.
Not the “best” choice for the Best choice for domain.
value domain.
30 Confidential Material of NEON Enterprise Software, Inc.
31. Handling Large Character Data
s VARCHAR(x) – variable length string, of max size “x”
s Actual data lengths should vary widely before
VARCHAR should be considered.
s LONG VARCHAR prevents column additions.
s Consider CLOB for very large character columns.
s Breaking up a VARCHAR into two tables can help
performance under the proper conditions.
s Consider using compression instead of VARCHAR!
31 Confidential Material of NEON Enterprise Software, Inc.
32. Consider DB2 Compression
Consider compression instead of VARCHAR
— Compression = less overhead (no 2 byte prefix)
— Compression requires no programmatic handling
But it does add a compression dictionary to the table so a
compressed small table may be larger than a non-compressed
small table.
Also, be sure to weigh increased CPU to compress/de-compress
against the decreased I/O due to smaller row sizes
Same basic impact, to minimize row length…
— Compression will compress the entire row
— VARCHAR only shrinks the size of the column(s)
A General Compression “Rule of Thumb”
— Do NOT compress unless table is over 10 megabytes and
compression saves 20% or more
32 Confidential Material of NEON Enterprise Software, Inc.
33. DATE/TIME vs. TIMESTAMP
Need: date and time for each row of a table
DATE / TIME TIMESTAMP
Requires 2 columns. Everything in 1 column.
Saves storage: only 7 total bytes Requires 10 bytes of storage.
required.
Less precise: seconds. More precise: microseconds.
DB2 provides more formatting DATE arithmetic easier using 1
options for DATE and TIME. column
33 Confidential Material of NEON Enterprise Software, Inc.
34. DATE/TIME Arithmetic
DB2 can add and subtract DATE, TIME, and
TIMESTAMP values and columns
and DATE, TIME, and TIMESTAMP durations
Guidelines:
Let DB2 do the hard work for you.
Use the proper DB2 data types.
Understand durations (see next slide).
Know your DB2 functions.
34 Confidential Material of NEON Enterprise Software, Inc.
35. Understanding Durations
Labeled Durations – YEAR(S), MONTH(S), DAY(S), HOUR(S),
MINUTE(S), SECOND(S), MICROSECOND(S)
Example(s): 10 DAYS 2 YEARS 33 MINUTES 1 SECOND
Date Durations –yyyymmdd DECIMAL(8,0)
EXAMPLE: 00201104 (20 years, 11 months, and 4 days)
Time Durations – hhmmsss DECIMAL(6,0)
EXAMPLE: 081144 (8 hours, 11 minutes, and 44 seconds)
TIMESTAMP Durations – yyyyxxddhhmmsszzzzzz
DECIMAL(20,6)
EXAMPLE: 00201104081144.004351
(20 years, 11 months, 4 days, 8 hours, 11 minutes, 44 seconds, & 4351
microseconds)
35 Confidential Material of NEON Enterprise Software, Inc.
36. DB2 9: New Built-in Functions - Timestamp
TIMESTAMPADD - adds an interval to a timestamp.
TIMESTAMPDIFF - subtracts two timestamps and returns an
interval.
TIMESTAMP_FORMAT – changes the display format for a
timestamp value. Valid formats that can be specified are:
‘YYYY-MM-DD’
‘YYYY-MM-DD-HH24-MI-SS’
‘YYYY-MM-DD-HH24-MI-SS-NNNNNN’
36 Confidential Material of NEON Enterprise Software, Inc.
37. Know Your DATE/TIME Functions!
Q: How can I get DB2 to express a duration resulting
from DATE subtraction as a total number-of-days?
For example:
SELECT DATE('03/01/2004') – DATE('12/01/2003')
A: Use the DAYS function to return the exact number
of days between those two dates, as follows:
SELECT DAYS('03/01/2004') - DAYS('12/01/2003')
37 Confidential Material of NEON Enterprise Software, Inc.
38. Sort by Day of Week Order?
First thought is to try this:
SELECT DAY_NAME, COL1, COL2 . . .
FROM TXN_TABLE
ORDER BY DAY_NAME;
But…the results from this query would be ordered
alphabetically; in other words:
FRI
MON
SAT
SUN
THU
TUE
WED
38 Confidential Material of NEON Enterprise Software, Inc.
39. Sort by Day of Week Order
Instead, try this:
SELECT DAY_NAME, COL1, COL2 . . .
FROM TXN_TABLE
ORDER BY
LOCATE(DAY_NAME,'SUNMONTUEWEDTHUFRISAT');
LOCATE finds the position of the DAY_NAME value within the
specified string, and returns the integer value of that position.
So, if DAY_NAME is WED, the LOCATE function returns 10. Sunday
would return 1, Monday 4, Tuesday 7, Wednesday 10, Thursday
13, Friday 16, and Saturday 19. This means that our results would
be in the order we require.
(Note: Some other database systems have a function similar to LOCATE called INSTR.)
39 Confidential Material of NEON Enterprise Software, Inc.
40. Trigger Tips and Thoughts
SQL termination character
DSNTEP2: SET TERMINATOR
Command Center: 'Tools' 'Tools Settings' 'Use
statement termination character'
SPUFI Defaults: Option #1 SQL Termination
Consider trigger “testing” columns
TIMESTAMP
Last Trigger Name
REBIND TRIGGER PACKAGE
This is the only way to EXPLAIN the SQL in triggers
40 Confidential Material of NEON Enterprise Software, Inc.
41. Consider INSTEAD OF Triggers on Views
INSTEAD OF triggers can only be defined on VIEWs.
INSTEAD OF triggers enable views that would not otherwise
be updatable to support updates.
Typically, a view that consists of multiple base tables
cannot be updated.
With an INSTEAD OF trigger you can code logic to direct
inserts, updates and deletes to the appropriate underlying
tables that comprise the view.
41 Confidential Material of NEON Enterprise Software, Inc.
42. Some Database Design Tips
s As normalized as possible, but performance before
aesthetics
s Normalization optimizes “update” at the expense of
“retrieval”
Don’t let data modelers dictate “physical” design
s One table per tablespace (usually)
s Partitioned or segmented over simple TS
s Avoid the defaults - they are usually wrong
s
Appropriate free space (PCTFREE & FREEPAGE)
Based on volatility – don’t just let everything default to 10.
Keep in mind the buffering impact of free space (page density).
42 Confidential Material of NEON Enterprise Software, Inc.
43. Cluster on Appropriate Columns
43 Confidential Material of NEON Enterprise Software, Inc.
44. Version 8 – Major Partitioning Changes
Partitioning and clustering are separated
Table space need not be clustered on partitioning key
Partitioning and indexing are separated
Partitioned versus non-partitioned
— Partitioned = the index is physically partitioned into separate
data sets;
— Non-partitioned = the index is in one data set
— Whether partitioned or not, the index may still be
“partitioning”
Partitioning versus secondary
— Partitioning = the index aligns with the keys by which the data
is partitioned
— Secondary = index keys do not align with partitioning
44 Confidential Material of NEON Enterprise Software, Inc.
45. Column Ordering
Sequence columns based on logging:
Static (infrequently updated) non-variable columns first
Then static (infrequently updated) variable columns
Frequently updated columns last
Frequently modified together, place next to each other
AD
CUST FIRST LAST D RE ACCT
ID NAME NAME S S BAL
Static, Frequently updated at Frequently
infrequently the same time (marriage) updated
45
updated Material of NEON Enterprise Software, Inc. …but infrequently updated.
Confidential
46. Version 9 – Reordered Row Format
DB2 9 introduces a new row format that helps to
optimize data on the page w.r.t. logging:
This is reordered row format, or RRF; the row format
we are all familiar with today is now referred to as
basic row format (BRF)
VARCHAR columns stored at end of the row
Does not change the DDL or DCLGEN, only how the
data is stored on the page
Once in DB2 9 NFM, a REORG or a LOAD REPLACE will
cause a change from BRF to RRF.
— You can have a partitioned table space with some
partitions in BRF and some in RRF.
46 Confidential Material of NEON Enterprise Software, Inc.
47. Sequences or Identity Columns?
Identity Columns Sequence Objects
Internal objects generated and Stand-alone objects created by
maintained by DB2 the DBA
Associated with a single table Not associated with any table
Use IDENTITY_VAL_LOCAL() Use PREVIOUS VALUE expression
function to get last value to get last value assigned
N/A – DB2 handles assigning
assigned NEXT VALUE expression gets
next value next value to be assigned
Add/change using ALTER TABLE Administer using ALTER
(V8+ only) SEQUENCE, DROP, GRANT,
Available as of V6 refresh Available COMMENT.
REVOKE, as of V8 NFM
47 Confidential Material of NEON Enterprise Software, Inc.
48. Use Real-Time Statistics (RTS)
With RTS, DB2 collects (some) statistics & periodically
writes the stats to two tables (in DB2 Catalog as of V9):
SYSIBM.TABLESPACESTATS
SYSIBM.INDEXSPACESTATS
You can use RTS in place of traditional catalog statistics
for determining when to run utilities
Doing so, you can eliminate RUNSTATS other than for
gathering statistics for optimization.
48 Confidential Material of NEON Enterprise Software, Inc.
49. An Hour of DB2 Tips and Techniques
Craig S. Mullins
NEON Enterprise Software
craig.mullins@neonesoft.com
www.neonesoft.com
www.craigsmullins.com
www.DB2portal.com
49 Confidential Material of NEON Enterprise Software, Inc.
Editor's Notes
Service level management (SLM) is the disciplined, proactive methodology and procedures used to ensure that adequate levels of service are delivered to all IT users in accordance with business priorities and at acceptable cost. So, in order to effectively manage service levels, the business needs to prioritize application and identify the amount of time, effort, and capital that can be expended delivering service for those applications. A service level is a measure of operational behavior. SLM ensures applications behave accordingly by applying resources to those applications based on their importance to the organization. Depending on the needs of the organization SLM can focus on availability, performance, or both. In terms of availability, the service level may be defined as "99.95% up time, during the hours of 9:00 AM to 10:00 PM on weekdays." Of course, a service level can be more specific stating "average response time for transactions will be two seconds or less for workloads of 500 or fewer users." For a service level agreement (SLA) to be successful all of the parties involved must agree upon stated objectives for availability and performance. The end users must be satisfied with the performance of their applications and the DBAs and technicians must be content with their ability to manage the system to the objectives. Compromise is essential to reach a useful SLA. A robust SLM discipline makes performance management predictable. SLM manages the expectations of all involved. Without a pre-defined and agreed upon SLA how will the DBA and the end users know whether or not an application is performing adequately. Not every application can, or needs to, deliver sub-second response time. Without SLAs, business users and DBAs may have different expectations, resulting in unsatisfied business executives and frustrated DBAs. Not a good situation. With SLM in place, DBAs can adjust resources by applying them to the most mission critical applications as defined in the SLA. Costs will be controlled and capital will be expended on the portions of the business that are most important to the business.
Knowing where to turn and what to look for can be a daunting problem for DBAs, whether they are novice or experienced. When confronted with a DB2 performance problem and a likely cause is not immediately evident, it makes sense to use a systematic approach to problem resolution. A good approach is to move through the specific areas where problems can arise starting with the APPLICATION then the DATABASE then the SYSTEM followed by the ENVIRONMENT. This sequence makes sense because the majority of DB2 performance problems are caused by the application code, whether in the host language or the SQL. The second biggest cause of performance problems is the database design, and so on.
The EDM pool is located in the DB2 Database Address Space. Its sizing is important to DB2 performance because of the control blocks located in this pool. DBD pages are control block structures that represent databases and their dependent objects. DBD pages must be located in contiguous pages. This control block structure must be allocated or the associated database will not be opened by DB2. SKCT/SKPT pages represent cursor table and package table pages. If the plan was bound with packages, then SKPT pages are allocated. If the plan was bound with DBRMs then SKCT pages are allocated. There is only one SKCT page allocated for the plan regardless of the number of threads executing the plan. The CT/PT pages are allocated for each thread instance of the plan. If the pages are found in the EDM pool, then DB2 avoids I/Os to the directory to load this data. The EDM pool caches these pages from the directory. The greater the hit ratio, the greater the effectiveness of the EDM pool. This pool can be increased to improve the hit ratio, but increasing this pool must not result in an increasing in MVS paging activity. If that happens, it is better for system performance to reduce the size of the EDM pool and avoid the paging activity or review other DB2 pools for reduction to reduce MVS paging. If the EDM Pool is too Small: Increased I/O to SCT02, SPT01 and DBD01 Increased Response time due to loading of SKCT, SKPT, and DBD Re-preparation on Caching Dynamic SQL
The first time a plan is executed the CT (Cursor Table) is built first and the SKCT is built by copying the CT pages as they are loaded. As long as the SKCT remains in the EDM Pool subsequent executions of the program will cause a CT to be built from the SKCT. The SKCT can be flushed from the EDM Pool if space is needed for other EDM activity (SKCT, SKPT, CT, PT, DBD, etc.)
CRAIG – need the DETAILS on those 5 pools for DB2 9!!! Check out the IDUG CDs…
EDM Pool Statistics to Monitor Pages In, Pages Used by CT, DBD, SKCT, PT, SKPT Do not worry about these too much Load Failures Always monitor and correct Load xx Sections from DASD Always monitor and correct
Be sure to train your application development staff in the proper usage of SQL. Application developers who are used to processing data a record-at-a-time will make very poor DB2 programmers without some training in the set-at-a-time nature of accessing relational database. Guidelines should be developed and published in a readily accessible place (corporate intranet/portal) that outline the basics Elements of Style for DB2 SQL programming. A good start is to use the bullets on this slide: Simpler is better, but complex SQL can be efficient Let SQL do the work, not the program Retrieve the absolute minimum # of rows required Retrieve only those columns required - never more Always provide join predicates Favor Stage 1 predicates Favor Indexable predicates Avoid tablespace scans for large tables Avoid sorting when possible
Favoring Stage 1 predicates over Stage 2 predicates causes DB2 to evaluate the expression at the time the data is retrieved. Therefore, SQL statements using Stage 1 predicates instead of Stage 2 should be better performers. Whenever a SQL query is using a Stage 2 predicate you should examine the code to see if you can formulate an equivalent SQL statement using Stage 1 predicates instead. Stage 1 and Indexable predicates are documented in the IBM DB2 Admin Guide, Chapter 31.
Do not implement data access interfaces that are called by application programs instead of coding SQL requests as needed in each program. Doing so is usually referred to as a “black box” approach because the data access interface acts as a black box - shielding the developers from having to know SQL. When a black box is used, the tendency is that short cuts are taken. The black box inevitably deviates from the SQL development guidelines just presented.
Every DB2 program should implement a COMMIT strategy. That strategy should only rarely be to avoid issuing COMMITs and let the end of the program cause the modified data to be committed. Only very small programs that run for very short durations should consider this approach. Even then it is wise to clearly document in the program why there are no COMMITs being issued (because the program could change later rendering the original assumptions inaccurate, thereby causing a performance problem). What about DB2 programs that do not issue INSERT, UPDATE, or DELETE statements? Well, it makes sense to implement a COMMIT strategy in these programs, too. Otherwise, you might run into problems. For example, one shop that was moving to online REORGs ran into problems because of long-running DB2 read-only batch jobs. The online REORGs could not find a time where they could do the data set switch over because the long-running programs were holding locks. Another example is when a program needs to auto REBIND. The first time through the program is rebound, a bunch of works is done (perhaps read only), then the program fails. Everything is rolled back – including the update to the catalog for the rebound program. So, the next time through the program is auto rebound again. If there were COMMITs in the program this would NOT happen.
BIND and REBIND are important components in assuring optimal application performance. It is the bind process that determines exactly how your DB2 data is accessed in your application programs. As such, it critically important that you develop an appropriate strategy for when and how to REBIND your programs. There are several common approaches taken by DB2 customers. By far, the best approach is the first – as we will learn. This approach involves some form of regular maintenance that keeps DB2 statistics up to date and formulates new access paths as data volumes and patterns change. More on this in a moment. Other approaches include binding only when a new version of DB2 is installed, or perhaps more ambitious, whenever new PTFs are applied to DB2. Another approach is to rebind automatically after a regular period of time, whether it is days, weeks, months, what have you. This approach can work if the period of time is wisely chosen based on the application data – but it still can pose significant administrative issues. The final approach – if it ain’t broke don’t fix it - is taken by some organizations… and it is the worst of the several approaches here. Let’s go on to the next slide to discuss the issues with this tactic.
The biggest problem with this approach is that you are penalizing EVERY program in your subsystem for fear that a program or two will have a few degraded access paths. This results in potentially many programs having sub-optimal performance because the optimizer never gets a chance to create better access paths as the data changes. Of course, the possibility of degraded performance is real. The problem is being able to find which statements may be worse. The ideal would situation would be to be able to review the access path changes before hand to determine if they are better or worse. But DB2 does not provide any systematic method of administering access paths that way.
As I mentioned previously, the best approach is to perform regular REBINDs as your data changes. This involves what has become known as the three Rs. We’ll discuss this approach on the next slide in just a moment. At any rate, your goal should be to keep your access paths up-to-date with the current state of your data. Failing to do this means that DB2 is accessing data based upon false assumptions. DB2 is unlikely to make the same access path choice as your data grows – and as patterns within the data change. By REBINDing you can generally improve the overall performance of your applications because the access paths will be better designed based on an accurate view of the data. Additionally, as DB2 changes are made (via new releases or PTFs) optimizer improvements and new access techniques can be incorporated into the access paths. Of course, this means you will have to develop a methodology for reviewing your access paths and taking care of any “potential” problem access paths. But tackling this as a manual process can be difficult.
Of course, the Three R’s can pose more questions than it answers. For example, when should you REORGanize? In order to properly determine when a REORG is needed you’ll have to look at statistics. This mean looking at either RUNSTATS or Real-Time Statistics. So, perhaps it should be at least 4 R’s – in other words, RUNSTATS, REORG, RUNSTATS, REBIND. Some folks don’t rely on statistics to schedule a REORG but just build the JCL to REORG their database objects when they create the object. So they create a table space then build the REORG job and schedule it to run monthly, or quarterly, or on some regular basis. This is better than no REORG at all, but it is probably not the best approach because you are probably either reorging too soon (in which case you waste the CPU cycles to do the REORG) or you are reorging too late (in which case performance is suffering for a period of time before the REORG runs). Better to base your REORGs off of statistics – either RUNSTATS or RTS. OK, then when should you run RUNSTATS? My answer is "As frequently as possible based on how often your data changes.” Of course, this means you need to know a thing or two about your data growth patterns. To properly determine a schedule for statistics you need to know things about your data: what is its make-up, how is it used, how fast does it grow, and how often does it change? These patterns will differ for every table space in your system. Next we need to decide when to REBIND? The best answer for this is when statistics have changed significantly enough to change access paths. But knowing when this is so is the hard part.
DB2 handles nulls using a one byte null indicator. Therefore, any column that can be null will require one additional byte of storage. Additionally, nullable columns are not (automatically) variable length columns. So your storage problem is actually bigger than you think. For example, if NAME is defined as CHAR(20) and nullable each row will require 21 bytes for the NAME column - one for the null indicator, and 20 for the contents of the NAME column (all 20 of which are required even if the NAME is set to null). Of course, you can choose the VARCHAR data type for this column instead of the CHAR data type. This will enable you to store only the amount of bytes required for each column value - that is, 5 bytes for CRAIG, 6 bytes for RONALD, 3 bytes for SUE, etc. Of course, this is not 100% accurate because a variable column requires a 2 byte length indicator. So every row will always require the 2 byte length indicator, but if the column is set to null you can set the length indicator to 0 and save space. Keep in mind, though, that you will have to programmatically set the length of each column as it is inserted or updated into the table.
Keys consist of the attributes that identify entity occurrences and define relationships between entities. A key will consist of one or more attributes, the values of which uniquely identify an entity occurrence. Well, more precisely, candidate keys and primary keys identify the entity. A combination of the primary key value of one entity and the foreign key value of another entity identify relationships. A key should contain no embedded meaning. The key’s purpose is to identify and not to describe. The other attributes in the entity serve a descriptive purpose. When keys contain embedded meaning problems can arise if the meaning changes. Furthermore, the values for any embedded meaning are likely to be outside your immediate control, which can also cause data integrity and modification problems. Candidate Keys Each entity can have multiple candidate keys, but it must have at least one. Each candidate key is an attribute, or set of attributes, that can be used to uniquely identify an occurrence of the entity. If the value of the attributes cannot be used to identify a specific occurrence of the entity, then they do not represent a candidate key.
Semantic data integrity refers to the meaning of data and relationships that need to be maintained between different types of data. The DBMS provides options, controls and procedures to define and assure the semantic integrity of the data stored within its databases. DBAs must understand how the DBMS enables automatic semantic data integrity checking. And, as an on-going component of the job, the DBA has to implement semantic data integrity into the database design, as well as initiate processes to check and correct integrity problems that creep into the database over time. DOMAIN - values in columns - no nulls in PK (table check constraints) ASSOCIATION - pre-defined business rules for attribute association (VALIDPROC, check constraint) REFERENTIAL - PK to FK (referential constraints) Of course, data integrity can be built into application programs for enforcement, too. But such data integrity is not enforced when ad hoc changes are made to the database.
A four-byte code is required to identify an entity; all of the codes are numeric and will stay that way. But, for reporting purposes, users wish the codes to print out with leading zeroes. Should the column be defined as CHAR(4) or SMALLINT? Edit checks: Without proper edit checks, inserts and updates could place invalid alphabetic characters into the product code. This can be a very valid concern if ad hoc data modifications are permitted. This is rare in production databases, but data problems can still occur if the proper edit checks are not coded into every program that can modify the data. If proper edit checks are coded and will never be bypassed, this removes the data integrity question. Filter factors: Consider the possible number of values that a CHAR(4) column and a SMALLINT column can assume. Even if edit checks are coded for each, DB2 is not aware of these and assumes that all combinations of characters are permitted. DB2 uses base 37 math when it determines access paths for character columns, under the assumption that 26 alphabetic letters, 10 numeric digits, and a space will be used. This adds up to 37 possible characters. For a four-byte character column there are 374 or 1,874,161 possible values. A SMALLINT column can range from -32,768 to 32,767 producing 65,536 possible small integer values. The drawback here is that negative or 5 digit product codes could be entered. However, if we adhere to our proper edit check assumption, the data integrity problems will be avoided here, as well. DB2 will use the HIGH2KEY and LOW2KEY values to calculate filter factors. For character columns, the range between HIGH2KEY and LOW2KEY is larger than numeric columns because there are more total values. The filter factor will be larger for the numeric data type than for the character data type which may influence DB2 to choose a different access path. For this reason, favor the SMALLINT over the CHAR(4) definition. The leading zeroes problem might be able to be solved using other methods. When using QMF, you can ensure that leading zeroes are shown by using the "J" edit code. Report programs can be coded to display leading zeroes easily enough by moving the host variables to appropriate display fields. Ad hoc access through other reporting tools typically provide a parameter that can enable leading zeroes to be displayed.
Variable length columns can save space but are worth considering only when the length of the actual data items varies considerably. Even so, with data compression built-in to DB2 now - and operating very efficiently - compression is frequently a better choice for conserving disk space than variable length columns. This is so because DB2 compresses automatically - behind the scenes. With variable length columns a two-byte prefix is required for each column to indicate the actual length of the data in the variable length column. This prefix must be set and maintained by application code - this is not necessary with compression. However, the overhead of compression in some cases may be problematic. DB2 must incur CPU in order to compress and de-compress the data as it is modified and read. However, I/O benefits can outweigh the additional CPU because more data will be stored per page due to the smaller size of the rows.
Problem: I need to store both date and time information on a single row in DB2. Is it better to use a single TIMESTAMP column or two columns, one DATE and the other TIME? Solution: The answer to this question depends on several factors specific to your situation. Consider the following points before making your decision: With DATE and TIME you must use two columns. TIMESTAMP uses one column, thereby simplifying data access and modification The combination of DATE and TIME columns requires 7 bytes of storage, while a TIMESTAMP column always requires 10 bytes of storage. Using the combination of DATE and TIME columns will save space. TIMESTAMP provides greater time accuracy, down to the microsecond level. TIME provides accuracy only to the second level. If precision is important, use TIMESTAMP. Use TIME if you want to ensure that the actual time is NOT stored down to the microsecond level. Date and time arithmetic is easier to implement using TIMESTAMP data instead of a combination of DATE and TIME. Subtracting one TIMESTAMP from another results in a TIMESTAMP duration. To calculate a duration using DATE and TIME columns, two subtraction operations must occur: one for the DATE column and one for the TIME column. DB2 provides for the formatting of DATE and TIME columns via local DATE and TIME exits, the CHAR function, and the DATE and TIME precompiler options. These facilities are not available for TIMESTAMP columns. If the date and time information is to be extracted and displayed on a report or by an online application, the availability of these DB2-provided facilities for DATE and TIME columns should be considered when making your decision.
DB2 enables you to add and subtract DATE, TIME, and TIMESTAMP columns. In addition, you can add date and time durations to or subtract them from these columns. But if you do not understand the capabilities and features of date and time arithmetic, you will likely encounter some problems implementing it. Keep the following rules in mind: When you issue date arithmetic statements using durations, do not try to establish a common conversion factor between durations of different types. For example, the following two date arithmetic statements are not equivalent: 1997/04/03 - 1 MONTH 1997/04/03 - 30 DAYS April has 30 days, so the normal response would be to subtract 30 days to subtract one month. The result of the first statement is 1997/03/03, but the result of the second statement is 1997/03/04. In general, use like durations (for example, use months or use days, but not both) when you issue date arithmetic. If one operand is a date, the other operand must be a date or a date duration. If one operand is a time, the other operand must be a time or a time duration. You cannot mix durations and data types with date and time arithmetic. If one operand is a timestamp, the other operand can be a time, a date, a time duration, or a date duration. The second operand cannot be a timestamp. You can mix date and time durations with timestamp data types.
Now, what exactly is in that field returned as the result of a date or time calculation? Simply stated, it is a duration. There are three types of durations: date durations, time durations, and labeled durations. Date durations are expressed as a DECIMAL(8,0) number. To be properly interpreted, the number must have the format yyyymmdd, where yyyy represents the number of years, mm the number of months, and DD the number of days. The result of subtracting one DATE value from another is a date duration. Time durations are expressed as a DECIMAL(6,0) number. To be properly interpreted, the number must have the format hhmmss, where hh represents the number of hours, mm the number of minutes, and ss the number of seconds. The result of subtracting one TIME value from another is a time duration. Labeled durations represent a specific unit of time as expressed by a number followed by one of the seven duration keywords: YEARS, MONTHS, DAYS, HOURS, MINUTES, SECONDS, or MICROSECONDS. A labeled duration can only be used as an operand of an arithmetic operator, and the other operand must have a data type of DATE, TIME, or TIMESTAMP. For example: CURRENT DATE + 3 YEARS + 6 MONTHS This will add three and a half years to the current date.
SELECT SUBSTR(NAME,1,8) AS TSNAME, VARCHAR_FORMAT(CREATEDTS,'YYYY-MM-DD-HH24:MI:SS') AS TSCR FROM SYSIBM.SYSTABLESPACE; WHERE CREATEDTS >= TIMESTAMP_FORMAT('2007-01-01 00:00:00','YYYY-MM-DD HH24:MI:SS');
The first SELECT returns a duration of 00000300 (3 months). But those months encompass a 29-day FEB plus a 31-day JAN and a 31-day DEC (total 91 days). What if you want a query that returns the number 91. The second SELECT uses the DAYS function to convert the dates into the number of days since Jan 1, 0000 and then subtracts the two numbers. This will result in the exact number of days between two dates.
A frequent problem new trigger users encounter is a SQL0104N error message. Messages & Codes describes this error as an unexpected token – not very helpful. This is one of the more frustrating aspects of trigger implementation. The problem is usually not in the syntax of the CREATE TRIGGER statement. The problem is usually with the SQL terminator character - which, as every good DB2 person knows, is the semi-colon character, or ";". Well, the problem is a trigger can be comprised of multiple SQL statements, each of which needs to be terminated using the ";" character. But DB2 reads a ; as "end of statement" so the CREATE TRIGGER statement is prematurely halted and of course, the SQL is not correct if you stop at the first semi-colon, right? The trick is to specify a different SQL termination character. Using DSNTEP2 you can specify SET TERMINATOR TO in a comment to use a different character to terminate the SQL. Alternately you can create the trigger using the DB2 Command Center. There you can select the 'Tools Settings' option from the 'Tools' pull-down menu. Once there, check the box next to the line labeled 'Use statement termination character' and supply the character you wish to use in the little prompt box to the right of the line. In SPUFI, just change the SQL Termination on the SPUFI Defaults panel. To perform trigger testing and debugging it can be useful to have a record of the date/time and last trigger to modify each table upon which a trigger can act. Consider adding these columns to tables upon which triggered actions can occur. Finally, the only way to find out the access paths on SQL in a trigger is to REBIND the trigger package. And you cannot tell DB2 to do this when the trigger is first created, so you should consider REBINDing right after creating each trigger.
Clustering defines the way data will be stored physically. The term refers to keeping rows in a specific order on the disk. Through clustering, data that is commonly accessed together can be stored together on the same or contiguous pages. By storing the data together this way performance is optimized when it is retrieved sequentially because fewer I/Os are required. Actually, more accurately, clustering indicates that the DBMS should attempt to maintain rows in the sequence of specific column values. If insufficient space is available to maintain clustering as data is inserted or modified, the DBMS typically stores the data without forcing clustering. So a clustered table may not actually be 100% clustered by the key value at all times. Consider clustering tables when: (1) a large number of queries retrieve ranges of data based on specific column values, (2) if a foreign key exists in the table it is a good candidate for clustering. The primary key is almost always a bad choice for clustering because primary key access tends to be random and clustering optimizes sequential access, or (3) when data is frequently sorted together (ORDER BY, GROUP BY, UNION, SELECT DISTINCT, joins). When clustering a table be sure to consider the frequency of modification. Inserts and updates can cause data to become unclustered. Favor clustering infrequently modified data over very frequently modified data.
Partitioned and non-partitioned – a partitioned index means that the index itself is physically partitioned into separate data sets; a non-partitioned index, though, may still be a partitioning index. Partitioning and secondary index – a partitioning index means that the index keys correlate to the columns that are used to partition the data. The index may or may not also be partitioned.
Before creating a table review the order of the columns. Column order is irrelevant from an operational perspective – that is, DB2 gives the same results regardless of column sequence. But column sequencing can impact performance so you may need to change sequence as recorded in the logical data model for physical implementation For example, column ordering can impact performance based on the way DB2 logs changes. Data modifications are logged from the first byte changed to the last byte changed… except for variable length rows, then DB2 logs the change from the first byte changed to the end of the row. A DB2 table will have variable rows if compression is turned on or any column is VARCHAR or VARGRAPHIC. To take advantage of this knowledge, re-sequence columns based on logging. Infrequently updated non-variable columns should be grouped together at the beginning of the table, followed by static (infrequently updated) variable columns, then frequently updated columns last. This structure will ensure that the least amount of data required is logged, thereby speeding up any data modification processes. Another good idea would be to group any columns that are frequently modified together. This can reduce the amount of data logged, too. Of course, each DBMS logs data differently. You will need to understand how your DBMS logs and how column ordering can impact performance.
The DSNRTSDB database must be started for DB2 to be able to externalize the statistics. The stats are held in virtual storage until they are externalized. The tables will have one row per table space or index space, unless partitioned then they will have one row per TS or IX partition. For shared tables in a data-sharing environment, each member writes its own statistics to the RTS tables. The real-time stats will be change as operations occur to the data. Changes caused by inserts, updates and deletes will be reflected in the real time statistics. The RTS tables use counters to track of the number of inserts, updates and deletes. If an object is dropped then the statistics in the RTS tables are removed (provided that the RTS tables are available at the time, otherwise the statistics remain and must be manually removed). Rollbacks will also affect the counters. Utilities can cause real-time stats to change. LOAD increases the counters based on the number of records loaded; other changes are recorded too when they impact stats that are captured (e.g. active pages). REORG and REBUILD can impact the RTS stats, too. The impact will be determined based on the type of REORG and what is actually changed. And, of course, RUNSTATS will cause the counters for number of insert, updates and deletes since the last RUNSTATS to be reset, and COPY will cause the counters for number of updated and changed pages since the last COPY to be reset (as well as other pertinent stats). Basically, with RTS you get the ability to more immediately gauge the impact of database changes on your objects to determine whether a REORG is required or whether a COPY is needed – and what type of COPY.