28. Directory Structure
• One directory per database per segment
• <base_dir>/<seg_dir>/base/<database oid>
e.g. /d/d2/primary/gpseg_37/base/19002
• SELECT oid, datname FROM pg_database;
29. Data Files
• Each file is named using the pg_classs.relfilenode
column of its relation
SELECT relfilenode FROM pg_class WHERE oid =
‘test.mytable’::regclass;
• Originally relfilenode is equal to the OID of the relation
but numerous database operations (e.g. truncate) can
change this value
30. Diagnostics
• CREATE EXTERNAL WEB TABLE database_files (
host TEXT
, segment INT
, file TEXT
, mtime TIMESTAMP
, sz BIGINT
) EXECUTE E’ls –l –time-style=+%Y%m%d_%H:%M:%S
$GP_SEG_DATADIR/base/<database_oid> | awk {’print
ENVIRON[“HOSTNAME”]”|”ENVIRON[“GP_SEGMENT_ID”]”|”$7”|”$6”|”$5’
}’
ON ALL
FORMAT ‘text’ (DELIMITER E’|’ NULL ‘’);
31. Diagnostics
• Querying this table can produce substantial
load since it stats every file in the cluster
• Views can easily be built on top of table to join
back to pg_class
32. Heap Tables
• One data file per heap table for tuple storage
• Minimum file size is equal to the default blocksize defined for the database
CREATE TABLE test1 (a INT, b VARCHAR, c DATE);
INSERT INTO test1 VALUES(1, ‘a’, current_date);
SELECT segment_id, sz FROM database_file WHERE file like ‘<relfilenode>%’;
segment | sz
0 | 0
1 | 0
2 | 32768
3 | 0
33. AO Tables
• Either row or columnar orientation
• Variable file size
• Columnar tables have one file per column (files with format
<relfilenode>.*)
• Concurrent loads also create a set of new files related to each table
• AO tables initially consist of a single empty file in each data
directory until data is inserted
• Data files are not limited to a minimum size corresponding to the
database blocksize.
34. AO Tables
CREATE TABLE test1 (a INT, b VARCHAR, c DATE) WITH (appendonly=true, orientation = row);
SELECT segment_id, file, sz FROM database_file WHERE file like ‘<relfilenode>%’;
segment | file | sz
0 | 3000010 | 0
1 | 3000010 | 0
2 | 3000010 | 0
3 | 3000010 | 0
INSERT INTO test1 VALUES(1, ‘a’, current_date);
SELECT segment_id, sz FROM database_file WHERE file like ‘<relfilenode>%’;
segment | file | sz
0 | 3000010 | 0
1 | 3000010 | 0
2 | 3000010 | 0
2 | 3000010.1 | 40
3 | 3000010 | 0
35. AO Tables
CREATE TABLE test1 (a INT, b VARCHAR, c DATE) WITH (appendonly=true, orientation = column);
SELECT segment_id, file, sz FROM database_file WHERE file like ‘<relfilenode>%’;
segment | file | sz
0 | 3000010 | 0
1 | 3000010 | 0
2 | 3000010 | 0
3 | 3000010 | 0
INSERT INTO test1 VALUES(1, ‘a’, current_date);
SELECT segment_id, sz FROM database_file WHERE file like ‘<relfilenode>%’;
segment | file | sz
0 | 3000010 | 0
1 | 3000010 | 0
2 | 3000010 | 0
2 | 3000010.1 | 40
2 | 3000010.129 | 40
2 | 3000010.257 | 40
3 | 3000010 | 0
36. AO Tables
• For large fact tables ADD/DROP COLUMN
operations are much faster carried out against
AO columnar tables as no rewrite of data files
is required.
37. AO Tables
• Beware of large numbers of concurrent loads running
against AO tables
• For example, 50 concurrent loads running against an
AO columnar table with 500 columns will produce
20000 primary segment files on a single segment host
(500 column files x 50 loads x 8 primary segments)
• File system efficiency can decline drastically as the
number of files increases
38. AO Tables
• Workarounds:
1. Rebuild the partition via batch processing
every night (CTAS followed by a partition
swap)
2. Load into a heap organized staging table
39. Skew
• Typically skew is discovered due to unbalanced
storage in one or more segments in the cluster
• Skew in the gp_toolkit view is calculated by
querying the hidden gp_segment_id column
SELECT gp_segment_id, count(*) FROM mytable
GROUP BY 1;
• This operation is prohibitively expensive when
querying all tables in a cluster
40. Skew
• Querying file metadata with the diagnostic table is much faster
• Coefficient of variation, interquartile range
SELECT
substring(file, ‘([0-9]+)’),
, stddev(sz)/avg(sz)
FROM database_files
GROUP BY 1
HAVING SUM(sz) != 0;
41. Bloat
• Checking for skew via the gp_segment_id
column will miss physical skew due to bloat
(dead space from deleted/updated tuples).