1. Getting Started with
PL/Proxy
Peter Eisentraut
peter@eisentraut.org
F-Secure Corporation
PostgreSQL Conference East 2011
CC-BY
2. Concept
• a database partitioning system implemented as a
procedural language
• “sharding”/horizontal partitioning
• PostgreSQL’s No(t-only)SQL solution
4. Areas of Application
• high write load
• (high read load)
• allow for some “eventual consistency”
• have reasonable partitioning keys
• use/plan to use server-side functions
5. Example
Have:1
CREATE TABLE products (
prod_id serial PRIMARY KEY ,
category integer NOT NULL ,
title varchar (50) NOT NULL ,
actor varchar (50) NOT NULL ,
price numeric (12 ,2) NOT NULL ,
special smallint ,
common_prod_id integer NOT NULL
);
INSERT INTO products VALUES (...) ;
UPDATE products SET ... WHERE ...;
DELETE FROM products WHERE ...;
plus various queries
1 dellstore2 example database
7. Backend Functions I
CREATE FUNCTION insert_product ( p_category int ,
p_title varchar , p_actor varchar , p_price
numeric , p_special smallint ,
p_common_prod_id int ) RETURNS int
LANGUAGE plpgsql
AS $$
DECLARE
cnt int ;
BEGIN
INSERT INTO products ( category , title ,
actor , price , special , common_prod_id )
VALUES ( p_category , p_title , p_actor ,
p_price , p_special , p_common_prod_id ) ;
GET DIAGNOSTICS cnt = ROW_COUNT ;
RETURN cnt ;
END ;
$$ ;
8. Backend Functions II
CREATE FUNCTION update_product_price ( p_prod_id
int , p_price numeric ) RETURNS int
LANGUAGE plpgsql
AS $$
DECLARE
cnt int ;
BEGIN
UPDATE products SET price = p_price WHERE
prod_id = p_prod_id ;
GET DIAGNOSTICS cnt = ROW_COUNT ;
RETURN cnt ;
END ;
$$ ;
9. Backend Functions III
CREATE FUNCTION delete_product_by_title ( p_title
varchar ) RETURNS int
LANGUAGE plpgsql
AS $$
DECLARE
cnt int ;
BEGIN
DELETE FROM products WHERE title = p_title ;
GET DIAGNOSTICS cnt = ROW_COUNT ;
RETURN cnt ;
END ;
$$ ;
10. Frontend Functions I
CREATE FUNCTION insert_product ( p_category int ,
p_title varchar , p_actor varchar , p_price
numeric , p_special smallint ,
p_common_prod_id int ) RETURNS SETOF int
LANGUAGE plproxy
AS $$
CLUSTER ' dellstore_cluster ';
RUN ON hashtext ( p_title ) ;
$$ ;
CREATE FUNCTION update_product_price ( p_prod_id
int , p_price numeric ) RETURNS SETOF int
LANGUAGE plproxy
AS $$
CLUSTER ' dellstore_cluster ';
RUN ON ALL ;
$$ ;
11. Frontend Functions II
CREATE FUNCTION delete_product_by_title ( p_title
varchar ) RETURNS int
LANGUAGE plpgsql
AS $$
CLUSTER ' dellstore_cluster ';
RUN ON hashtext ( p_title ) ;
$$ ;
12. Frontend Query Functions I
CREATE FUNCTION get_product_price ( p_prod_id
int ) RETURNS SETOF numeric
LANGUAGE plproxy
AS $$
CLUSTER ' dellstore_cluster ';
RUN ON ALL ;
SELECT price FROM products WHERE prod_id =
p_prod_id ;
$$ ;
13. Frontend Query Functions II
CREATE FUNCTION
get_products_by_category ( p_category int )
RETURNS SETOF products
LANGUAGE plproxy
AS $$
CLUSTER ' dellstore_cluster ';
RUN ON ALL ;
SELECT * FROM products WHERE category =
p_category ;
$$ ;
14. Unpartitioned Small Tables
CREATE FUNCTION insert_category ( p_categoryname )
RETURNS SETOF int
LANGUAGE plproxy
AS $$
CLUSTER ' dellstore_cluster ';
RUN ON 0;
$$ ;
15. Which Hash Key?
• natural keys (names, descriptions, UUIDs)
• not serials (Consider using fewer “ID” fields.)
• single columns
• group sensibly to allow joins on backend
16. Set Basic Parameters
• number of partitions (2n ), e. g. 8
• host names, e. g.
• frontend: dbfe
• backends: dbbe1, . . . , dbbe8
• database names, e. g.
• frontend: dellstore2
• backends: store01, . . . , store08
• user names, e. g. storeapp
• hardware:
• frontend: lots of memory, normal disk
• backends: full-sized database server
17. Set Basic Parameters
• number of partitions (2n ), e. g. 8
• host names, e. g.
• frontend: dbfe
• backends: dbbe1, . . . , dbbe8 (or start at 0?)
• database names, e. g.
• frontend: dellstore2
• backends: store01, . . . , store08 (or start at 0?)
• user names, e. g. storeapp
• hardware:
• frontend: lots of memory, normal disk
• backends: full-sized database server
18. Configuration
CREATE FUNCTION
plproxy . get_cluster_partitions ( cluster_name
text ) RETURNS SETOF text LANGUAGE plpgsql AS
$$ ... $$ ;
CREATE FUNCTION
plproxy . get_cluster_version ( cluster_name
text ) RETURNS int LANGUAGE plpgsql AS
$$ ... $$ ;
CREATE FUNCTION plproxy . get_cluster_config ( IN
cluster_name text , OUT key text , OUT val
text ) RETURNS SETOF record LANGUAGE plpgsql
AS $$ ... $$ ;
19. get_cluster_partitions
Simplistic approach:
CREATE FUNCTION
plproxy . get_cluster_partitions ( cluster_name
text ) RETURNS SETOF text
LANGUAGE plpgsql
AS $$
BEGIN
IF cluster_name = ' dellstore_cluster ' THEN
RETURN NEXT ' dbname = store01 host = dbbe1 ';
RETURN NEXT ' dbname = store02 host = dbbe2 ';
...
RETURN NEXT ' dbname = store08 host = dbbe8 ';
RETURN ;
END IF ;
RAISE EXCEPTION ' Unknown cluster ';
END ;
$$ ;
20. get_cluster_version
Simplistic approach:
CREATE FUNCTION
plproxy . get_cluster_version ( cluster_name
text ) RETURNS int
LANGUAGE plpgsql
AS $$
BEGIN
IF cluster_name = ' dellstore_cluster ' THEN
RETURN 1;
END IF ;
RAISE EXCEPTION ' Unknown cluster ';
END ;
$$ LANGUAGE plpgsql ;
21. get_cluster_config
CREATE OR REPLACE FUNCTION
plproxy . get_cluster_config ( IN cluster_name
text , OUT key text , OUT val text ) RETURNS
SETOF record
LANGUAGE plpgsql
AS $$
BEGIN
-- same config for all clusters
key := ' connection_lifetime ';
val := 30*60; -- 30 m
RETURN NEXT ;
RETURN ;
END ;
$$ ;
22. Table-Driven Configuration I
CREATE TABLE plproxy . partitions (
cluster_name text NOT NULL ,
host text NOT NULL ,
port text NOT NULL ,
dbname text NOT NULL ,
PRIMARY KEY ( cluster_name , dbname )
);
INSERT INTO plproxy . partitions VALUES
( ' dellstore_cluster ' , ' dbbe1 ' , ' 5432 ' ,
' store01 ') ,
( ' dellstore_cluster ' , ' dbbe2 ' , ' 5432 ' ,
' store02 ') ,
...
( ' dellstore_cluster ' , ' dbbe8 ' , ' 5432 ' ,
' store03 ') ;
23. Table-Driven Configuration II
CREATE TABLE plproxy . cluster_users (
cluster_name text NOT NULL ,
remote_user text NOT NULL ,
local_user NOT NULL ,
PRIMARY KEY ( cluster_name , remote_user ,
local_user )
);
INSERT INTO plproxy . cluster_users VALUES
( ' dellstore_cluster ' , ' storeapp ' , ' storeapp ') ;
24. Table-Driven Configuration III
CREATE TABLE plproxy . remote_passwords (
host text NOT NULL ,
port text NOT NULL ,
dbname text NOT NULL ,
remote_user text NOT NULL ,
password text ,
PRIMARY KEY ( host , port , dbname ,
remote_user )
);
INSERT INTO plproxy . remote_passwords VALUES
( ' dbbe1 ' , ' 5432 ' , ' store01 ' , ' storeapp ' ,
' Thu1Ued0 ') ,
...
-- or use . pgpass ?
25. Table-Driven Configuration IV
CREATE TABLE plproxy . cluster_version (
id int PRIMARY KEY
);
INSERT INTO plproxy . cluster_version VALUES (1) ;
GRANT SELECT ON plproxy . cluster_version TO
PUBLIC ;
/* extra credit : write trigger that changes the
version when one of the other tables changes
*/
26. Table-Driven Configuration V
CREATE OR REPLACE FUNCTION plproxy . get_cluster_partitions ( p_cluster_name text )
RETURNS SETOF text
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
r record ;
BEGIN
FOR r IN
SELECT ' host = ' || host || ' port = ' || port || ' dbname = ' || dbname || '
user = ' || remote_user || ' password = ' || password AS dsn
FROM plproxy . partitions NATURAL JOIN plproxy . cluster_users NATURAL JOIN
plproxy . remote_passwords
WHERE cluster_name = p_cluster_name
AND local_user = session_user
ORDER BY dbname -- important
LOOP
RETURN NEXT r. dsn ;
END LOOP ;
IF NOT found THEN
RAISE EXCEPTION ' no such cluster : % ', p_cluster_name ;
END IF ;
RETURN ;
END ;
$$ ;
27. Table-Driven Configuration VI
CREATE FUNCTION
plproxy . get_cluster_version ( p_cluster_name
text ) RETURNS int
LANGUAGE plpgsql
AS $$
DECLARE
ret int ;
BEGIN
SELECT INTO ret id FROM
plproxy . cluster_version ;
RETURN ret ;
END ;
$$ ;
28. SQL/MED Configuration
CREATE SERVER dellstore_cluster FOREIGN DATA
WRAPPER plproxy
OPTIONS (
connection_lifetime ' 1800 ' ,
p0 ' dbname = store01 host = dbbe1 ' ,
p1 ' dbname = store02 host = dbbe2 ' ,
...
p7 ' dbname = store08 host = dbbe8 '
);
CREATE USER MAPPING FOR storeapp SERVER
dellstore_cluster
OPTIONS ( user ' storeapp ' , password
' sekret ') ;
GRANT USAGE ON SERVER dellstore_cluster TO
storeapp ;
29. Hash Functions
RUN ON hashtext ( somecolumn ) ;
• want a fast, uniform hash function
• typically use hashtext
• problem: implementation might change
• possible solution: https://github.com/petere/pgvihash
30. Sequences
shard 1:
ALTER SEQUENCE products_prod_id_seq MINVALUE 1
MAXVALUE 100000000 START 1;
shard 2:
ALTER SEQUENCE products_prod_id_seq MINVALUE
100000001 MAXVALUE 200000000 START 100000001;
etc.
31. Aggregates
Example: count all products
Backend:
CREATE FUNCTION count_products () RETURNS bigint
LANGUAGE SQL STABLE AS $$SELECT count (*)
FROM products$$ ;
Frontend:
CREATE FUNCTION count_products () RETURNS SETOF
bigint LANGUAGE plproxy AS $$
CLUSTER ' dellstore_cluster ';
RUN ON ALL ;
$$ ;
SELECT sum ( x ) AS count FROM count_products () AS
t(x);
32. Dynamic Queries I
a. k. a. “cheating” ;-)
CREATE FUNCTION execute_query ( sql text ) RETURNS
SETOF RECORD LANGUAGE plproxy
AS $$
CLUSTER ' dellstore_cluster ';
RUN ON ALL ;
$$ ;
CREATE FUNCTION execute_query ( sql text ) RETURNS
SETOF RECORD LANGUAGE plpgsql
AS $$
BEGIN
RETURN QUERY EXECUTE sql ;
END ;
$$ ;
33. Dynamic Queries II
SELECT * FROM execute_query ( ' SELECT title ,
price FROM products ') AS ( title varchar ,
price numeric ) ;
SELECT category , sum ( sum_price ) FROM
execute_query ( ' SELECT category , sum ( price )
FROM products GROUP BY category ') AS
( category int , sum_price numeric ) GROUP BY
category ;
34. Repartitioning
• changing partitioning key is extremely cumbersome
• adding partitions is somewhat cumbersome, e. g., to split
shard 0:
COPY ( SELECT * FROM products WHERE
hashtext ( title :: text ) & 15 <> 0) TO
' somewhere ';
DELETE FROM products WHERE
hashtext ( title :: text ) & 15 <> 0;
Better start out with enough partitions!