PostgreSQL

PostgreSQL
Reuven M. Lerner (reuven@lerner.co.il)
IL-Techtalks
November 14th, 2012

Who am I?
• Web developer since 1993
• Linux Journal columnist since 1996
• Software architect, developer, consultant
• Mostly Ruby on Rails + PostgreSQL, but
also Python, PHP, Perl, JavaScript, MySQL,
MongoDB, and lots more...
• PostgreSQL user since (at least) 1997

What do I do?

• Web development, especially in Rails
• Teaching/training
• Coaching/consulting

What is a database?

Store data
conﬁdently

Database

Retrieve data
ﬂexibly

Relational databases

Deﬁne tables,
store data in them

Database

Retrieve data from
related tables

Lots of options!

• Oracle
• Microsoft SQL Server
• IBM DB2
• MySQL
• PostgreSQL

How do you choose?
• Integrity (ACID compliance)

• Data types

• Functionality

• Tools

• Extensibility

• Documentation

• Community

PostgreSQL
• Very fast, very scalable. (Just ask Skype.)
• Amazingly ﬂexible, easily extensible.
• Rock-solid — no crashes, corruption,
security issues for years
• Ridiculously easy administration
• It also happens to be free (MIT/BSD)

What about MySQL?
• PostgreSQL has many more features
• Not nearly as popular as MySQL
• No single company behind it
• (A good thing, I think!)
• After using both, I prefer PostgreSQL
• I’ll be happy to answer questions later

Brief history
• Ingres (Stonebreaker, Berkeley)
• Postgres (Stonebreaker, Berkeley)
• PostgreSQL project = Postgres + SQL
• About one major release per year
• Version 8.x — Windows port, recovery
• Version 9.0 — hot replication, upgrades

ACID
• ACID — basic standard for databases
• Atomicity
• Consistency
• Isolation
• Durability
• Pg has always been ACID compliant

Data types
• Boolean
• Numeric (integer, ﬂoat, decimal)
• (var)char, text (inﬁnitely large), binary
• sequences (guaranteed to be unique)
• Date/time and time intervals
• IP addresses, XML, enums, arrays

Or create your own!

CREATE TYPE Person AS
(first_name TEXT, last_name
TEXT);

Or create your own!

CREATE TYPE Person AS
(first_name TEXT, last_name
TEXT);

CREATE TABLE Members (group_id
INTEGER, member Person);

Strong typing
• PostgreSQL won’t automatically change
types for you.
• This can be annoying at ﬁrst — but it is
meant to protect your data!
• You can cast from one type to another with
the “cast” function or the :: operator
• You can also deﬁne your own casts

PostGIS
• Some people took this all the way
• Want to include geographical information?
• No problem — we’ve got PostGIS!
• Complete GIS solution, with data types and
functions
• Keeps pace with main PostgreSQL revisions

Object oriented tables

• Employee table inherits from People table:
CREATE TABLE Employee
(employee_id INTEGER
department_id INTEGER)
INHERITS (People);

Foreign keys that work
CREATE TABLE DVDs (id SERIAL, title TEXT, store_id
INTEGER REFERENCES Stores);

INSERT INTO DVDs (title, store_id) VALUES ('Attack of
the Killer Tomatoes', 500);



ERROR: insert or update on table "dvds" violates
foreign key constraint "dvds_store_id_fkey"




DETAIL: Key (store_id)=(500) is not present in table
"stores".




"stores".





"stores".


"stores".

Custom validity checks
CREATE TABLE DVDs (id SERIAL, title
TEXT check (length(title) > 3),
store_id INTEGER REFERENCES
Stores);
INSERT INTO DVDs (title, store_id)
VALUES ('AB', 500);

Custom validity checks
CREATE TABLE DVDs (id SERIAL, title
TEXT check (length(title) > 3),
store_id INTEGER REFERENCES
Stores);
INSERT INTO DVDs (title, store_id)
VALUES ('AB', 500);
ERROR: new row for relation "dvds"
violates check constraint
"dvds_title_check"

No more bad dates!
INSERT INTO UPDATES
(created_at) values ('32-
feb-2008');

No more bad dates!
INSERT INTO UPDATES
feb-2008');
ERROR: date/time field value
out of range: "32-feb-2008"

No more bad dates!
INSERT INTO UPDATES
feb-2008');
ERROR: date/time field value
out of range: "32-feb-2008"
LINE 1: insert into updates
feb-2008');

Timestamp vs. Interval
testdb=# select now();
now
-------------------------------
2010-10-31 08:58:23.365792+02
(1 row)
Point in time

testdb=# select now() - interval '3 days';
?column?
-------------------------------
2010-10-28 08:58:28.870011+02
Difference between
(1 row) points in time

Built-in functions
• Math
• Text processing (including regexps)
• Date/time calculations
• Conditionals (CASE, COALESCE, NULLIF)
for use in queries
• Extensive library of geometrical functions

Or write your own!
• PL/pgSQL
• PL/Perl
• PL/Python
• PL/Ruby
• PL/R
• PL/Tcl

Or write your own!
CREATE OR REPLACE FUNCTION remove_cache_tables() RETURNS
VOID AS $$
DECLARE
r pg_catalog.pg_tables%rowtype;
BEGIN
FOR r IN SELECT * FROM pg_catalog.pg_tables
WHERE schemaname = 'public'
AND tablename ILIKE 'cache_%'
LOOP
RAISE NOTICE 'Now dropping table %', r.tablename;
EXECUTE 'DROP TABLE ' || r.tablename;
END LOOP;
END;
$$ LANGUAGE 'plpgsql';

Another example
CREATE OR REPLACE FUNCTION store_hostname() RETURNS
TRIGGER AS $store_hostname$

BEGIN

NEW.hostname := 'http://' ||

substring(NEW.url, '(?:http://)?([^/]+)');

RETURN NEW;

END;

$store_hostname$ LANGUAGE plpgsql;

Triggers

• Yes, that last function was a trigger
• Automatically execute functions upon
INSERT, UPDATE, and/or DELETE
• Can execute before or after
• Very powerful, very fast

Function possibilities
• Computing values, strings
• Returning table-like sets of values
• Encapsulating queries
• Dynamically generating queries via strings
• Triggers: Modifying data before it is inserted
or updated

Why use a PL/lang?

• Other libraries (e.g., CPAN for Perl)
• Faster, optimized functions (eg., R)
• Programmer familiarity
• Cached query plans

Views and rules
• Views are stored SELECT statements
• Pretend that something is a read-only table
• Rules let you turn it into a read/write table
• Intercept and rewrite incoming query
• Check or change data
• Change where data is stored

Full-text indexing

• Built into PostgreSQL
• Handles stop words, different languages,
synonyms, and even (often) stemming
• Very powerful, but it can take some time to
get conﬁgured correctly

Transactions
• In PostgreSQL from the beginning
• Use transactions for just about anything:
BEGIN
DROP TABLE DVDs;
ROLLBACK;
SELECT * FROM DVDs; -- Works!

Savepoints
(or, sub-transactions)
BEGIN;
INSERT INTO table1 VALUES (1);
SAVEPOINT my_savepoint;
ROLLBACK TO SAVEPOINT my_savepoint;
COMMIT;

MVCC
• Readers and writers don’t block each other
• “Multi-version concurrency control”
• xmin, xmax on each tuple; rows are those
tuples with txid_current between them
• Old versions stick around until vacuumed
• Autovacuum removes even this issue

MVCC
• Look at a row’s xmin and xmax
• Look at txid_current()
• Start transaction; look at row’s xmin/xmax
• Look at xmin/xmax on that row from
another session
• Commit, and look again at both!

Downsides of MVCC
• MVCC is usually fantastic
• But if you insert or update many rows, and
then do a COUNT(*), things will be slow
• There are solutions — including more
aggressive auto-vacuuming
• 9.2 introduced features that improved this

Indexing
• Regular, unique indexes
• Functional indexes
• Index calling a function on a column
• Partial indexes
• Index only rows matching criteria
• Cluster table on an index

CTEs
• Adds a “WITH” statement, which deﬁnes a
sorta-kinda temp table
• You can then query that same temp table
• Makes many queries easier to read, write,
without a real temp table
• Better yet: CTEs can be recursive, for
everything from Fibonacci to org charts

Speed and scalability
• MVCC + a smart query optimizer makes
PostgreSQL pretty fast and smart
• Statistics based on previous query results
inform the query planner
• Several scan types, join types are weighed
• Benchmarks consistently show excellent
performance with high mixes of read/write

WAL
• All activity in the database is put in “write-
ahead logs” before it happens
• If the database server fails, it replays the
WALs, then continues
• You can change how often WALs are
written, to improve performance
• PITR — restore database from WALs

Log shipping
• Copy WALs to a second, identical server —
known as “log shipping” — and you have a
backup
• If the primary server goes down, you can
bring the secondary up in its place
• This was known as “warm standby,” and
worked in 8.4

Hot standby,
streaming replication
• As of 9.0, you don’t have to do this
• You can have the primary stream the
information to the secondary
• Almost-instant updates
• The secondary machine can answer read-
only queries (“hot standby”), not just
handle failover

Extensions
• Provides a standardized mechanism for
downloading, installing, and versioning
extensions
• New data types, functions, languages are
possible
• Download, search via pgxn.org
• Similar to CPAN, PyPi, or Ruby gems

SQL/MED

• SQL/MED was introduced in 9.1
• Query information from other databases
(and database-like interfaces)
• So if you have data in MySQL, Oracle,
CSV ... just install a wrapper, and you can
query it like a PostgreSQL table

Unlogged tables

• All actions are logged in WALs
• That adds some overhead, which isn’t
required by throwaway data
• Unlogged tables (different from temp
tables!) offer a speedup, in exchange for
less reliability

New in 9.2
• JSON support
• Range types, for handling
• Much more scalable — from 24 cores and
75k queries/sec to 64 cores and 350k
queries/sec
• Index-only queries (“covering indexes”)
• Cascading replication

Web problems
• PostgreSQL is great as a Web backend
• But if you use an ORM (e.g., ActiveRecord),
you are probably losing much of the power
• e.g., foreign keys, CTE, triggers, and views
• No good way to bridge this gap — for now
• There are always methods, but this is an
area that deﬁnitely needs some work

Tablespaces
• You can create any number of
“tablespaces,” separate storage areas
• Put tables, indexes on different tablespaces
• Most useful with multiple disks
• Separate tables (or parts of a partitioned
table)... or separate tables from indexes

Partitioning
• Combine object-oriented tables, CHECK
clauses, and tablespaces for partitioning
• Example: Invoices from Jan-June go in table
“q12”, and July-December go in table “q34”
• Now PostgreSQL knows where to look
when you SELECT from the parent table
• Note that INSERT requires a trigger

Reﬂection

• pg_catalog schema contains everything
about your database
• Tables, functions, views, etc.
• You can learn a great deal about
PostgreSQL by looking through the
pg_catalog schema

Advanced uses

• GridSQL: Split a query across multiple
PostgreSQL servers
• Very large-scale data warehousing:
Greenplum

Client libraries
• libpq (in C) • Java (JDBC)
• Others by 3 rd • .NET (npgsql)
parties: • ODBC
• Python • JavaScript (!)
• Ruby • Just about any
language you can
• Perl imagine

Tools
• Yeah, tools are more primitive
• If you love GUIs, and hate the command
line, then PostgreSQL will be hard for you
• PgAdmin and other tools are OK, but not
really up to the task for “real” work
• PgAdmin does provide some graphical
query building and “explain” output

Windows compatibility
• It works on Windows
• .NET drivers work, as well
• Logging is far from perfect (can go to the
Windows log tool, but not filtered well)
• Configuration is still in a text file, foreign to
most Windows people
• Windows is still a second-class citizen

Who uses it?
• Afﬁlias
• IMDB
• Apple
• Skype
• BASF
• Sourceforge
• Cisco
• Heroku
• CD Baby
• Checkpoint
• Etsy

Who supports it?

• EnterpriseDB — products and services
• 2 Quadrant
nd

• Many freelancers (like me!)

PostgreSQL problems
• Tuning is still hard (but getting easier)
• Double quotes
• Lack of good GUI-based tools
• Some features (e.g., materialized views) that
people want without having to resort to
hacks and triggers/rules
• Multi-master (of course!)

Bottom line
• PostgreSQL: BSD licensed, easy to install,
easy to use, easy to administer
• Still not quite up to commercial databases
regarding features — but not far behind
• More than good enough for places like
Skype and Afﬁlias; probably good enough
for you!

Want to learn more?
• Mailing lists, wikis, and blogs
• All at http://postgresql.org/
• http://planetpostgresql.org
• PostgreSQL training, consulting,
development, hand-holding, and general
encouragement

Thanks!
(Any questions?)

reuven@lerner.co.il
http://www.lerner.co.il/
054-496-8405
“reuvenlerner” on Skype/AIM

PostgreSQL

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to PostgreSQL

Similar to PostgreSQL (20)

More from Reuven Lerner

More from Reuven Lerner (20)

Recently uploaded

Recently uploaded (20)

PostgreSQL

Editor's Notes