The latest version of my PostgreSQL introduction for IL-TechTalks, a free service to introduce the Israeli hi-tech community to new and interesting technologies. In this talk, I describe the history and licensing of PostgreSQL, its built-in capabilities, and some of the new things that were added in the 9.1 and 9.2 releases which make it an attractive option for many applications.
2. Who am I?
• Web developer since 1993
• Linux Journal columnist since 1996
• Software architect, developer, consultant
• Mostly Ruby on Rails + PostgreSQL, but
also Python, PHP, Perl, JavaScript, MySQL,
MongoDB, and lots more...
• PostgreSQL user since (at least) 1997
3. What do I do?
• Web development, especially in Rails
• Teaching/training
• Coaching/consulting
4. What is a database?
Store data
confidently
Database
Retrieve data
flexibly
5. Relational databases
Define tables,
store data in them
Database
Retrieve data from
related tables
6. Lots of options!
• Oracle
• Microsoft SQL Server
• IBM DB2
• MySQL
• PostgreSQL
7. How do you choose?
• Integrity (ACID compliance)
• Data types
• Functionality
• Tools
• Extensibility
• Documentation
• Community
8. PostgreSQL
• Very fast, very scalable. (Just ask Skype.)
• Amazingly flexible, easily extensible.
• Rock-solid — no crashes, corruption,
security issues for years
• Ridiculously easy administration
• It also happens to be free (MIT/BSD)
15. What about MySQL?
• PostgreSQL has many more features
• Not nearly as popular as MySQL
• No single company behind it
• (A good thing, I think!)
• After using both, I prefer PostgreSQL
• I’ll be happy to answer questions later
16. Brief history
• Ingres (Stonebreaker, Berkeley)
• Postgres (Stonebreaker, Berkeley)
• PostgreSQL project = Postgres + SQL
• About one major release per year
• Version 8.x — Windows port, recovery
• Version 9.0 — hot replication, upgrades
17. ACID
• ACID — basic standard for databases
• Atomicity
• Consistency
• Isolation
• Durability
• Pg has always been ACID compliant
18. Data types
• Boolean
• Numeric (integer, float, decimal)
• (var)char, text (infinitely large), binary
• sequences (guaranteed to be unique)
• Date/time and time intervals
• IP addresses, XML, enums, arrays
20. Or create your own!
CREATE TYPE Person AS
(first_name TEXT, last_name
TEXT);
21. Or create your own!
CREATE TYPE Person AS
(first_name TEXT, last_name
TEXT);
22. Or create your own!
CREATE TYPE Person AS
(first_name TEXT, last_name
TEXT);
CREATE TABLE Members (group_id
INTEGER, member Person);
23. Strong typing
• PostgreSQL won’t automatically change
types for you.
• This can be annoying at first — but it is
meant to protect your data!
• You can cast from one type to another with
the “cast” function or the :: operator
• You can also define your own casts
24. PostGIS
• Some people took this all the way
• Want to include geographical information?
• No problem — we’ve got PostGIS!
• Complete GIS solution, with data types and
functions
• Keeps pace with main PostgreSQL revisions
25. Object oriented tables
• Employee table inherits from People table:
CREATE TABLE Employee
(employee_id INTEGER
department_id INTEGER)
INHERITS (People);
26. Foreign keys that work
CREATE TABLE DVDs (id SERIAL, title TEXT, store_id
INTEGER REFERENCES Stores);
INSERT INTO DVDs (title, store_id) VALUES ('Attack of
the Killer Tomatoes', 500);
27. Foreign keys that work
CREATE TABLE DVDs (id SERIAL, title TEXT, store_id
INTEGER REFERENCES Stores);
INSERT INTO DVDs (title, store_id) VALUES ('Attack of
the Killer Tomatoes', 500);
ERROR: insert or update on table "dvds" violates
foreign key constraint "dvds_store_id_fkey"
28. Foreign keys that work
CREATE TABLE DVDs (id SERIAL, title TEXT, store_id
INTEGER REFERENCES Stores);
INSERT INTO DVDs (title, store_id) VALUES ('Attack of
the Killer Tomatoes', 500);
ERROR: insert or update on table "dvds" violates
foreign key constraint "dvds_store_id_fkey"
DETAIL: Key (store_id)=(500) is not present in table
"stores".
29. Foreign keys that work
CREATE TABLE DVDs (id SERIAL, title TEXT, store_id
INTEGER REFERENCES Stores);
INSERT INTO DVDs (title, store_id) VALUES ('Attack of
the Killer Tomatoes', 500);
ERROR: insert or update on table "dvds" violates
foreign key constraint "dvds_store_id_fkey"
DETAIL: Key (store_id)=(500) is not present in table
"stores".
ERROR: insert or update on table "dvds" violates
foreign key constraint "dvds_store_id_fkey"
30. Foreign keys that work
CREATE TABLE DVDs (id SERIAL, title TEXT, store_id
INTEGER REFERENCES Stores);
INSERT INTO DVDs (title, store_id) VALUES ('Attack of
the Killer Tomatoes', 500);
ERROR: insert or update on table "dvds" violates
foreign key constraint "dvds_store_id_fkey"
DETAIL: Key (store_id)=(500) is not present in table
"stores".
ERROR: insert or update on table "dvds" violates
foreign key constraint "dvds_store_id_fkey"
DETAIL: Key (store_id)=(500) is not present in table
"stores".
31. Custom validity checks
CREATE TABLE DVDs (id SERIAL, title
TEXT check (length(title) > 3),
store_id INTEGER REFERENCES
Stores);
INSERT INTO DVDs (title, store_id)
VALUES ('AB', 500);
32. Custom validity checks
CREATE TABLE DVDs (id SERIAL, title
TEXT check (length(title) > 3),
store_id INTEGER REFERENCES
Stores);
INSERT INTO DVDs (title, store_id)
VALUES ('AB', 500);
ERROR: new row for relation "dvds"
violates check constraint
"dvds_title_check"
33. No more bad dates!
INSERT INTO UPDATES
(created_at) values ('32-
feb-2008');
34. No more bad dates!
INSERT INTO UPDATES
(created_at) values ('32-
feb-2008');
ERROR: date/time field value
out of range: "32-feb-2008"
35. No more bad dates!
INSERT INTO UPDATES
(created_at) values ('32-
feb-2008');
ERROR: date/time field value
out of range: "32-feb-2008"
LINE 1: insert into updates
(created_at) values ('32-
feb-2008');
36. Timestamp vs. Interval
testdb=# select now();
now
-------------------------------
2010-10-31 08:58:23.365792+02
(1 row)
Point in time
testdb=# select now() - interval '3 days';
?column?
-------------------------------
2010-10-28 08:58:28.870011+02
Difference between
(1 row) points in time
37. Built-in functions
• Math
• Text processing (including regexps)
• Date/time calculations
• Conditionals (CASE, COALESCE, NULLIF)
for use in queries
• Extensive library of geometrical functions
38. Or write your own!
• PL/pgSQL
• PL/Perl
• PL/Python
• PL/Ruby
• PL/R
• PL/Tcl
39. Or write your own!
CREATE OR REPLACE FUNCTION remove_cache_tables() RETURNS
VOID AS $$
DECLARE
r pg_catalog.pg_tables%rowtype;
BEGIN
FOR r IN SELECT * FROM pg_catalog.pg_tables
WHERE schemaname = 'public'
AND tablename ILIKE 'cache_%'
LOOP
RAISE NOTICE 'Now dropping table %', r.tablename;
EXECUTE 'DROP TABLE ' || r.tablename;
END LOOP;
END;
$$ LANGUAGE 'plpgsql';
40. Another example
CREATE OR REPLACE FUNCTION store_hostname() RETURNS
TRIGGER AS $store_hostname$
BEGIN
NEW.hostname := 'http://' ||
substring(NEW.url, '(?:http://)?([^/]+)');
RETURN NEW;
END;
$store_hostname$ LANGUAGE plpgsql;
41. Triggers
• Yes, that last function was a trigger
• Automatically execute functions upon
INSERT, UPDATE, and/or DELETE
• Can execute before or after
• Very powerful, very fast
42. Function possibilities
• Computing values, strings
• Returning table-like sets of values
• Encapsulating queries
• Dynamically generating queries via strings
• Triggers: Modifying data before it is inserted
or updated
43. Why use a PL/lang?
• Other libraries (e.g., CPAN for Perl)
• Faster, optimized functions (eg., R)
• Programmer familiarity
• Cached query plans
44. Views and rules
• Views are stored SELECT statements
• Pretend that something is a read-only table
• Rules let you turn it into a read/write table
• Intercept and rewrite incoming query
• Check or change data
• Change where data is stored
45. Full-text indexing
• Built into PostgreSQL
• Handles stop words, different languages,
synonyms, and even (often) stemming
• Very powerful, but it can take some time to
get configured correctly
46. Transactions
• In PostgreSQL from the beginning
• Use transactions for just about anything:
BEGIN
DROP TABLE DVDs;
ROLLBACK;
SELECT * FROM DVDs; -- Works!
48. MVCC
• Readers and writers don’t block each other
• “Multi-version concurrency control”
• xmin, xmax on each tuple; rows are those
tuples with txid_current between them
• Old versions stick around until vacuumed
• Autovacuum removes even this issue
49. MVCC
• Look at a row’s xmin and xmax
• Look at txid_current()
• Start transaction; look at row’s xmin/xmax
• Look at xmin/xmax on that row from
another session
• Commit, and look again at both!
50. Downsides of MVCC
• MVCC is usually fantastic
• But if you insert or update many rows, and
then do a COUNT(*), things will be slow
• There are solutions — including more
aggressive auto-vacuuming
• 9.2 introduced features that improved this
51. Indexing
• Regular, unique indexes
• Functional indexes
• Index calling a function on a column
• Partial indexes
• Index only rows matching criteria
• Cluster table on an index
52. CTEs
• Adds a “WITH” statement, which defines a
sorta-kinda temp table
• You can then query that same temp table
• Makes many queries easier to read, write,
without a real temp table
• Better yet: CTEs can be recursive, for
everything from Fibonacci to org charts
53. Speed and scalability
• MVCC + a smart query optimizer makes
PostgreSQL pretty fast and smart
• Statistics based on previous query results
inform the query planner
• Several scan types, join types are weighed
• Benchmarks consistently show excellent
performance with high mixes of read/write
54. WAL
• All activity in the database is put in “write-
ahead logs” before it happens
• If the database server fails, it replays the
WALs, then continues
• You can change how often WALs are
written, to improve performance
• PITR — restore database from WALs
55. Log shipping
• Copy WALs to a second, identical server —
known as “log shipping” — and you have a
backup
• If the primary server goes down, you can
bring the secondary up in its place
• This was known as “warm standby,” and
worked in 8.4
56. Hot standby,
streaming replication
• As of 9.0, you don’t have to do this
• You can have the primary stream the
information to the secondary
• Almost-instant updates
• The secondary machine can answer read-
only queries (“hot standby”), not just
handle failover
57. Extensions
• Provides a standardized mechanism for
downloading, installing, and versioning
extensions
• New data types, functions, languages are
possible
• Download, search via pgxn.org
• Similar to CPAN, PyPi, or Ruby gems
58. SQL/MED
• SQL/MED was introduced in 9.1
• Query information from other databases
(and database-like interfaces)
• So if you have data in MySQL, Oracle,
CSV ... just install a wrapper, and you can
query it like a PostgreSQL table
59. Unlogged tables
• All actions are logged in WALs
• That adds some overhead, which isn’t
required by throwaway data
• Unlogged tables (different from temp
tables!) offer a speedup, in exchange for
less reliability
60. New in 9.2
• JSON support
• Range types, for handling
• Much more scalable — from 24 cores and
75k queries/sec to 64 cores and 350k
queries/sec
• Index-only queries (“covering indexes”)
• Cascading replication
61. Web problems
• PostgreSQL is great as a Web backend
• But if you use an ORM (e.g., ActiveRecord),
you are probably losing much of the power
• e.g., foreign keys, CTE, triggers, and views
• No good way to bridge this gap — for now
• There are always methods, but this is an
area that definitely needs some work
62. Tablespaces
• You can create any number of
“tablespaces,” separate storage areas
• Put tables, indexes on different tablespaces
• Most useful with multiple disks
• Separate tables (or parts of a partitioned
table)... or separate tables from indexes
63. Partitioning
• Combine object-oriented tables, CHECK
clauses, and tablespaces for partitioning
• Example: Invoices from Jan-June go in table
“q12”, and July-December go in table “q34”
• Now PostgreSQL knows where to look
when you SELECT from the parent table
• Note that INSERT requires a trigger
64. Reflection
• pg_catalog schema contains everything
about your database
• Tables, functions, views, etc.
• You can learn a great deal about
PostgreSQL by looking through the
pg_catalog schema
65. Advanced uses
• GridSQL: Split a query across multiple
PostgreSQL servers
• Very large-scale data warehousing:
Greenplum
66. Client libraries
• libpq (in C) • Java (JDBC)
• Others by 3 rd • .NET (npgsql)
parties: • ODBC
• Python • JavaScript (!)
• Ruby • Just about any
language you can
• Perl imagine
67. Tools
• Yeah, tools are more primitive
• If you love GUIs, and hate the command
line, then PostgreSQL will be hard for you
• PgAdmin and other tools are OK, but not
really up to the task for “real” work
• PgAdmin does provide some graphical
query building and “explain” output
68. Windows compatibility
• It works on Windows
• .NET drivers work, as well
• Logging is far from perfect (can go to the
Windows log tool, but not filtered well)
• Configuration is still in a text file, foreign to
most Windows people
• Windows is still a second-class citizen
69. Who uses it?
• Affilias
• IMDB
• Apple
• Skype
• BASF
• Sourceforge
• Cisco
• Heroku
• CD Baby
• Checkpoint
• Etsy
70. Who supports it?
• EnterpriseDB — products and services
• 2 Quadrant
nd
• Many freelancers (like me!)
71. PostgreSQL problems
• Tuning is still hard (but getting easier)
• Double quotes
• Lack of good GUI-based tools
• Some features (e.g., materialized views) that
people want without having to resort to
hacks and triggers/rules
• Multi-master (of course!)
72. Bottom line
• PostgreSQL: BSD licensed, easy to install,
easy to use, easy to administer
• Still not quite up to commercial databases
regarding features — but not far behind
• More than good enough for places like
Skype and Affilias; probably good enough
for you!
73. Want to learn more?
• Mailing lists, wikis, and blogs
• All at http://postgresql.org/
• http://planetpostgresql.org
• PostgreSQL training, consulting,
development, hand-holding, and general
encouragement
74. Thanks!
(Any questions?)
reuven@lerner.co.il
http://www.lerner.co.il/
054-496-8405
“reuvenlerner” on Skype/AIM