🐘 PostgreSQL - Data Types
Basic data types you'll need: You can create all most all application just using these data types. But it's good to use things like enumerables or coordinate types when you need them.
bool # TRUE or FALSE
integer # signed four-byte integer, -2147mil to +2147mil
smallint # signed two-byte integer, -32k to +32k
bigint # signed eight-byte integer, shitloads
# `integer` is the right choice most of the time.
# `decimal` and `numeric` are the same and should be used for exact numbers
# they are more costly but perform all calculations exactly
numeric(7, 2) # 7 numbers before , and 2 after
numeric # allows all precisions and scales
# not defining precision and scale might cause compatibility problems
# with other systems in the future
text # all strings
jsonb # binary decomposed JSON data
uuid # universally unique identifier
cidr # IPv4 or IPv6 address
timestamp [without time zone] # always use timestamps w/o timezone, implicit
date # year, month and day in a single column
interval # time span, not related to any specific time
# there are multiple others like `box` and `macaddr` but are rarely needed.
Prefer text
over char
or varchar
. As of 2015, text has exactly the same performance and space usage while removing any limitations and all limitations should be enforced by domains, not by the data type.
CREATE DOMAIN username_text text CHECK (
length(VALUE) > 3
AND length(VALUE) < 200
AND VALUE ~ '^[A-Za-z][A-Za-z0-9]+$'
);
CREATE TABLE user (
name username_text,
email text
);
citext
is also really nice data type. Case-insensitive text, all comparisons are automatically case-insensitive, especially useful for unique columns like username or email.
CREATE EXTENSION IF NOT EXISTS citext WITH SCHEMA ext;
Range Types
Range types are good with time spans like reservation times. tsrange
and daterange
are the most useful but you can create your own. Avoid tstzrange
as you should handle time zones in your applicatin.
CREATE TABLE reservation (room int, during tsrange);
INSERT INTO reservation VALUES (1108, '[2010-01-01 14:30, 2010-01-01 15:30)');
-- Containment
SELECT int4range(10, 20) @> 3; -- false
SELECT int4range(10, 20) @> 15; -- true
-- Overlap
SELECT numrange(11.1, 22.2) && numrange(25.0, 30.0); -- false
SELECT numrange(11.1, 22.2) && numrange(20.0, 30.0); -- true
-- Bounds
SELECT upper(int8range(15, 25)); -- 25
SELECT lower(int8range(15, 25)); -- 15
-- Intersection
SELECT int4range(10, 20) * int4range(15, 25); -- [15,20)
SELECT int4range(15, 25) * int4range(10, 16); -- [15,16)
SELECT int4range(5, 10) * int4range(10, 15); -- []
-- Is the range empty?
SELECT isempty(numrange(1, 5)); -- false
SELECT isempty(int4range(5, 10) * int4range(10, 15)); -- true
[
means that the element is included. (
means that the element is excluded.
-- includes 3, does not include 7, and does include all points in between
SELECT '[3,7)'::int4range;
-- does not include either 3 or 7, but includes all points in between
SELECT '(3,7)'::int4range;
-- includes only the single point 4
SELECT '[4,4]'::int4range;
-- includes no points (and will be normalized to 'empty')
SELECT '[4,4)'::int4range;
Range can be unbound.
SELECT numrange(NULL, 2.2); -- (,2.2)
User-Defined Data Types
CREATE TYPE customer_type AS (
id integer,
firstname text,
lastname text,
email text
);
SELECT (1, 'Testy5', 'Tester5', 'test5@project-a.com')::customer_type;
unnest()
can be used to convert array of typed values back to rows.
SELECT unnest(ARRAY[
(1, 'Testy5', 'Tester5', 'test5@project-a.com')::customer_type,
(100, 'Testy6', 'Tester6', 'test6@project-a.com')::customer_type
]);
-- (1,Testy5,Tester5,test5@project-a.com)
-- (100,Testy6,Tester6,test6@project-a.com)
Enumerated Types
Enumerated type is a custom data type that has static set of values.
CREATE TYPE mood AS enum ('sad', 'ok', 'happy');
You can set enumerated type to columns like any data type .
-- Note that enumerated type identifiers are case-sensitive.
CREATE TABLE person (
name text,
current_mood mood
);
INSERT INTO person VALUES ('Moe', 'happy');
INSERT INTO person VALUES ('Larry', 'sad');
INSERT INTO person VALUES ('Curly', 'ok');
Enumerated types have an order.
SELECT p.* FROM person p WHERE p.current_mood > 'sad';
-- Returns Moe and Curly.
SELECT p.name
FROM person p
WHERE p.current_mood = (SELECT min(p.current_mood) FROM person);
-- Returns Larry.
Domains
Domains are extended data types that you can attach constraints to e.g. regex. Enums are just group of data types, without constraints. Naming is usually done <USAGE_CONTEXT>_<ORIGINAL_TYPE>
CREATE DOMAIN color_text AS text
CONSTRAINT red_or_blue
CHECK (VALUE = 'red' OR VALUE = 'blue');
CREATE DOMAIN url_text AS text
CONSTRAINT starts_right
CHECK (VALUE ~ '^https?://');
CREATE DOMAIN url_text AS text
CONSTRAINT starts_right
CHECK (VALUE ~* '^https?://')
CONSTRAINT min_length
CHECK (length(VALUE) > 10);
CREATE DOMAIN us_postal_code_text AS text
CHECK(VALUE ~ '^\\d{5}$' OR VALUE ~ '^\\d{5}-\\d{4}$');
This differs from columns constraints so that you can use same domain in multiple tables and changes are centralized.
ALTER DOMAIN url_text
DROP CONSTRAINT min_length;
ALTER DOMAIN url_text
ADD CONSTRAINT min_length
CHECK (length(VALUE) > 15);
To use domains, use its name as the type of the column.
CREATE TABLE rss_feed (
url url_text NOT NULL,
content text
);
Sequences
Sequence is a database object designed for generating unique number identifiers.
You should consider using UUIDs for identifiers though.
The most common usage is with the serial
pseudotype. Serials always produce NOT NULL
values and adds NOT NULL
constraint to the column.
CREATE TABLE person (
person_id serial
);
-- is the same as
CREATE SEQUENCE person_person_id_seq;
CREATE TABLE person (
person_id integer NOT NULL DEFAULT nextval('person_person_id_seq')
);
ALTER SEQUENCE person_person_id_seq OWNED BY person.person_id;
-- Make the "user_id" column a primary key; this also creates
-- a UNIQUE constraint and a b+-tree index on the column.
CREATE TABLE user (
user_id serial PRIMARY KEY,
name text,
age integer
);
-- Usage:
INSERT INTO user (name, age) VALUES ('Mozart', 20);
INSERT INTO user (name, age, user_id) VALUES ('Mozart', 20, DEFAULT);
-- Get most recent value generated, error if nothing generated:
SELECT currval( pg_get_serial_sequence('user', 'user_id') );
-- Usage example that returns generated identifier:
INSERT INTO user (name, age) VALUES ('Liszt', 10) RETURNING user_id;
Note that nextval()
increments the value of the sequence and isn't rolled back if its transaction is later aborted.
Creating a custom sequences.
CREATE SEQUENCE my_schema.type_id_sequence
INCREMENT 1
MINVALUE 1
MAXVALUE 9999999999999999
START 4
CACHE 1;
ALTER TABLE my_schema.type_id_sequence OWNER TO postgres;
GRANT ALL ON TABLE my_schema.type_id_sequence TO postgres;
Using custom sequence for multiple tables.
CREATE SEQUENCE common_id_seq;
CREATE TABLE apple (
id int DEFAULT nextval('common_id_seq') NOT NULL,
price numeric(8,4)
);
CREATE TABLE orange (
id int DEFAULT nextval('common_id_seq') NOT NULL,
weight numeric(8,4)
);