🐘 PostgreSQL -
Data Types

Updated at 2015-11-03 22:22

Basic data types you'll need: You can create all most all application just using these data types. But it's good to use things like enumerables or coordinate types when you need them.

bool                            # TRUE or FALSE

integer                         # signed four-byte integer, -2147mil to +2147mil
smallint                        # signed two-byte integer, -32k to +32k
bigint                          # signed eight-byte integer, shitloads
# `integer` is the right choice most of the time.

# `decimal` and `numeric` are the same and should be used for exact numbers
# they are more costly but perform all calculations exactly
numeric(7, 2)                   # 7 numbers before , and 2 after
numeric                         # allows all precisions and scales
# not defining precision and scale might cause compatibility problems
# with other systems in the future

text                            # all strings
jsonb                           # binary decomposed JSON data
uuid                            # universally unique identifier
cidr                            # IPv4 or IPv6 address

timestamp [without time zone]   # always use timestamps w/o timezone, implicit
date                            # year, month and day in a single column
interval                        # time span, not related to any specific time

# there are multiple others like `box` and `macaddr` but are rarely needed.

Prefer text over char or varchar. As of 2015, text has exactly the same performance and space usage while removing any limitations and all limitations should be enforced by domains, not by the data type.

CREATE DOMAIN username_text text CHECK (
        length(VALUE) > 3
    AND length(VALUE) < 200
    AND VALUE ~ '^[A-Za-z][A-Za-z0-9]+$'
);

CREATE TABLE user (
  name  username_text,
  email text
);

citext is also really nice data type. Case-insensitive text, all comparisons are automatically case-insensitive, especially useful for unique columns like username or email.

CREATE EXTENSION IF NOT EXISTS citext WITH SCHEMA ext;

Range Types

Range types are good with time spans like reservation times. tsrange and daterange are the most useful but you can create your own. Avoid tstzrange as you should handle time zones in your applicatin.

CREATE TABLE reservation (room int, during tsrange);
INSERT INTO reservation VALUES (1108, '[2010-01-01 14:30, 2010-01-01 15:30)');

-- Containment
SELECT int4range(10, 20) @> 3;  -- false
SELECT int4range(10, 20) @> 15; -- true

-- Overlap
SELECT numrange(11.1, 22.2) && numrange(25.0, 30.0); -- false
SELECT numrange(11.1, 22.2) && numrange(20.0, 30.0); -- true

-- Bounds
SELECT upper(int8range(15, 25)); -- 25
SELECT lower(int8range(15, 25)); -- 15

-- Intersection
SELECT int4range(10, 20) * int4range(15, 25); -- [15,20)
SELECT int4range(15, 25) * int4range(10, 16); -- [15,16)
SELECT int4range(5, 10)  * int4range(10, 15); -- []

-- Is the range empty?
SELECT isempty(numrange(1, 5));                       -- false
SELECT isempty(int4range(5, 10) * int4range(10, 15)); -- true

[ means that the element is included. ( means that the element is excluded.

-- includes 3, does not include 7, and does include all points in between
SELECT '[3,7)'::int4range;

-- does not include either 3 or 7, but includes all points in between
SELECT '(3,7)'::int4range;

-- includes only the single point 4
SELECT '[4,4]'::int4range;

-- includes no points (and will be normalized to 'empty')
SELECT '[4,4)'::int4range;

Range can be unbound.

SELECT numrange(NULL, 2.2); -- (,2.2)

User-Defined Data Types

CREATE TYPE customer_type AS (
    id          integer,
    firstname   text,
    lastname    text,
    email       text
);

SELECT (1, 'Testy5', 'Tester5', '[email protected]')::customer_type;

unnest() can be used to convert array of typed values back to rows.

SELECT unnest(ARRAY[
    (1,   'Testy5', 'Tester5', '[email protected]')::customer_type,
    (100, 'Testy6', 'Tester6', '[email protected]')::customer_type
]);
-- (1,Testy5,Tester5,[email protected])
-- (100,Testy6,Tester6,[email protected])

Enumerated Types

Enumerated type is a custom data type that has static set of values.

CREATE TYPE mood AS enum ('sad', 'ok', 'happy');

You can set enumerated type to columns like any data type .

-- Note that enumerated type identifiers are case-sensitive.
CREATE TABLE person (
    name         text,
    current_mood mood
);
INSERT INTO person VALUES ('Moe', 'happy');
INSERT INTO person VALUES ('Larry', 'sad');
INSERT INTO person VALUES ('Curly', 'ok');

Enumerated types have an order.

SELECT p.* FROM person p WHERE p.current_mood > 'sad';
-- Returns Moe and Curly.

SELECT p.name
FROM person p
WHERE p.current_mood = (SELECT min(p.current_mood) FROM person);
-- Returns Larry.

Domains

Domains are extended data types that you can attach constraints to e.g. regex. Enums are just group of data types, without constraints. Naming is usually done <USAGE_CONTEXT>_<ORIGINAL_TYPE>

CREATE DOMAIN color_text AS text
   CONSTRAINT red_or_blue
        CHECK (VALUE = 'red' OR VALUE = 'blue');

CREATE DOMAIN url_text AS text
   CONSTRAINT starts_right
        CHECK (VALUE ~ '^https?://');

CREATE DOMAIN url_text AS text
   CONSTRAINT starts_right
        CHECK (VALUE ~* '^https?://')
   CONSTRAINT min_length
        CHECK (length(VALUE) > 10);

CREATE DOMAIN us_postal_code_text AS text
    CHECK(VALUE ~ '^\\d{5}$' OR VALUE ~ '^\\d{5}-\\d{4}$');

This differs from columns constraints so that you can use same domain in multiple tables and changes are centralized.

ALTER DOMAIN url_text
    DROP CONSTRAINT min_length;

ALTER DOMAIN url_text
    ADD CONSTRAINT min_length
        CHECK (length(VALUE) > 15);

To use domains, use its name as the type of the column.

CREATE TABLE rss_feed (
    url     url_text NOT NULL,
    content text
);

Sequences

Sequence is a database object designed for generating unique number identifiers.

You should consider using UUIDs for identifiers though.

The most common usage is with the serial pseudotype. Serials always produce NOT NULL values and adds NOT NULL constraint to the column.

CREATE TABLE person (
    person_id serial
);
-- is the same as
CREATE SEQUENCE person_person_id_seq;
CREATE TABLE person (
    person_id integer NOT NULL DEFAULT nextval('person_person_id_seq')
);
ALTER SEQUENCE person_person_id_seq OWNED BY person.person_id;

-- Make the "user_id" column a primary key; this also creates
-- a UNIQUE constraint and a b+-tree index on the column.
CREATE TABLE user (
    user_id   serial PRIMARY KEY,
    name      text,
    age       integer
);

-- Usage:
INSERT INTO user (name, age) VALUES ('Mozart', 20);
INSERT INTO user (name, age, user_id) VALUES ('Mozart', 20, DEFAULT);

-- Get most recent value generated, error if nothing generated:
SELECT currval( pg_get_serial_sequence('user', 'user_id') );

-- Usage example that returns generated identifier:
INSERT INTO user (name, age) VALUES ('Liszt', 10) RETURNING user_id;

Note that nextval() increments the value of the sequence and isn't rolled back if its transaction is later aborted.

Creating a custom sequences.

CREATE SEQUENCE my_schema.type_id_sequence
    INCREMENT 1
    MINVALUE 1
    MAXVALUE 9999999999999999
    START 4
    CACHE 1;
ALTER TABLE my_schema.type_id_sequence OWNER TO postgres;
GRANT ALL ON TABLE my_schema.type_id_sequence TO postgres;

Using custom sequence for multiple tables.

CREATE SEQUENCE common_id_seq;
CREATE TABLE apple (
    id    int DEFAULT nextval('common_id_seq') NOT NULL,
    price numeric(8,4)
);
CREATE TABLE orange (
    id     int DEFAULT nextval('common_id_seq') NOT NULL,
    weight numeric(8,4)
);