*Query Optimization*

*Query Optimization*

Most databases will optimize queries the best they can and in most cases, better than you can.

Never optimize queries if you are not hitting performance limits. But you can try if performance becomes an issue.

It is also "good to know" stuff so you do not make anything stupid.

Make sure you measure the results when attempting to optimize. Most database systems generate some kind of query plan for queries and those are usually good place to start the optimization.

```
-- PostgreSQL, see query plan for a query.
EXPLAIN
SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 100
AND t1.unique2 = t2.unique2;
-- PostgreSQL, run the actual query and time it.
EXPLAIN ANALYZE
SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 100
AND t1.unique2 = t2.unique2;
```

# Query Optimization

- SELECT == projection, restrict attributes.
- WHERE == selection, select only with these attributes.

Main target is to do SELECTs before JOINs. Here is the normal optimization workflow.

```
Cascade WHERE.
(Use rule 1)
Move SELECTs as down as possible
(Use rules 2, 4, 6 and 10).
Rearrange the leaves so the most restrictive SELECTs are done first.
(Use rule 5 and 9)
Combine CROSS JOIN and WHERE into JOIN
(Use rule 12)
Break down and move SELECT lists as down as possible
(Use rule 3, 4, 7 and 11)
Finally see if any subtrees has a single evaluation algorithm.
```

Cascade of WHERE:

`SELECT * FROM r WHERE C1 AND C2 AND C3 AND C4 SELECT * FROM r WHERE C1 ( WHERE C2 ( WHERE C3 ( WHERE C4 )))`

Commutativity of WHERE:

`SELECT * FROM r WHERE C1 ( WHERE C2 ) SELECT * FROM r WHERE C2 ( WHERE C1 )`

Cascade of SELECT:

`SELECT L1 FROM r ( SELECT L2 ( SELECT L3 )) SELECT L1 FROM r WHERE L1 is_subset_of L2 is_subset_of L3`

Commuting WHERE with SELECT:

`SELECT L ( WHERE C (R) ) WHERE C ( SELECT L (R) ) -- if condition C refers only to attributes in L.`

Commutativity of JOIN:

`SELECT * FROM s NATURAL JOIN r SELECT * FROM r NATURAL JOIN s -- include only one colums for all matched columns.`

Commuting WHERE with JOIN:

`SELECT * FROM r NATURAL JOIN s WHERE c (SELECT * FROM r WHERE c) NATURAL JOIN (SELECT * FROM s WHERE c)`

Commuting SELECT and JOIN:

`SELECT L FROM ( r JOIN s WHERE c) SELECT L FROM ( (SELECT L1 FROM r) JOIN (SELECT L2 FROM s) WHERE c ) -- L = {A1,A2,An,B1,B2,Bn} -- where A are attributes of R and B are attributes of B -- L1 = {A1,A2,An,An+1,An+k} -- where A are attributes of R occuring in L or C -- L2 = {B1,B2,Bn,Bn+1,Bn+k} -- where B are attributes of S occuring in L or C`

Commutativity of UNION and INTERSECTION:

`(SELECT l FROM r) UNION (SELECT l FROM s) (SELECT l FROM s) UNION (SELECT l FROM r) (SELECT l FROM r) INTERSECT (SELECT l FROM s) (SELECT l FROM s) INTERSECT (SELECT l FROM r)`

Associativity of NATURAL JOIN, CROSS JOIN, UNION and INTERSECTION:

`( r <SET_OPERATION> s ) <SET_OPERATION> t r <SET_OPERATION> ( s <SET_OPERATION> t)`

Commuting WHERE with set operations:

`SELECT * FROM r <SET_OPERATION> s WHERE C (SELECT * FROM r WHERE c) <SET_OPERATION> (SELECT * FROM s WHERE c)`

Commuting SELECT with UNION:

`SELECT l FROM (r UNION s) (SELECT l FROM r) UNION (SELECT l FROM s)`

WHERE and CROSS JOIN to JOIN:

`WHERE c FROM (r CROSS JOIN s) FROM r UNION_c s -- if C is join condition.`

DeMorgan:

```
NOT (C1 AND C2) === (NOT C1) OR (NOT C2)
NOT (C1 OR C2) === (NOT C1) AND (NOT C2)
```

# Source

- Databases 1, University of Turku
- Databases 2, University of Turku