Left Self Exclusion Joins
From Zanecorpwiki
This uses the SQL Example Data Setup.
In general, the technique is useful when you want to select a particular member of a group. Particularly useful to collapse/create summaries crosses resulting from joined tables.
Say we want the lowest priced supplier of a widget and we use:
SELECT is1.item, is1.supplier, is1.price
FROM itemsupplier is1
LEFT JOIN itemsupplier is2
ON is1.item=is2.item AND is1.price > is2.price
WHERE is2.item IS NULL;
We get the right answer, but what is going on?
- recall that a left (outer) join includes every row from the left hand side of the join--subject to the WHERE-clause--whether it matches the ON-clause or not
- so, the WHERE-clause specifying 'is2 IS NULL' means we want those rows where the join is to a non-self
To see this run:
SELECT *
FROM itemsupplier is1
LEFT JOIN itemsupplier is2
ON is1.item=is2.item AND is1.price > is2.price;
Which yields:
supplier | item | price | supplier | item | price ----------------------------------------------------------- Acme | widget | 3.00 | NULL | NULL | NULL Acme | dongle | 8.00 | Corp | dongle | 6.00 Acme | thingy | 10.00 | Ma and Pop | thingy | 9.75 Corp | widget | 4.50 | Acme | widget | 3.00 Corp | dongle | 6.00 | NULL | NULL | NULL Ma and Pop | thingy | 9.75 | NULL | NULL | NULL
The join matches up every supplier with every other supplier selling the same item (is1.item=is2.item) and where the first supplier has a higher price. In the case of the lowest price supplier, there is no match, so the LEFT JOIN means the is2 values will be null because the JOIN-clause cannot be satisfied.
So to select the lowest priced supplier, we look for those who have no lower priced supplier (is2.supplier IS NULL).
The 'self exclusion' part comes from the fact that the non-equal join clause (is1.price > is2.price in this case) excludes a row from joining with itself. (TODO: is that right?)
The idea of "self" here isn't limited to a single row. Consider a query of the form:
SELECT * FROM registration r1 JOIN transaction tr ON r1.transactionId=tr.id LEFT JOIN registration r2 ON r2.transactionId=r1.transacationId AND r.id > r2.id WHERE r2.id IS NULL;
This gets you a row for each transaction crossed with the first entry. Useful if there's something special about the first entry, or if you need to select on some data in the registration table in the where clause.


