UNION removes duplicates
When you need to combine two sets of data together, we use the UNION operator. That comes in two flavors: UNION and UNION ALL. The default is to remove duplicates between the two sets whereas UNION ALL does no filtering.
Pop quiz! Given the following sets A and B
What's the result of
SELECT * FROM A UNION SELECT * FROM B;
Piece of cake, we start with everything in A and get the values in B that aren't in A.
So we're looking at 1, 5, 9 7, 3, 3, 2, 3
Except of course that's not what is actually happening. UNION is actually going to smash both sets of data together and then take the distinct results. Or it does a distinct within each result set, smashes them together and takes one last pass to remove duplicates. I don't know or care about the actual mechanics, what I care about is the final outcome.
We actually end up with a result of 1, 5, 9, 7, 3, 2. In the fifteen years I've been writing SQL statements, I don't think I ever realized that behavior of the final result set being distinct. I thought it was purely an intra set dedupe process.
I thought wrong