Unique
UniqueHash
Duplicates can also be eliminated by hashing.
If the input does not fit in memory
I.e., N > B
open(): Partition the table, so that duplicated rows
are always in the same partition as one another:
- Create B partitions on disk.
- Use each disk buffer page to write out to one of the partitions.
- For each input row, hash to one of the partitions.
|