|< < 22 > >|

Unique

UniqueHash

Duplicates can also be eliminated by hashing.

If the input does not fit in memory

I.e., N > B

open(): Partition the table, so that duplicated rows are always in the same partition as one another:

  • Create B partitions on disk.
  • Use each disk buffer page to write out to one of the partitions.
  • For each input row, hash to one of the partitions.

|< < 22 > >|