2)Records can be grouped on one or more keys
3)In parallel environment, we need to be careful when partitioning. It can affect the result of the aggregator. If the records that fall in the same group are in different partitions, then the generated output will be wrong.Therefore, it is better to do Hash partition on grouping keys before the aggregator stage so that records with same keys will go to same partition.
4)In Aggregator two grouping methods(Hash and sort) are present.Please find the following link-Grouping Methods for more information about grouping methods in aggregator stage
No comments:
Post a Comment