Stuff: Aggregator stage

Friday, April 12, 2013

Aggregator stage

1)Aggregator stage is a processing stage that can have one input and one output link. It classifies records from the input link into groups and computes the totals or performs specified aggregator functions for each group

2)Records can be grouped on one or more keys

3)In parallel environment, we need to be careful when partitioning. It can affect the result of the aggregator. If the records that fall in the same group are in different partitions, then the generated output will be wrong.Therefore, it is better to do Hash partition on grouping keys before the aggregator stage so that records with same keys will go to same partition.

4)In Aggregator two grouping methods(Hash and sort) are present.Please find the following link-Grouping Methods for more information about grouping methods in aggregator stage

Pages

Friday, April 12, 2013

Aggregator stage

No comments:

Post a Comment