Friday, April 12, 2013

Aggregator stage

1)Aggregator  stage  is  a  processing  stage  that  can  have  one  input  and  one  output  link.  It classifies  records  from  the  input  link  into   groups  and  computes  the  totals  or  performs specified aggregator  functions  for each group

2)Records can be grouped on one or more keys

3)In parallel environment, we need to be careful when partitioning. It can  affect the result of the aggregator.  If  the  records  that  fall  in  the same  group  are  in  different  partitions,  then  the generated output will be wrong.Therefore,  it  is better  to do Hash partition on grouping keys  before  the  aggregator  stage so that records with same keys will go to same partition.

4)In Aggregator two grouping methods(Hash and sort) are present.Please find the following link-Grouping Methods for more information about grouping methods in aggregator stage

No comments:

Post a Comment