Sunday, April 14, 2013

Merge stage


1)Merge stage is a processing stage, which can have any number of input links and one output link, with same number of reject links as there are update links.

2)The  input datasets  to  the Merge  stage must  be  key  partitioned  and  sorted.This ensures  that  rows  with  the  same  key  column  values  are   located  in  the  same partition and will be processed by the same node.

3)Merge stage combines master data with one or more updates  link data where  the keys match. 

4)Master and update links must have duplicate free data to ensure proper results.If the input data is not duplicate-free, the output generated will be improper.

5)Check link ordering to make sure the master and update links are proper otherwise the output generated will be improper

No comments:

Post a Comment