Tuesday, April 16, 2013

Sort stage

1)Sort  stage  is  a  processing  stage  used  to  perform  sorting  operations  on  input  data.

2)Need for Sorting:
Some stages require sorted input 
ex- Join, merge stages

3)Sort stage requires a ‘key’ to be specified by which the sort is performed.Multiple sort keys can be specified

4)Sort  operation  is  performed  partition wise.To sort  a complete set  of  data, you should  change  the Sort Stage  execution mode  to  sequential. 

5)There are two ways Sort can be performed in Datastage : 

a)Within stages On input link Partitioning tab, set partitioning to anything other than Auto 

b)In a separate Sort stage which has more options like Allow duplicates,case sensitive,sort order(ascending / descending) etc.

By default, both methods use the tsort operator which can be identified in Job score

6)Partitioning keys are often different than Sorting keys 

Keyed partitioning (e.g.Hash) is used to group related records into the same partition  where as Sort keys are used to establish order within each partition

No comments:

Post a Comment