Sunday, April 14, 2013

Filter stage

1)Filter stage  is used to transform (not modify) records from the input link based on a specific condition

2)Different criteria/requirements can be  specified  to direct data  to different output  links  from  the Filter stage.

3)This processing stage can have a single input link and any number of output links. It can also, optionally, have a reject link.

4)Switch stage supports only 128 output links whereas Filter stage can have any number of output links

Friday, April 12, 2013

When to use which stage in Datastage?

Copy STAGE-To drop a Particular column

Sort STAGE-sorting,generating Key change and similar to order by clause in oracle

Filter STAGE-Similar to where clause in oracle but we can not perform Join operation

Lookup,Join,Merge-To perform Join operation

Pivot Enterprise STAGE-Rows to columns and columns to Rows

External Filter STAGE-Filter the records by using Unix filter commands like Grep etc

MODIFY STAGE-Metadata conversion,Null Handling and similar to conversion functions in oracle

FUNNEL STAGE -Combining the multiple input data into a single output.Metadata should be same for all the inputs

REMOVE DUPLICATES STAGE-To  remove  duplicate  values  from  a single sorted  input.

ENCODE / DECODE STAGES:To encode/compress a data set using UNIX encoding  commands like gzip etc

TRANSFORMER STAGE:
a)Filtering the Data(constraints)
b)Metadata conversion(Using Functions)
c)Rows to columns and columns to Rows(Using Stage variables)
d)Looping
e)Creating a counter(Using macros)-Counter using Transformer

SURROGATE KEY GENERATOR STAGE-To generate SURROGATE KEYs similar to oracle Database sequence

Aggregator Stage:To perform Group by Operations like max,min etc similar to Group by clause in oracle

ROW GENERATOR STAGE:To generate a set of mock data fitting the specified metadata when no real data is available

XML OUTPUT STAGE -To convert tabular data such as tables and sequential files to XML hierarchical structures.

SWITCH STAGE-  It performs an operation similar to  the  switch  statement  in  C and to filter the data

CHANGE CAPTURE STAGE-To identify Delta changes(inserts,updates,deletes etc) between two sources

oracle connector-To connect to the oracle Database.

Aggregator stage

1)Aggregator  stage  is  a  processing  stage  that  can  have  one  input  and  one  output  link.  It classifies  records  from  the  input  link  into   groups  and  computes  the  totals  or  performs specified aggregator  functions  for each group

2)Records can be grouped on one or more keys

3)In parallel environment, we need to be careful when partitioning. It can  affect the result of the aggregator.  If  the  records  that  fall  in  the same  group  are  in  different  partitions,  then  the generated output will be wrong.Therefore,  it  is better  to do Hash partition on grouping keys  before  the  aggregator  stage so that records with same keys will go to same partition.

4)In Aggregator two grouping methods(Hash and sort) are present.Please find the following link-Grouping Methods for more information about grouping methods in aggregator stage

Copy Stage


1)Copy stage  is  a  processing  stage  that  can  have  a  single  input and  any  number  of  output links

2)Copy stage  is  used  to  copy  a  single  input  data  set  to  a  number  of  output  data  sets

3)This stage is generally used for following things

a)Columns can be dropped. 
b)The order of the columns can be altered.