1)Filter stage is used to transform (not modify) records from the input link based on a specific condition
2)Different criteria/requirements can be specified to direct data to different output links from the Filter stage.
3)This processing stage can have a single input link and any number of output links. It can also, optionally, have a reject link.
4)Switch stage supports only 128 output links whereas Filter stage can have any number of output links
Sunday, April 14, 2013
Friday, April 12, 2013
When to use which stage in Datastage?
Copy STAGE-To drop a Particular column
Sort STAGE-sorting,generating Key change and similar to order by clause in oracle
Filter STAGE-Similar to where clause in oracle but we can not perform Join operation
Lookup,Join,Merge-To perform Join operation
Pivot Enterprise STAGE-Rows to columns and columns to Rows
External Filter STAGE-Filter the records by using Unix filter commands like Grep etc
MODIFY STAGE-Metadata conversion,Null Handling and similar to conversion functions in oracle
FUNNEL STAGE -Combining the multiple input data into a single output.Metadata should be same for all the inputs
REMOVE DUPLICATES STAGE-To remove duplicate values from a single sorted input.
ENCODE / DECODE STAGES:To encode/compress a data set using UNIX encoding commands like gzip etc
TRANSFORMER STAGE:
a)Filtering the Data(constraints)
b)Metadata conversion(Using Functions)
c)Rows to columns and columns to Rows(Using Stage variables)
d)Looping
e)Creating a counter(Using macros)-Counter using Transformer
SURROGATE KEY GENERATOR STAGE-To generate SURROGATE KEYs similar to oracle Database sequence
Aggregator Stage:To perform Group by Operations like max,min etc similar to Group by clause in oracle
ROW GENERATOR STAGE:To generate a set of mock data fitting the specified metadata when no real data is available
XML OUTPUT STAGE -To convert tabular data such as tables and sequential files to XML hierarchical structures.
SWITCH STAGE- It performs an operation similar to the switch statement in C and to filter the data
CHANGE CAPTURE STAGE-To identify Delta changes(inserts,updates,deletes etc) between two sources
oracle connector-To connect to the oracle Database.
Sort STAGE-sorting,generating Key change and similar to order by clause in oracle
Filter STAGE-Similar to where clause in oracle but we can not perform Join operation
Lookup,Join,Merge-To perform Join operation
Pivot Enterprise STAGE-Rows to columns and columns to Rows
External Filter STAGE-Filter the records by using Unix filter commands like Grep etc
MODIFY STAGE-Metadata conversion,Null Handling and similar to conversion functions in oracle
FUNNEL STAGE -Combining the multiple input data into a single output.Metadata should be same for all the inputs
REMOVE DUPLICATES STAGE-To remove duplicate values from a single sorted input.
ENCODE / DECODE STAGES:To encode/compress a data set using UNIX encoding commands like gzip etc
TRANSFORMER STAGE:
a)Filtering the Data(constraints)
b)Metadata conversion(Using Functions)
c)Rows to columns and columns to Rows(Using Stage variables)
d)Looping
e)Creating a counter(Using macros)-Counter using Transformer
SURROGATE KEY GENERATOR STAGE-To generate SURROGATE KEYs similar to oracle Database sequence
Aggregator Stage:To perform Group by Operations like max,min etc similar to Group by clause in oracle
ROW GENERATOR STAGE:To generate a set of mock data fitting the specified metadata when no real data is available
XML OUTPUT STAGE -To convert tabular data such as tables and sequential files to XML hierarchical structures.
SWITCH STAGE- It performs an operation similar to the switch statement in C and to filter the data
CHANGE CAPTURE STAGE-To identify Delta changes(inserts,updates,deletes etc) between two sources
oracle connector-To connect to the oracle Database.
Aggregator stage
1)Aggregator stage is a processing stage that can have one input and one output link. It classifies records from the input link into groups and computes the totals or performs specified aggregator functions for each group
2)Records can be grouped on one or more keys
3)In parallel environment, we need to be careful when partitioning. It can affect the result of the aggregator. If the records that fall in the same group are in different partitions, then the generated output will be wrong.Therefore, it is better to do Hash partition on grouping keys before the aggregator stage so that records with same keys will go to same partition.
4)In Aggregator two grouping methods(Hash and sort) are present.Please find the following link-Grouping Methods for more information about grouping methods in aggregator stage
Copy Stage
1)Copy stage is a processing stage that can have a single input and any number of output links
2)Copy stage is used to copy a single input data set to a number of output data sets
3)This stage is generally used for following things
a)Columns can be dropped.
b)The order of the columns can be altered.
Subscribe to:
Posts (Atom)