Thursday, April 18, 2013

Datastage scenario with small example


Consider Input file.txt as below

telephoneno 
09700020075 
919889110102 
918571233668

Output.txt

telephoneno 
09700020075 
09889110102 
08571233668 and length should be 11.If first two characters is "91" i need to replace as "0"

Ans:
In Transformer use following derivation

If (Len(InLink.Phone) = 11) Then InLink.Phone Else "0" : Right(InLink.Phone,10)

Common steps for Datastge Job Development:

1)Understand the requirement clearly and also exceptions

2)Form the algorithm in simple english and Do not go for Job development directly

Algorithm:
a)Check whether length of the string is 11 or not
b)If it 11 pass the input else extract right most 10 characters and append with zero

3)Now convert the Algorithm into Datastage stages

a)To implement first point in the Algorithm we need to use Len function and also If statement
b)To implement second point in the Algorithm we need to use Right function

4)In most of the cases requirement is Spilted into Jobs and we have to identity the stages used in each Job after forming the algorithm  which makes you debugging easier instead of implementing entire requirement in one Job.

To identify when to use which stage the following link will  help you-when-to-use-which-stage-in-datastage

5)Performance tuning if you are facing any problem which is an iterative approach.

6)Finally connect all the Jobs through Job sequence depends on Dependency because one Datastage Job output may depends upon the Input of other Datstage Job

Wednesday, April 17, 2013

Difference between sequential file stage and Data set stage?


1) When you use sequential file as Source, at the time of Compilation it will convert to native format from ASCII.where as, when you go for using datasets conversion is not required. Also, by default sequential files we be Processed in sequence only. Sequential files can accommodate up to 2GB only. Sequential  files does not support NULL values.All the above can me overcome using dataset Stage,but selection is depends on the Requirement.suppose if you want to capture rejected data in that case you need to use sequential file or file set stage.

2)Sequential file is used to Extract the data from flat files and load the data into flat files and limit is 2GB.Dataset is a intermediate stage and it has parallelism when load data into dataset and it improve the performance.

3)Data set mainly consists of two files.

a)Descriptor file which consists of Metada,data location but not actual data itself
b)Data file contains the data in multiple files and one file file per partition.

4)orchadmin command is used to delete the datasets where as rm unix command is used to remove the flat files.

Complete information about orchadmin can be found in the below
link-orchadmin

Difference between server jobs and parallel jobs?


Server jobs:-
1)In server jobs it handles less volume of data.
2)It is having less number of components.
3)Data processing will be slow.
4)Executed by Datastage serve engine
5)compiled into Basic
6)No parallel Capability  is one of the  drawback of Server jobs

Parallel jobs:-
1)It handles high volume of data.
2)Executed by Datastage parallel engine
3)It is having more number of components compared to server jobs.
4)Supports  pipeline and partition parallelsim
5)Compiled into OSH

Similarity Between Server jobs and Parallel jobs:

Runtime monitoring in DataStage Director

Tuesday, April 16, 2013

COLUMN GENERATOR STAGE



1)The Column Generator Stage  is a development/debug stage.

2)It can have a single  input  link and  a  single  output  link.

3)The  Column  Generator  adds  columns  to  incoming  data  and generates mock data for these columns for each data row processed.This is useful for testing a job when no real test data is available.