Wednesday, February 13, 2013

How to compile the InfoSphere DataStage jobs by using dscc command?

1)Datastage User can compile the DataStage jobs from the command line on the InfoSphere DataStage client, using the following command:
dscc
2)User can obtain a complete list of options by using the /? as shown in the following example.
C:\IBM\InformationServer\Clients\Classic>dscc /?
Example:
dscc /h r101 /u fellp /p plaintextpassword dstageprj /J mybigjob 
The above command will connect to the machine r101, with a username and password of fellp and plaintextpassword, attach to the project dstageprj and compile the job mybigjob

Saturday, February 9, 2013

Difference between Normal lookup and Sparse lookup?

1)The first input link to lookup stage is called the ‘Primary’ link. Other links   are called ‘Lookup’ links. When  lookup  links are  from a stage  that  is other  than a database stage, all data from the  lookup  link  is read  into memory. Then,  for each row  from  the primary  link,  the  lookup  is performed. If the source of lookups is a database, there can be two types of lookups: 

 Normal lookup: 
All the data from the database is read into memory, and then lookup is performed. 

Sparse  lookup:  For  each  incoming  row  from  the  primary  link,  the  SQL  is  fired  on database at run time.

2)Sparse lookups can be used if the input data is smaller than the reference data.

Tuesday, February 5, 2013

Difference between Hash and sort grouping methods in Aggregator stage


Grouping Methods
Hash (default)
1)Calculations are made for all groups and stored in memory
2)Results are written out after all input has been processed so large memory is required when volume of input is high
3)Input does not need to be sorted
4)Useful when the number of unique groups is small 
Sort
1)Requires the input data to be sorted by grouping keys
2)Only a single aggregation group is kept in memory so less memory is required
3)When a new group is seen, the current group is written out
4)Can handle unlimited numbers of groups

Conclusion-When the volume of input is high  and is not predictable it is better to use Sort Method