Stuff: February 2013

Wednesday, February 13, 2013

How to compile the InfoSphere DataStage jobs by using dscc command?

1)Datastage User can compile the DataStage jobs from the command line on the InfoSphere DataStage client, using the following command:

dscc
2)User can obtain a complete list of options by using the /? as shown in the following example.

C:\IBM\InformationServer\Clients\Classic>dscc /?

Example:

dscc /h r101 /u fellp /p plaintextpassword dstageprj /J mybigjob

The above command will connect to the machine r101, with a username and password of fellp and plaintextpassword, attach to the project dstageprj and compile the job mybigjob

Saturday, February 9, 2013

Difference between Normal lookup and Sparse lookup?

1)The first input link to lookup stage is called the ‘Primary’ link. Other links are called ‘Lookup’ links. When lookup links are from a stage that is other than a database stage, all data from the lookup link is read into memory. Then, for each row from the primary link, the lookup is performed. If the source of lookups is a database, there can be two types of lookups:

Normal lookup:

All the data from the database is read into memory, and then lookup is performed.

Sparse lookup: For each incoming row from the primary link, the SQL is fired on database at run time.

2)Sparse lookups can be used if the input data is smaller than the reference data.

Tuesday, February 5, 2013

Difference between Hash and sort grouping methods in Aggregator stage

Grouping Methods

Hash (default)

1)Calculations are made for all groups and stored in memory

2)Results are written out after all input has been processed so large memory is required when volume of input is high

3)Input does not need to be sorted

4)Useful when the number of unique groups is small

Sort

1)Requires the input data to be sorted by grouping keys

2)Only a single aggregation group is kept in memory so less memory is required

3)When a new group is seen, the current group is written out

4)Can handle unlimited numbers of groups

Conclusion-When the volume of input is high and is not predictable it is better to use Sort Method

Pages