Stuff: December 2012

Sunday, December 23, 2012

Orchadmin

DataStage stores data in persistent internal (specific to DataStage) format in the form of Data sets. Orchestrate Data set aid in the parallel processing of data and are much faster performance wise. They help in achieving end-to-end parallelism by writing data in partitioned form and maintaining the sort order. Orchestrate Data set consists of one or more data files stored on multiple processing nodes. A parallel data set is represented on disk by:

• A single descriptor file - defines the record schema of the data set and the location of all data files in the set. It does not contain the actual data.

• Data files (which contain the actual data) located on one or more processing nodes.

Orchadmin Utility

This is an Orchestrate Administrator Utility. It can perform operations on Data sets which cannot be performed by normal UNIX file commands. The basic syntax is:

orchadmin [command] [options] [descriptor_files]

Commands

The various commands that are available with orchadmin are dump, delete, truncate, copy and describe.

Dump Command

This command can be used to write records from a given data set onto standard output or can be redirected to a sequential file. The syntax is:

Syntax-orchadmin dump [options] descriptor_file

If no option is specified, all the records will be returned to the standard output.

ex 1)orchadmin dump test.ds

2)orchadmin dump test.ds>temp.txt

In the second example temp.txt file will contain data present in test.ds

Delete Command

rm deletes only descriptor file and the actual data is not deleted as it is present in the data files which reside on the processing nodes.Toremove the persistent data from the data sets the conventional approach is the use of Data set management in data stage.

Orchadmin utility simplifies the whole process by providing the
delete command. The syntax is:

Syntax-orchadmin delete | del | rm [-option] ds_1 ... ds_N

ex-orchadmin delete test.ds

Describe Command

This command outputs a report about the datasets specified. The syntax is:

orchadmin describe [-options] descriptor_file

ex- orchadmin describe test.ds

Copy Command

This command can be used to create an identical dataset with the same column definition and number of records. Orchadmin copy command can be used to take backups of existing datasets.

Syntax-orchadmin copy | cp source-ds target-ds

ex-orchadmin copy temp.ds temp_target.ds

Note:1)If one uses the UNIX cp command then only the descriptor file is copied, and these descriptor files point to the same data files residing in the processing nodes.

2)Type orchadmin on command prompt to get help information about this command

Thursday, December 13, 2012

What is External Filter Stage?

1)External filter stage is a processing stage.
2)In Datastage User can filter the Data using Unix Commands(sed,cut, cat, grep, head etc) with the help of External filter stage.
External filter stage allows us to run these commands during processing the data in the job
Job:
Now I want to use grep command to filter input file(employee file-emp.txt) with DEPTNO=10

Command : grep "10$"

Click the links below for input file(employee file-emp.txt) and dsx file for the Job-J_ExternalFilter

The Source File-emp.txt and dsx file for the Job-J_ExternalFilter

Job:

What is dsjob in Datastage?

1)Datastage User can start and stop jobs, and retrieve information about job runs by using the dsjob command

2)Need for dsjob Command
a)In every Data warehousing project requires that your job runs in batches at specified time slots. In many cases they batches may be consisting of jobs from different systems in addition to the ones in Datastage. In such cases the Datastage jobs are usually scheduled by using an external scheduling tool like ESP Scheduler, Control M, Autosys, etc.This is made possible by writing scripts that will run your jobs through the command line

b)Developer can use other unix command with dsjob command

ex:dsjob command[dsjob -logsum <project> <job>] with unix mail command to send log information of a Datastage Jobto a User

3)Prerequisites to run the dsjob Command

a)IN UNIX, the Datastage home directory location will always be specified in the “.dshome” file which will be present in the root directory.Before you can run your Datastage commands you will have to run the following commands

cd `cat /.dshome`
This will change the location to the home directory. By default this will be /opt/IBM/InformationServer/Server/DSEngine

b). ./dsenv
This will run the dsenv file which contains all the environment variables. Without doing this, your UNIX commands won’t run on the command prompt.

c)cd bin

Go to bin Directory by using above Unix command

After you have done this then you can use any Datastage command for interacting with the server

4)Developer can use the ‘dsjob’ command not only to run jobs but for a wide variety of following reasons

The logon clause
Starting a job
Stopping a job
Listing projects, jobs, stages, links, and parameters
Setting an alias for a job
Retrieving information
Accessing log files
Generating a report

The Usage of dsjob command is found in the following Document with examples is found in the following link-dsjob Command

Thursday, December 6, 2012

Server Routine to count the no of lines in a file?

In many cases User should trigger the Jobactivity for each record in the file. In that case Developer can use following server Routine to count the no of lines in a file which can be used in final counter value for Start Loop Activity.Please find the atachment

RecordCount Routine

Wednesday, December 5, 2012

How to replace character by string in Datastage?

1)convert function replaces a single character with another single character .It is not possible to replace single character by string(group of characters).

2)Datastage does not have any inbuilt functionto implement this functionality.User has to write parallel routine which can called from transformer.The code for this routine can be found in the below link

PXEReplace

How to find the nth highest salary in file?

Job:
sequential file-->sortstage-->transformer stage--->targetsequentialfile
sort stage settings
Sal-desc

Transformer stage settings
constraint-@INROWNUM=n

Both the sort stage and transformer must run in sequential mode.

Tuesday, December 4, 2012

How To find list of jobs accessing a table in Datastage?

1)Create New Server Job

Universe Stage-->Sequential file

a)Universe Stage:
Datasource name-localuv

SQL Query:
SELECT
DS_JOBS.NAME AS JOB_NAME,
DS_JOBS.CATEGORY,
DS_JOBOBJECTS.NAME AS OBJECT_NAME,
DS_JOBOBJECTS.OLETYPE,
EVAL DS_JOBOBJECTS."if index(upcase(@RECORD),'PIPPO',1) > 0 then 'FOUND' else ''" AS FOUND FMT '5L'
FROM
DS_JOBS,
DS_JOBOBJECTS
WHERE
DS_JOBS.JOBNO = DS_JOBOBJECTS.OBJIDNO
and FOUND = 'FOUND'
GROUP BY
JOB_NAME,
DS_JOBS.CATEGORY,
OBJECT_NAME,
DS_JOBOBJECTS.OLETYPE,
FOUND

Mention the column names in Columns tab from the above sql Query

b)Mention target filename in Sequential file stage

Server Routine to find Job type In Datastage

Code:
$INCLUDE DSINCLUDE JOBCONTROL.H
handle=DSAttachJob (Arg1,DSJ.ERRFATAL)
if handle <> 0 Then
jobName=DSGetJobInfo(handle,DSJ.JOBNAME)
jobStatus=DSGetJobInfo(handle,DSJ.JOBSTATUS)
JOBTYPE = TRANS("DS_JOBS", jobName, 6, "X")
JobTypeDesc = FIELD('Server|Sequence|Parallel|Mainframe','|',JOBTYPE,1)
Ans="Job Name is : " : jobName : " and job type is :" : JobTypeDesc
End
Else
Ans="Invalid.. Please enter a valid Job/Sequence name."
End

User has to Pass Proper Jobname to Arg1 to get the Proper Output and test the Routine using Test option

How to abort the Job based on Condition?

I/p

col
100
200
300
100
100

If any records come other than 100 in col column, I need to abort the job.

We can use Transfomer to abort the Job based upon certain condition

If Inlink.col<>"100" Then 'Y' Else 'N'=StgVar

Constraint:

StageVar='Y'

Abort After rows=1

Sunday, December 2, 2012

How to find the list of the job names and folder where a particular stage is used?

1)Go to Advanced Find (The magnifier-glass-icon on the tool-bar)

2)Select the stage[Ex:- Pivot Stage] from the Where Used Option

How to store the output of a Repository query in a file?

1)Open the Administrator

Administrator Client Component-> Projects Tab-> Command Button

2)COMO ON filename

3)Then execute your query[ex:-SELECT NAME FROM DS_JOBS]and output will be written to a file in Projects/ProjectName/'&COMO&'/ path

4)Do not forget to turn off COMO.

COMO OFF

Saturday, December 1, 2012

How to fetch the every nth record in sequential file?

Job design:
Sequentialfile--->samplestage--->Sequentialfile

In sample stage select the following settings

SampleMode-Period
Period(Per partition)-n
[To select the every 3 record in sequential file mention 3].
Execution Mode-sequential Mode

samplestage extract every third row from input.

List of the stages that used in a jobs in a project

1)Get the list of Jobs in a project Using following command

dsjob [ options ] -ljobs projectname

2)For each Job get the list of stages in a Job

dsjob [ options ] -lstages projectname

3)Use above commands and loop concepts in unix shell scripting to achieve the required output

How to find job modified date in 8.5

Method-1
Export the job as a DSX and look at the DateModified immediately lines below the BEGIN DSJOB and Identifier lines

ex-
BEGIN DSJOB
Identifier "JobNameGoesHere"
DateModified "2010-10-24"
TimeModified "10.22.00"

Method-2
2)a)Go to Advanced Find Option
b)change view-->List
c)Type Jobname in Name to find Option
d)Click Find Option

Datastage will show Job Type,Job modified date and who modified the Job etc.Method 2 is possible if the Datastage version is 8x,in such cases user can Use method 2

Pages