Sunday, December 23, 2012

Orchadmin


DataStage stores data in persistent internal (specific to DataStage) format in the form of Data sets. Orchestrate Data set aid in the parallel processing of data and are much faster performance wise. They help in achieving end-to-end parallelism by writing data in partitioned  form and maintaining the sort order. Orchestrate Data set consists of one or more data  files stored on multiple processing nodes. A parallel data set is represented on disk by: 

• A  single descriptor  file  - defines  the  record  schema of  the data  set and  the  location  of  all  data  files  in  the  set.  It  does  not  contain  the actual data. 

• Data  files  (which  contain  the  actual  data)  located  on  one  or more processing nodes.

Orchadmin Utility 

This  is an Orchestrate Administrator Utility.  It  can perform  operations on Data sets which cannot be performed by normal UNIX file commands. The basic syntax is: 

orchadmin [command] [options] [descriptor_files] 

Commands 
The  various  commands  that  are  available  with  orchadmin  are  dump, delete, truncate, copy and describe. 

Dump Command 

This  command  can  be  used  to write  records  from  a  given  data  set  onto standard output or can be redirected to a sequential file. The syntax is: 

Syntax-orchadmin dump [options] descriptor_file 

If  no  option  is  specified,  all  the  records will  be  returned  to  the  standard output.

ex 1)orchadmin dump test.ds
    2)orchadmin dump test.ds>temp.txt

In the second example temp.txt file will contain data present in test.ds


Delete Command  

rm deletes only  descriptor file and the actual data  is  not  deleted  as  it  is  present  in  the  data  files  which  reside  on  the processing  nodes.Toremove  the  persistent  data  from  the  data  sets  the conventional approach is the use of Data set management in data stage.
  
Orchadmin  utility  simplifies  the  whole  process  by  providing  the  
delete command. The syntax is: 

Syntax-orchadmin delete | del | rm [-option] ds_1 ... ds_N 

ex-orchadmin delete test.ds

Describe Command  

This command outputs a report about the datasets specified. The syntax is: 

orchadmin describe [-options] descriptor_file 

ex- orchadmin describe test.ds


Copy Command

This  command  can  be  used  to  create  an  identical  dataset  with  the  same column definition and number of records. Orchadmin  copy  command  can  be  used  to  take  backups  of  existing datasets.  
  
Syntax-orchadmin copy | cp  source-ds  target-ds 

ex-orchadmin copy temp.ds temp_target.ds

Note:1)If  one  uses  the  UNIX  cp  command  then  only  the  descriptor  file  is copied, and these descriptor  files point to the same data  files residing in the processing nodes.

2)Type orchadmin on command prompt to get help information about this command




Thursday, December 13, 2012

What is External Filter Stage?

1)External filter stage is a processing stage.
2)In Datastage User can filter the Data using Unix Commands(sed,cut, cat, grep, head etc) with the help of External filter stage.
External filter stage allows us to run these commands during processing the data in the job
Job:
Now I want to use grep command to filter input file(employee file-emp.txt)  with DEPTNO=10 

Command : grep  "10$"

Click the links below for input file(employee file-emp.txt) and dsx file for the Job-J_ExternalFilter

The Source File-emp.txt and dsx file for the Job-J_ExternalFilter

Job:

What is dsjob in Datastage?

1)Datastage User can start and stop jobs, and retrieve information about job runs by using the dsjob command

2)Need for dsjob Command
a)In every Data warehousing project requires that your job runs in batches at specified time slots. In many cases they batches may be consisting of jobs from different systems in addition to the ones in Datastage. In such cases the Datastage jobs are usually scheduled by using an external scheduling tool like ESP Scheduler, Control M, Autosys, etc.This is made possible by writing scripts that will run your jobs through the command line

b)Developer can use other unix command with dsjob command

ex:dsjob command[dsjob -logsum <project> <job>] with unix  mail command to send log information of a Datastage Jobto a User

3)Prerequisites to run the dsjob Command

a)IN UNIX, the Datastage home directory location will always be specified in the “.dshome” file which will be present in the root directory.Before you can run your Datastage commands you will have to run the following commands

cd  `cat /.dshome` 
This will change the location to the home directory. By default this will be /opt/IBM/InformationServer/Server/DSEngine

b). ./dsenv 
This will run the dsenv file which contains all the environment variables. Without doing this, your UNIX commands won’t run on the command prompt.

c)cd bin

Go to bin Directory by using above Unix command

After you have done this then you can use any Datastage command for interacting with the server

4)Developer can use the ‘dsjob’ command  not only to run jobs but for a wide variety of following reasons

The logon clause 
Starting a job 
Stopping a job 
Listing projects, jobs, stages, links, and parameters 
Setting an alias for a job 
Retrieving information 
Accessing log files 
Generating a report 

The Usage of dsjob command is found in the following Document with examples is found in the following link-dsjob Command



Thursday, December 6, 2012

Server Routine to count the no of lines in a file?

In many cases User should trigger the Jobactivity for each record in the file. In that case Developer can use following server Routine to count the no of lines in a file which can be used in final counter value for Start Loop Activity.Please find the atachment


Wednesday, December 5, 2012

How to replace character by string in Datastage?


1)convert function replaces a single character with another single character .It is not possible to replace single  character by string(group of characters).

2)Datastage does not have any inbuilt  functionto  implement this functionality.User has to write parallel routine which can called from transformer.The code for this routine can be found in the below link


How to find the nth highest salary in file?


Job:
sequential file-->sortstage-->transformer stage--->targetsequentialfile
sort stage settings
Sal-desc 

Transformer stage settings
constraint-@INROWNUM=n

Both the sort stage and transformer must run in sequential mode.

Tuesday, December 4, 2012

How To find list of jobs accessing a table in Datastage?


1)Create New Server Job

Universe Stage-->Sequential file

a)Universe Stage:
Datasource name-localuv

SQL Query:
SELECT 
   DS_JOBS.NAME AS JOB_NAME, 
   DS_JOBS.CATEGORY, 
   DS_JOBOBJECTS.NAME AS OBJECT_NAME, 
   DS_JOBOBJECTS.OLETYPE, 
   EVAL DS_JOBOBJECTS."if index(upcase(@RECORD),'PIPPO',1) > 0 then 'FOUND' else ''" AS FOUND FMT '5L' 
FROM 
   DS_JOBS, 
   DS_JOBOBJECTS 
WHERE 
   DS_JOBS.JOBNO = DS_JOBOBJECTS.OBJIDNO 
   and FOUND = 'FOUND' 
GROUP BY 
   JOB_NAME, 
   DS_JOBS.CATEGORY, 
   OBJECT_NAME, 
   DS_JOBOBJECTS.OLETYPE, 
   FOUND 

Mention the column names in Columns tab from the above sql Query

b)Mention target filename in Sequential file stage

Server Routine to find Job type In Datastage


Code:
$INCLUDE DSINCLUDE JOBCONTROL.H 
handle=DSAttachJob (Arg1,DSJ.ERRFATAL) 
if handle <> 0 Then 
jobName=DSGetJobInfo(handle,DSJ.JOBNAME) 
jobStatus=DSGetJobInfo(handle,DSJ.JOBSTATUS) 
JOBTYPE = TRANS("DS_JOBS", jobName, 6, "X") 
JobTypeDesc = FIELD('Server|Sequence|Parallel|Mainframe','|',JOBTYPE,1) 
Ans="Job Name is : " : jobName : " and job type is :" : JobTypeDesc 
End 
Else 
Ans="Invalid.. Please enter a valid Job/Sequence name." 
End 

User has to Pass Proper Jobname to Arg1 to get the Proper Output and test the Routine using Test option

How to abort the Job based on Condition?

I/p

col 
100 
200 
300 
100 
100 

If any records come other than 100 in col column, I need to abort the job. 

We can use Transfomer to abort the Job based upon certain condition

If Inlink.col<>"100" Then 'Y' Else 'N'=StgVar

Constraint: 

StageVar='Y'

Abort After rows=1

Sunday, December 2, 2012

How to find the list of the job names and folder where a particular stage is used?

1)Go to Advanced Find (The magnifier-glass-icon on the tool-bar) 

2)Select the stage[Ex:- Pivot Stage] from the Where Used Option

How to store the output of a Repository query in a file?

1)Open  the Administrator
Administrator Client Component-> Projects Tab-> Command Button
2)COMO ON filename 
3)Then execute your query[ex:-SELECT NAME FROM DS_JOBS]and output will be written to a file in Projects/ProjectName/'&COMO&'/ path
4)Do not forget to turn off COMO. 
COMO OFF

Saturday, December 1, 2012

How to fetch the every nth record in sequential file?

Job design:
Sequentialfile--->samplestage--->Sequentialfile

In sample stage select the following settings

SampleMode-Period
Period(Per partition)-n
[To select the  every 3 record in sequential file mention 3].
Execution Mode-sequential Mode

samplestage extract every third row from input.

List of the stages that used in a jobs in a project

1)Get the list of Jobs in a project Using following command

dsjob [ options ] -ljobs projectname

2)For each Job get the list of stages in a Job

dsjob [ options ] -lstages projectname

3)Use  above commands and loop concepts in unix shell scripting to achieve the required output

How to find job modified date in 8.5

Method-1
Export the job as a DSX and look at the DateModified immediately lines below the BEGIN DSJOB and Identifier lines

ex-
BEGIN DSJOB 
   Identifier "JobNameGoesHere" 
   DateModified "2010-10-24" 
   TimeModified "10.22.00"
   
 Method-2  
2)a)Go to Advanced Find Option
b)change view-->List
c)Type Jobname in Name to find Option
d)Click Find Option

Datastage will show Job Type,Job modified date and who modified the Job etc.Method 2 is possible if the Datastage version is 8x,in such cases user can Use method 2