DataStage stores data in persistent internal (specific to DataStage) format in the form of Data sets. Orchestrate Data set aid in the parallel processing of data and are much faster performance wise. They help in achieving end-to-end parallelism by writing data in partitioned form and maintaining the sort order. Orchestrate Data set consists of one or more data files stored on multiple processing nodes. A parallel data set is represented on disk by:
• A single descriptor file - defines the record schema of the data set and the location of all data files in the set. It does not contain the actual data.
• Data files (which contain the actual data) located on one or more processing nodes.
Orchadmin Utility
This is an Orchestrate Administrator Utility. It can perform operations on Data sets which cannot be performed by normal UNIX file commands. The basic syntax is:
orchadmin [command] [options] [descriptor_files]
Commands
The various commands that are available with orchadmin are dump, delete, truncate, copy and describe.
Dump Command
This command can be used to write records from a given data set onto standard output or can be redirected to a sequential file. The syntax is:
Syntax-orchadmin dump [options] descriptor_file
If no option is specified, all the records will be returned to the standard output.
ex 1)orchadmin dump test.ds
2)orchadmin dump test.ds>temp.txt
In the second example temp.txt file will contain data present in test.ds
Delete Command
rm deletes only descriptor file and the actual data is not deleted as it is present in the data files which reside on the processing nodes.Toremove the persistent data from the data sets the conventional approach is the use of Data set management in data stage.
Orchadmin utility simplifies the whole process by providing the
delete command. The syntax is:
delete command. The syntax is:
Syntax-orchadmin delete | del | rm [-option] ds_1 ... ds_N
ex-orchadmin delete test.ds
Describe Command
This command outputs a report about the datasets specified. The syntax is:
orchadmin describe [-options] descriptor_file
ex- orchadmin describe test.ds
Copy Command
This command can be used to create an identical dataset with the same column definition and number of records. Orchadmin copy command can be used to take backups of existing datasets.
Syntax-orchadmin copy | cp source-ds target-ds
ex-orchadmin copy temp.ds temp_target.ds
Note:1)If one uses the UNIX cp command then only the descriptor file is copied, and these descriptor files point to the same data files residing in the processing nodes.
2)Type orchadmin on command prompt to get help information about this command