creating a unique counter in DataStage job

This entry describes various ways of creating a unique counter in DataStage jobs.

A parallel job has a surrogate key stage that creates unique IDs, however it is limited in that it does not support conditional code and it may be more efficient to add a counter to an existing transformer rather than add a new stage.

In a server job there are a set of key increment routines installed in the routine SDK samples that offer a more complex counter that remembers values between job executions.

The following section outlines a transformer only technique.

Steps

In a DataStage job the easiest way to create a counter is within the Transformer stage with a Stage Variable.

svMyCounter = svMyCounter + 1

This simple counter adds 1 each time a row is processed.

The counter can be given a seed value by passing a value in as a job parameter and setting the initial value of svMyCounter to that job parameter.

In a parallel job this simple counter will create duplicate values on each node as the transformer is split into parallel instances. It can be turned into a unique counter by using special parallel macros.

1. Create a stage variable for the counter, eg. SVCounter.

2. At the Stage Properties form set the Initial Value of the Stage Variable to "@PARTITIONNUM - @NUMPARTITIONS + 1".

3. Set the derivation of the stage variable to "svCounter + @NUMPARTITIONS". You can embed this in an IF statement if it is a conditional counter.

Each instance will start at a different number, eg. -1, -2, -3, -4. When the counter is incremented each instance is increment by the number of partitions, eg. 4. This gives us a sequence in instance 1 of 1, 5, 9, 13... Instance 2 is 2, 6, 10, 14... etc.

Remember this method only works if your data is evenly balanced i.e. equal number of rows going through each partition. Alternative syntax is:

@INROWNUM * @NUMPARTITIONS + @PARTITIONNUM

Big Data - DW & BI

Translate to your Language

Labels

Wednesday, December 12, 2012

creating a unique counter in DataStage job

0 comments:

Popular Posts

Pin It