This entry describes various ways of creating a unique
counter in DataStage jobs.
A parallel job has a surrogate key stage that creates unique
IDs, however it is limited in that it does not support conditional code and it
may be more efficient to add a counter to an existing transformer rather than
add a new stage.
In a server job there are a set of key increment routines
installed in the routine SDK samples that offer a more complex counter that
remembers values between job executions.
The following section outlines a transformer only technique.
In a DataStage job the easiest way to create a counter is
within the Transformer stage with a Stage Variable.
svMyCounter = svMyCounter + 1
This simple counter adds 1 each time a row is processed.
The counter can be given a seed value by passing a value in
as a job parameter and setting the initial value of svMyCounter to that job
parameter.
In a parallel job this simple counter will create duplicate
values on each node as the transformer is split into parallel instances. It can
be turned into a unique counter by using special parallel macros.
1.
Create a stage variable for the
counter, eg. SVCounter.
2.
At the Stage Properties form set the
Initial Value of the Stage Variable to "@PARTITIONNUM - @NUMPARTITIONS +
1".
3.
Set the derivation of the stage
variable to "svCounter + @NUMPARTITIONS". You can embed this in an IF
statement if it is a conditional counter.
Each instance will start at a different number, eg. -1, -2,
-3, -4. When the counter is incremented each instance is increment by the
number of partitions, eg. 4. This gives us a sequence in instance 1 of 1, 5, 9,
13... Instance 2 is 2, 6, 10, 14... etc.
Remember this method only works if your data is evenly
balanced i.e. equal number of rows going through each partition. Alternative
syntax is:
@INROWNUM * @NUMPARTITIONS + @PARTITIONNUM
0 comments: