Companies are investing so much money to understand data
they have accumulated for so many years and what value it can can potentially
provide.
Hadoop plays an major role in processing/handling Big
data ,Hadoop (HDFS) is simply a file
system in which the data files are
distributed across multiple computer systems (nodes).
A Hadoop cluster is a set of computer systems which function
as the file system.
A single file in Hadoop can be spread over an indefinite amount
of nodes in the Hadoop cluster.
In theory, there is no limit to the amount of data which the
file system can store since it is always possible to add more nodes.
Datastage :-
DataStage has a stage called the Big Data File stage(BDFS)
which allows DataStage to read and write from Hadoop.
Before we can utilize this stage in a DataStage job, we have
to configure the environment correctly. The following pre-requirements have to
be met:
Verify that the Hadoop (BigInsights) cluster is up and
running correctly. The status of BigInsights can be checked either from the
BigInsights console or from the command line.
Add the BigInsights library path to the dsenv file.
Find out the required connection details to the BigInsights
cluster.
BDFS Cluster Host
BDFS Cluster Port Number
BDFS User: User name to access files
BDFS Group: Group name for permissions – Multiple groups can
be listed.
The Big Data File stage functions similarly to the
Sequential File stage. It can be used as either a source or a target in a job.
Other than the required connection properties to the HDFS, the stage has the
same exact properties as the Sequential File stage (i.e. First line is column
names, Reject mode, Write mode, etc.)
Informatica:-
The
informatica has hand full of Big data products which will allows informatica
customers to process/access data from Hadoop environment
Power Exchange Connector:-
The power
exchange connector has inbuild “hadoop” connector which will allow you to
connect to hadoop directly
Informatica Big Data Edition:-
This
edition provides an extensive library of prebuilt transformation capabilities
on Hadoop, including
data type conversions and string manipulations, high
performance cache-enabled lookups, joiners, sorters,
routers, aggregations, and many more
Other functionality provided:-
·
Data profiling on Hadoop
·
Data Parsing
·
Entity Extraction and Data Classification
0 comments: