Translate to your Language

Friday, July 10, 2015

Accessing Big Data using DI tools

by Unknown  |  in DI at  3:04 PM

Companies are investing so much money to understand data they have accumulated for so many years and what value it can can potentially provide.
Hadoop plays an major role in processing/handling Big data  ,Hadoop (HDFS) is simply a file system in  which the data files are distributed across multiple computer systems (nodes).
A Hadoop cluster is a set of computer systems which function as the file system.
A single file in Hadoop can be spread over an indefinite amount of nodes in the Hadoop cluster.
In theory, there is no limit to the amount of data which the file system can store since it is always possible to add more nodes.

Datastage :-
DataStage has a stage called the Big Data File stage(BDFS) which allows DataStage to read and write from Hadoop.
Before we can utilize this stage in a DataStage job, we have to configure the environment correctly. The following pre-requirements have to be met:

Verify that the Hadoop (BigInsights) cluster is up and running correctly. The status of BigInsights can be checked either from the BigInsights console or from the command line.
Add the BigInsights library path to the dsenv file.
Find out the required connection details to the BigInsights cluster.
BDFS Cluster Host
BDFS Cluster Port Number
BDFS User: User name to access files
BDFS Group: Group name for permissions – Multiple groups can be listed.
The Big Data File stage functions similarly to the Sequential File stage. It can be used as either a source or a target in a job. Other than the required connection properties to the HDFS, the stage has the same exact properties as the Sequential File stage (i.e. First line is column names, Reject mode, Write mode, etc.)


Informatica:-
            The informatica has hand full of Big data products which will allows informatica customers to process/access data from Hadoop environment
Power Exchange Connector:-
            The power exchange connector has inbuild “hadoop” connector which will allow you to connect to hadoop directly
Informatica Big Data Edition:-
            This edition provides an extensive library of prebuilt transformation capabilities on Hadoop, including
data type conversions and string manipulations, high performance cache-enabled lookups, joiners, sorters,
routers, aggregations, and many more

Other functionality provided:-
·         Data profiling on Hadoop
·         Data Parsing

·         Entity Extraction and Data Classification

0 comments:

© Copyright © 2015Big Data - DW & BI. by