Ton of Data Warehousing and Datastage Interview Questions

· Types of Stages in DS? Explain with Examples

· What are active stages and passive stages?

· Can you filter data in hashed file? (No)

· Difference between sequential and hashed file?

· How do you populate time dimension?

· Can we use target hashed file as lookup? (Yes)

· What is Merge Stage?

· What is Job Sequencer?

· What are stages in sequences?

· How do you pass parameters?

· What parameters you used in your project?

· What are log tables?

· What is job controlling?

· Facts and dimension tables?

· Confirmed dimensions?

· Difference between OLTP and OLAP?

· Difference between star schema and snow flake schema?

· What are hierarchies? Examples?

· What are materialized views?

· What is aggregation?

· What is surrogate key? Is it used for both fact and dimension tables?

· Why do you go for oracle sequence generator rather than datastage routine?

· Flow of data in datastage?

· Initial loading and incremental loading?

· What is SCD? Types?

· How do you develop SCD type2 in your project?

· How do you load dimension data and fact data? Which is first?

· Difference between oracle function and procedure?

· Difference between unique and primary key?

· Difference between union and union all?

· What is minus operator?

· What is audit table?

· If there is a large hash file and a smaller oracle table and if you are looking up from

· transformer in different jobs which will be faster?

· Tell me about SCD’s?

· How did you implement SCD in your project?

· What are derivations in transformer?

· How do you use surrogate key in reporting?

· Logs view in datastage, logs in Informatica which is clear?

· How does pivot stage work?

· What is surrogate key? What is the importance of it? How did you implement it in your

· project?

· Totally how many jobs did you developed and how many lookups did you use totally?

· How do constraint in transformer work?

· How will you declare a constraint in datastage?

· How will you handle rejected data?

· Give me some performance tips in datastage?

· Can we use sequential file as a lookup?

· How does hash file stage lookup?

· Why can’t we use sequential file as a lookup?

· What is data warehouse?

· What is ‘Star-Schema’?

· What is ‘Snowflake-Schema’?

· What is difference between Star-Schema and Snowflake-Schema?

· What is mean by surrogate key?

· What is ‘Conformed Dimension’?

· What is Factless Fact Table?

· When will we use connected and unconnected lookup?

· Which cache supports connected and unconnected lookup?

· What is the difference between SCD Type2 and SCD Type3?

· What is difference between data mart and data warehouse?

· What is composite key?

· What is surrogate key? When you will go for it?

· What is dimensional modeling?

· What are SCD and SGT? Difference between them? Example of SGT from your project.

· How do you import your source and targets? What are the types of sources and targets?

· What is Active Stages and Passive Stages means in datastage?

· What is difference between Informatica and DataStage? Which do you think is best?

· What are the stages you used in your project?

· What do you mean by parallel processing?

· What is difference between Merge Stage and Join Stage?

· What is difference between Copy Stage and Transformer Stage?

· What is difference between ODBC Stage and OCI Stage?

· What is difference between Lookup Stage and Join Stage?

· What is difference between Change Capture Stage and Difference Stage?

· What is difference between Hashed file and Sequential File?

· What are different Joins used in Join Stage?

· How you decide when to go for join stage and lookup stage?

· What is partition key? Which key is used in round robin partition?

· How do you handle SCD in datastage?

· What are Change Capture Stage and Change Apply Stages?

· How many streams to the transformer you can give?

· What is primary link and reference link?

· What is routine? What is before and after subroutines? These are run after/before job or

· stage?

· What is Config File? Each job having its own config file or one is needed?

· What is Node?

· What is IPC Stage? What it increase performance?

· What is Sequential buffer?

· What are Link Partioner and Link Collector?

· What are the performance tunning you have done in your project?

· Did you done scheduling? How? Can you schedule a job at the every end date of month?

· How?

· What is job sequence? Had you run any jobs?

· What is status view? Why you clear this? If you clear the status view what internally

· done?

· What is hashed file? What are the types of hashed file? Which you use? What is default?

· What is main advantage of hashed file? Difference between them. (static and dynamic)

· What are containers? Give example from your project.

· What are parameters and parameter file?

· How do you convert columns to rows and rows to columns in datastage? (Using Pivot

· Stage).

· What is Pivot Stage?

· What is execution flow of constraints, derivations and variables in transformer stage?

· What are these?

· How do you eliminate duplicates in datastage? Can you use hash file for it?

· If 1st and 8th record is duplicate then which will be skipped? Can you configure it?

· How do you import and export datastage jobs? What is the file extension? (See each

· component while importing and exporting).

· How do you rate yourself in DataStage?

· Explain DataStage Architecture?

· What is repository? What are the repository items?

· What is difference between routine and transform?

· When you write the routines?

· What is the complex situation you faced in DataStage?

· System variable, what are system variables used your project?

· What are the different datastage functions used in your project?

· Difference between star schema and snow flake schema?

· What is confirmed, degenerated and junk dimension?

· What are confirmed facts?

· Different type of facts and their examples?

· What are approaches in developing data warehouse?

· Different types of hashed files?

· What are routines and transforms? How you used in your project?

· Difference between Data Mart and Data Warehouse?

· What is surrogate key? How do you generate it?

· What are environment variables and global variables?

· How do you improve the performance of the job?

· What is SCD? How do you developed SCD type1 and SCD type2?

· How do you generate surrogate key in datastage?

· What is job sequence?

· What are plug-ins?

· How much data you can get every day?

· What is the biggest table and size in your schema or in your project?

· What is the size of data warehouse (by loading data)?

· How do you improve the performance of the hashed file?

· What is IPC Stage?

· What are the different types of stages and used in your project?

· What are the operations you can do in IPC Stage and transformer stage?

· What is merge stage? How do you merge two flat files?

· What is difference between ODBC and ORACLE OCI stage?

· What difference between sequential file and hashed file?

· Can you use sequential file as source to hashed file? Have you done it? What error it will

· give?

· Why hashed file improve the performance?

· Can aggregator and transformer stage used for sorting data? How

· How many input links you can give to transformer?

· Definition of Slowly Changing Dimensions? Types?

· What is iconv and oconv functions?

· What is the advantage of using OCI stage as compared to ODBC stage

· What is the difference between Interprocess and inprocess? Which one is the best?

Big Data - DW & BI

Translate to your Language

Labels

Tuesday, January 8, 2013

Ton of Data Warehousing and Datastage Interview Questions

0 comments:

Popular Posts

Pin It