Translate to your Language

Thursday, December 13, 2012

Materialized View

by Unknown  |  in DB at  7:52 PM


In data warehouses, materialized views can be used to pre compute and store aggregated data such as the sum of sales. Materialized views in these environments are typically referred to as summaries, because they store summarized data. They can also be used to pre compute joins with or without aggregations. A materialized view eliminates the overhead associated with expensive joins or aggregations for a large or important class of queries.

Need for Materialized View

  • It increases the speed of queries on very large databases. Queries to large databases often involve joins between tables or aggregations such as SUM, or both. These operations are very expensive in terms of time and processing power. The type of materialized view that is created determines how the materialized view can be refreshed and used by query rewrite.

  • It can be used in a number of ways and almost identical syntax can be used to perform a number of roles.

  • It improves query performance by pre calculating expensive join and aggregation operations on the database prior to execution time and storing the results in the database. In general, rewriting queries to use materialized views rather than detail table’s results in a significant performance gain.

Where We Can Use Them

In essence, there are three situations in which we might want to make the same set of information available in multiple places.

The first of these is in summary tables. We provide summary tables in a data warehouse because not every query is going to need the very detailed information that our fact tables may provide, and because allowing high‑level queries to scan a much smaller pre‑aggregated data set reduces our I/O burden.

The second situation is in dimension tables. The dimension tables contain lists of all the values in the key columns of the fact tables, and maybe some more information on those values such as descriptive text for codes.

The third situation is in the fact tables themselves. I will admit this is a bit of a stretch for many people, but the fact tables are generally the cleansed and transformed versions of some kind of source data set. That data set may be a set of OLTP database tables, or a set of flat files, but the fact tables are still in some ways a duplication of the original data set.

Types of Materialized Views

  • Materialized Views with Joins and Aggregates
  • Single-Table Aggregate Materialized Views
  • Materialized Views Containing Only Joins

Materialized Views with Joins and Aggregates
In data warehouses, materialized views would normally contain one of the aggregates. For fast refresh to be possible, the SELECT list must contain the entire GROUP BY columns (if present), and may contain one or more aggregate functions. The aggregate function must be one of: SUM, COUNT(x), COUNT (*), COUNT (DISTINCT x), AVG, VARIANCE, STDDEV, MIN, and MAX, and the expression to be aggregated can be any SQL value expression.
If a materialized view contains joins and aggregates, then it cannot be fast refreshed using a materialized view log. Therefore, for a fast refresh to be possible, only new data can be added to the detail tables and it must be loaded using the direct path method
Example:
CREATE MATERIALIZED VIEW store_sales_mv
  PCTFREE 0 TABLESPACE mviews
  STORAGE (initial 16k next 16k pctincrease 0)
  BUILD DEFERRED
  REFRESH COMPLETE ON DEMAND
  ENABLE QUERY REWRITE
  AS
  SELECT
   s.store_name,
     SUM(dollar_sales) AS sum_dollar_sales
      FROM store s, fact f
      WHERE f.store_key = s.store_key
      GROUP BY s.store_name;
 
This example creates a materialized view store_sales_mv that computes the sum of sales by store. It is derived by joining the tables store and fact on the column store_key. The materialized view does not initially contain any data because the build method is DEFERRED. A complete refresh is required for the first refresh of a build deferred materialized view. When it is refreshed, a complete refresh is performed and, once populated; this materialized view can be used by query rewrite. 
Single-Table Aggregate Materialized Views
A materialized view that contains one or more aggregates (SUM, AVG, VARIANCE, STDDEV, and COUNT) and a GROUP BY clause may be based on a single table. The aggregate function can involve an expression on the columns such as SUM(a*b). If this materialized view is to be incrementally refreshed, then a materialized view log must be created on the detail table with the INCLUDING NEW VALUES option, and the log must contain all columns referenced in the materialized view query definition.
Example:
 
CREATE MATERIALIZED VIEW log on fact   
  with rowid (store_key, time_key, dollar_sales, unit_sales)  
  including new values;  
 
CREATE MATERIALIZED VIEW sum_sales  
  PARALLEL
  BUILD IMMEDIATE  
  REFRESH FAST ON COMMIT  
  AS  
  SELECT f.store_key, f.time_key,  
         COUNT(*) AS count_grp,  
  SUM(f.dollar_sales) AS sum_dollar_sales,  
        COUNT(f.dollar_sales) AS count_dollar_sales,  
  SUM(f.unit_sales) AS sum_unit_sales,  
        COUNT(f.unit_sales) AS count_unit_sales 
  FROM fact f  
  GROUP BY f.store_key, f.time_key;
  
In this example, a materialized view has been created which contains aggregates on a single table. Because the materialized view log has been created, the materialized view is fast refreshable. If DML is applied against the fact table, then, when the commit is issued, the changes will be reflected in the materialized view. 

Materialized Views Containing Only Joins

Materialized views may contain only joins and no aggregates, such as in the next example where a materialized view is created which joins the fact table to the store table. The advantage of creating this type of materialized view is that expensive joins will be pre calculated.

Incremental refresh for a materialized view containing only joins is possible after any type of DML to the base tables (direct load or conventional INSERT, UPDATE, or DELETE). It can be defined to be refreshed ON COMMIT or ON DEMAND. If it is ON COMMIT, the refresh is performed at commit time of the transaction that does DML on the materialized view's detail table.

If you specify REFRESH FAST, Oracle performs further verification of the query definition to ensure that fast refresh can be performed if any of the detail tables change. These additional checks include:

  • A materialized view log must be present for each detail table.
  • The rowids of all the detail tables must appear in the SELECT list of the materialized view query definition.
  • If there are outer joins, unique constraints must exist on the join columns of the inner table.

For example, if you are joining the fact and a dimension table and the join is an outer join with the fact table being the outer table, there must exist unique constraints on the join columns of the dimension table.

If some of the above restrictions are not met, then the materialized view should be created as REFRESH FORCE to take advantage of incremental refresh when it is possible. If the materialized view is created as ON COMMIT, Oracle performs all of the fast refresh checks. If one of the tables did not meet all of the criteria, but the other tables did, the materialized view would still be incrementally refreshable with respect to the other tables for which all the criteria are met.

In a data warehouse star schema, if space is at a premium, you can include the rowid of the fact table only because this is the table that will be most frequently updated and the user can specify the FORCE option when the materialized view is created.

A materialized view log should contain the rowid of the master table. It is not necessary to add other columns.

To speed up refresh, it is recommended that the user create indexes on the columns of the materialized view that stores the rowids of the fact table.
 
Example:
 
CREATE MATERIALIZED VIEW LOG ON fact
  WITH ROWID;
 
CREATE MATERIALIZED VIEW LOG ON time
  WITH ROWID;
 
CREATE MATERIALIZED VIEW LOG ON store
  WITH ROWID;
 
CREATE MATERIALIZED VIEW detail_fact_mv 
       PARALLEL
       BUILD IMMEDIATE
       REFRESH FAST
       AS
       SELECT
    f.rowid "fact_rid", t.rowid "time_rid", s.rowid "store_rid",
      s.store_key, s.store_name, f.dollar_sales,
      f.unit_sales, f.time_key
       FROM fact f, time t, store s 
       WHERE f.store_key = s.store_key(+) AND 
       f.time_key = t.time_key(+);
In this example, in order to perform a REFRESH FAST, unique constraints should exist on s.store_key and t.time_key. It is also recommended that indexes be created on the columns fact_rid, time_rid, and store_rid, as illustrated below, which will improve the performance of refresh. 

0 comments:

© Copyright © 2015Big Data - DW & BI. by