Materialized View

In data warehouses, materialized views can be used to pre compute and store aggregated data such as the sum of sales. Materialized views in these environments are typically referred to as summaries, because they store summarized data. They can also be used to pre compute joins with or without aggregations. A materialized view eliminates the overhead associated with expensive joins or aggregations for a large or important class of queries.

Need for Materialized View

It increases the speed of queries on very large databases. Queries to large databases often involve joins between tables or aggregations such as SUM, or both. These operations are very expensive in terms of time and processing power. The type of materialized view that is created determines how the materialized view can be refreshed and used by query rewrite.

It can be used in a number of ways and almost identical syntax can be used to perform a number of roles.

It improves query performance by pre calculating expensive join and aggregation operations on the database prior to execution time and storing the results in the database. In general, rewriting queries to use materialized views rather than detail table’s results in a significant performance gain.

Where We Can Use Them

In essence, there are three situations in which we might want to make the same set of information available in multiple places.

The first of these is in summary tables. We provide summary tables in a data warehouse because not every query is going to need the very detailed information that our fact tables may provide, and because allowing high‑level queries to scan a much smaller pre‑aggregated data set reduces our I/O burden.

The second situation is in dimension tables. The dimension tables contain lists of all the values in the key columns of the fact tables, and maybe some more information on those values such as descriptive text for codes.

The third situation is in the fact tables themselves. I will admit this is a bit of a stretch for many people, but the fact tables are generally the cleansed and transformed versions of some kind of source data set. That data set may be a set of OLTP database tables, or a set of flat files, but the fact tables are still in some ways a duplication of the original data set.

Types of Materialized Views

Materialized Views with Joins and Aggregates
Single-Table Aggregate Materialized Views
Materialized Views Containing Only Joins

Materialized Views with Joins and Aggregates

In data warehouses, materialized views would normally contain one of the aggregates. For fast refresh to be possible, the SELECT list must contain the entire GROUP BY columns (if present), and may contain one or more aggregate functions. The aggregate function must be one of: SUM, COUNT(x), COUNT (*), COUNT (DISTINCT x), AVG, VARIANCE, STDDEV, MIN, and MAX, and the expression to be aggregated can be any SQL value expression.

If a materialized view contains joins and aggregates, then it cannot be fast refreshed using a materialized view log. Therefore, for a fast refresh to be possible, only new data can be added to the detail tables and it must be loaded using the direct path method

Example:

CREATE MATERIALIZED VIEW store_sales_mv

  PCTFREE 0 TABLESPACE mviews

  STORAGE (initial 16k next 16k pctincrease 0)

  BUILD DEFERRED

  REFRESH COMPLETE ON DEMAND

  ENABLE QUERY REWRITE

AS

  SELECT

   s.store_name,

     SUM(dollar_sales) AS sum_dollar_sales

      FROM store s, fact f

      WHERE f.store_key = s.store_key

      GROUP BY s.store_name;

This example creates a materialized view store_sales_mv that computes the sum of sales by store. It is derived by joining the tables store and fact on the column store_key. The materialized view does not initially contain any data because the build method is DEFERRED. A complete refresh is required for the first refresh of a build deferred materialized view. When it is refreshed, a complete refresh is performed and, once populated; this materialized view can be used by query rewrite.

Single-Table Aggregate Materialized Views

A materialized view that contains one or more aggregates (SUM, AVG, VARIANCE, STDDEV, and COUNT) and a GROUP BY clause may be based on a single table. The aggregate function can involve an expression on the columns such as SUM(a*b). If this materialized view is to be incrementally refreshed, then a materialized view log must be created on the detail table with the INCLUDING NEW VALUES option, and the log must contain all columns referenced in the materialized view query definition.

Example:

CREATE MATERIALIZED VIEW log on fact

  with rowid (store_key, time_key, dollar_sales, unit_sales)

  including new values;

CREATE MATERIALIZED VIEW sum_sales

  PARALLEL

  BUILD IMMEDIATE

  REFRESH FAST ON COMMIT

AS

  SELECT f.store_key, f.time_key,

         COUNT(*) AS count_grp,

  SUM(f.dollar_sales) AS sum_dollar_sales,

        COUNT(f.dollar_sales) AS count_dollar_sales,

  SUM(f.unit_sales) AS sum_unit_sales,

        COUNT(f.unit_sales) AS count_unit_sales

  FROM fact f

  GROUP BY f.store_key, f.time_key;

In this example, a materialized view has been created which contains aggregates on a single table. Because the materialized view log has been created, the materialized view is fast refreshable. If DML is applied against the fact table, then, when the commit is issued, the changes will be reflected in the materialized view.

Materialized Views Containing Only Joins

Materialized views may contain only joins and no aggregates, such as in the next example where a materialized view is created which joins the fact table to the store table. The advantage of creating this type of materialized view is that expensive joins will be pre calculated.

Incremental refresh for a materialized view containing only joins is possible after any type of DML to the base tables (direct load or conventional INSERT, UPDATE, or DELETE). It can be defined to be refreshed ON COMMIT or ON DEMAND. If it is ON COMMIT, the refresh is performed at commit time of the transaction that does DML on the materialized view's detail table.

If you specify REFRESH FAST, Oracle performs further verification of the query definition to ensure that fast refresh can be performed if any of the detail tables change. These additional checks include:

A materialized view log must be present for each detail table.
The rowids of all the detail tables must appear in the SELECT list of the materialized view query definition.
If there are outer joins, unique constraints must exist on the join columns of the inner table.

For example, if you are joining the fact and a dimension table and the join is an outer join with the fact table being the outer table, there must exist unique constraints on the join columns of the dimension table.

If some of the above restrictions are not met, then the materialized view should be created as REFRESH FORCE to take advantage of incremental refresh when it is possible. If the materialized view is created as ON COMMIT, Oracle performs all of the fast refresh checks. If one of the tables did not meet all of the criteria, but the other tables did, the materialized view would still be incrementally refreshable with respect to the other tables for which all the criteria are met.

In a data warehouse star schema, if space is at a premium, you can include the rowid of the fact table only because this is the table that will be most frequently updated and the user can specify the FORCE option when the materialized view is created.

A materialized view log should contain the rowid of the master table. It is not necessary to add other columns.

To speed up refresh, it is recommended that the user create indexes on the columns of the materialized view that stores the rowids of the fact table.

Example:

CREATE MATERIALIZED VIEW LOG ON fact

  WITH ROWID;

CREATE MATERIALIZED VIEW LOG ON time

  WITH ROWID;

CREATE MATERIALIZED VIEW LOG ON store

  WITH ROWID;

CREATE MATERIALIZED VIEW detail_fact_mv

       PARALLEL

       BUILD IMMEDIATE

       REFRESH FAST

AS

       SELECT

    f.rowid "fact_rid", t.rowid "time_rid", s.rowid "store_rid",

      s.store_key, s.store_name, f.dollar_sales,

      f.unit_sales, f.time_key

       FROM fact f, time t, store s

       WHERE f.store_key = s.store_key(+) AND

       f.time_key = t.time_key(+);

In this example, in order to perform a REFRESH FAST, unique constraints should exist on s.store_key and t.time_key. It is also recommended that indexes be created on the columns fact_rid, time_rid, and store_rid, as illustrated below, which will improve the performance of refresh.

Big Data - DW & BI

Translate to your Language

Labels

Thursday, December 13, 2012

Materialized View

0 comments:

Popular Posts

Pin It