In
data warehouses, materialized views can be used to pre compute and store
aggregated data such as the sum of sales. Materialized views in these
environments are typically referred to as summaries, because they store
summarized data. They can also be used to pre compute joins with or without
aggregations. A materialized view eliminates the overhead associated with
expensive joins or aggregations for a large or important class of queries.
- It increases the speed of
queries on very large databases. Queries to large databases often involve
joins between tables or aggregations such as SUM, or both. These
operations are very expensive in terms of time and processing power. The
type of materialized view that is created determines how the materialized
view can be refreshed and used by query rewrite.
- It can be used in a
number of ways and almost identical syntax can be used to perform a number
of roles.
- It improves
query performance by pre calculating expensive join and aggregation
operations on the database prior to execution time and storing the results
in the database. In general, rewriting queries to use materialized views
rather than detail table’s results in a significant performance gain.
Where We Can Use Them
In
essence, there are three situations in which we might want to make the same set
of information available in multiple places.
The
first of these is in summary
tables. We provide summary tables in a data warehouse because not
every query is going to need the very detailed information that our fact tables
may provide, and because allowing high‑level queries to scan a much smaller pre‑aggregated
data set reduces our I/O burden.
The
second situation is in dimension
tables. The dimension tables contain lists of all the values in the
key columns of the fact tables, and maybe some more information on those values
such as descriptive text for codes.
The
third situation is in the fact
tables themselves. I will admit this is a bit of a stretch for many
people, but the fact tables are generally the cleansed and transformed versions
of some kind of source data set. That data set may be a set of OLTP database
tables, or a set of flat files, but the fact tables are still in some ways a
duplication of the original data set.
Types of Materialized Views
- Materialized Views with
Joins and Aggregates
- Single-Table Aggregate
Materialized Views
- Materialized Views
Containing Only Joins
Materialized Views
with Joins and Aggregates
In data
warehouses, materialized views would normally contain one of the aggregates.
For fast refresh to be possible, the SELECT list must contain the entire GROUP
BY columns (if present), and may contain one or more aggregate functions. The
aggregate function must be one of: SUM, COUNT(x), COUNT (*), COUNT (DISTINCT
x), AVG, VARIANCE, STDDEV, MIN, and MAX, and the expression to be aggregated
can be any SQL value expression.
If a
materialized view contains joins and aggregates, then it cannot be fast
refreshed using a materialized view log. Therefore, for a fast refresh to be
possible, only new data can be added to the detail tables and it must be loaded
using the direct path method
Example:
CREATE MATERIALIZED VIEW store_sales_mv
PCTFREE 0 TABLESPACE mviews
STORAGE (initial 16k next 16k pctincrease 0)
BUILD DEFERRED
REFRESH COMPLETE ON DEMAND
ENABLE QUERY REWRITE
AS
SELECT
s.store_name,
SUM(dollar_sales) AS sum_dollar_sales
FROM store s, fact f
WHERE f.store_key = s.store_key
GROUP BY s.store_name;
This example creates a materialized view store_sales_mv that computes the sum of sales by store. It is derived by joining the tables store and fact on the column store_key. The materialized view does not initially contain any data because the build method is DEFERRED. A complete refresh is required for the first refresh of a build deferred materialized view. When it is refreshed, a complete refresh is performed and, once populated; this materialized view can be used by query rewrite.
Single-Table Aggregate Materialized
Views
A
materialized view that contains one or more aggregates (SUM, AVG, VARIANCE,
STDDEV, and COUNT) and a GROUP BY clause may be based on a single table. The
aggregate function can involve an expression on the columns such as SUM(a*b).
If this materialized view is to be incrementally refreshed, then a materialized
view log must be created on the detail table with the INCLUDING NEW VALUES
option, and the log must contain all columns referenced in the materialized
view query definition.
Example:
CREATE MATERIALIZED VIEW log on fact
with rowid (store_key, time_key, dollar_sales, unit_sales)
including new values;
CREATE MATERIALIZED VIEW sum_sales
PARALLEL
BUILD IMMEDIATE
REFRESH FAST ON COMMIT
AS
SELECT f.store_key, f.time_key,
COUNT(*) AS count_grp,
SUM(f.dollar_sales) AS sum_dollar_sales,
COUNT(f.dollar_sales) AS count_dollar_sales,
SUM(f.unit_sales) AS sum_unit_sales,
COUNT(f.unit_sales) AS count_unit_sales
FROM fact f
GROUP BY f.store_key, f.time_key;
In this example, a materialized view has been created which contains aggregates on a single table. Because the materialized view log has been created, the materialized view is fast refreshable. If DML is applied against the fact table, then, when the commit is issued, the changes will be reflected in the materialized view.
Materialized Views
Containing Only Joins
Materialized
views may contain only joins and no aggregates, such as in the next example
where a materialized view is created which joins the fact table to the store table. The
advantage of creating this type of materialized view is that expensive joins
will be pre calculated.
Incremental
refresh for a materialized view containing only joins is possible after any
type of DML to the base tables (direct load or conventional INSERT, UPDATE, or
DELETE). It can be defined to be refreshed ON COMMIT or ON
DEMAND. If it is ON COMMIT, the refresh is performed at commit time of the
transaction that does DML on the materialized view's detail table.
If
you specify REFRESH FAST, Oracle performs further verification of the query
definition to ensure that fast refresh can be performed if any of the
detail tables change. These additional checks include:
- A
materialized view log must be present for each detail table.
- The
rowids of all the detail tables must appear in the SELECT list of the
materialized view query definition.
- If there
are outer joins, unique constraints must exist on the join columns of the
inner table.
For
example, if you are joining the fact and a dimension table and the join is an
outer join with the fact table being the outer table, there must exist unique
constraints on the join columns of the dimension table.
If
some of the above restrictions are not met, then the materialized view should
be created as REFRESH FORCE to take advantage of incremental refresh when it is
possible. If the materialized view is created as ON COMMIT, Oracle performs all
of the fast refresh checks. If one of the tables did not meet all of the
criteria, but the other tables did, the materialized view would still be
incrementally refreshable with respect to the other tables for which all the
criteria are met.
In a
data warehouse star schema, if space is at a premium, you can include the rowid
of the fact table only because this is the table that will be most frequently updated
and the user can specify the FORCE option when the materialized view is
created.
A
materialized view log should contain the rowid of the master table. It is not
necessary to add other columns.
To
speed up refresh, it is recommended that the user create indexes on the columns
of the materialized view that stores the rowids of the fact table.
Example:
CREATE MATERIALIZED VIEW LOG ON fact
WITH ROWID;
CREATE MATERIALIZED VIEW LOG ON time
WITH ROWID;
CREATE MATERIALIZED VIEW LOG ON store
WITH ROWID;
CREATE MATERIALIZED VIEW detail_fact_mv
PARALLEL
BUILD IMMEDIATE
REFRESH FAST
AS
SELECT
f.rowid "fact_rid", t.rowid "time_rid", s.rowid "store_rid",
s.store_key, s.store_name, f.dollar_sales,
f.unit_sales, f.time_key
FROM fact f, time t, store s
WHERE f.store_key = s.store_key(+) AND
f.time_key = t.time_key(+);
0 comments: