A fact
table is the primary table in a dimensional model where the numerical
performance measurements of the business are stored. We strive to store the
measurement data resulting from a business process in a single data mart. Since
measurement data is overwhelmingly the largest part of any data mart, we avoid
duplicating it in multiple places around the enterprise.
The
term fact represents a business measure. A measurement is taken at the
intersection of all the dimensions (day, product, and store). This list of
dimensions defines the grain of the fact table and tells us what the scope of
the measurement is.
All the measurements in a fact
table must be at the same grain.
The
most useful facts are numeric and additive. Additivity is crucial because data
warehouse applications almost never retrieve a single fact table row. Semi
additive facts can be added only along some of the dimensions, and non-
additive facts simply can’t be added at all. With non additive facts we are
forced to use counts or averages if we wish to summarize the rows or are
reduced to printing out the fact rows one at a time. Also, there are factless fact
tables which do not contain any measurement metrics; but merely capture the
relationship between the involved keys.
Can a measured fact
be textual?
It is
theoretically possible for a measured fact to be textual; however, the
condition arises rarely. In most cases, a textual measurement is a description
of something and is drawn from a discrete list of values. A true text fact is
rare in a data warehouse because the unpredictable content of a text fact, like
a free text comment, makes it nearly impossible to analyze.
Transaction and
Snapshot grained fact tables
Transaction-grained
fact tables are often richly dimensional and usually have only a single fact
(the amount of the transaction). The context of a transaction is modeled as a
set of generally independent dimensions. The measured transaction amount is in
a fact table that refers to all the dimensions by foreign keys pointing outward
to their respective dimension tables.
A
snapshot is a measurement of status at a specific point in time. In the data
warehouse world, there are two different snapshots of status, both equally
important. The periodic snapshot
is a regular, predictable statement of status. The monthly snapshot, the most
common type, may contain a whole set of measurements. Some of these snapshots
are instantaneous measurements at the end of the period, and some are
measurements accumulated throughout the period. The design of this snapshot
focuses on the activity that occurred during the time interval between
snapshots.
The
other important kind of snapshot is the accumulating
snapshot, which shows us the status at any given moment. If we ask for
the current status an hour from now, it may be different. The accumulating
snapshot often combines the most recent volatile status with measures that
accumulate from the beginning of history.
The
transaction-grained fact table represents an atomic action that occurs at an
instantaneous point in time. There is often no guarantee that a record for a
given policy or customer exists in the transaction fact table; a record exists
only if a transaction occurs. Conversely, there is no upper limit to the number
of records for a given policy or customer. The date/timestamp on the
transaction record may be accurate to the individual day, or it may be accurate
to the hour and minute. Normally there is only a single fact generically called
“amount.” Its meaning depends on the value of the transaction key. After
posting it, we don’t usually revisit a transaction-grained fact record for any
type of update.
The
periodic-snapshot-grained fact table represents a predefined time span.
Usually, exactly one record exists for each combination of significant keys.
The timestamp on the periodic snapshot is only the name of the period and
usually denotes the end of the period. Periodic snapshots can have any number
of facts, depending on what measures are possible or useful to calculate. Some
of these facts may be extraordinarily difficult to calculate directly from the
transactions. Finally, once it’s posted, we do not normally revisit a
periodic-snapshot fact record for any type of update unless we decide to add a
completely new periodic-snapshot fact to the entire table.
The
accumulating-snapshot-grained fact table represents an indeterminate time span,
covering the entire history starting when the transaction started and ending
with the present moment. Accumulating-snapshot fact tables almost always have
multiple time stamps. One time stamp represents the last time the snapshot was
updated, and the others represent generic or predictable events that may have
taken place in the life of the coverage. Note that some of these time stamps
may have to handle null values because, for example, the first_claim_date will
not have a real value unless a claim has been filed. The correct treatment of
null date keys requires using integer-valued surrogate keys.
Accumulating
snapshots, such as the periodic ones, can have any number of facts depending on
what measures are possible or useful to calculate. In dramatic contrast to the
other fact-table types, we frequently revisit accumulating-snapshot fact
records to update the facts. As history unfolds, we must revisit the same
record several times to revise the accumulating status.
When to choose
which kind of fact table?
The
transaction-grained fact table is the only table that can answer detailed
questions about timing and questions about unpredictable, repetitive behavior.
The
periodic-snapshot fact table is often the only table that can easily generate a
regular, predictable view of the important measures of a business. Calculating
basic measures, such as revenue and cost, may be impractical if starting from
any sort of transactional view of the business. So we publish such measures
only once each month at a low level of granularity and let all the company’s
managers and analysts roll these measures upward into different combinations
from our periodic-snapshot table.
The
accumulating-snapshot fact table is appealing when we are tracking items that
have a finite lifetime, such as insurance policy coverages or line items on
orders. If the coverage or the order line item exists for a finite period of
time, there is a flurry of updating activity on those records when they are
“open,” and then the record quiets down. It remains in the database as a
summary of the activity surrounding that coverage or line item. The advantage
of the accumulating snapshot is that we can do a great deal of useful reporting
without complex constraints on the time dimension and without double counting
across the multiple records of a periodic snapshot.
0 comments: