Translate to your Language

Tuesday, March 12, 2013

What is Fact Table

by Unknown  |  in DW at  9:25 PM


A fact table is the primary table in a dimensional model where the numerical performance measurements of the business are stored. We strive to store the measurement data resulting from a business process in a single data mart. Since measurement data is overwhelmingly the largest part of any data mart, we avoid duplicating it in multiple places around the enterprise.

The term fact represents a business measure. A measurement is taken at the intersection of all the dimensions (day, product, and store). This list of dimensions defines the grain of the fact table and tells us what the scope of the measurement is.

All the measurements in a fact table must be at the same grain.

The most useful facts are numeric and additive. Additivity is crucial because data warehouse applications almost never retrieve a single fact table row. Semi additive facts can be added only along some of the dimensions, and non- additive facts simply can’t be added at all. With non additive facts we are forced to use counts or averages if we wish to summarize the rows or are reduced to printing out the fact rows one at a time. Also, there are factless fact tables which do not contain any measurement metrics; but merely capture the relationship between the involved keys.

Can a measured fact be textual?

It is theoretically possible for a measured fact to be textual; however, the condition arises rarely. In most cases, a textual measurement is a description of something and is drawn from a discrete list of values. A true text fact is rare in a data warehouse because the unpredictable content of a text fact, like a free text comment, makes it nearly impossible to analyze.


Transaction and Snapshot grained fact tables

Transaction-grained fact tables are often richly dimensional and usually have only a single fact (the amount of the transaction). The context of a transaction is modeled as a set of generally independent dimensions. The measured transaction amount is in a fact table that refers to all the dimensions by foreign keys pointing outward to their respective dimension tables.

A snapshot is a measurement of status at a specific point in time. In the data warehouse world, there are two different snapshots of status, both equally important. The periodic snapshot is a regular, predictable statement of status. The monthly snapshot, the most common type, may contain a whole set of measurements. Some of these snapshots are instantaneous measurements at the end of the period, and some are measurements accumulated throughout the period. The design of this snapshot focuses on the activity that occurred during the time interval between snapshots.

The other important kind of snapshot is the accumulating snapshot, which shows us the status at any given moment. If we ask for the current status an hour from now, it may be different. The accumulating snapshot often combines the most recent volatile status with measures that accumulate from the beginning of history.

The transaction-grained fact table represents an atomic action that occurs at an instantaneous point in time. There is often no guarantee that a record for a given policy or customer exists in the transaction fact table; a record exists only if a transaction occurs. Conversely, there is no upper limit to the number of records for a given policy or customer. The date/timestamp on the transaction record may be accurate to the individual day, or it may be accurate to the hour and minute. Normally there is only a single fact generically called “amount.” Its meaning depends on the value of the transaction key. After posting it, we don’t usually revisit a transaction-grained fact record for any type of update.

The periodic-snapshot-grained fact table represents a predefined time span. Usually, exactly one record exists for each combination of significant keys. The timestamp on the periodic snapshot is only the name of the period and usually denotes the end of the period. Periodic snapshots can have any number of facts, depending on what measures are possible or useful to calculate. Some of these facts may be extraordinarily difficult to calculate directly from the transactions. Finally, once it’s posted, we do not normally revisit a periodic-snapshot fact record for any type of update unless we decide to add a completely new periodic-snapshot fact to the entire table.
The accumulating-snapshot-grained fact table represents an indeterminate time span, covering the entire history starting when the transaction started and ending with the present moment. Accumulating-snapshot fact tables almost always have multiple time stamps. One time stamp represents the last time the snapshot was updated, and the others represent generic or predictable events that may have taken place in the life of the coverage. Note that some of these time stamps may have to handle null values because, for example, the first_claim_date will not have a real value unless a claim has been filed. The correct treatment of null date keys requires using integer-valued surrogate keys.

Accumulating snapshots, such as the periodic ones, can have any number of facts depending on what measures are possible or useful to calculate. In dramatic contrast to the other fact-table types, we frequently revisit accumulating-snapshot fact records to update the facts. As history unfolds, we must revisit the same record several times to revise the accumulating status.


When to choose which kind of fact table?

The transaction-grained fact table is the only table that can answer detailed questions about timing and questions about unpredictable, repetitive behavior.

The periodic-snapshot fact table is often the only table that can easily generate a regular, predictable view of the important measures of a business. Calculating basic measures, such as revenue and cost, may be impractical if starting from any sort of transactional view of the business. So we publish such measures only once each month at a low level of granularity and let all the company’s managers and analysts roll these measures upward into different combinations from our periodic-snapshot table.
The accumulating-snapshot fact table is appealing when we are tracking items that have a finite lifetime, such as insurance policy coverages or line items on orders. If the coverage or the order line item exists for a finite period of time, there is a flurry of updating activity on those records when they are “open,” and then the record quiets down. It remains in the database as a summary of the activity surrounding that coverage or line item. The advantage of the accumulating snapshot is that we can do a great deal of useful reporting without complex constraints on the time dimension and without double counting across the multiple records of a periodic snapshot.




0 comments:

© Copyright © 2015Big Data - DW & BI. by