Slowly Changing Dimensions
Type 2 Slowly Changing Dimension (SCD) is a strategy used for loading data warehouses. It is often used for loading dimension tables, in order to keep track of changes on specific columns. A typical slowly changing dimension table would contain the flowing columns:-
A surrogate key. This is usually a
numeric column containing an automatically-generated number (using an
identity column, a rank function or a sequence).
-
A natural key. This is the list of columns that represent the primary key of the operational system.
-
Columns that one must overwritten on change.
-
Columns that require to add row on change.
-
A starting timestamp column indicating when the record was created in the data warehouse
-
An ending timestamp column indicating when the record became obsolete (closing date)
-
A current record flag indicating whether the record is the actual one (1) or an old one (0)
In the operational system, a product is defined by its ID that acts as a primary key. Every product has a name, a size, a supplier and a family. In the Data Warehouse a new version of this product is stored whenever the supplier or the family is updated in the operational system.
Figure 6-1 Type 2 Slow Changing Dimensions Example
Description of "Figure 6-1 Type 2 Slow Changing Dimensions Example"
-
The supplier is updated for product P1
-
The family is updated for product P2
-
The name is updated for product P3
-
Product P5 is added
-
The update of the supplier of P1 is translated into the creation of a
new current record (Surrogate Key 5) and the closing of the previous
record (Surrogate Key 1)
-
The update of the family of P2 is translated into the creation of a
new current record (Surrogate Key 6) and the closing of the previous
record (Surrogate Key 2)
-
The update of the name of P3 simply updates the target record with Surrogate Key 3
-
The new product P5 is translated into the creation of a new current record (Surrogate Key 7).
When populating such a datastore in an interface, the IKM has access to this metadata using the SCD_xx selectors on the getColList() substitution method.
The way Oracle Data Integrator implements Type 2 Slowly Changing Dimensions is described below:
-
Drop (if it exists) and create the integration table in the staging area.
-
Insert the flow data in the integration table using only mappings that apply to the natural key, overwrite on change and add row on change columns. Set the starting timestamp to the current date and the ending timestamp to a constant.
-
Recycle previous rejected records
-
Call the CKM to perform a data quality check on the flow
-
Flag the records in the integration table to 'U' when the natural key and the add row on change columns have not changed compared to the current records of the target.
-
Update the target with the columns flagged overwrite on change by using the integration table content filtered on the 'U' flag.
-
Close old records - those for which the natural key exists in the integration table, and set their current record flag to 0 and their ending timestamp to the current date
-
Insert the new changing records with their current record flag set to 1
-
Drop the integration table.
No comments:
Post a Comment