In today’s world, data is everything. It’s a vital asset across all aspects of life. Historical data especially, is incredibly important. Managing and preserving this historical data is fundamental in effective data warehousing.
Data warehouse is the goldmine of information, keeping it relevant is a big challenge. There will always be some entities whose data will change slowly but often. This is where Slowly Changing Dimensions (SCD) comes in.
Understanding Slowly Changing Dimensions
SCD is a set of techniques to manage historical data over time. It keeps track of the new data and historical data. This enables us to observe and stay informed about changes to an entity’s history. For instance, consider Product entities. The price of a product can fluctuate frequently. If we simply overwrite the product table with each new price, we lose sight of its historical pricing data.That will be invalid not to see the history of it. This is where SCD becomes invaluable. Through SCD strategies, we maintain the relevance of product histories, enabling us to discern when price changes occurred.
Types of SCD
There are multiple types of SCD. We can follow any type depending on our requirements over entities.
- Type 1 SCD
- Type 2 SCD
- Type 3 SCD
- Type 4 SCD
Type 1 SCD
The simplest approach among Slowly Changing Dimensions (SCD) is SCD Type 1, where data is overwritten. When a change occurs, the existing record in the dimension is replaced with the new data, erasing any trace of historical information.
This method is suitable when there’s no requirement for historical data. For an example, consider customer addresses.
While delivering goods, the system prioritizes the current address for customer convenience, without the need to retain past addresses in the database.
Type 2 SCD
SCD Type 2 is widely recognized as the most popular and effective among all types. It’s the optimal way to maintain historical data. When a change occurs, a new row is inserted into the target table, while the old row is marked as inactive. If new data is introduced, a new row is added accordingly. This approach ensures comprehensive tracking of entity changes.
Consider products as an example: product prices can fluctuate frequently. In this scenario, whenever a new product is introduced, a new row is appended to the product table. Likewise, when an existing product undergoes a change, the previous row is deactivated, and a new row is added with the updated information.
Type 3 SCD
Type 3 SCD is more common when historical data retention is not that much mandatory. In this type, a new column is introduced for keeping the immediate previous data. This approach is suitable for scenarios where tracking just the most recent change is sufficient. For instance, consider a situation where the product price is updated occasionally, and the system only needs to retain the previous price for comparison purposes.
Type 4 SCD
We will explore the implementation of different types of SCD here. We’ll start with Type 1, Type 2, and Type 3 SCD implementations.
Implementation of Slowly Changing Dimensions
We will explore the implementation of different types of SCD here. We’ll start with Type 1, Type 2, and Type 3 SCD implementations.
Implementing Type 1 SCD
Type 1 Slowly Changing Dimension (SCD) is often preferred when historical data retention isn’t necessary. It revolves around updating the primary table directly, altering its fields accordingly. However, utilizing a stage table can enhance this process.
Let us create an Employee stage table and Employee table.
-- Create the Employee table
CREATE TABLE Employee (
EmployeeID INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
Department VARCHAR(50),
Salary DECIMAL(10, 2)
);
-- Create the staging table for Employee updates
CREATE TABLE Employee_Staging (
EmployeeID INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
Department VARCHAR(50),
Salary DECIMAL(10, 2)
);
-- Adding some data to employee table
INSERT INTO Employee (EmployeeID, FirstName, LastName, Department, Salary)
VALUES
(47121, 'Nurul', 'Islam', 'IT', 62000.00),
(47122, 'Mohammad', 'Nabi', 'HR', 55000.00),
(47123, 'Jewel', 'Islam', 'Finance', 65000.00);
-- Adding data that we will change in stage table
INSERT INTO Employee_Staging (EmployeeID, FirstName, LastName, Department, Salary)
VALUES
(47121, 'Nurul', 'Islam', 'IT', 72000.00)
After populating our Employee table with dummy data, we’re set to update the salary of the employee with ID 47121.
Now let’s look at our SCD type 1 Query
MERGE INTO Employee AS dst
USING Employee_Staging AS src
ON (dst.EmployeeID = src.EmployeeID)
WHEN MATCHED THEN
UPDATE SET
dst.FirstName = src.FirstName,
dst.LastName = src.LastName,
dst.Department = src.Department,
dst.Salary = src.Salary
WHEN NOT MATCHED BY target THEN
INSERT (EmployeeID, FirstName, LastName, Department, Salary)
VALUES (src.EmployeeID, src.FirstName, src.LastName, src.Department, src.Salary);
Now if we run a select query on our Employee table we will see that we have a change in salary for an employee whose ID is 47121.
Fig: Type 1 SCD example
Implementing Type 2 SCD
Type 2 SCD is one of the most used approaches for keeping historical data.It relies on two essential tables: a stage table and a primary table. Initially, data resides in the stage table as the primary repository. Then, records from the stage table are transferred to the main table for long-term storage.
For this we will use two table
- Product_Stage – this will be our stage table.
- Product – this will be our primary table.
To implement Type 2 Slowly Changing Dimensions (SCD), we will set up two tables; create a product table and a product stage table. We will also add some dummy data in both of them to test.
-- create product table
CREATE TABLE Product (
ProductCode VARCHAR(50),
Name VARCHAR(255),
Price DECIMAL(10, 2),
StartedFrom DATE,
IsActive BIT DEFAULT 1
);
-- create product stage table
CREATE TABLE Product_Stage (
ProductCode VARCHAR(50),
Name VARCHAR(255),
Price DECIMAL(10, 2)
);
-- adding some dummy data
INSERT INTO Product (ProductCode, Name, Price, StartedFrom) VALUES
('MD101', 'iPhone 12', 999.99, '2024-01-01'),
('MD102', 'Samsung Galaxy S21', 899.99, '2024-01-01'),
('MD103', 'Google Pixel 6', 799.99, '2024-01-01'),
('MD104', 'OnePlus 10', 699.99, '2024-01-01'),
('MD105', 'Xiaomi Redmi Note 11', 599.99, '2024-01-01');
-- adding data that will be changed or added in product table
INSERT INTO Product_Stage (ProductCode, Name, Price) VALUES
('MD104', 'OnePlus 10', 599.99),
('MD105', 'Xiaomi Redmi Note 11', 399.99),
('MD108', 'OnePlus 10 PRO', 799.99),
('MD109', 'iPhone 14 Pro Max', 1699.99);
Select * From Product
This will give us below data in our product table. Currently we have 5 products here.
Fig : Type 2 SCD example
Now for adding new products and making changes on any existing products we will strictly follow the stage table. We will always add in the stage table first and then we will run SCD to update that in our main table.
In the stage table, we have added two new products;product code MD108 and MD109. Also we have changed the price of existing products;product code MD104 and MD105.
Now lets look at our below Type 2 SCD query
INSERT INTO Product(ProductCode,Name,Price,StartedFrom,IsActive
)
SELECT ProductCode,Name,Price,StartedFrom,IsActive
FROM(
MERGE INTO Product AS Dst
USING
(SELECT DISTINCT ProductCode, Name, Price FROM Product_Stage) AS Src
ON Dst.ProductCode = Src.ProductCode
And Dst.IsActive = 1
WHEN MATCHED and dst.Name != Src.Name OR Dst.Price != Src.Price THEN
UPDATE SET Dst.IsActive = 0
WHEN NOT MATCHED BY TARGET THEN
INSERT
(
ProductCode,
Name,
Price,
StartedFrom,
IsActive
)
VALUES
(Src.ProductCode, Src.Name, Src.Price, GETDATE(), 1)
OUTPUT $ACTION,
Src.ProductCode,
Src.Name,
Src.Price,
GETDATE(),
1) AS CHANGES([ACTION],ProductCode, Name, Price, StartedFrom, IsActive)
WHERE [ACTION] = 'UPDATE';
Now if we make a select statement we can see the below output. We can see we have added two new products here; MD108 and MD109. Also we have set inactive to the products; MD104 and MD105 first and then we have added them as new.
Fig:Type 2 SCD result
Implementing Type 3 SCD
For Type 3 Slowly Changing Dimension (SCD), we’ll focus on product prices, assuming they don’t fluctuate frequently and there’s no necessity to track every price change. We’ll set up a main table along with a stage table to facilitate this process.
-- create product table
CREATE TABLE Product (
ProductID INT PRIMARY KEY,
ProductName NVARCHAR(100),
Price DECIMAL(10,2),
PriceStartedDate DATETIME,
PreviousPrice DECIMAL(10,2),
PreviousPriceDate DATETIME
);
-- create product stage table
CREATE TABLE Product_Stage (
ProductID INT PRIMARY KEY,
ProductName NVARCHAR(100),
Price DECIMAL(10,2)
);
-- Add dummy data to the Product table
INSERT INTO Product (ProductID, ProductName, Price, PriceStartedDate, PreviousPrice, PreviousPriceDate)
VALUES
(2311, 'Laptop', 1500, '2024-05-01', 0, NULL),
(2312, 'Smartphone', 1000, '2024-05-01', 0, NULL),
(2313, 'Tablet', 500, '2024-05-01', 0, NULL);
-- Add dummy data to the Product_Stage table
INSERT INTO Product_Stage (ProductID, ProductName, Price)
VALUES
(2312, 'Smartphone', 899)
We have added some dummy data in both of the tables. We have kept two separated columns. Price of the product and Previous Price of the product.
Let’s take a look on our Type 3 SCD query
MERGE INTO Product AS dst
USING Product_Stage AS src
ON dst.ProductID = src.ProductID
WHEN MATCHED THEN
UPDATE SET
dst.ProductName = src.ProductName,
dst.PreviousPrice = dst.Price,
dst.PreviousPriceDate = dst.PriceStartedDate,
dst.Price = src.Price,
dst.PriceStartedDate = GETDATE()
WHEN NOT MATCHED BY TARGET THEN
INSERT (ProductID, ProductName, Price, PriceStartedDate, PreviousPrice, PreviousPriceDate)
VALUES (src.ProductID, src.ProductName, src.Price, GETDATE(), 0, NULL);
Now if we run the select statement on the product table, we will see we have update the smartphone price and we have updated the previous price of that product.
Fig: Type 3 SCD example
Implementing Type 4 SCD
For Type 4 SCD we will add an extra table to keep history of the changes.
-- create product table
CREATE TABLE Product (
ProductID INT PRIMARY KEY,
ProductName NVARCHAR(100),
ProductPrice DECIMAL(10, 2),
EffectiveDate DATETIME DEFAULT GETDATE(),
ExpiryDate DATETIME NULL
);
-- create product history table
CREATE TABLE Product_History (
ProductID INT,
ProductName NVARCHAR(100),
ProductPrice DECIMAL(10, 2),
EffectiveDate DATETIME,
ExpiryDate DATETIME
);
-- create product stage table
CREATE TABLE Product_Stage (
ProductID INT PRIMARY KEY,
ProductName NVARCHAR(100),
ProductPrice DECIMAL(10, 2)
);
-- adding dummy data to product table
INSERT INTO Product (ProductID, ProductName, ProductPrice, EffectiveDate)
VALUES
(1, 'iPhone 13', 999.99, '2024-01-01'),
(2, 'Samsung Galaxy S21', 799.99, '2024-01-01'),
(3, 'Google Pixel 6', 599.99, '2024-01-01');
-- adding dummy data to product stage table
INSERT INTO Product_Stage (ProductID, ProductName, ProductPrice)
VALUES
(1, 'iPhone 13', 949.99),
(4, 'OnePlus 9', 729.99);
So as we have created the tables and also we have added dummy data in the tables, we will now run the SCD query. Currently we have no data in the history table. But as soon as we run our SCD query there will be data in our history table.
Now let’s look at our SCD query
INSERT INTO Product_History([ProductID], [ProductName], [ProductPrice], [EffectiveDate])
SELECT [ProductID], [ProductName], [ProductPrice], [EffectiveDate]
FROM(
MERGE INTO Product AS Dst
USING
(SELECT DISTINCT [ProductID], [ProductName], [ProductPrice] FROM Product_Stage) AS Src
ON Dst.[ProductID] = Src.[ProductID]
WHEN MATCHED and dst.[ProductName] != Src.[ProductName] OR Dst.[ProductPrice] != Src.[ProductPrice] THEN
UPDATE SET
Dst.ProductName = Src.ProductName,
Dst.ProductPrice = Src.ProductPrice
WHEN NOT MATCHED BY TARGET THEN
INSERT
(
[ProductID], [ProductName], [ProductPrice], [EffectiveDate]
)
VALUES
(Src.[ProductID], Src.[ProductName], Src.[ProductPrice], GETDATE())
OUTPUT $ACTION,
Src.[ProductID],
Src.[ProductName],
Src.[ProductPrice],
GETDATE()
) AS CHANGES([ACTION],[ProductID], [ProductName], [ProductPrice], [EffectiveDate])
WHERE [ACTION] = 'UPDATE' OR [ACTION] = 'INSERT';
This query will first update or insert in the product table. If there is a change on the existing product table then we will update the product otherwise we will insert into the product table. After that we will insert into the history table according to the action type.
Query Insights
Merge Into, this specifies the target table where the merge operation will be performed. It allows us to combine the update and insert operations into a single statement, enhancing readability and performance.
When Matched, by this we compare the source values with destinations. If there are any changes, then we update the changes.
When Not Matched, we are simply adding new values to our destinations.
End Note
Slowly Changing dimensions (SCD) offers a great way to manage evolving data over time. By implementing needed SCD techniques organizations can ensure data accuracy and relevance, empowering informed decision-making for sustained success in a dynamic world.
The choice of which type of SCD to implement depends on the specific requirements of the organization.