Data Replication Using SSIS and MS "Change Track" - TechNet Articles - United States (English) - TechNet Wiki
Introduction
Our customer required near real-time data from our OLTP database. There were many approaches we could have taken to replicate the data. We decided to use this approach, which uses Microsoft's SSIS application and&nbdiv>
There are three components of this process:
The Source (OLTP) database
An SSIS package
The Target DB
Approach
Source DB
On the source OLTP database, Microsoft "Change track" was enabled on 70 tables. We then created a read-only access account. The two requirements for Change Track was that the candidate tables needed to have a primary key, and the database had to have Read Commit
Snapshot Isolation set. That was all that was needed on source DB side.
Target DB
One the Target DB, we created exact duplicates of the source tables, called target tables. We also create 2 additional supporting tables for each target table. One of the supporting tables was used as a temporary holding area for changed data. The other was
a history table, an optional component that tracks the history of changes. Both of these tables contain the same columns as the target table, and, in addition, contain the following column(s):
target_table_tmp:
sys_change_opration nchar(2) - contains the type of change for the record (I - insert, U- update, D - Delete)
target_table_history:
sys_change_opration nchar(2) - contains the type of change for the record (I - insert, U- update, D - Delete)
sync_version - Change Track version number at the time of capture
extraction_date - date/time of the extraction
Two other tables were created:
A table (T_REP) that contains a record for each table was being replicated.
Table_name
last_sync_version bigint - last version that was changed for that table
last_extraction_date datetime - last date/time that the table was updated.
last_change_count - number of rows changed from the last capture for that table.
A table (T_REP_SUMMARY) that contains one row of summary information
last_change_tracked bigint - last value of Change Track that was captured for all tables
last_check_date datetime - last checked date/time
last_extraction_date datetime - last extraction datetime
history_flag - allows the abilty to set history off/on.
SSIS Package
We then created an SSIS Package than contains the ETL code.
The package is very simple:
Step 1: Connect to the Source DB:
1.1 Determine if changes occurred to any of the 70 tables since the last check. This is performed using a Change Track Function Call using the last_change_tracked number value from T_REP_SUMMARY. (use 0 if this is the first time checking)
1.2 If changes occured, proceed to Step 2. Otherwise, update the last_check_date field in the T_REP_SUMMARY, and wait until the next checktime
Step 2: ETL
2.1 Using the T_REP table and a loop, use the MIN_VALID_VERSION function to find which table(s) were changed
2.1 For each changed table:
2.1.1 Truncate the target_tmp table associated with selected table
2.1.2. By Joining the internal Change Track table with the actual source table, populate the rows in the the target_tmp table with the source table along with the sys_change_operation number.
2.1.3 Using the MERGE statement, merged the changes with the target table to update the replicated table
2.1.4 Update the T_REP table with last_sync_version ,last_extraction_date , and last_change_count
2.1.5 Finally, copy all the changes from the target_tmp table to the target_history table along with the sync_version and extraction_date.
2.2 Update the T_REP_SUMMARY table with the last_updated timestamp and the last_sync_version.
For testing, the application was stress tested using a tool that performed 120 DML changes per minute on the source. The ETL SSIS package was implemented using the SQL Server Job Scheduler with a 30 second interval. The average process time was 6 seconds. The
interval was changed to 1 minute. The average process time remained the same.
Conclusion
By using the power of Microsoft's Change Track and SSIS, our customer is able to view near real time OLTP data without having to access the production servers.