Find Changed Data by computing Checksum using MD5 function in Informatica

Introduction:
Capturing and preserving the state of data across time is one of the core functions of a data warehouse, but CDC can be utilized in any database or data integration tool. There are many methodologies such as Timestamp, Versioning, Status indicators, Triggers and Transaction logs exists but MD5 function outlines on working with Checksum.

Overview:
MD5 stands for Message Digest 5 algorithm.It calculates the checksum of the input value using a cryptographic Message-Digest algorithm 5 and returns a128-bit 32 character string of hexadecimal digits (0 - F).
Advantage of using MD5 function is that, it will reduce overall ETL run time and also reduces cache memory usage by caching only required fields which are utmost necessary.

Implementation Steps :

Identify the ports from the source which are subjected to change.
Concatenate all the ports and pass them as parameter to MD5 function in expression transformation
Map the MD5 function output to a checksum output port in the expression transformation
Lookup on target primary keys, checksum port
Detect changes by checking for nullability of primary keys and compare source checksum with the target checksum field
Update checksum field with other fields if primary key exist in the target else insert records to the target

Advantages:

Minimize Lookup Cache creation time
Checksum comparisons are faster
Overall improvement in ETL run time

Comments

sivaNovember 21, 2018 at 5:53 PM
nice work sir informatica Online Course
ReplyDelete
Replies

Add comment

Datawarehousing Concepts and Tools

Search This Blog

Find Changed Data by computing Checksum using MD5 function in Informatica

Comments

Post a Comment

Popular posts from this blog

Data virtualization

Informatica Powercenter Partitioning

Difference between server jobs and parallel jobs in Datastage