Various methods for extracting transactional data from operational sources have been
used to populate data warehouses.These techniques vary mostly on the latency of data integration, from daily batches to continuous real-time integration.
The capture of data from sources is either performed through incremental queries that filter based on a timestamp or flag or through a CDC mechanism that detects any changes as it is happening.
Architectures are further distinguished between pull and push operation,
where a pull operation polls in fixed intervals for new data,
while in a push operation data is loaded into the target once a change appears.
A daily batch mechanism is most suitable if intra-day freshness is not required for the data,
such as longer-term trends or data that is only calculated once daily,
for example financial close information.
Batch loads might be performed in a downtime window, if the business model doesn’t require 24 hour availability of the data warehouse.
Different techniques such as real-time partitioning or trickle-and-flip exist to minimize the impact of a load to a live data warehouse without downtime.
Comments
Post a Comment