Datawarehousing Concepts and Tools

Posts

Dimension types

Rapidly Changing Dimensions A dimension is considered to be a rapidly changing if one or more of its attributes changes frequently in many rows. For a rapidly changing dimension, the dimension table can grow very large from the application of numerous type 2 changes. Slowly Changing Dimensions Attributes of a dimension that would undergo changes over time. It depends on the business requirement whether particular attribute history of changes should be preserved in the data warehouse. This is called a Slowly Changing Attribute and a dimension containing such an attribute is called a Slowly Changing Dimension. Conforming Dimension A dimension table may be used in multiple places if the data warehouse contains multiple fact tables or contributes data to data marts. For example,A dimension such as customer, time, or product that is used in multiple schemas is called a conforming dimension if all copies of the dimension are the same Use of Conforming Dimensions in Multiple Fac...

When is snowflaking of dimensions required

There are few key factors need to be considered to decide on snowflake schema as a data model in DW initiative Frequency of dimension attribute change No of dimension attributes - Large dimensions Hierarchies within the dimension attributes History tracking of dimension attributes Consider type 2 SCD implementation is used for history tracking. If a dimensional table has 10 dimension attributes and only one among the dimension attributes is frequently changing in many rows, then table becomes huge causing performance issues and unnecesary space consumption by rarely changed dimension attributes. Move the frequently changed dimension attribute to a seperate table and maintain RI with main dimension table, Thus snowflaking the dimension to avoid performance issues and unneccesary space consumption If there are hierarchies within the dimension attributes,then keying between Aggregate facts and dimension table(with hierarchies) is not appropriate A record i...

Business Requirement Specifications/Findings for a Data warehouse intiative

The scope of data warehouse initiative must be driven by business requirements.Requirements determine what data must be available in the data warehouse, how it is organized, and how often it is updated. High-level requirements analysis with business management should comprise of following areas: Understand their key strategic business initiatives. Identify their key performance indicators or success metrics for each of the strategic business initiatives. Determine the core business processes they monitor and want to impact. Determine the potential impact on their performance metrics with improved access to improved business process information. You will also need to conduct preliminary data discovery sessions to identify any gaping data feasibility issues. Approach to Requirements Definition Talk to the business users and get the overall understanding of an organization: What are the objectives of your Organization/department? What are you trying to...

How teradata makes sure that there are no duplicate rows being inserted when its a SET table?

Teradata redirects the new inserted row as per its Primary Index to the target AMP (Access Module Processor) on the basis of its row hash value, and if it find same row hash value in that AMP then it start comparing the whole row, and find out if duplicate. When the target table has UPI(Unique Primary Index) then the duplicate row is rejected with an error. In case of a NUPI(Non-Unique Primary Index) then it is rejected without throwing any error

Difference between Rank and Row_number() functions in teradata

RANK function in teradata resembles very much to rank we have in real life. for example,Students with equal number of marks are assigned with same rank. ROW_NUMBER function in Teradata is used to generate SEQUENCE number Using Rank function : SELECT Student_name ,SUM(English + Maths) AS MARKS_Total ,RANK() OVER (ORDER BY MARKS_Total DESC) AS STUDENT_RANK FROM STUDENT GROUP BY 1; Output of above sql: Student_name MARKS_Total Student_ROWNUM John 198 1 David 165 3 Alice ...

Sql for INSERT else UPDATE

Select Case When E.Empid Is Not Null Then ‘UPDATE’ Else ‘INSERT’ End Flag, E1.Empid, E1.Location From Empsource E1 Left Outer Join Emptarget E On E.Empid = E1.Empid