Skip to main content

Posts

Types of facts

Measures in the Fact Tables The values that quantify facts are usually numeric, and are often referred to as measures. Measures are typically additive along all dimensions, such as Quantity in a sales fact table. A sum of Quantity by customer, product, time, or any combination of these dimensions results in a meaningful value. Additive - Additive measures are facts that can be added up through all of the dimensions in the fact table. A sales fact is a good example for additive fact. Semi-Additive - Measures that can be summed up for some of the dimensions in the fact table, but not the others. Eg: quantity-on-hand can be added along the Warehouse dimension to achieve the total-quantity-on-hand Non-Additive - Measures that cannot be added along any dimension are truly non-additive. Non-additive measures can often be combined with additive measures to create new additive measures Eg: Sale Price =Quantity*Price        Facts which have percentages, ratios calculated. Calculated
Recent posts

Dimension types

Rapidly Changing Dimensions A dimension is considered to be a rapidly changing if one or more of its attributes changes frequently in many rows. For a rapidly changing dimension, the dimension table can grow very large from the application of numerous type 2 changes. Slowly Changing Dimensions  Attributes of a dimension that would undergo changes over time. It depends on the business requirement whether particular attribute history of changes should be preserved in the data warehouse. This is called a Slowly Changing Attribute and a dimension containing such an attribute is called a Slowly Changing Dimension. Conforming Dimension A dimension table may be used in multiple places if the data warehouse contains multiple fact tables or contributes data to data marts. For example,A dimension such as customer, time, or product that is used in multiple schemas is called a conforming dimension if all copies of the dimension are the same Use of Conforming Dimensions in Multiple Facts:

When is snowflaking of dimensions required

There are few key factors need to be considered to decide on snowflake schema  as a data model in DW initiative Frequency of dimension attribute change No of dimension attributes - Large dimensions Hierarchies within the dimension attributes History tracking of dimension attributes Consider type 2 SCD implementation is used for history tracking. If a dimensional table has 10 dimension attributes and only one among the dimension attributes is frequently changing in many rows, then table becomes huge causing performance issues and unnecesary space consumption by rarely changed dimension attributes. Move the frequently changed dimension attribute to a seperate table and maintain RI with main dimension table, Thus snowflaking the dimension to avoid performance issues and unneccesary space consumption If there are hierarchies within the dimension attributes,then keying between Aggregate facts and dimension table(with hierarchies) is not appropriate A record in sales f

Business Requirement Specifications/Findings for a Data warehouse intiative

The scope of data warehouse initiative must be driven by business requirements.Requirements determine what data must be available in the data warehouse, how it is organized, and how often it is updated. High-level requirements analysis with business management should comprise of following areas: Understand their key strategic business initiatives. Identify their key performance indicators or success metrics for each of the strategic business  initiatives. Determine the core business processes they monitor and want to impact. Determine the potential impact on their performance metrics with improved access to improved business process information. You will also need to conduct preliminary data discovery sessions to identify any gaping data feasibility issues. Approach to Requirements Definition  Talk to the business users and get the overall understanding of an organization: What are the objectives of your Organization/department? What are you trying to accompli

How teradata makes sure that there are no duplicate rows being inserted when its a SET table?

Teradata redirects the new inserted row as per its Primary Index to the target AMP (Access Module Processor) on the basis of its row hash value, and if it find same row hash value in that AMP then it start comparing the whole row, and find out if duplicate. When the target table has UPI(Unique Primary Index) then the duplicate row is rejected with an error. In case of a NUPI(Non-Unique Primary Index) then it is rejected without throwing any error

Difference between Rank and Row_number() functions in teradata

  RANK function in teradata resembles very much to rank we have in real life.  for example,Students with equal number of marks are assigned with same rank. ROW_NUMBER function in Teradata is used to generate SEQUENCE number Using Rank function : SELECT Student_name ,SUM(English + Maths) AS MARKS_Total ,RANK() OVER (ORDER BY MARKS_Total DESC) AS STUDENT_RANK FROM STUDENT GROUP BY 1; Output of above sql: Student_name MARKS_Total Student_ROWNUM John                       198                     1 David                     165                     3 Alice                      165                      3 Using ROW_NUMBER function : SELECT Student_name ,SUM(English + Maths) AS MARKS_Total ,ROW_NUMBER() OVER (ORDER BY MARKS_Total DESC) AS STUDENT_ROWNUM FROM STUDENT GROUP BY 1; Output of above sql: Student_name MARKS_Total Student_ROWNUM John                       198                     1 David                     165            

Sql for INSERT else UPDATE

               Select                    Case When E.Empid Is Not Null Then ‘UPDATE’ Else ‘INSERT’ End Flag,                     E1.Empid,   E1.Location          From                     Empsource E1          Left Outer Join                    Emptarget E                 On  E.Empid = E1.Empid