Lalit Goyal: Dimensional Modeling

Dimensional Modeling is the designing technique used to support End- User quires from a data warehouse. Entity-Relationship design technique is very useful while designing a transactional system for day to day entry.

What is ER?

ER is a logical design technique that removes the redundancy in data. The ER modeling technique is used to illuminate the microscopic relationships among data elements. The ideal form of ER modeling is to remove all redundancy in the data. This is very useful because transactions are made very simple. The transaction of updating a customer's address may involve to a single record update in a customer address master table.

The main complexities with ER modeling are

· End users cannot understand or remember an ER model. End users cannot navigate an ER model easily as compared to DM.

· Software faces performance issues while querying a general ER model.

What is DM?

Dimensional Model is a logical design technique that seeks to present the data in a standard, intuitive framework that allows for high-performance access. Every dimensional model is composed of one table with a multipart key, called the fact table, and a set of smaller tables called dimension tables. Each dimension table has a single-part primary key that corresponds exactly to one of the components of the multipart key in the fact table. This characteristic "star-like" structure is often called a star join. The term star join dates back to the earliest days of relational databases.

A fact table, because it has a multipart primary key made up of two or more foreign keys, always expresses a many-to-many relationship. The most useful fact tables also contain one or more numerical measures, or "facts," that occur for the combination of keys that define each record.

Dimension tables, by contrast, most often contain descriptive textual information. Dimension attributes are used as the source of most of the interesting constraints in data warehouse queries, and they are virtually always the source of the row headers in the SQL answer set.

DM vs. ER

The key to understanding the relationship between DM and ER is that a single ER diagram breaks down into multiple DM diagrams. The first step in converting an ER diagram to a set of DM diagrams is to separate the ER diagram into its discrete business processes and to model each one separately.

The second step is to select those many-to-many relationships in the ER model containing numeric and additive non-key facts and to designate them as fact tables.

The third step is to de-normalize all of the remaining tables into flat tables with single-part keys that connect directly to the fact tables. These tables become the dimension tables. In cases where a dimension table connects to more than one fact table, we represent this same dimension table in both schemas, and we refer to the dimension tables as "conformed" between the two dimensional models.

If the design has been done correctly, many of these dimension tables will be shared from fact table to fact table. Applications that drill down will simply be adding more dimension attributes to the SQL answer set from within a single star join. Applications that drill across will simply be linking separate fact tables together through the conformed (shared) dimensions.

The Strengths of DM

The dimensional model has a number of important data warehouse advantages that the ER model lacks.

First, the dimensional model is a predictable, standard framework.

A second strength of the dimensional model is that it withstands unexpected changes in user behavior. Every dimension is equivalent. All dimensions can be thought of as symmetrically equal entry points into the fact table. The logical design can be done independent of expected query patterns. The user interfaces, Query strategies and SQL generated against the dimensional model are symmetrical.

A third strength of the dimensional model is that it is gracefully extensible to accommodate unexpected new data elements and new design decisions. The following graceful changes to the design after the data warehouse is up and running by:

Adding new unanticipated facts (that is, new additive numeric fields in the fact table), as long as they are consistent with the fundamental grain of the existing fact table
Adding completely new dimensions, as long as there is a single value of that dimension defined for each existing fact record
Adding new, unanticipated dimensional attributes
Breaking existing dimension records down to a lower level of granularity from a certain point in time forward.

A fourth strength of the dimensional model is that there is a body of standard approaches for handling common modeling situations in the business world. These modeling situations include:

Slowly changing dimensions, where a "constant" dimension such as Product or Customer actually evolves slowly and asynchronously.
Heterogeneous products, where a business such as a bank needs to track a number of different lines of business together within a single common set of attributes and facts, but at the same time it needs to describe and measure the individual lines of business in highly idiosyncratic ways using incompatible measures.
Pay-in-advance databases, where the transactions of a business are not little pieces of revenue, but the business needs to look at the individual transactions as well as report on revenue on a regular basis.
Event-handling databases, where the fact table usually turns out to be "factless."

A final strength of the dimensional model is the growing body of administrative utilities and software processes that manage and use aggregates.

Myths About DM

Myth number four is "Snowflaking is an alternative to dimensional modeling." Snowflaking is the removal of low-cardinality textual attributes from dimension tables and the placement of these attributes in "secondary" dimension tables. For instance, a product category can be treated this way and physically removed from the low-level product dimension table. I believe that this method compromises cross-attribute browsing performance and may interfere with the legibility of the database, but I know that some designers are convinced that this is a good approach. Snowflaking is certainly not at odds with dimensional modeling. I regard snowflaking as an embellishment to the cleanliness of the basic dimensional model. I think that a designer can snowflake with a clear conscience if this technique improves user understandability and improves overall performance. The argument that snowflaking helps the maintainability of the dimension table is specious.

Reference: http://www.dbmsmag.com/9708d15.html

2 comments:

Anonymous said...: Nice brief and this post helped me alot in my college assignement. Thank you on your information.; Monday, February 01, 2010 1:50:00 PM
Anonymous said...: Howdy im new on here. I hit upon this chat board I have found It amply accessible & its helped me out alot. I should be able to give something back and aid other users like its helped me.

Thank's, See You Around.; Wednesday, June 09, 2010 7:10:00 PM

Lalit Goyal

Thursday, April 12, 2007

Dimensional Modeling

What is ER?

What is DM?

DM vs. ER

The Strengths of DM

Myths About DM

2 comments:

Translate

Posts

Useful Links

Visitors

Quotes