Principles of Dimensional Modeling
Dimensional modeling is system of a logical design used by several data warehouse designers for their commercial OLAP products. DM is considered to be the single practicable technique for databases that are intended to support end-user queries in a data warehouse. It is quite dissimilar from entity-relation modeling.
Though ER is very functional for the transaction capture and the data administration phases of creating a data warehouse, but it should be shunned for end-user delivery.
This paper explains the dimensional modeling and how dimensional modeling technique varies/ contrasts with ER models. Dimensional Modeling technique is a preferred choice in data warehousing. Basically, it is a technique of logical design which presents the data in a standard, intuitive framework that allows for high-performance access. It is intrinsically dimensional, and it sticks on to a discipline that uses the relational model with some significant restrictions.
In each DM, there is one table with a multiple key, called the fact table, and a set of smaller tables called dimension tables. Each dimension table consists of a single-part primary key that corresponds precisely to one of the components of the multipart key in the fact table. This characteristic of star-like structure is generally called a star join. Due to multipart primary key made up of two or more foreign keys in fact table, it always articulates a many-to-many relationship.
The most valuable fact tables include one or more numerical measures that crop up for the permutation of keys that delineate each record. Dimension tables have explanatory textual information. Dimension attributes are used as the source of most of the interesting constraints in data warehouse queries, and they are virtually always the source of the row headers in the SQL answer set. Dimension Attributes are the various columns in a dimension table. In the Location dimension, the attributes can be Location Code, State, Country, Zip code.
Normally the Dimension Attributes are used in report labels, and query constraints such as where ‘Country=US’. The dimension attributes also contain one or more hierarchical relationships. One has to decide the subjects before designing a data warehouse. In DM, a model of tables and relations is constituted with the purpose of optimizing decision support query performance in relational databases, relative to a measurement or set of measurements of the outcomes of the business process being modeled.
Whereas, conventional E-R models are composed to eradicate redundancy in the data model, to facilitate retrieval of individual records having certain critical identifiers, and therefore, optimize On-line Transaction Processing (OLTP) performance. The grain of the fact table is usually a quantitative measurement of the outcome of the business process being analyzed in a DM. The dimension tables are generally composed of attributes measured on some discrete category scale that describe, qualify, locate, or constrain the fact table quantitative measurements.
Ralph Kimball views that the data warehouse should always be modeled using a DM/star schema. Kimball has affirmed that though DM/star schemas have the better performance in comparison to E-R models, their use involves no loss of information, because any E-R model can be signified as a set of DM models without loss of information. In E-R models, normalization through addition of attributive and sub-type entities destroys the clean dimensional structure of star schemas and creates snowflakes, which, in general, slows down browsing performance.
But in star schemas, browsing performance is protected by restricting the formal model to associative and fundamental entities, unless certain special conditions exist. The dimensional model has a numerous important data warehouse advantages which the ER model is deficient in. The dimensional model is an expected, standard outline. The wild variability of the structure of ER models means that each data warehouse needs custom, handwritten and tuned SQL. It also means that each schema, once it is tuned, is very vulnerable to changes in the user’s querying habits, because such schemas are asymmetrical.
By contrast, in a dimensional model all dimensions serve as equal entry points to the fact table. Changes in users’ querying habits don’t change the structure of the SQL or the standard ways of measuring and controlling performance (Ramon Barquin and Herb Edelstein, 1996). It can be concluded that dimensional modeling is the only feasible technique for designing end-user delivery databases. ER modeling beats end-user delivery and should not be used for this intention. ER modeling form the micro relationships among data elements thus it is not a proper business model (Ramon Barquin and Herb Edelstein, 1996).