Tuesday, April 30, 2013

Ragged Hierarchy


 Ragged Hierarchy

What is a ragged hierarchy?

A hierarchy in which a gap exist in one or more levels is called a ragged hierarchy. There could be many reasons why even a properly designed schema results in a Ragged Hierarchy. Lets  look at an example. In this example company every Vice President has a Director reporting to him. Each Director is in charge of a Department. Every Department has Architects and the sales profit for which the architect was responsible is captured in a the fact table. 

The hierarchy looks like below.


Vice President
Director
Department
Architect


However, later few Directors moved on and left a gap between the VP and the Department. Remember the Director was in charge of the Departments. Now there are few departments who do not have a director. This is called a Ragged Hierarchy.

The table for Department now looks like below. Both Manlog and lifescience do not have a asscoiated Director.

LU_DEPARTMENT
DE_ID
DE_DESC
DI_ID
1
EIM
1
2
MANLOG
2
3
BANK
3
4
MANLOG
5
LIFESCIENCE


I used the below warehouse tables for this post.

LU_VP
VP_ID
VP_DESC
1
SHYAM
2
RAM
3
BABU
LU_DIRECTOR
DI_ID
DI_DESC
VP_ID
1
D1
1
2
D2
2
3
D3
3
LU_DEPARTMENT
DE_ID
DE_DESC
DI_ID
1
EIM
1
2
MANLOG
2
3
BANK
3
4
MANLOG
5
LIFESCIENCE
LU_ARCHITECT
A_ID
A_DESC
DE_ID
1
ANAND
1
2
SRIRAM
2
3
PARTHA
3
4
AKASH
4
5
YAMUNA
5
FACT_ARCHITECT_SALES
ID
SALES
DATE_ID
A_ID
1
100
4/1/2013
1
2
200
4/1/2013
2
3
120
4/1/2013
3
4
150
4/1/2013
4
5
160
4/5/2013
5
6
170
4/5/2013
1
7
200
4/17/2013
2
8
120
4/9/2013
3
9
150
4/25/2013
4
10
130
4/22/2013
5





What happens when you execute a report with Vice President and Sales?

The report gets executed. However data for manlog and lifescience is not included. You can verify this by looking at the grand total whcih should be 1500.




Why data is missing in the report ?

You can find the reason if you look at the SQL.
 select a14.[VP_ID] AS VP_ID,
max(a15.[VP_DESC]) AS VP_DESC,
a12.[DE_ID] AS DE_ID,
max(a13.[DE_DESC]) AS DE_DESC,
a11.[A_ID] AS A_ID,
max(a12.[A_DESC]) AS A_DESC,
sum(a11.[SALES]) AS WJXBFS1
from (((([FACT_ARCHITECT_SALES] a11
inner join [LU_ARCHITECT] a12
on (a11.[A_ID] = a12.[A_ID]))
inner join [LU_DEPARTMENT] a13
on (a12.[DE_ID] = a13.[DE_ID]))
inner join [LU_DIRECTOR] a14
on (a13.[DI_ID] = a14.[DI_ID]))  à Since data is missing for some Department, those rows are skipped. Why is this table added in the join,
this is because, there is no other SQL path to the fact table.
inner join [LU_VP] a15
on (a14.[VP_ID] = a15.[VP_ID]))
group by a14.[VP_ID],a12.[DE_ID],a11.[A_ID]

How do we fix this problem ?

Either you wait for the company to hire a new director and fill this gap. To add to your problem company decided that the VP will be in charge of these departments and no new Director will be hired.

The problem is in a ragged hierarchy is that  there is no ther sql path for data retrival from  the fact table. What if you can provide an alternate  path? From VP to Depatment, or from VP to Architect.  You can do this either by adding VP in the Department table / Architect table. Then assign Department / Architect as child of VP. This way you can skip the Director table in SQL join

You will be able to do analytics at VP level. However in the reports where you need to include the Department, you will see the same problem.




So this is really not a solution if you want to do analytics at all levels. The only solution is to fill the gap in the Director table.


LU_DIRECTOR
DI_IDDI_DESCVP_ID
1D11
2D22
3D33
4VP Direct4
5VP Direct5



LU_DEPARTMENT
DE_IDDE_DESCDI_ID
1EIM1
2TRANSPORT2
3BANK3
4MANLOG4
5LIFESCIENCE5



Now you will be able to do the analytics at all levels.