Ragged Hierarchy
What is a ragged hierarchy?
A hierarchy in which a gap exist in one or more levels is called a ragged hierarchy. There could be many reasons why even a properly designed schema results in a Ragged Hierarchy. Lets look at an example. In this example company every Vice President has a Director reporting to him. Each Director is in charge of a Department. Every Department has Architects and the sales profit for which the architect was responsible is captured in a the fact table.
The hierarchy looks like below.
Vice President
|
Director
|
Department
|
Architect
|
However, later few Directors moved on and left a gap between the VP and the Department. Remember the Director was in charge of the Departments. Now there are few departments who do not have a director. This is called a Ragged Hierarchy.
The table for Department now looks like below. Both Manlog and lifescience do not have a asscoiated Director.
LU_DEPARTMENT
| ||
DE_ID
|
DE_DESC
|
DI_ID
|
1
|
EIM
|
1
|
2
|
MANLOG
|
2
|
3
|
BANK
|
3
|
4
|
MANLOG
| |
5
|
LIFESCIENCE
|
I used the below warehouse tables for this post.
|
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
What happens when you execute a report with Vice President and Sales?
The report gets executed. However data for manlog and lifescience is not included. You can verify this by looking at the grand total whcih should be 1500.
Why data is missing in the report ?
You can find the reason if you look at the SQL.select a14.[VP_ID] AS VP_ID,
max(a15.[VP_DESC]) AS VP_DESC,
a12.[DE_ID] AS DE_ID,
max(a13.[DE_DESC]) AS DE_DESC,
a11.[A_ID] AS A_ID,
max(a12.[A_DESC]) AS A_DESC,
sum(a11.[SALES]) AS WJXBFS1
from (((([FACT_ARCHITECT_SALES] a11
inner join [LU_ARCHITECT] a12
on (a11.[A_ID] = a12.[A_ID]))
inner join [LU_DEPARTMENT] a13
on (a12.[DE_ID] = a13.[DE_ID]))
inner join [LU_DIRECTOR] a14
on (a13.[DI_ID] = a14.[DI_ID])) à Since data is missing for some Department, those rows are skipped. Why is this table added in the join,
this is because, there is no other SQL path to the fact table.
inner join [LU_VP] a15
on (a14.[VP_ID] = a15.[VP_ID]))
group by a14.[VP_ID],a12.[DE_ID],a11.[A_ID]How do we fix this problem ?
Either you wait for the company to hire a new director and fill this gap. To add to your problem company decided that the VP will be in charge of these departments and no new Director will be hired.
The problem is in a ragged hierarchy is that there is no ther sql path for data retrival from the fact table. What if you can provide an alternate path? From VP to Depatment, or from VP to Architect. You can do this either by adding VP in the Department table / Architect table. Then assign Department / Architect as child of VP. This way you can skip the Director table in SQL join
You will be able to do analytics at VP level. However in the reports where you need to include the Department, you will see the same problem.
So this is really not a solution if you want to do analytics at all levels. The only solution is to fill the gap in the Director table.
DI_ID | DI_DESC | VP_ID |
---|---|---|
1 | D1 | 1 |
2 | D2 | 2 |
3 | D3 | 3 |
4 | VP Direct | 4 |
5 | VP Direct | 5 |
DE_ID | DE_DESC | DI_ID | |
---|---|---|---|
1 | EIM | 1 | |
2 | TRANSPORT | 2 | |
3 | BANK | 3 | |
4 | MANLOG | 4 | |
5 | LIFESCIENCE | 5 |
Now you will be able to do the analytics at all levels.
No comments:
Post a Comment