Friday, May 17, 2013

MicroStrategy SQL Generation


How does MicroStrategy generates an SQL?

You must first understand that no BI systems are build on artificial intelligence, and that they will start auto learning. These systems works on a predefined set of rules. Objects are built on that, and sql are generated in a predefined fashion. 

MicroStrategy identifies tables as base table and lookup table. Base tables support the calculation and aggregation of metrics, and the lookup tables are for resolving the attribute forms.

When you run a report, MSTR looks for the best base table that can support the calculation of the metric in the report. It also checks whether the dimensionality attributes for the report are included in the base table.  In most cases you are not likely to keep all the dimensionality in the base table. In this case engine will still choose the base table provided, the base table has the child attribute of the report dimensionality.  If there is no child attribute of the dimensionality included in the report, engine will still support the base table if there is a fact extension defined.  Sometimes, it so happens that in your data warehouse, you have multiple tables that can support the attributes and metric. In that, MSTR looks for the logical table size, and it chooses the table with the smallest logical table size.

Let's look at how the lookup tables are joined the SQL. You will see that in most cases, only the ID form is included in the intermediate pass. If you have included parent and child, only the child ID form will be used in the SQL. MSTR will not join an extra table for the sake of resolving the extra forms, it includes the extra forms only if they are present in the tables included in the SQL pass.  All the other forms that are not yet resolved will be obtained from the final pass by using the lookup tables.

Lets look at the below example. I have three tables in my WH.

Table T1

Table T2

Table T3




The relationship between the attributes is shown below.
I am trying to generate a result at level B2.


The SQL that I need to use for generating this out is
#mysql> SELECT T2.B2, COUNT(T1.POP), SUM(T3.POP2) FROM T1, T2, T3 WHERE T3.A1=T1.
A1 AND T1.A2=T2.A2 GROUP BY T2.B2;

Will you be able to generate the same results in MSTR ? By the rule, it needs to have the dimensionality included in the base table, in this case it is B2, and it does not exist in the Fact table T3,  Not it checks whether any child of B2 is available in the Fact table T3, and it does not find anything. Hence this will result in a cross join between the Fact Table T3 and  table T2 (which has the required dimensionality). 

The SQL generated by MSTR will have a cross join.

select pa11.B2 B2,
pa11.WJXBFS1 WJXBFS1,
pa12.WJXBFS1 WJXBFS2
from (select a12.B2 B2,
        count(a11.POP) WJXBFS1
        from t1 a11
        join t2 a12
        on (a11.A2 = a12.A2)
        group by a12.B2
) pa11
join (select a12.B2 B2,
        sum(a11.POP2) WJXBFS1
        from t3 a11
        cross join t2 a12
        group by a12.B2
) pa12
on (pa11.B2 = pa12.B2)

The report data is abviously wrong in this case due to the cross join.



 


What can you do to fix this report ?

We are talking about fixing this report, so that we get correct results. So I will not look at adding the child attributes into the report, so that the dimensionality gets changed. What changes do we need to make to the base table, so that the calculation of this report is supported.  The base table in this case is T3. and it has an attribute A1. If you want to do the calculation at level B1, you need to make A1 as a child of B1. So you will need to either modity T2 and add A1, or create a seperate relationship table for A1,B2.
 
What other options do you see..

The relation ship was designed like this.



Correct the relationship as below.


 

3 comments:

  1. Creating a separate relationship between a1 and b2 or adding a1 in t2 may violate warehouse design concepts (snowflake). Please suggest?

    ReplyDelete
  2. I see these systems as a guide, if you could achieve what these design concepts are trying to do, I would not mind deviating from the rules. If you look at this scehma, it is actually a wrong design for the requirement.

    A1 -> A2 -> B2 . This is the relationship. B2 is at the smallest level, so you cannot aggreate to a child level using a parent, unless there is fact extension available.

    To correct this I am finding a relationship between B2 and A1, and the reason for adding this to T2 is that in most cases, T2 (relate tables) are smaller than the fact tables. (and snowflake is about saving space) . The correct design would be setting an relationship like this. (B2 -> A2 -> A1)

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete