Monday, December 2, 2013

SQL for various solution to M-M relation


The below warehouse schema represents a M-M relationship.  REL, SKU , and COMPOUND tables, provide various solutions for a M-M relationship. We will look at the sql generated by usaing these three different approaches. 


LU_COLOR
COLOR_ID
COLOR
1
Red
2
Blue
3
Green
4
White
LU_PRODUCT
PRODUCT_ID
PRODUCT
1
Car
2
Bike
3
Fridge
REL_PRODUCT_COLOR
ID
COLOR_ID
PRODUCT_ID
1
1
1
2
1
2
3
2
2
4
2
3
5
3
1
6
3
3
7
4
1
8
4
3
SKU_PRODUCT_COLOR
SKU_ID
PRODUCT_ID
COLOR_ID
1
1
1
2
1
3
3
1
4
4
2
1
5
2
2
6
3
2
7
3
3
8
3
4
LU_PRODUCT_COMPOUND
ID
PRODUCT_ID
COLOR_ID
PRODUCT_DESC
1
1
1
CAR
2
1
3
CAR
3
1
4
CAR
4
2
1
BIKE
5
2
2
BIKE
6
3
2
FRIDGE
7
3
3
FRIDGE
8
3
4
FRIDGE
FACT_PRODUCT_SALES
ID
DATE_ID
PRODUCT_ID
COLOR_ID
SALES
SKU_ID
1
4/1/2013
1
1
100
1
2
4/1/2013
2
2
50
5
3
4/1/2013
3
3
50
7
4
4/1/2013
3
4
100
8
5
4/8/2013
1
1
100
1
6
4/8/2013
3
3
50
7
7
4/8/2013
1
4
20
3
8
4/8/2013
3
4
100
8




1.       Method 1 :- using relationship table REL_PRODUCT_COLOR
Without adding metric

(since there is no metric the sql gets the results from the relate table)
select    distinct a12.[PRODUCT_ID] AS PRODUCT_ID,
                a13.[PRODUCT] AS PRODUCT,
                a11.[COLOR_ID] AS COLOR_ID,
                a11.[COLOR] AS COLOR
from      [LU_COLOR]       a11,
                [REL_PRODUCT_COLOR]              a12,
                [LU_PRODUCT] a13
where   a11.[COLOR_ID] = a12.[COLOR_ID] and
                a12.[PRODUCT_ID] = a13.[PRODUCT_ID]

Adding a metric

(since a metric is added the sql will query the results directly from the fact table, and will not use the relate table because both color and product are available in the fact table)
select      a11.[PRODUCT_ID] AS PRODUCT_ID,
                max(a13.[PRODUCT]) AS PRODUCT,
                a11.[COLOR_ID] AS COLOR_ID,
                max(a12.[COLOR]) AS COLOR,
                sum(a11.[SALES]) AS WJXBFS1
from         [FACT_PRODUCT_SALES]             a11,
                [LU_COLOR]          a12,
                [LU_PRODUCT]     a13
where      a11.[COLOR_ID] = a12.[COLOR_ID] and
                a11.[PRODUCT_ID] = a13.[PRODUCT_ID]
group by  a11.[PRODUCT_ID],
                a11.[COLOR_ID]


2.       Method 2 :- Compound attribute. This is done by making the product a compound attribute with color. For this you need to edit the warehouse schema.

Without adding metric.
The results look like below. They are not grouped by product since each record is treated as unique record.  Notice in the SQL, color_id is called twice , once to retrieve the product and once to retrieve the color.

select    distinct a11.[PRODUCT_ID] AS PRODUCT_ID,
                a11.[COLOR_ID] AS COLOR_ID,
                a11.[PRODUCT_DESC] AS PRODUCT_DESC,
                a11.[COLOR_ID] AS COLOR_ID0,
                a12.[COLOR] AS COLOR
from      ([LU_PRODUCT_COMPOUND]  a11
                inner join             [LU_COLOR]       a12
                  on         (a11.[COLOR_ID] = a12.[COLOR_ID]))

With a metric in the report. Since product is a compound attribute, you cannot subtotal at product level. Also notice that the join on Product table happens using both product_id and color_id .


select    a11.[PRODUCT_ID] AS PRODUCT_ID,
                a11.[COLOR_ID] AS COLOR_ID,
                max(a12.[PRODUCT_DESC]) AS PRODUCT_DESC,
                a11.[COLOR_ID] AS COLOR_ID0,
                max(a13.[COLOR]) AS COLOR,
                sum(a11.[SALES]) AS WJXBFS1
from      (([FACT_PRODUCT_SALES]          a11
                inner join                [LU_PRODUCT_COMPOUND]    a12
                  on         (a11.[COLOR_ID] = a12.[COLOR_ID] and
                a11.[PRODUCT_ID] = a12.[PRODUCT_ID]))
                inner join             [LU_COLOR]       a13
                  on         (a11.[COLOR_ID] = a13.[COLOR_ID]))
group by              a11.[PRODUCT_ID],
                a11.[COLOR_ID],
                a11.[COLOR_ID]


3.       Method 3 :- using a common child attribute. 
Without adding metric. The sql will use the SKU table to retrieve the data
select    distinct a12.[PRODUCT_ID] AS PRODUCT_ID,
                a13.[PRODUCT] AS PRODUCT,
                a11.[COLOR_ID] AS COLOR_ID,
                a11.[COLOR] AS COLOR
from      [LU_COLOR]       a11,
                [SKU_PRODUCT_COLOR]             a12,
                [LU_PRODUCT] a13
where   a11.[COLOR_ID] = a12.[COLOR_ID] and
                a12.[PRODUCT_ID] = a13.[PRODUCT_ID]


With a metric . results are same as using the relate table, however sql cannot retrieve the item and color combination from the fact table, it uses the SKU table for this.


select    a12.[PRODUCT_ID] AS PRODUCT_ID,
                max(a14.[PRODUCT]) AS PRODUCT,
                a12.[COLOR_ID] AS COLOR_ID,
                max(a13.[COLOR]) AS COLOR,
                sum(a11.[SALES]) AS WJXBFS1
from      [FACT_PRODUCT_SALES]             a11,
                [SKU_PRODUCT_COLOR]             a12,
                [LU_COLOR]       a13,
                [LU_PRODUCT] a14
where   a11.[PRODUCT_ID] = a12.[PRODUCT_ID] and
                a11.[SKU_ID] = a12.[SKU_ID] and
                a12.[COLOR_ID] = a13.[COLOR_ID] and
                a11.[PRODUCT_ID] = a14.[PRODUCT_ID] and
                a12.[PRODUCT_ID] = a14.[PRODUCT_ID]
group by              a12.[PRODUCT_ID],
                a12.[COLOR_ID]







Wednesday, June 26, 2013

Subtotal From Base


Lets look at this report. This report has a metric defined as distinct count of subcategory. Do you find anything strange in this report ?



Yes, the grand total is 24 and not the sum 24 for three years. (72) . Here the count is a distinct records and is defined as Count(Subcategory){~+}

The subcategory for each of the Category have not changed in 3 years. Hence the count of distinct subcategory is 24. Finding a total for distinct count was not possible untill MicroStrategy version 9.3. With version 9.3 this is possible by a new feature called subtotal from base. You have to create this from the web as a derived metric by right click on an attribute. 

The distinct values cannot be reaggregated to higher levels, you have to aggregate it from the base level to get the correct count of distinct values. When used in a view report, the Cube Subsetting Instruction Engine generates the appropriate sql to calculate this distint count from the base level.

However you will not see the distinct count when you use a derived element. This is becase the derived element is calculated by analytical engine and it runs after CSI engine. Hence the data will be wrong, since analytical engine is not re aggregating using the base values. So instead of displaying wrong number, mstr would show "--" .  For example your report will look like below with derived element.





You will see similar results when you apply a metric limit, this is also because the metric limit is applied by the analutical engine after the CSI engine execution. 

VLDB Settings For Temporary Tables



Temporary tables are needed by MicroStrategy to store intermediate data.  You can choose the table type and kind using the VLDB properties.

Derived Tables

This is a nested select statement, and it does not physically exist. It is created during the execution of the sql and it utilize the database memory for creating this virtual table.  This makes use of the database memory, and you will need to check the performance by testing both types.

The syntax is like select a,b from (select c,d from table)

You might have noticed that certain reports cannot be resolved by the use of subquery in derived tables. These reports include the following.
  1. Creation of datamarts
  2. Query partitioned tables
  3. Reports with Custom Groups
  4. Reports with Percentage Ranking
  5. Reoprts with analytical functions supported by MicroStrategy Analytical Engine and not the Database

Common Table Expression

example SQL .

with TEMP1 as
(select a21.STORE_NBR STORE_NBR,
(sum(a21.REG_SLS_DLR) + sum(a21.PML_SLS_DLR)) TOT_SLS_STR,
sum(a21.PML_SLS_DLR) PRO_STR
from STORE_DIVISION a21
where a21.STORE_NBR = 1
group by a21.STORE_NBR),
TEMP2 as
(select a22.REGION_NBR REGION_NBR, ............

The advantage here is that you can create the temp table once and refer it multiple times.

You can change the table type using the VLDB Property ->  Tables -> Intermediate Table Type .

Implicit Table creation Vs. Explicit Table creation.

Example of Implicit table creation
create table TEMP1 as
select a21.STORE_NBR STORE_NBR,
(sum(a21.REG_SLS_DLR) + sum(a21.PML_SLS_DLR)) TOT_SLS,
sum(a21.PML_SLS_DLR) PROMO_SLS
from STORE_DIVISION a21
where a21.STORE_NBR = 1
group by a21.STORE_NBR

Example of Explicit table creation
create table TEMP1 (
STORE_NBR INTEGER,
TOT_SLS DOUBLE,
PROMO_SLS DOUBLE)
insert into TEMP1
select a21.STORE_NBR STORE_NBR,
(sum(a21.REG_SLS_DLR) + sum(a21.PML_SLS_DLR)) TOT_SLS,
sum(a21.PML_SLS_DLR) PROMO_SLS
from STORE_DIVISION a21
where a21.STORE_NBR = 1
group by a21.STORE_NBR


You need to test and find out which of these methods provides an optimal performance. Generally there is not much difference in the excution plan between these two types, (I have not lab tested this), however  according to Dr. Tim Hall   implicit cursor is kind of faster.

You can access this setting at VLDB ->  Tables -> Table Creation Type

Another setting you might be interested to check would be "Drop Temp Table Method" . Certain cases when you use global temporary tables on Oracle, the tables need to be empty before you can drop it. Otherwise oracle will throw error.