## Metric calculation based on selection in Microstrategy

Hello guys, I am back after a long time. I was learning Microstrategy during last one year. So, here I am, writing something on mstr.

Today, I am writing about a problem that almost every mstr developer faces. Let’s say, you have two base metrics: dollars and units. Now, you like to calculate Market Share (MS) based on these two base metrics. Everything stays fine up to this point. But, problem starts at dashboard or document side. It is very likely that your client would like to have one selector in web document to choose between dollars and units. And MS should be calculated based on his selection (either dollars or units). Given below is a snapshot which will make everything clear.

Usually, this problem is solved by using panels with a panel selector. For two base metrics, you have to take two identical panels. Both panels will be exactly same from visualization point of view. But, they will be using two different metrics. One panel will use MS_Dollars and the other one will use MS_Units. That means, whenever you need to change anything, you have to do it in two places. Imagine what would happen if you have three or four base metrics? You will end up copying the same panel three or four times. In one of my projects I had six identical panels (yuuuk..)!

Following is my workaround to avoid creating multiple identical panels; although with a small glitch…

1. Add one table called DimFlag into warehouse catalog. It will contain two records:  ‘Units’ and ‘Dollars’

2. Create an attribute called Flag. This attribute doesn’t have any relationship with any other table in my schema.

3. Create a metric called Flag using the attribute created in previous step.

4. Create one metric called MyMeasure. This metric holds key to our solution. This metric returns Dollar metric if Dollar is selected from flag attribute and it will return Units metric if Units is selected from flag attribute. Note that Flag ID has been used in if condition instead of flag description.

5. By the way, we have another attribute called country. We want to show each country’s sales (dollar or units) and its corresponding share (%) in our web document. We need to use level dimensionality to derive ‘market sales in units’ and ‘market sales in dollars’ first. When I say ‘market’, I mean summation of all available countries. So, our level dimensionality will make sure Country attribute is ignored as ‘filter’ and ‘GroupBy’. This level dimensionality is not related to our main solution. Our main trick is the ‘if’ condition that was demonstrated in previous step. Finally, we will create another wrapper metric called ‘Market Total’ which will contain an ‘if’ condition just like the one in ‘MyMeasure’ metric created in previous step.

6. Now, we are ready to create a dataset. Given below is a snapshot of our dataset. If you look at corresponding generated SQL, you will notice a cross-join. It’s because Flag attribute is not connected anyway to our facts. This is the small glitch I was talking about. But, we can live with a cross-join for small scale projects where data volume is not that big and generated SQL is also simple.

7. We are almost done. All you have to do is to import this dataset into your document. Create a selector from Flag attribute, create a grid table using MyMeasure and MarketTotal metric and Country attribute. Finally, make this grid a target for your Flag selector. Metrics calculation will happen based on your selection in selector as per the ‘if’ condition defined inside Metric definitions.

Hope you’ve understood my idea. Get back to me for any question.

## What I have learned in the name of unit testing of ETL, DWH, OLAP Cube and MDX

In one of our projects, we ‘somehow’ decided that every part of the system should be ‘unit test’-ed . It was a classical ETL->DWH->Cube project where web dashboard was built to show BI reports from cube (using MDX as reporting language).  Off course, unit testing was nothing new to our web developers. But, our data warehouse developers had to figure it out.

Let’s take a simple example; we have a function in our ETL process that joins two table and returns the result set. The only way to test this function is to verify its entire output data. But, unlike a programming language where expected output of a unit test remains fixed, this simple function’s expected output cannot remain fixed. Because it depends on input data. The only way we can keep it fixed is to keep the input data fixed. Hence came the idea of ‘frozen environment’. So, all our data should be frozen as long as we are doing development and this way we can pass the unit test. Now this is where it gets difficult. Because, we can never make our system ‘go live’ keeping every data in a frozen state.  It has to change every now and then. And, every time input data changes, we need to change expected output data as well.

Another thing is, it hardly happens that we are changing something in our ETL flow and not expecting any data change in the system. In ETL we mostly do data manipulation, isn’t it? What kind of change we can do in a lame ‘join’ where we do not expect any data change in the end? Yes, performance tweaking, CLRs, string formatters may be some of those but not much other than that. So, we end up changing the unit test almost every time we change anything in the flow. Otherwise, unit tests will fail. So, more and more it is looking like data testing rather than unit testing.

All of above is also applicable for data warehouse, olap cubes and MDX. Actually, unit testing is something meant for programming languages. Cube or sophisticated ETL tools are more like design tools doing mostly data manipulation; therefore will not behave like a programming language. Personally, I will always call it ‘good old data testing’, not ‘unit testing’ anymore.

Posted in SQA, Testing, SQL, Database | | 2 Comments

## Common Latest Period in SSAS Cube

Recently I came across a situation where data was being delivered at different times for different countries and client wanted to see data for latest common month in Region/Continent level reports. Data arrive on monthly basis, but it arrives at different dates for different countries. So, at one point in our data warehouse, we have Dec-2014  data for UK and Qatar. Rest of the countries have fallen behind them (having data for up to Nov-2014, Oct-2014 etc.). Let’s analyze in more details.

First, let’s go into the cube. We have a simple Geo hierarchy. Countries are at lowest level and its parent is Region (Europe, Mid East).

We have a simple month dimension as well. Its primary key is the digit produced by concatenating year with month. So, Dec-2014’s key will be 201412.

Let’s browse cube and see how data looks like…

Look at rows inside red colored zone. Different countries have different latest months. User requirement is to see latest month’s sales for all geo levels. It can be achieved in many ways. One way to achieve this is to create a calculated measure like this…

CREATE MEMBER CURRENTCUBE.[Measures].[Last Month Sales]

AS sum(

tail(nonempty([Dim Month].[Id].[Id],([Measures].[Sales],[Dim Geography].[Hierarchy].currentmember)),1),

[Measures].[Sales])

,VISIBLE = 1 ;

If we drag this measure in cube browser, country level and region level sales will be like this:

It’s perfectly alright for country level sales . But, it will be interesting to analyze Region level sales. Actually, for regions, most of the business users would like to see common latest month’s total sales. In our data, Europe’s last month’s sales 4830 is probably giving the business user a wrong picture (of course it depends…. ), because only UK  data is available for Dec-2014. So showing a total for Dec-2014 may not make much sense. Latest common month for Europe is Nov-2014 and latest common month for Mid East region is Oct-2014. So, it would always make much sense if we show Nov-2014 total for Europe and Oct-2014 total for Mid East. To support this requirement of showing common latest month’s sales I have introduced another fact. This fact is nothing but a storage for country-wise latest month.

This Fact table’s name is FactCountryMaxPeriod and declared a simple count measure called [Measures].[Fact Country Max Period Count] from it. We are almost done. Just declare a new calculated measure called [Measures].[Common Latest Month Sales] and see it in cube browser.

CREATE MEMBER CURRENTCUBE.[Measures].[Common Latest Month Sales]

as

sum(nonempty([Dim Month].[Id].[Id], [Measures].[Fact Country Max Period Count]).item(0), [Measures].[Sales])

,VISIBLE = 1 ;

This time, Europe and Mid East is showing total sales of Nov-2014 and Oct-2014 respectively.

## Slowly Changing Dimension in SSAS Cube

This is a simple example of SCD (Type-2) in OLAP cube.  We have a Dimension table for Employee and their Departments. This dimension is an SCD. One employee worked in different department over the course of time. EmpID is business key and SID is surrogate key.

DimEmpolyee table

We have another dimension called DimTime. It’s a very simple Month Dimension table.

Fact table looks like this:

Fact

Both business key (EmpID) and Surrogate key (EmpSID) is present in Fact table, but we will use only EmpSID key here.

Let’s have a look at the cube now. First, look at the Dimension called [Dim Employee]

And it’s attribute relationship…..

See carefully that I did not use SID in hierarchy. I used employee name (attribute called ‘Employee’) for reporting purpose. Its key column and name column both are set to Employee column of the physical dimension table.

We are almost done. Let’s process the cube and write an MDX to see result:

You can clearly see that an employee named David sold different amounts in different months. David sold under Product Development Department for up to Aug-2013 and then he started selling under Android Apps department from Sep-2013.

Hope you will like it.

## 5 Tips to avoid running out of space in SQL Server Database

Try to follow these tips. Each of these topics has been explained in different websites. So, I am not going to explain them here.

1. Design

Design your database efficiently (normalization, use of foreign keys, getting rid of unnecessary tables, selecting appropriate data type etc..). Clean up your database before your go LIVE!

2. Estimate Data Volume

Estimate the total data volume you need in your database. Don’t get surprise in future. Know your data and know the space required.

3. Estimate Log file size and Temp DB Size

Set back up plan for your database and then chose a logging mode (Simple, full or bulk logged). Try to find out the log file space required in the first few runs and then set log file size accordingly. If your process uses ‘#’tables (Temp tables), try to find out the space required there as well. Closely monitor TempDB size in first few runs and then set its size accordingly.

4. Compression

After a few months (could be few years depending upon the rate of change in your data), you can run Table compression in your tables. It’s good for reporting systems like DWH but might reduce performance in transactional system. Read pros and cons first.

5. Shrink (use carefully)

If any big sized Ad-hoc operations happen in your database, you can shrink your log files. Otherwise, shrinking of log files will not benefit you much. It will anyway take up the allotted space. Same goes for Database shrink. Read articles found on web. Database shrink will introduce fragmented index. In worst case, you can drop/recreate index rather than rebuild them.

## Ceiling in MDX

It’s just a simple solution to find ceiling of a measure with help of some string operation. We have a measure called [Measures].[Marks]. Now, we define a calculated measure like below:

CREATE MEMBER CURRENTCUBE.[Measures].[CeilingMarks]
AS CInt(Mid(cstr([Measures].[Marks]),0,InStr(cstr([Measures].[Marks]),“.”)-1))+1,
VISIBLE = 1 ;

It works like this;

Let’s say our number is 315.5, first it finds the decimal point’s position. In our case, it is 4. Then, it uses the MID function to find the number prior to that decimal point. In our case, it is 315. And then, it just adds one, so it becomes 316.

Posted in MDX, SSAS (Analysis Service), Cube | Tagged , , | Leave a comment

## Pre order, Post order, Level order (BFS) traversing with MDX

A tree or hierarchy of my sample cube is being shown below:

BFS or Level Order Traversing:

with
member [Measures].[x]
as [Dim Category].[Parent].currentmember.level.ordinal

select
order(([Dim Category].[Parent].members),[Measures].[x],BASC)
on 1 ,
[Measures].[x] on 0
from SalesCube
And, here goes the output:

Pre-Order traversing:
By default, traversing in MDX is always in pre-order.
select [Dim Category].[Parent].members
on 1 ,
[Measures].[Sales] on 0
from SalesCube

Post-Order Traversing:
Using “Hierarchize” function, with parameter “POST”
select
hierarchize([Dim Category].[Parent].members,post)
on 1 ,
[Measures].[Sales] on 0
from SalesCube