Leaf members do not sum up to parent – Part 1 (user defined hierarchies)

We have a simple cube with two dimensions called DimCompany and DimCategory, and only one measure called Sales. DimCategory table looks like this:

Id    SubCategory                 Category

1    Phone                              Hardware

2    Tab                                  Hardware

3    Web Applications         Software

The key attribute is Id and attribute relationship is     Id->Sub Category->Category

There is one user-defined hierarchy called “Hierarchy” in DimCategory.

In BIDS 2008 cube browser, if we drag the user defined hierarchy in one axis and company names in another, we can see this report:


In Hardware category, total Apple sales is 8000, total Google sales is 8000, total Microsoft sales is 8000 and total Samsung sales is 6000.

Now, imagine that we received an ad-hoc request from our source data provider that total Hardware Sales for Apple should be 9000 and that of Google should be 12000. There have been some unreported sales for both these companies which do not fit into the available sub categories (phone and tab). So, the requirement becomes likes this:

For Apple, Phone sales is 1000 and Tab Sales is 7000, but total Hardware Sales is 9000 (1000 greater than children sum)

For Google, Phone sales is 8000 and Tab Sales is 0, but total Hardware Sales is 12000 (4000 greater than children sum)

One possible solution could be to introduce another Sub Category called “Dummy”. So, DimCategory table looks like this:

Id    SubCategory                 Category

1    Phone                              Hardware

2    Tab                                  Hardware

3    Web Applications         Software

4    Dummy                          Hardware

And, we have to insert the unreported extra sales into Fact table (as “Dummy” sub category sales). Our Fact table has only three columns; CompanyId, CategoryId and Sales. So, the insert statement will look like this:

Insert into Fact (CompanyId, CategoryId, Sales) values (2, 4, 1000) –For Apple

Insert into Fact (CompanyId, CategoryId, Sales) values (4, 4, 4000) –For Google

Now, we process the cube and browse it:


It would be great if we could hide “Dummy”. For that, all we have to do is replace the value “dummy” with “Hardware” in DimCategory table, then go to the properties of sub-category level of user defined Hierarchy of DimCategory dimension and set “Hide Member If” to “Parent Name”.


Final result looks like this:


We can also insert total values instead of incremental values; 9000 for Apple and 12000 for Google and use MDX Scope statement to handle Parent sum.

Posted in MDX, SSAS (Analysis Service), Cube | Tagged , , , , | Leave a comment

Maximum value of a measure from an MDX result set

Below is an MDX Query which returns the sales of different product categories:

select {[Dim Category].[Category].[Category]} on 1,
{[Measures].[Measure Value]} on 0
from [MY CUBE]

result set returned:

We can see that the maximum value is 42092.83. Using max and axis function, we can show the maximum value as a seperate measure. So, we add a measure and the query becomes :

with
member [Measures].[MAXSales]
as
max(axis(1),[Measures].[Measure Value])

select {[Dim Category].[Category].[Category]} on 1,
{[Measures].[Measure Value],[Measures].[MAXSales]} on 0
from [MY CUBE]

the result set looks like this:

Posted in MDX, SSAS (Analysis Service), Cube | Tagged , , , | 2 Comments

Understanding crossjoin, set, tuples with help of SetToStr function

MDX cross join function returns the cross product of two sets. For example, we have two sets A = {1,2} and B = {x,y}. After crossjoin, we get, A X B = {(1,x),(1,y),(2,x),(2,y)}.

(1,x), (1,y) are individual tuples. The resultant set consists of these tuples.

Now, let’s see the below mdx query which uses crossjoin to form a named set called ASet. Crossjoin is done between a particular member of category hierarchy (“Hair Care”) and sub-categories hierarchy. A measure called [SetToString] is decalred. It calls SetToStr function on the named set. So, all the members of the set are converted into string.

with
set ASet
as
{crossjoin([Dim Category].[Category].[Category].&[Hair Care],[Dim Category].[Sub Category].[Sub Category])}
member [Measures].[SetToString]
as
SetToStr(ASet)
select ASet
on 1,
{[Measures].[SetToString]}
on 0
from [MY_CUBE]

returns result

A close look at the returned measure
{
( [Dim Category].[Category].&[Hair Care], [Dim Category].[Sub Category].&[Conditioner] ), <————-tuple
( [Dim Category].[Category].&[Hair Care], [Dim Category].[Sub Category].&[Shampoo] )      <————–tuple
}

Posted in MDX, SSAS (Analysis Service), Cube | Tagged , , , | Leave a comment

Composite Domain Rules (CD Rules) in Data Correction

Hello all

I am just learning DQS and would like to share a few things that I learned so far. One of them is the use of composite domains (CD). I will go straight into an example rather than wasting time on definitions. You can find lots of them in MSDN.

Let’s say, you have a table as shown below:

race_venue       year          won_by

malaysia            2012         UNKNOWN
australia            2012         XYZ

Now, you like to correct the values of the ‘won_by’ column using DQS. As, u can see, the value of the field ‘won_by’ depends upon the other two columns. That’s why we need a composite domain(CD) in DQS. Because, we are dealing with cross domain relationships here.

Now, in the domain management section of your knowledge base (KB), create 3 domains called  venue(string), year (int) and winner(string). Now create a CD called ‘set_winner’ combining all the 3 domains.

 Now, open the “CD Rules” tab and create a rule there. The rule looks like below:

CD Rules

In case, u r travelling the same troubled path as I did and wondering how to add another domain name/clause in the “IF” section of the rule, the solution is ‘clicking the right button’. If u click on right button, u will see an option called “Add Clause”.

So, u r done with the setting up of domains. It’s time to create a new data quality project(cleansing project). After u create a new project, map the 3 columns with the 3 domains. Don’t forget to include the composite domain before start processing your data.

composite domain project

Now, u r ready to start processing your data. Here is the result of applying our CD Rule on the data:

Posted in Data Quality Service (DQS) | Tagged , , , , , | Leave a comment

Set Synonyms in DQS

This is my first post on the newly added Data Quality Service of MS SQL Server. It’s about setting up synonyms in Domain Values. The steps are :

1. select “domain management” activity for your knowledge base (KB)
2. select a domain from your domain list
3. select the “domain values” tab
4. select all the values that you like to make synonyms of each other
5. right click (all the selected values are highlighted bright yellow)
6. select ‘set as synonyms’ from the right click menu
…and the job done

Posted in Data Quality Service (DQS) | Tagged , , , | Leave a comment

Load Excel File Dynamically Into Database Using SQLBulkCopy and GetOleDbSchemaTable in C#

Previously, I have posted some SSIS solutions which would handle dynamic column mapping in data loading. This time, I was wondering how to handle dynamic column mapping when the source is an excel file. I was quick to realize that it would be entirely a C# solution and the program should be able to handle any change in the number or name of columns. In simple words, you give any excel file and the program will load it into database. I used GetOleDbSchemaTable in order to fetch the schema information from the excel file, then built a similar table in the database. Excel data types and SQL data types do not match. Sometimes, they make me panic while loading files using SSIS. In my program also, I had to do some change in the data types; like WChar becomes NVARCHAR, Double becomes INT. Yes, I could use an XML file/database table to save a detailed mapping of this conversion of excel data types to SQL data types to make my program more capable and versatile. But, the conversion is hard-coded at the moment. Another challenge was to get all the data loaded to SQL table without using any column names or hard coded column mapping. SQLBulkCopy came to my rescue. By the way, I had to use a dataview over the table returned by GetOleDbSchemaTable because it was returning the column names in alphabetical order. So, I had to sort the rows using “ORDINAL_POSITION”

using System;

using System.Data.OleDb;

using System.Data.SqlClient;

using System.Data;

using System.Windows;

namespace ConsoleApplication2

{

class Program

{

           public static void Main()

{

try

{

///*****************Establish Excel Connection*****************///

String strExcelConn = “Provider=Microsoft.ACE.OLEDB.12.0;”

+ “Data Source=C:\\files\\Book1.xlsx;”

+ “Extended Properties=’Excel 8.0;HDR=Yes'”;

OleDbConnection connExcel = new OleDbConnection(strExcelConn);

OleDbCommand cmdExcel = new OleDbCommand();

cmdExcel.Connection = connExcel;

///*****Get schema info and Read Sheet Name and save it as TableName *******//

connExcel.Open();

DataTable dtExcelSchema;

dtExcelSchema = connExcel.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);

string TableName = dtExcelSchema.Rows[0][“TABLE_NAME”].ToString();

string Query = “CREATE TABLE “ + TableName + “(“;

/*****Get schema info and read the column names and build Query to Create similar SQL table*****/

//

dtExcelSchema = connExcel.GetOleDbSchemaTable(OleDbSchemaGuid.Columns, new object[] { null, null, TableName, null });

connExcel.Close();

string FieldNames = “”;

DataView dv = new DataView(dtExcelSchema);

dv.Sort = “ORDINAL_POSITION”;

foreach (DataRowView rowView in dv)

{

DataRow row = rowView.Row;

Query += row[“COLUMN_NAME”].ToString() + ” “ + ((OleDbType)row[“DATA_TYPE”]).ToString() + “,”;

FieldNames += row[“COLUMN_NAME”].ToString() + “,”;

}

/////******************Replace data types************/////

Query = Query.TrimEnd(“,”.ToCharArray()) + “)”;

Query = Query.Replace(“WChar”, “NVARCHAR(500)”);

Query = Query.Replace(“Double”, “INT”);

FieldNames = FieldNames.TrimEnd(“,”.ToCharArray()) + “”;

/******************read records from excel file************/

cmdExcel.CommandText = “Select “ + FieldNames + ” from [“ + TableName + “]”;

DataSet ds = new DataSet();

OleDbDataAdapter da = new OleDbDataAdapter();

connExcel.Open();

da.SelectCommand = cmdExcel;

da.Fill(ds);

connExcel.Close();

//******** Establish SQL Server connection and create table*****//

SqlConnection conn = new SqlConnection();

string sqlConn = “Data Source=YourServerName;Initial Catalog = YourDBName; Integrated Security = SSPI;Persist Security Info=False;”;

conn.ConnectionString = sqlConn;

SqlCommand cmd = new SqlCommand(Query);

cmd.Connection = conn;

conn.Open();

cmd.ExecuteNonQuery();

conn.Close();

//*****************Do bulk copy to table*****************//

DataTable dt = new DataTable();

dt = ds.Tables[0];

SqlBulkCopy sbc = new SqlBulkCopy(conn);

sbc.DestinationTableName = TableName;

conn.Open();

sbc.WriteToServer(dt);

conn.Close();

}

catch (Exception ex)

{

Console.WriteLine(ex.Message);

}

}

}

}

 

Posted in MS Visual Studio, C#.NET, SQL, Database, SSIS (SQL Server Integration Service) | Tagged , , , , , | 13 Comments

Access SSIS variables in script task and script component

You have to write codes to access the variables in these two components in a slight different way.

For Script Task:

DTS.Variables[“VariableName”].Value

For Script Component:

Variables.VariableName

Don’t forget to add these variables in task editor first. 🙂

Posted in SSIS (SQL Server Integration Service) | Tagged , , | Leave a comment