Understanding crossjoin, set, tuples with help of SetToStr function

MDX cross join function returns the cross product of two sets. For example, we have two sets A = {1,2} and B = {x,y}. After crossjoin, we get, A X B = {(1,x),(1,y),(2,x),(2,y)}.

(1,x), (1,y) are individual tuples. The resultant set consists of these tuples.

Now, let’s see the below mdx query which uses crossjoin to form a named set called ASet. Crossjoin is done between a particular member of category hierarchy (“Hair Care”) and sub-categories hierarchy. A measure called [SetToString] is decalred. It calls SetToStr function on the named set. So, all the members of the set are converted into string.

with
set ASet
as
{crossjoin([Dim Category].[Category].[Category].&[Hair Care],[Dim Category].[Sub Category].[Sub Category])}
member [Measures].[SetToString]
as
SetToStr(ASet)
select ASet
on 1,
{[Measures].[SetToString]}
on 0
from [MY_CUBE]

returns result

A close look at the returned measure
{
( [Dim Category].[Category].&[Hair Care], [Dim Category].[Sub Category].&[Conditioner] ), <————-tuple
( [Dim Category].[Category].&[Hair Care], [Dim Category].[Sub Category].&[Shampoo] )      <————–tuple
}

Advertisements
Posted in MDX, SSAS (Analysis Service), Cube | Tagged , , , | Leave a comment

Composite Domain Rules (CD Rules) in Data Correction

Hello all

I am just learning DQS and would like to share a few things that I learned so far. One of them is the use of composite domains (CD). I will go straight into an example rather than wasting time on definitions. You can find lots of them in MSDN.

Let’s say, you have a table as shown below:

race_venue       year          won_by

malaysia            2012         UNKNOWN
australia            2012         XYZ

Now, you like to correct the values of the ‘won_by’ column using DQS. As, u can see, the value of the field ‘won_by’ depends upon the other two columns. That’s why we need a composite domain(CD) in DQS. Because, we are dealing with cross domain relationships here.

Now, in the domain management section of your knowledge base (KB), create 3 domains called  venue(string), year (int) and winner(string). Now create a CD called ‘set_winner’ combining all the 3 domains.

 Now, open the “CD Rules” tab and create a rule there. The rule looks like below:

CD Rules

In case, u r travelling the same troubled path as I did and wondering how to add another domain name/clause in the “IF” section of the rule, the solution is ‘clicking the right button’. If u click on right button, u will see an option called “Add Clause”.

So, u r done with the setting up of domains. It’s time to create a new data quality project(cleansing project). After u create a new project, map the 3 columns with the 3 domains. Don’t forget to include the composite domain before start processing your data.

composite domain project

Now, u r ready to start processing your data. Here is the result of applying our CD Rule on the data:

Posted in Data Quality Service (DQS) | Tagged , , , , , | Leave a comment

Set Synonyms in DQS

This is my first post on the newly added Data Quality Service of MS SQL Server. It’s about setting up synonyms in Domain Values. The steps are :

1. select “domain management” activity for your knowledge base (KB)
2. select a domain from your domain list
3. select the “domain values” tab
4. select all the values that you like to make synonyms of each other
5. right click (all the selected values are highlighted bright yellow)
6. select ‘set as synonyms’ from the right click menu
…and the job done

Posted in Data Quality Service (DQS) | Tagged , , , | Leave a comment

Load Excel File Dynamically Into Database Using SQLBulkCopy and GetOleDbSchemaTable in C#

Previously, I have posted some SSIS solutions which would handle dynamic column mapping in data loading. This time, I was wondering how to handle dynamic column mapping when the source is an excel file. I was quick to realize that it would be entirely a C# solution and the program should be able to handle any change in the number or name of columns. In simple words, you give any excel file and the program will load it into database. I used GetOleDbSchemaTable in order to fetch the schema information from the excel file, then built a similar table in the database. Excel data types and SQL data types do not match. Sometimes, they make me panic while loading files using SSIS. In my program also, I had to do some change in the data types; like WChar becomes NVARCHAR, Double becomes INT. Yes, I could use an XML file/database table to save a detailed mapping of this conversion of excel data types to SQL data types to make my program more capable and versatile. But, the conversion is hard-coded at the moment. Another challenge was to get all the data loaded to SQL table without using any column names or hard coded column mapping. SQLBulkCopy came to my rescue. By the way, I had to use a dataview over the table returned by GetOleDbSchemaTable because it was returning the column names in alphabetical order. So, I had to sort the rows using “ORDINAL_POSITION”

using System;

using System.Data.OleDb;

using System.Data.SqlClient;

using System.Data;

using System.Windows;

namespace ConsoleApplication2

{

class Program

{

           public static void Main()

{

try

{

///*****************Establish Excel Connection*****************///

String strExcelConn = “Provider=Microsoft.ACE.OLEDB.12.0;”

+ “Data Source=C:\\files\\Book1.xlsx;”

+ “Extended Properties=’Excel 8.0;HDR=Yes'”;

OleDbConnection connExcel = new OleDbConnection(strExcelConn);

OleDbCommand cmdExcel = new OleDbCommand();

cmdExcel.Connection = connExcel;

///*****Get schema info and Read Sheet Name and save it as TableName *******//

connExcel.Open();

DataTable dtExcelSchema;

dtExcelSchema = connExcel.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);

string TableName = dtExcelSchema.Rows[0][“TABLE_NAME”].ToString();

string Query = “CREATE TABLE “ + TableName + “(“;

/*****Get schema info and read the column names and build Query to Create similar SQL table*****/

//

dtExcelSchema = connExcel.GetOleDbSchemaTable(OleDbSchemaGuid.Columns, new object[] { null, null, TableName, null });

connExcel.Close();

string FieldNames = “”;

DataView dv = new DataView(dtExcelSchema);

dv.Sort = “ORDINAL_POSITION”;

foreach (DataRowView rowView in dv)

{

DataRow row = rowView.Row;

Query += row[“COLUMN_NAME”].ToString() + ” “ + ((OleDbType)row[“DATA_TYPE”]).ToString() + “,”;

FieldNames += row[“COLUMN_NAME”].ToString() + “,”;

}

/////******************Replace data types************/////

Query = Query.TrimEnd(“,”.ToCharArray()) + “)”;

Query = Query.Replace(“WChar”, “NVARCHAR(500)”);

Query = Query.Replace(“Double”, “INT”);

FieldNames = FieldNames.TrimEnd(“,”.ToCharArray()) + “”;

/******************read records from excel file************/

cmdExcel.CommandText = “Select “ + FieldNames + ” from [“ + TableName + “]”;

DataSet ds = new DataSet();

OleDbDataAdapter da = new OleDbDataAdapter();

connExcel.Open();

da.SelectCommand = cmdExcel;

da.Fill(ds);

connExcel.Close();

//******** Establish SQL Server connection and create table*****//

SqlConnection conn = new SqlConnection();

string sqlConn = “Data Source=YourServerName;Initial Catalog = YourDBName; Integrated Security = SSPI;Persist Security Info=False;”;

conn.ConnectionString = sqlConn;

SqlCommand cmd = new SqlCommand(Query);

cmd.Connection = conn;

conn.Open();

cmd.ExecuteNonQuery();

conn.Close();

//*****************Do bulk copy to table*****************//

DataTable dt = new DataTable();

dt = ds.Tables[0];

SqlBulkCopy sbc = new SqlBulkCopy(conn);

sbc.DestinationTableName = TableName;

conn.Open();

sbc.WriteToServer(dt);

conn.Close();

}

catch (Exception ex)

{

Console.WriteLine(ex.Message);

}

}

}

}

 

Posted in MS Visual Studio, C#.NET, SQL, Database, SSIS (SQL Server Integration Service) | Tagged , , , , , | 13 Comments

Access SSIS variables in script task and script component

You have to write codes to access the variables in these two components in a slight different way.

For Script Task:

DTS.Variables[“VariableName”].Value

For Script Component:

Variables.VariableName

Don’t forget to add these variables in task editor first. 🙂

Posted in SSIS (SQL Server Integration Service) | Tagged , , | Leave a comment

Find out anagram using SQL

Any word or phrase that exactly reproduces the letters in another order is an anagram. For example: “Hamlet” is an anagram of the word “Amleth”. We will show u a query today which can determine if two given words are anagram or not. We have used a Tally table for parsing the given strings. It has only one column called ‘N’ and the values of this column are like 1,2,3,4…….100

DROP TABLE #a
DROP TABLE #b

DECLARE @str1 VARCHAR(30)
DECLARE @str2 VARCHAR(30)

SET @str1 = ‘ramy’
SET @str2 = ‘mary’

SELECT N, SUBSTRING(@str1,N,1)letters
INTO #a
FROM dbo.Tally  WHERE N <= LEN(@str1)
ORDER BY N

SELECT N, SUBSTRING(@str2,N,1)letters
INTO #b
FROM dbo.Tally  WHERE N <= LEN(@str2)
ORDER BY N

IF EXISTS(
SELECT A.*,B.* FROM
(
SELECT letters,COUNT(*)C FROM #a GROUP BY letters
)A
FULL OUTER JOIN
(
SELECT letters,COUNT(*)C FROM #b GROUP BY letters
)B
ON A.letters = B.letters AND A.C = B.C
WHERE A.letters IS NULL OR B.letters IS NULL
)
PRINT ‘NOT ANAGRAM’
ELSE
PRINT ‘ANAGRAM’

Posted in SQL, Database | Tagged , | Leave a comment

Avoid Concurrency problem while running SQL Server Jobs

Hello everyone,

This time my problem was to delay the start of a SQL server job if another job is already running. Obviously, the reason for this is that the jobs use the same database resources. The sysjobhistory table in MSDB database has the job ids and their status stored. First, I selected the job id which have the run_status=4 (4 means “Job in progress”) from sysjobhistory table. Then I selected the job_id and name of my suspected jobs from sysjobs table. Then, I joined these two result sets and put the joined result set inside IF NOT EXISTS command. In short, I check if any of my suspected jobs is running before I start my job. If the “IF NOT EXISTS” command returns true, I break the while loop and start my job. If it returns false, the process goes to sleep mode for 10 minutes. After 10 minutes the process wakes up and continue from the beginning of the while loop. Below is the code I have used to solve my problem:

WHILE(1=1)
BEGIN
IF NOT EXISTS
(

SELECT A.job_id FROM

(

SELECT job_id FROM msdb.dbo.sysjobhistory WHERE  run_status = 4

)A

INNER JOIN

(

SELECT job_id,name FROM msdb.dbo.sysjobs WHERE name LIKE ‘%Mirror%’
OR name LIKE ‘%BR%’
OR name like ‘%Fact%’

)B

ON A.job_id = B.job_id

)

BEGIN
BREAK;
END

WAITFOR DELAY ’10:00′

END

—-start my job——
EXEC msdb..sp_start_job @job_name = ‘My Job’

Posted in SQL, Database, SSIS (SQL Server Integration Service) | Tagged , , , , , | Leave a comment