Hello all
I am just learning DQS and would like to share a few things that I learned so far. One of them is the use of composite domains (CD). I will go straight into an example rather than wasting time on definitions. You can find lots of them in MSDN.
Let’s say, you have a table as shown below:
race_venue year won_by
malaysia 2012 UNKNOWN
australia 2012 XYZ
Now, you like to correct the values of the ‘won_by’ column using DQS. As, u can see, the value of the field ‘won_by’ depends upon the other two columns. That’s why we need a composite domain(CD) in DQS. Because, we are dealing with cross domain relationships here.
Now, in the domain management section of your knowledge base (KB), create 3 domains called venue(string), year (int) and winner(string). Now create a CD called ‘set_winner’ combining all the 3 domains.
Now, open the “CD Rules” tab and create a rule there. The rule looks like below:
In case, u r travelling the same troubled path as I did and wondering how to add another domain name/clause in the “IF” section of the rule, the solution is ‘clicking the right button’. If u click on right button, u will see an option called “Add Clause”.
So, u r done with the setting up of domains. It’s time to create a new data quality project(cleansing project). After u create a new project, map the 3 columns with the 3 domains. Don’t forget to include the composite domain before start processing your data.
Now, u r ready to start processing your data. Here is the result of applying our CD Rule on the data: