Monitor Data Exchange between DQS and Reference Data Service Providers - TechNet Articles - United States (English) - TechNet Wiki

One of the key features in Data Quality Services (DQS)

Monitor Data Exchange between DQS and Reference Data Service Providers

One of the key features in is to use knowledge from a cloud-based reference data service (Windows Azure Marketplace) to cleanse your data. However, in Data Quality Client, you cannot monitor information about the data exchanged between DQS and the reference data service providers during the cleansing process. This article provides information about why and how you can view/monitor this information.

Why to monitor data exchange?

Well, primarily for the following two reasons:

  • Financial auditing: You are charged/billed for data exchange between DQS and reference data service providers in the cloud. It is good to have the information handy in case you find any discrepancy in your billing, and want to verify actual usage of the reference data service.
  • Troubleshooting: Provides various information about the data exchange such as total time taken for the cleansing of a batch of records, how many records were cleansed, response from the reference data service provider (HTTP status), and so on. This information enables you to efficiently track issues with reference data-based cleansing, and quickly answer support calls from customer.

How to monitor data exchange?

The V_A_REFERENCE_DATA_AUDIT view in the DQS_MAIN database contains information about the data exchange between DQS and reference data service. Each batch data request is stored as a row in the view. This V_A_REFERENCE_DATA_AUDIT view contains the following columns:

  • ID: Unique ID of the data exchange record in this view.
  • Provider Name: Reference data service provider name used for cleansing.
  • Project: Data Quality Project name used for cleansing.
  • Knowledge Base: DQS Knowledge base used for cleansing.
  • Batch ID: Unique ID of the data batch sent to the reference data provider.

    IMPORTANT: When a domain value is successfully queried once from the reference data service provider, it is cached locally to prevent the same domain values from being queried multiple times from the reference data service provider. You are not billed for queries against the cached data. In this case, ##Cache Lookup## is displayed in the Batch ID column instead of a GUID to signify that the data has been queried against the local cache.
  • Number of records sent: Number of records sent to the reference data provider.
  • Number of records received: Number of records received from the reference data provider.
  • Time sent: Time when the batch request was sent to the reference data provider for data cleansing. (Note: this time stays constant across multiple batches within the same activity, because there is a bug where the time is not refreshed for each individual batch in SQL Server 2012 RTM)
  • Time received: Time when the reference date service responded with the cleansed data.
  • Duration: Time taken by the reference data provider to respond with the cleansed data. This is calculated as the difference between the request and response time. (Note: this duration is cumulative is since the beginning of the activity, since the Time sent column is not incremented per individual batch in SQL Server 2012 RTM)
  • Response Code: The HTTP status code for the operation. For example, 200 OK, 504 - Gateway Timeout, and so on.

To view data in the V_A_REFERENCE_DATA_AUDIT view in the DQS_MAIN database:

  1. Start Microsoft SQL Server Management Studio
  2. In Microsoft SQL Server Management Studio, right-click the SQL Server instance where DQS is installed, and then click New Query in the shortcut menu.
  3. In the Query Editor window, copy the following SQL statement:
    select * from DQS_MAIN.dbo.V_A_REFERENCE_DATA_AUDIT
  4. Press F5 to run the statement. The Results pane will display the data in the V_A_REFERENCE_DATA_AUDIT view.

Example Output:

ID PROVIDER NAME PROJECT KNOWLEDGEBASE BATCH ID NUMBER OF RECORDS SENT NUMBER OF RECORDS RECEIVED TIME SENT TIME RECEIVED DURATION RESPONSE CODENUMBER OF RECORDS SENT NUMBER OF RECORDS RECEIVED TIME SENT TIME RECEIVED 1000000 Provider 1 Project1 KB1 ##Cache Lookup## 38 0 2011-12-12 14:49:40.973 2011-12-12 14:49:41.063 00:00:00.0900000 OK
1000001 Provider 1 Project1 KB1 354aae07-bb1e-47ea-a9f8-43afdcbacefb 38 0 2011-12-12 14:49:41.127 2011-12-12 14:50:27.067 00:00:45.9400000 504 - Gateway Timeout
1000002 Provider 1 Project1 KB1 191c6a4e-af18-4117-a273-53d9e80988c8 20 20 2011-12-12 14:50:27.097 2011-12-12 14:50:35.153 00:00:08.0570000 OK
1000003 Provider 2 Project2 KB2 ##Cache Lookup## 288 0 2012-02-07 11:38:40.163 2012-02-07 11:38:40.560 00:00:00.3970000 OK
1000004 Provider 2 Project2 KB2 OK
1000004 Provider 2 Project2 82db1c29-b77b-4452-a837-d4e1744b655e 100 100 2012-02-07 11:38:40.630 2012-02-07 11:38:46.057 00:00:05.4270000 OK
1000005 Provider 2 Project2 KB2 5d0e526d-56db-47bc-bebe-caaa43123665 100 100 2012-02-07 11:38:40.630 2012-02-07 11:38:50.447 00:00:09.8170000 OK
1000006 Provider 2 Project2 KB2 847ba394-92aa-4a55-b882-940834838d69 88 88 2012-02-07 11:38:40.630 2012-02-07 11:38:54.453 00:00:13.8230000 OK

If you have enabled logging for reference data service (RDS) in Data Quality Client (see Configure Severity Levels for DQS Log Files), each batch request is also logged in the DQS Server log file (DQServerLog.DQS_MAIN.log). A sample entry in the DQS Server log file looks like the following:

12/14/2011 9:44:55 AM|[]|15|INFO|f592c205-3a81-40d3-8fe8-08a511df6762|Microsoft.Ssdqs.ReferenceData.Auditing.ReferenceDataAuditor| Batch Job: PROVIDER [1002], PROJECT [1000078], PROCESS [1079], BATCH ID [f6388357-a209-413f-b2df-499c3ea8b12e], NUMBER OF RECORDS SENT [10], NUMBER OF RECORDS RECEIVED [10], TIME SENT = [12/14/2011 09:44:03], TIME RECEIVED [12/14/2011 09:44:55], DURATION [00:00:52.2082203], RESPONSE CODE [OK]


See Also