Saturday, February 28, 2009

SQL Server 2005 Table Row Counts... One Select Statement, No Table Scan

A big change that occurred with SQL Server 2005 (from 2000) is that the system table information is always in sync with the actual tables.

When the system tables get out of sync in SQL Server 2000 the following SQL statement must be run to synchronize them.
dbcc updateusage(0)
GO 
According to the MSDN site on DBCC UPDATEUSAGE (Transact-SQL) contains this little bit of information:
Reports and corrects pages and row count inaccuracies in the catalog views. These inaccuracies may cause incorrect space usage reports returned by the sp_spaceused system stored procedure. In SQL Server 2005, these values are always maintained correctly. Databases created on SQL Server 2005 should never experience incorrect counts, however, databases upgraded to SQL Server 2005 may contain invalid counts. We recommend running DBCC UPDATEUSAGE after upgrading to SQL Server 2005 to correct any invalid counts.
Googling for information on getting table counts without a row scan yields several posts on the issue, but they involve creating functions and needlessly complicating a simple SQL statement.

There are 2 possible solutions. 1) Common Table Expressions and 2) Use a join. The first option looks like this:
WITH CTETableRowCount
AS
(
 SELECT 
   rows rowCnt
 , ID ObjectID 
 --, si.*
        FROM sysindexes si
        WHERE indid < 2
)
SELECT 
  so.[name]
, cte.[rowCnt] as [Row Count]
--, so.*
FROM sysobjects so LEFT JOIN CTETableRowCount cte ON so.id=cte.ObjectID
WHERE type='U' and so.name != 'dtproperties'
ORDER BY so.[name]
This works and is reliable, but a bit overkill. There is nothing in the Common Table Expression that really requires it to be like that and in fact just adds lines to the query with no benefit The above query can be simplified to the following:
SELECT 
   so.[name]
 , si.[rows] as [Row Count]
 --, so.*
FROM sysobjects so LEFT JOIN sysindexes si ON so.id=si.ID
WHERE (type='U') and (so.name != 'dtproperties') and (si.indid < 2)
ORDER BY so.[name]
This is probably the best solution to the issue.

Tuesday, February 24, 2009

Duplicate Entries in a Mapping Table with All Guid/Uniqueidentifier Columns

I recently ran into a case where a custom .net membership provider was incorrectly adding a record to a map table every time a user was updated. This wouldn't normally be a problem, except SharePoint's user security has a couple hang-ups when the number of groups doesn't match the distinct group list.

Needless to say, there was a LOT of duplicate records in the table; more than I was willing to weed through by hand. After performing the obligated web search, I couldn't find anything that fit my situation. Common Table Expressions looked promising, but had several limitations that I wasn't able to work around.

The problem could be prevented from the beginning with the correct keys/constraints, but that was not the case and I needed to clean up the table before making the correction.

The first bit of information that tells me how I am going to solve the issue is the DBMS. In this case it is an SQL Server 2005 instance.

Lets setup an example, you have a table with 3 "uniqueidentifier" columns: "ID", "FromMap", "ToMap".
CREATE TABLE [dbo].[DuplicateGuidMap](
 [ID] [uniqueidentifier] NOT NULL,
 [FromMap] [uniqueidentifier] NOT NULL,
 [ToMap] [uniqueidentifier] NOT NULL
) ON [PRIMARY]
Now lets populate the table. You can skip this part, nothing unusual here.
INSERT INTO [Test].[dbo].[DuplicateGuidMap]
           ([ID]
           ,[FromMap]
           ,[ToMap])
SELECT '0f6f2339-4597-4933-b8b0-de04804f4c04','e0bb5dad-f30e-4bcb-a4ec-12363af0e6cd','8a15fde9-f9f3-4ecb-acd4-9c78274f9d95' UNION ALL SELECT
'1a0d0991-6a12-4621-901d-0ca58a4fbf5a','88c3d3aa-663c-41e9-bdcf-f9022604b5fc','d4757f79-3452-4a3e-a4d8-1f2ea3a26c2a' UNION ALL SELECT
'14bc9408-9537-4d2a-b917-6e1b44099799','778cffba-a14b-447f-8648-96d640ae1b93','dd20127d-33ab-4d91-9202-8196a2ca90d6' UNION ALL SELECT
'3801a162-cdef-4ffc-b4b5-928f345fe68f','77121133-8c4f-4e28-b9fa-048707c41a7a','faae6e93-e91d-4729-8b69-367401130a42' UNION ALL SELECT
'e8386333-0e89-4ec0-8b2a-e16ae9544290','d385ea64-68f9-421c-bda0-3c0d9fd785e8','442fd8b3-a5d3-4074-9886-e9104d0f2421' UNION ALL SELECT
'a4b5dacb-3f21-43ea-b516-e7055c3831d1','10b7ec11-0cb8-4128-a146-2ca2ef0a9594','c09aa0e7-96fd-41fa-8fce-105a4ce908ee' UNION ALL SELECT
'2c340b29-cd5d-4036-92af-7ae24bdc47d9','dde6379c-6626-46c0-ac70-b7ddd0377e62','5ac17542-fc1d-4581-a616-5dd602e5f34f' UNION ALL SELECT
'd6d56501-98e0-4662-a72c-116b6e2a2e4c','08d0200d-375a-4022-af9e-bdc5ff4a57fe','4e001ae6-d8a0-45b5-a30a-1ab95b66dae4' UNION ALL SELECT
'ee392830-d077-48c0-a817-5a2169dc8bb6','abd1ed2d-3f4d-4f8d-9240-7aebc46ff021','3fdc2214-e11d-411d-b524-7cdd57037f6c' UNION ALL SELECT
'a37b664d-1735-444b-a326-2d23f8bf0b55','c0a1a8a0-efdd-4740-807a-f5f0aaba90ed','f95a288d-3e6c-43a4-9b59-54d382825e99' UNION ALL SELECT
'1d8a2558-0f50-4765-98a9-dd2cf2127bcf','e0bb5dad-f30e-4bcb-a4ec-12363af0e6cd','8a15fde9-f9f3-4ecb-acd4-9c78274f9d95' UNION ALL SELECT
'577a05de-4bbc-4b8d-93dc-22f9dd838e43','e446232a-02bb-47a0-b9a1-e5fab5c0aba3','faae6e93-e91d-4729-8b69-367401130a42'
We will assume that the relationship of "FromMap" to "ToMap" is 1 to Many, meaning that each "ToMap" should be unique. A good place to start would be to find out how many "ToMap" duplicates there are.
SELECT DISTINCT ToMap
  FROM DuplicateGuidMap
GROUP BY ToMap
HAVING Count(ToMap) > 1
There are 2 potential sets of duplicate records in the table now. ToMap:
  • FAAE6E93-E91D-4729-8B69-367401130A42
  • 8A15FDE9-F9F3-4ECB-ACD4-9C78274F9D95
Getting the list of all of the duplicated records is a trivial join, but I want a query that would return only the duplicate rows (leaving one of the duplicated records). Many DBA's who have worked for years would use cursors and possibly create functions. This is all well and good, but they are rather slow and consume a lot of CPU time. The idea of one SQL statement to rule them all doesn't come to mind. I tend to think in sets. I haven't found too many things I can't do in one Select statement. Knowing this, I banked heavily on my understanding of how SQL works and created the following query.
SELECT 
        dgm.ToMap
   , dgm.FromMap
   , dgm.ID
  FROM DuplicateGuidMap dgm
WHERE ToMap IN (
 SELECT DISTINCT ToMap
   FROM DuplicateGuidMap
 GROUP BY ToMap
 HAVING Count(ToMap) > 1
) AND ID NOT IN (
 SELECT TOP 1 ID
   FROM DuplicateGuidMap
 WHERE ToMap = dgm.ToMap
)
ORDER BY ToMap
The meat of the query lies in the Where clause.
ID NOT IN (
 SELECT TOP 1 ID
   FROM DuplicateGuidMap
 WHERE ToMap = dgm.ToMap
)
What's happening here? Since we are currently in the Where clause, we are banking on the fact that it is evaluated for every row that passes the first condition. So in the subquery, we are selecting the record we want to keep. The Where clause "ToMap = dgm.ToMap" grabs the "ToMap" in the main query for use in the subquery. This functionality works almost the same way as a SQL function without having to load and unload the function from memory every time. Having this included in the statement makes the query run faster. In a table with ~10,000 records, ~79 unique duplicates, and ~196 duplicates to remove the query took "00:00" (min:sec).

There is a couple points to make before we proceed. The Guid 'FAAE6E93-E91D-4729-8B69-367401130A42' case is not a valid dupicate and never occurred in the scenario. I added this case to dive farther into controlling which duplicates are actually valid. The only valid duplicate is '8A15FDE9-F9F3-4ECB-ACD4-9C78274F9D95' because the "FromMap"'s are the same. This is the solution I developed for originally.

We need to look at the "ID" Where clause. We can refine it to be: "FromMap = dgm.FromMap AND ToMap = dgm.ToMap".

The final query looks like the following. It returns a list of the duplicate records ready to delete leaving one of the duplicate records.
SELECT 
        dgm.ToMap
   , dgm.FromMap
   , dgm.ID
  FROM DuplicateGuidMap dgm
WHERE ToMap IN (
 SELECT DISTINCT ToMap
   FROM DuplicateGuidMap
 GROUP BY ToMap
 HAVING Count(ToMap) > 1
) AND ID NOT IN (
 SELECT TOP 1 ID
   FROM DuplicateGuidMap
 WHERE FromMap = dgm.FromMap AND ToMap = dgm.ToMap
)
ORDER BY ToMap
Never stop thinking outside the box. Most of the time you can find a simple answer with a little extra thought.

Monday, February 23, 2009

So That's What I've Been Doing All Along

I think a fascinating part about life is the things that I come up with in my mind, only to discover later on (sometimes years) a different vocabulary in which to use which describes the same ideas and principles I had come up with. I don't care that I wasn't the first to come up with them because I am a student of knowledge and am always excited to learn new things. Even if it is adjusting my vocabulary so I can dive into a newly "discovered" subject to gain new perspectives and further understand.

I tend to find human interaction and communication very interesting and sometimes a bit foreign. I tend to look at situations in the third person; putting emotions into a situation tends to exaserbate it.

My interest in the interactions, controlling them, and analyze them started in high school when I started reading a lot of philosophy. Being a quiet geek in school, I had a lot of time to observe and ponder. I later "experimented" with seeing if there were ways to control situations or at least influence them. I realized very quickly how powerful of a tool this is/was and that it could be used for good and bad. I resolved myself to thought experiments.

Logic, philosophy, and critical thinking have been quiet personal hobby of mine ever since. I took several excellent courses in college on them. I attribute them to my distaste for bad reporting on many new stations.

Several good web resources: I used them to enhance and refine my thought experiments.

A little while ago I ran across a blog post (sorry, I can't remember which one) from Mind Your Decisions by Presh Talwalkar and discovered that much of what I was doing in my mind was a major subject. Presh blogs on "Game Theory". The subject applies to many aspects of hard and soft sciences and mathematics. I find it quite amazing how the same ideas and principles I use to analyze situations in the sciences, mathematics and life are used by other people in a different vocabulary.

Discovering a new vocabulary that explains your own ideas is a lot like having a library of books in an unknown language and then suddenly being able to understand it. With a side of Christmas morning. Since that time, I have taken to any knowledge of game theory like a starving person to a sandwich.

Oh what I will learn tomorrow... I can hardly wait.

Happy Learning.

Sunday, February 22, 2009

Finding the Root Node Name with XQuery and SQL Server 2005

I recently had a case where i needed to test for the existence of a node in a query involving an XML. After researching I found a lot of posts indicating this as the solution:
DECLARE @x xml
SET @x = '
123 Main St.
' DECLARE @query varchar(30) SET @query = '/address/street' SELECT @x.exist('sql:variable("@query")')
A key point to make is that the ".exist" is case sensitive; using ".Exist" causes an error. The result is "1", however when i would change @query to an invalid xpath, I would still get "1". Never the less, it was still a good start toward figuring out how to find the root node tag name.

Many posts I found used the 'sql:variable("@variable")' as a parameter in a xpath query. I am quite experienced w/ XPath, especially with Xsl, so I applied it to my situation and came up with this (simplified for example)...
DECLARE @tag varchar(30)
SET @tag = 'address'

SELECT ID, XmlField
FROM sometable
WHERE XmlField.exist('/*[1][local-name() = sql:variable("@tag")]')
The result of this one was 1. If a changed @tag to an invalid input, the result was 0. This is exactly what I was expecting and thus my solution.

Taking this further, the tag name can be returned by extending the previous example.
DECLARE @tag varchar(30)
SET @tag = 'address'

SELECT ID, XmlField.query('local-name(/*[1])'), XmlField
FROM sometable
WHERE XmlField.exist('/*[1][local-name() = sql:variable("@tag")]')
There you go, hope it helps.