Advertisement:

Skystone Software

http://www.SkystoneSoftware.com

Scott Waletzko's Blog
SQL Server: Uniqueidentifier Primary Keys
Published: 1/21/2009
XMl / RSS

There is a lot of debate in the database architecture world about whether or not Uniqueidentifier types should be used as primary keys. Generally you will find that people fit into one of two camps: those who never use Uniqueidentifier primary keys, and those who always do. I generally fall into the latter camp, I prefer them to auto increment numeric fields for a variety of reasons, but the bottom line is that either can be more appropriate for any given situation. What's important is to properly evaluate the requirements of your system and / or table and use the correct tool for the job.

The arguments against using Uniqueidentifiers as primary keys in SQL Server tables generally have to do with their drawbacks. Here they are, itemized:

  • They are larger on disk than integers, taking up 16 bytes each.
  • They prevent the use of a clustered index on the primary key - forcing a resort of the records in the table in INSERT to order them properly (the defintion of a clustered index) is incredibly inefficient and can bring a system to a halt once the number of records grows to over 1 million or so.
  • The are impossible to remember, and can therefore make debugging more difficult.

Most people see these arguments and quickly jump to the conclusion that auto incrementing integer values are better for primary key ID fields. However, I think those people are missing quite a bit. Specifically, the benefits of using GUIDs:

  • They are nearly impossible to reverse engineer or anticipate. Consider a Web application that passes ID values in the querystring (not a great practice, but one that is used nonetheless). If your account ID is "55", then it's not a leap to imagine that there is an account with an ID of "54". You cannot make this same leap when uniqueidentifier types are used.
  • They are globally unique. This means that your data can be easily merged with data from other systems, especially using SQL Server replication (the uniqueidentifier serves as a replica ID). You can also insert similar data into multiple copies of a distributed set of databases and be assured that each record will have a unique ID.
  • They can be generated outside of the database. You can create a new System.GUID in .NET that will insert nicely into a uniqueidentifier field in your database, without having to return trip an insert to retrieve the new value. Lazy programming? Possibly, but eminently practical.
  • They mask inferred information. If you create a new account and your account number is "50", then you know you're working with a business that doesn't have all that many clients.

Honestly the arguments against using them are simply not all that powerful as the ones for using them. Disk space? Is that really much of a problem anymore? Debugging problems? That's what copy / paste is for. Slow performance on INSERT? Buy faster hardware, and create a clustered index on a different field - like DateInserted. That's actually better in a lot of ways.

I have designed and built a ton of systems in my time, ranging from small desktop applications with one user to huge enterprise level systems that processed tens of thousands of records an hour. I always use Uniqueidentifier primary key fields, and I have had virtually no problems. There was one instance where I would have been better off using a BIGINT auto increment value, but archiving old data solved that problem (INSERTS dragged when we got to 25 million records or so), and that's something we should have been doing anyway.

My point is this: while I prefer using Uniqueidentifier fields whenever possible, the best approach is to be flexible and open minded, and evaluate each situation to determine the best data type to use in that particular instance.



Questions or Comments? .

VB to C# and C# to VB translation provided by Instant C# and Instant VB.