Id’s, The Great Debate

Using ‘Id’ or NameId.3549285383_11de3317a6_z

I had a conversation with Rob Toyias on Id’s. The impetus was an existing product we were working on that did not identify primary and foreign keys consistantly. We spent a lot of time chasing down primary and foreign keys.

I was of the mindset that a table should have a primary key called “Id” and all foreign keys should be their respective tablenames. For example a foreign key from the User Table should be called UserId.

Rob disagreed. He thought that each id should be named consistently, even in the table where it originated. For example, in the user table, the primary key would be UserId.

My argument was if you have a table with 3 Id’s how do you know which one is the primary? He countered saying that the Id’s would be consistent across the database.

The conversation ended in a stalemate.

Fast-forward 6 months. I am on a different project. I’m working with Id’s in Javascript. I applied my same Id naming scheme I use in the database to my javascript. On the employee page the variable id refers to the employee id. All other id’s are their respective names of their pages (i.e. userId). This worked great. Until today. I won’t go too deep into the details, but what it came down to was sharing code between pages. The shared code assumes id is the id of the page, which it is, but Employee Id and User Id are not interchangeable.

After spending a number of hours fixing the Id debacle, I conceded to Rob’s point.


* photo reference

SQL Server Tidbit: NULL Values and Bits

Another Sql server tidbit. If you have a column (bit) that is nullable and there are three possible values (0,1 and NULL). Applying the “dbo.tablename.column <> 1” filter will only return the rows with the value of 0. The null values will be omitted. To retrieve the null values and the 0 values you’ll have to explicitly filter for null values: “dbo.tablename.column <> 1 OR dbo.tablename.column is null

Clustered Index and Non-Clustered Index

Clustered Index
Clustered Indexes are primary keys and uniquely identify the data, they are called clustered index because the data is clustered together on the file-system. Clustering the data allows for quick retrieval and a table can only have one clustered index.

Non-Clustered Index
A non clustered index is an index that points to the clustered index. Ideally, the data has limited duplication. For example, take a create date. Rows created on a specific data has a limited scope of that day. Creating a non clustered index on CreateDate will allow quick retrieval of rows grouped by CreateDate. Thing of it as a grouping of all the same data with a reference(clustered index) to the physical row.

Nothing is free, the benefit of creating indexes is speed. Read speed greatly increases. The cost is insert, update and deletes slow down.

Naked WHERE Clause

In a conversation today, my coworker mentioned a ‘naked WHERE clause’. I’d never heard of the term. I posted a question on StackOverflow and an answer was posted shortly thereafter.

A naked WHERE clause would refer to something that always returns true.

WHERE 1 = 1

This is commonly used on something like a search form where the user can enter one or many values and rather than checking a counter to see if you need to add a WHERE clause you add a generic WHERE clause and build everything up in an AND clause. You can run into issues if you aren’t checking to make sure there is at least on AND clause added.

My first reaction was, why would someone do this and what is the big deal.

Then it hit me. A naked where clause is the same as a select without a where clause.

Hypersonic is Open

Hypersonic started as an attempt to ease the pain of writing CRUD with ADO.NET. My goal was to simplify data persistence and retrieval.

In the early days of .NET, ORMs did not exist. ORMs were a foreign idea. Most companies had their own data access strategy. As a contractor, I’d seen many of these strategies. .TEXT was released by Scott Watermasysk in 2003. It was the first popular open source, if not THE first .Net blogging engine. I studied the code. I liked Scott’s approach to data access. He had elegantly decomposed the operations needed to persist and retrieve data. I leveraged his approach in Hypersonic.

In early 2006 I finished the first build of Hypersonic. It was one of many tools in my toolbox. It was not called Hypersonic then. I soon realized the value of what I had created and made it a stand alone library. Friends began to use it and before I knew it, it developed a small following.

Hypersonic is now at version 3. I have successfully used it for many projects over the years — it is still my tool of choice when developing against stored procedures.

I have decided to open source it. The data access problem has been solved with frameworks like nHibernate and Entity Framework.

My hope is others will find success using it.

The source is at GitHub.

The binaries are available.

All UTC times are not necessarily the same

A friend pointed out that all UTC Time is not the same. When he told me, I responded with “What!?! What are you talking about? It’s the same.” “No it’s not” he said. He explained, that yes using UTC will allot you an agreed upon time format but that does not guarantee that both server’s clocks are synchronized.

For example, Sever A calls server B for updates. Both Servers use UTC Time. Server A sends over a timestamp, how do we know that the two servers clocks are synchronized and that the two times match, we don’t. The odds are they are not. How can they be? Absolute time does not exist. It’s all relative. By using a timestamp to retrieve data from another server you are making an assumption that both servers have the same time.

*photo reference

Quick and Dirty on Migrating Data

I had a great talk with my friend Dave today. He’s a Data Scientist. He knows his stuff, for sure.

We talked about a number of things, but one that really stuck out was data migration. He says never to migrate via code, use a tool. You are reinventing the wheel. You are locked into your solution. All the risk is in your court. And the solution is not flexible. With that said. He went on to say the most efficient way to move data is with a primary key and a hash.

The destination side will request all the primary key and row hash. Taking the primary key it will check if the row exists. If it does exist it will compare the hash of the source to the hash of the destination row. If they match then the process is repeated for the next row. If they don’t match, then the primary key is added to a list of rows to request from the source. If the primary key does not exist then the primary key is added to the list of rows to be retrieved from the source. When the row comparison is completed all the rows that are stale or do not exist are requested from the source and persisted to the destination.


If you enjoy grunt work you’ll do the above. If you are a developer who enjoys building robust applications you’ll leave the grunt work to the tools.

*Image reference