Implementing Transparent Encryption with NHibernate Listeners (Interceptors)

Have you ever had to encrypt data in the database? In this post, I’ll explore how using nHibernate Listeners to encrypt and decrypt data coming from and going into your database. The cryptography will be transparent to your application.

Why would you want to do this? SQL Server has encryption baked into the product. That is true, but if you are moving to the cloud and want to use SQL Azure you’ll need some sort of cryptography strategy. SQL Azure does not support database encryption.

What is an nHibernate Listener? I think of a Listener as a piece of code that I can inject into specific extensibility points in the nHibernate persistence and data hydration lifecycle.

As of this writing the following extensibility points are available in nHibernate.

  • IAutoFlushEventListener
  • IDeleteEventListener
  • IDirtyCheckEventListener
  • IEvictEventListener
  • IFlushEntityEventListener
  • IFlushEventListener
  • IInitializeCollectionEventListener
  • ILoadEventListener
  • ILockEventListener
  • IMergeEventListener
  • IPersistEventListener
  • IPostCollectionRecreateEventListener
  • IPostCollectionRemoveEventListener
  • IPostCollectionUpdateEventListener
  • IPostDeleteEventListener
  • IPostInsertEventListener
  • IPostLoadEventListener
  • IPostUpdateEventListener
  • IPreCollectionRecreateEventListener
  • IPreCollectionRemoveEventListener
  • IPreCollectionUpdateEventListener
  • IPreDeleteEventListener
  • IPreInsertEventListener
  • IPreLoadEventListener
  • IPreUpdateEventListener
  • IRefreshEventListener
  • IReplicateEventListener
  • ISaveOrUpdateEventListener

The list is extensive.

To implement transparent cryptography, we need to find the right place to encrypt and decrypt the data. For encrypting the data we’ll use IPostInsertEventListener and IPostUpdateEventListener. With these events we’ll catch the new data and the updated data going into the database. For decrypting, we’ll use the IPreLoadEventListener.

For this demonstration we’ll be using DatabaseCryptography class for encrypting and decrypting. The cryptography implementation is not important for this article.

IPreLoadEventListener

public class PreLoadEventListener : IPreLoadEventListener
{
readonly DatabaseCryptography _crypto = new DatabaseCryptography();

///
/// Called when [pre load].
///

///The event. public void OnPreLoad(PreLoadEvent @event)
{
_crypto.DecryptProperty(@event.Entity, @event.Persister.PropertyNames, @event.State);
}
}

IPreInsertEventListener

public class PreInsertEventListener : IPreInsertEventListener
{
readonly DatabaseCryptography _crypto = new DatabaseCryptography();

///
/// Return true if the operation should be vetoed
///

///The event. /// true if XXXX, false otherwise.
public bool OnPreInsert(PreInsertEvent @event)
{
_crypto.EncryptProperties(@event.Entity, @event.State, @event.Persister.PropertyNames);

return false;
}
}

IPreUpdateEventListener

public class PreUpdateEventListener : IPreUpdateEventListener
{
readonly DatabaseCryptography _crypto = new DatabaseCryptography();

///
/// Return true if the operation should be vetoed
///

///The event. /// true if XXXX, false otherwise.
public bool OnPreUpdate(PreUpdateEvent @event)
{
_crypto.EncryptProperties(@event.Entity, @event.State, @event.Persister.PropertyNames);

return false;
}
}

It’s important to note that on both IPreUpdateEventListener and IPreInsertEventListener must return false, otherwise the insert/update event will be aborted.

Now that we have the Listeners implemented we need to register them with nHibernate. I am using FluentNHibernate so this will be different if you are using raw nHibernate.

SessionFactory

public class SessionFactory
{
///
/// Creates the session factory.
///

/// ISessionFactory.
public static ISessionFactory CreateSessionFactory()
{
return Fluently.Configure()

.Database(MsSqlConfiguration.MsSql2012
.ConnectionString(c => c
.FromConnectionStringWithKey("DefaultConnection")))

.Mappings(m => m.FluentMappings.AddFromAssemblyOf())
.ExposeConfiguration(s =>
{
s.SetListener(ListenerType.PreUpdate, new PreUpdateEventListener());
s.SetListener(ListenerType.PreInsert, new PreInsertEventListener());
s.SetListener(ListenerType.PreLoad, new PreLoadEventListener());
})
.BuildConfiguration()
.BuildSessionFactory();
}

When decrypting and encrypting data at the application level it makes the data useless in the database. You’ll need to bring the data back into the application to read the values of the encrypted fields. We want to limit the fields that are encrypted and we only want to encrypt string values. Encrypting anything other that string values complicates things. There is nothing saying we can’t encrypt dates, but doing so will require the date field in the database to become a string(nvarchar or varchar) field, to hold the encrypted data, once we do this we lose the ability to operate on the date field from the database.

To identify which fields we want encrypted and decrypted I’ll use marker attributes.

Encrypt Attribute

public class EncryptAttribute : Attribute
{
}

Decrypted Attribute

public class DecryptAttribute : Attribute
{
}

To see the EncryptAttribute and the DecryptedAttribute in action we’ll take a peek into the DatabaseCryptography class.

DatabaseCryptography

public class DatabaseCryptography
{
readonly Crypto _crypto = ObjectFactory.GetInstance();

///
/// Encrypts the properties.
///

///The entity. ///The state. ///The property names. public void EncryptProperties(object entity, object[] state, string[] propertyNames)
{
Crypt(entity, propertyNames, s=>_crypto.Encrypt(s),state);
}

///
/// Crypts the specified entity.
///

///
///The entity. ///The state. ///The property names. ///The crypt. private void Crypt(object entity, string[] propertyNames, Func<string, string> crypt, object[] state) where T : Attribute
{
if (entity != null)
{
var properties = entity.GetType().GetProperties();

foreach (var info in properties)
{
var attributes = info.GetCustomAttributes(typeof (T), true);

if (attributes.Any())
{
var name = info.Name;
var count = 0;

foreach (var s in propertyNames)
{
if (string.Equals(s, name, StringComparison.InvariantCultureIgnoreCase))
{
var val = Convert.ToString(state[count]);
if (!string.IsNullOrEmpty(val))
{

val = crypt(val);
state[count] = val;
}

break;
}

count++;
}
}
}
}
}

///
/// Decrypts the property.
///

///The entity. ///The state. ///The property names. public void DecryptProperies(object entity, string[] propertyNames, object[] state)
{
Crypt(entity, propertyNames, s => _crypto.Decrypt(s), state);
}

}

That’s it. Now the encryption and decryption of data will be transparent to the application and you can go on your merry way building the next Facebook.

Id’s, The Great Debate

Using ‘Id’ or NameId.3549285383_11de3317a6_z

I had a conversation with Rob Toyias on Id’s. The impetus was an existing product we were working on that did not identify primary and foreign keys consistantly. We spent a lot of time chasing down primary and foreign keys.

I was of the mindset that a table should have a primary key called “Id” and all foreign keys should be their respective tablenames. For example a foreign key from the User Table should be called UserId.

Rob disagreed. He thought that each id should be named consistently, even in the table where it originated. For example, in the user table, the primary key would be UserId.

My argument was if you have a table with 3 Id’s how do you know which one is the primary? He countered saying that the Id’s would be consistent across the database.

The conversation ended in a stalemate.

Fast-forward 6 months. I am on a different project. I’m working with Id’s in Javascript. I applied my same Id naming scheme I use in the database to my javascript. On the employee page the variable id refers to the employee id. All other id’s are their respective names of their pages (i.e. userId). This worked great. Until today. I won’t go too deep into the details, but what it came down to was sharing code between pages. The shared code assumes id is the id of the page, which it is, but Employee Id and User Id are not interchangeable.

After spending a number of hours fixing the Id debacle, I conceded to Rob’s point.

 

* photo reference

SQL Server Tidbit: NULL Values and Bits

Another Sql server tidbit. If you have a column (bit) that is nullable and there are three possible values (0,1 and NULL). Applying the “dbo.tablename.column <> 1” filter will only return the rows with the value of 0. The null values will be omitted. To retrieve the null values and the 0 values you’ll have to explicitly filter for null values: “dbo.tablename.column <> 1 OR dbo.tablename.column is null

Clustered Index and Non-Clustered Index

Clustered Index
Clustered Indexes are primary keys and uniquely identify the data, they are called clustered index because the data is clustered together on the file-system. Clustering the data allows for quick retrieval and a table can only have one clustered index.

Non-Clustered Index
A non clustered index is an index that points to the clustered index. Ideally, the data has limited duplication. For example, take a create date. Rows created on a specific data has a limited scope of that day. Creating a non clustered index on CreateDate will allow quick retrieval of rows grouped by CreateDate. Thing of it as a grouping of all the same data with a reference(clustered index) to the physical row.

Nothing is free, the benefit of creating indexes is speed. Read speed greatly increases. The cost is insert, update and deletes slow down.

Naked WHERE Clause

In a conversation today, my coworker mentioned a ‘naked WHERE clause’. I’d never heard of the term. I posted a question on StackOverflow and an answer was posted shortly thereafter.

A naked WHERE clause would refer to something that always returns true.

WHERE 1 = 1

This is commonly used on something like a search form where the user can enter one or many values and rather than checking a counter to see if you need to add a WHERE clause you add a generic WHERE clause and build everything up in an AND clause. You can run into issues if you aren’t checking to make sure there is at least on AND clause added.

My first reaction was, why would someone do this and what is the big deal.

Then it hit me. A naked where clause is the same as a select without a where clause.

Hypersonic is Open

Hypersonic started as an attempt to ease the pain of writing CRUD with ADO.NET. My goal was to simplify data persistence and retrieval.

In the early days of .NET, ORMs did not exist. ORMs were a foreign idea. Most companies had their own data access strategy. As a contractor, I’d seen many of these strategies. .TEXT was released by Scott Watermasysk in 2003. It was the first popular open source, if not THE first .Net blogging engine. I studied the code. I liked Scott’s approach to data access. He had elegantly decomposed the operations needed to persist and retrieve data. I leveraged his approach in Hypersonic.

In early 2006 I finished the first build of Hypersonic. It was one of many tools in my toolbox. It was not called Hypersonic then. I soon realized the value of what I had created and made it a stand alone library. Friends began to use it and before I knew it, it developed a small following.

Hypersonic is now at version 3. I have successfully used it for many projects over the years — it is still my tool of choice when developing against stored procedures.

I have decided to open source it. The data access problem has been solved with frameworks like nHibernate and Entity Framework.

My hope is others will find success using it.

The source is at GitHub.

The binaries are available.

All UTC times are not necessarily the same

A friend pointed out that all UTC Time is not the same. When he told me, I responded with “What!?! What are you talking about? It’s the same.” “No it’s not” he said. He explained, that yes using UTC will allot you an agreed upon time format but that does not guarantee that both server’s clocks are synchronized.

For example, Sever A calls server B for updates. Both Servers use UTC Time. Server A sends over a timestamp, how do we know that the two servers clocks are synchronized and that the two times match, we don’t. The odds are they are not. How can they be? Absolute time does not exist. It’s all relative. By using a timestamp to retrieve data from another server you are making an assumption that both servers have the same time.

*photo reference

Quick and Dirty on Migrating Data

I had a great talk with my friend Dave today. He’s a Data Scientist. He knows his stuff, for sure.

We talked about a number of things, but one that really stuck out was data migration. He says never to migrate via code, use a tool. You are reinventing the wheel. You are locked into your solution. All the risk is in your court. And the solution is not flexible. With that said. He went on to say the most efficient way to move data is with a primary key and a hash.

The destination side will request all the primary key and row hash. Taking the primary key it will check if the row exists. If it does exist it will compare the hash of the source to the hash of the destination row. If they match then the process is repeated for the next row. If they don’t match, then the primary key is added to a list of rows to request from the source. If the primary key does not exist then the primary key is added to the list of rows to be retrieved from the source. When the row comparison is completed all the rows that are stale or do not exist are requested from the source and persisted to the destination.

Grunt

If you enjoy grunt work you’ll do the above. If you are a developer who enjoys building robust applications you’ll leave the grunt work to the tools.

*Image reference