· Chuck Conway · Programming  · 2 min read

NVarchar Vs Varchar

Each engineer defining a new string column decides: Do I use nvarchar or do I use varchar?

Each engineer defining a new string column decides: Do I use nvarchar or do I use varchar?

Each engineer defining a new string column decides: Do I use nvarchar or do I use varchar?

Since I discovered nvarchar, I’ve always use nvarchar. My thought is, why use a datatype that may not support a text value, and you won’t likely discover an incompatibility value until it’s in production.

I hear the argument about space, but space is cheap and not worth worrying about. I know what you’re thinking, the cost doesn’t matter when the hard drive is full, and I agree.

Starting with Sql Server 2008 R2 data compression is applied to nchar and nvarchar (nvarchar(max) is excluded) fields. Depending on the data the effectiveness of the compression varies, but with English, there is a 50% compression, which puts it on par with the varchar’s space needs (1).

Something else to consider is most programming languages support UTF-16 as the string type. So each time a varchar is loaded from the database, it’s converted to UTF-16 (nvarchar-ish)

This StackOverflow answer sums up nvarchar vs. varchar:

An nvarchar column can store any Unicode data. A varchar column is restricted to an 8-bit codepage. Some people think that varchar should be used because it takes up less space. I believe this is not the correct answer. Codepage incompatabilities are a pain, and Unicode is the cure for codepage problems. With cheap disk and memory nowadays, there is really no reason to waste time mucking around with code pages anymore.

All modern operating systems and development platforms use Unicode internally. By using nvarchar rather than varchar, you can avoid doing encoding conversions every time you read from or write to the database. Conversions take time, and are prone to errors. And recovery from conversion errors is a non-trivial problem.

If you are interfacing with an application that uses only ASCII, I would still recommend using Unicode in the database. The OS and database collation algorithms will work better with Unicode. Unicode avoids conversion problems when interfacing with other systems. And you will be preparing for the future. And you can always validate that your data is restricted to 7-bit ASCII for whatever legacy system you’re having to maintain, even while enjoying some of the benefits of full Unicode storage. (2)

My conclusion is the only time the data is a varchar is when it’s at rest.

References:

1. Unicode Compression implementation
2. What is the difference between varchar and nvarchar?

Share:
Back to Blog

Related Posts

View All Posts »
Index Fragmentation in SQL Azure, Who Knew!

Index Fragmentation in SQL Azure, Who Knew!

I’ve been on my project for over a year and it has significantly grown as an application and in data during the year. It’s been nonstop new features. I’ve rarely gone back and refactored code...

C# 8 - Nullable Reference Types

C# 8 - Nullable Reference Types

Microsoft is adding a new feature to C# 8 called Nullable Reference Types. Which at first, is confusing because all reference types are nullable… so how this different? Going forward, if the feature is enabled, references types are non-nullable, unless you explicitly notate them as nullable.