· Chuck Conway · Programming · 2 min read
NVarchar Vs Varchar
Each engineer defining a new string column decides: Do I use nvarchar or do I use varchar?
Each engineer defining a new string column decides: Do I use nvarchar
or do I use varchar?
Since I discovered nvarchar
, I’ve always use nvarchar
. My thought is, why use a datatype that may not support a text value, and you won’t likely discover an incompatibility value until it’s in production.
I hear the argument about space, but space is cheap and not worth worrying about. I know what you’re thinking, the cost doesn’t matter when the hard drive is full, and I agree.
Starting with Sql Server 2008 R2 data compression is applied to nchar
and nvarchar
(nvarchar(max)
is excluded) fields. Depending on the data the effectiveness of the compression varies, but with English, there is a 50% compression, which puts it on par with the varchar
’s space needs (1).
Something else to consider is most programming languages support UTF-16 as the string type. So each time a varchar
is loaded from the database, it’s converted to UTF-16 (nvarchar-
ish)
This StackOverflow answer sums up nvarchar
vs. varchar
:
An nvarchar column can store any Unicode data. A varchar column is restricted to an 8-bit codepage. Some people think that varchar should be used because it takes up less space. I believe this is not the correct answer. Codepage incompatabilities are a pain, and Unicode is the cure for codepage problems. With cheap disk and memory nowadays, there is really no reason to waste time mucking around with code pages anymore.
All modern operating systems and development platforms use Unicode internally. By using nvarchar rather than varchar, you can avoid doing encoding conversions every time you read from or write to the database. Conversions take time, and are prone to errors. And recovery from conversion errors is a non-trivial problem.
If you are interfacing with an application that uses only ASCII, I would still recommend using Unicode in the database. The OS and database collation algorithms will work better with Unicode. Unicode avoids conversion problems when interfacing with other systems. And you will be preparing for the future. And you can always validate that your data is restricted to 7-bit ASCII for whatever legacy system you’re having to maintain, even while enjoying some of the benefits of full Unicode storage. (2)
My conclusion is the only time the data is a varchar is when it’s at rest.
References:
1. Unicode Compression implementation
2. What is the difference between varchar and nvarchar?