Systems (i.e., databases) managing their data integrity sounds like common sense, and in simple scenarios, it is common sense. However, when the business rules get complex, it’s harder to validate the data in a central location.

When a system (i.e., a database) can no longer enforce the shape of the data, something else must pick up the slack. When might this happen? 

The phone number format in the US is (area code) (prefix) – (number), here’s an example: (734) 555-3212. We’ll talk about the database in this article for simplicity’s sake, but the datastore doesn’t have to be a database. 

Phone numbers in the US always have ten digits (we are ignoring the international digit). Phone numbers can come in a variety of formats:  

  • xxx.xxx.xxxx 
  • xxx-xxx-xxxx
  • (xxx) xxx-xxxx
  • (xxx) xxx.xxxx

Most databases are limited to data-types (i.e., numbers, strings, dates, etc.) and don’t support formating. Many applications opt to use the string data-type to store the phone number. However, the string data-type accepts ANY string. To ensure the phone number is valid, we need an additional layer of validation. 

In a single application connecting to a single database, data validation is typically enforced in the application.

When you’re architecture grows to two or more application sharing a database, two things can happen:

1. Each application has its own data validation:

2. There is a central service the applications call to validate the data and persist the data:

The risk of data validation in multiple places is the validations might be out of sync. A valid format for one application might not be valid in another application. In the worse case, a bad format will throw an error or, in extreme cases, crash the application.

The best case is to centralize the data validation so the format stored in the database is consistent for the entire organization. There are exceptions, of course, and I’m assuming multiple applications read and write to a shared database base.