Category: Architecture

Centralize Your Data Integrity

Systems (i.e., databases) managing their data integrity sounds like common sense, and in simple scenarios, it is common sense. However, when the business rules get complex, it’s harder to validate the data in a central location.

When a system (i.e., a database) can no longer enforce the shape of the data, something else must pick up the slack. When might this happen? 

The phone number format in the US is (area code) (prefix) – (number), here’s an example: (734) 555-3212. We’ll talk about the database in this article for simplicity’s sake, but the datastore doesn’t have to be a database. 

Phone numbers in the US always have ten digits (we are ignoring the international digit). Phone numbers can come in a variety of formats:  

  • xxx.xxx.xxxx 
  • xxx-xxx-xxxx
  • (xxx) xxx-xxxx
  • (xxx) xxx.xxxx

Most databases are limited to data-types (i.e., numbers, strings, dates, etc.) and don’t support formating. Many applications opt to use the string data-type to store the phone number. However, the string data-type accepts ANY string. To ensure the phone number is valid, we need an additional layer of validation. 

In a single application connecting to a single database, data validation is typically enforced in the application.

When you’re architecture grows to two or more application sharing a database, two things can happen:

1. Each application has its own data validation:

2. There is a central service the applications call to validate the data and persist the data:

The risk of data validation in multiple places is the validations might be out of sync. A valid format for one application might not be valid in another application. In the worse case, a bad format will throw an error or, in extreme cases, crash the application.

The best case is to centralize the data validation so the format stored in the database is consistent for the entire organization. There are exceptions, of course, and I’m assuming multiple applications read and write to a shared database base.

The Benefits of Using a Build Framework

Continuous Integration (CI) and/or Continuous Delivery (CD) is the norm on software projects these days. There are many build servers such as Azure DevOps, TeamCity, Jenkins, and Cruise Control.Net. Most of these servers use proprietary languages to define build steps. But is codifying your build steps in a proprietary language a good thing?

Some applications are simple, with a few build steps, others are more complex with many build steps. When you define build steps in a proprietary language, the more complex the build steps (in sophistication or in number) the more coupled to a build platform you become. This becomes an issue when you want to switch build platforms. For example, you’re using JetBrain’s TeamCity in your on-premise datacenter, but the company decides to move to the cloud. Now you must re-write your build scripts because TeamCity isn’t supported in the new cloud platform.

Instead of writing your build scripts in a proprietary language, consider using a build framework.

Build frameworks have two benefits:

  1. Allowing transportability between build platforms.
  2. Allowing you to version your build scripts alongside your application code.

Transportability between platforms gives you the flexibility of moving between build platforms with minimal effort. There will always be some configuration on a new build platform, but build frameworks keep the effort low.

In my opinion, the biggest benefit to build frameworks is the ability to check-in and version your build scripts alongside your application code. Having the option to pull code from any point in your source control’s history and having that code build is well worth any downsides of a build framework.

There are two popular frameworks in the .Net space: Cake and Nuke Build. Both frameworks have been around for a while. I’ve used Nuke Build and enjoy it. I’ve heard great things about Cake and encourage you to look at it before deciding which is the best framework for your project.

So the next time you’re creating a new build definition for your application, consider using a build framework and checking it in source control with your application.

Grady Booch on Architecture

A Series of Tweets from Grady Booch on software architecture:

https://twitter.com/Grady_Booch/status/1301810358819069952

A thread regarding the architecture of software-intensive systems.

There is more to the world of software-intensive systems than web-centric platforms at scale.

A good architecture is characterized by crisp abstractions, a good separation of concerns, a clear distribution of responsibilities, and simplicity. All else is details.

You cannot reduce the complexity of a software-intensive systems; the best you can do is manage it.

In the fullness of time, all vibrant architectures must evolve.

Old software never dies; you must kill it.

Some architectures are intentional, some are accidental, most are emergent.

Meaningful architecture is a living, vibrant process of deliberation, design, and decision.

The relentless accretion of code over days, months, years and even decades quickly turns every successful new project into a legacy one.

Show me the organization of your team and I will show you the architecture of your system.

All well-structured software-intensive systems are full of patterns.

A software architect who does not code is like a cook who does not eat.

Focusing on patterns and cross-cutting concerns can yield an architecture that is smaller, simpler, and more understandable.

Design decisions encourage what a particular stakeholder can do as well as what constrains what a stakeholder cannot.

In the beginning, the architecture of a software-intensive system is a statement of vision. In the end, the architecture of every such system is a reflection of the billions upon billions of small and large, intentional and accidental design decisions made along the way.

All architecture is design, but not all design is architecture.

Architecture represents the set of significant design decisions that shape the form and the function of a system, where significant is measured by cost of change.

9 Guidelines for Creating Expressive Names

Naming is subjective and situational, it’s an art, and with most art, we discover patterns.  I’ve learned a lot through the reading of other’s code.  In this article, I’ve compiled 9 guidelines I wished others had followed when I read their code. 

When a software engineer opens a class, she should know, based on the names, the class’s responsibilities. Yes, I know naming is only one spoke of the wheel, physical and logical structures also play a significant role in understanding the code as does complexity. In this article, I’m only focusing on naming because I feel it’s the most significant impact on understanding the code.

Don’t Include Type Unless it Clarifies the Intent

A type can be anything from a programming type (string, int, decimal) to a grouping of responsibilities (Util, Helper, Validator, Event, etc.). Often it’s a classification which doesn’t express intent.

Let’s look at an example: The name StringHelper doesn’t express much. A string is a system type, and Helper is vague, StringHelper speaks more to the “how” than the intent. If instead, we change the name to DisplayNameFormatter we are given a clearer picture of intent. DisplayName is very specific, and Formatter expresses outcome. Formatter may or may not be a type, but it doesn’t matter, because it expresses intent. 

There are always exceptions; for example, in ASP.Net MVC, controllers must end in “Controller” or the application doesn’t function. Using paradigms such as Domain Driven Design (DDD), names like “Services,” “Repository,” “ValueType” and “Model” have meaning in DDD and express responsibility.

 For example, UserRespository implies that user data is retrieved and save to a data store.

Avoid Metaphors

Metaphors are cultural, and engineers from other cultures might not understand the intent.

Common metaphors in the United States:

  • Beating a dead horse
  • Chicken or the egg
  • Elephant in the room

Common metaphors in New Zealand:

  • Spit the dummy
  • Knackered
  • Hard yakka

Use Verbs

Steve Yegge wrote a (very long) blog post about using verbs over nouns.

His point is to use verbs, applications are composed of nouns, but nouns don’t do things. Systems are useless with only nouns, instead express action in names of methods.

For example, UserAuthentication(noun).AuthenticateUser(action/verb) expresses the action of verifying a user’s credentials.

Be Descriptive

Be descriptive, the more detail, the better — express the responsibility in the name. 

Ask yourself, what is the one thing this class or function does well?

If you have difficulty finding a name, the class or function might have more than one responsibility and thus violating the Single Responsibility Principle.

Don’t Lean on Comments for Intent

Comments are a great way to provide additional context to the code but don’t lean on comments. The names of classes and methods should stand on their own.

In Refactoring: Improving the Design of Existing Code by Martin Fowler, Kent Beck, John Brant, William Opdyke, and Don Roberts:

… comments are often used as a deodorant. It’s surprising how often you look at thickly commented code and notice that the comments are there because the code is bad.

Another wonderful quote from the “In Refactoring” authors:

When you feel the need to write a comment, first try to refactor the code so that any comment becomes superfluous. Page 88

Many times when the code is refactored and encapsulated into a method, you’ll find other locations where it’s possible to leverage the new method, places you never anticipated in using the new method. 

Sometimes when calling a method the consumer needs to know something particular about the method, if that particularness is a part of the name, then the consumer doesn’t need to review the source code.

Here’s an example of incorporating a comment into a name.

With comments:

// without tracking
var user = GetUserByUserId(userId);

Refactored to include the comment in the method name:

var userWithOutTracking = GetUserByUserIdWithoutTracking(userId);

Other engineers now know this method doesn’t have tracking before they’d need to either read the source code or find the comment.

Comments should be your last line of defense when possible lean other ways to express intent including using physical and logic structure and names to convey intent.

Refrain From Using Names with Ambiguous Meaning

Avoid names with ambiguous meanings. The meaning of ambiguous names changes from project to project which makes understanding intent harder for a new engineer. 

Here’s a list of common ambiguous names:

  • Helper
  • Input
  • Item
  • Logic
  • Manager
  • Minder
  • Moniker
  • Nanny
  • Overseer
  • Processor
  • Shepherd
  • Supervisor
  • Thingy
  • Utility
  • Widget

Use the Same Language as the Business Domain

Use the same terminology in the code as in the business domain. This allows engineers and subject matter experts (SME) to easily communicate ideas because they share the same vocabulary. When there isn’t a shared vocabulary translation happen which invariably leads to misunderstandings.

In one project I worked on, the business started with “Pupil” and then switched to “Student.” The software engineers never updated the software to reflect the change in terminology. When new engineers joined the project most believed Pupil and Student were different concepts.

Use Industry Speak

When possible, use terminology that has meaning across the software industry.

Most software engineers, when they see something name “factory,” they immediately think of the factory pattern.

Using existing application paradigms such as “Clean Architecture” and “Domain Driven Design” facilitates idea sharing and creates a common language for engineers to communicate ideas among themselves.

The worst possible naming is co-opting industry-wide terminology and giving it a different meaning.

When Naming Booleans…

Boolean names should always be an answer to a question with its value of either true or false.

For example, isUserAutheticated, the answer is either yes (true) or no (false)

Use words like:

  • Has
  • Does
  • Is
  • Can

Avoid negated names, for example:

A negated variable name:

var IsNotDeleted = true; // this is confusing

if(!IsNotDeleted) { // it gets even more confusing when the value is negated
  //Do magic
}

Without negated variable name:

var IsDeleted = true; // this is confusing

if(!IsDeleted) { 
  //Do magic
}

In Closing

Choosing expressive names is crucial in communicating intent, design, and domain knowledge to the next engineer. Often we work on code to fix defects or incorporate new features, and we’re continually compiling code in our head trying to understand how it works. Naming gives us hints as to what the previous engineer was thinking, without this communication between the past and the future engineers we handicap ourselves in growing of the application. Potentially dooming the project to failure.

Garbage Collection Types in .Net Core

Memory management in modern languages is often an afterthought. For all intents and purposes, we write software without nary a thought about memory. This serves us well but there are always exceptions…

In California, there are extensive financial reporting requirements for Local Education Agencies (LEA), an LEA can be a county, a district, a charter or a single school. Most LEAs create their own financial reports which are usually centered around Excel, it’s no surprise when each report is different. To solve this problem the California Board of Education commissioned software to generate financial reports. 

I was a part of the development team. 

My first stop was the testing logs, Ed-Pro’s logs pointed to high memory usage, perhaps there was a memory leak? An engineer observed that Ed-Pro’s calculations used a large amount of short-lived memory. If the memory wasn’t cleaned up quickly, it could appear like a memory leak.

Ed-Pro is built on top of .Net Core, Microsoft’s multi-platform framework. In .Net Core, memory is divided into three categories: Short-lived (Gen0), medium lived (Gen1), and long-lived (Gen2). Gen0 is for short-lived data that quickly goes out of scope, Gen1 is for medium lived memory that hangs around for a bit longer, it too also eventually goes out of scope and Gen2 is long-lived memory that may live for the life of the application. Gen0 memory is constantly reclaimed, Gen1 is reclaimed less frequently than Gen0, and Gen2 is reclaimed even less frequently than Gen1.

The only sure way to understand the memory usage of Ed-Pro was to profile it, below is a screenshot using dotMemory by JetBrains.

As suspected, we found large amounts of Gen0 memory (the blue), so much so, it appeared that Garbage Collection couldn’t keep up. A strategy to compensate for a large amount of memory, caused Garbage Collection to oscillated between increasing memory space (adding more memory for the application’s use) and cleaning it up. During the cleanup cycles, the application is unresponsive.

At first, we were stumped, isn’t the purpose of the GC to keep memory tidy? Two articles were instrumental in our understanding of how Garbage Collection works in .Net: Mark Vincze’s article Troubleshooting high memory usage with ASP.Net Core on Kubernetes and Fundamentals of Garbage Collection by Microsoft. Both are great reads and brought clarity to the memory usage in Ed-Pro. 

Here’s a summary of what we learned, there are two types of Garbage Collection in .Net: Server Garbage Collection and Workstation Garbage Collection.

Server Garbage Collection makes a couple of assumptions: First, there is ample memory available and second, the processors are multi-core and are fast. Both can be true, but we live in a world of virtual machines and Docker where it’s more likely that both assumptions are false.

Server Garbage Collection allows memory to build, at some point, it does one of two things: it either increases the memory space allowing memory to grow or it frees up orphaned memory. When it chooses to free memory, the Garbage Collection starts the process on a high priority thread. The high priority thread is a higher priority than the application; if the machine is fast, the clean up shouldn’t be noticed. However, if it’s not, it’ll cause the application to halt until the clean up is completed.

Workstation Garbage Collection operations differently. It continuously runs reclaiming memory on a thread with the same priority as the application. This means it’s also competing for resources with the application which can cause application slowness. The upside is the application’s memory usage can stay quite low, primarily when it uses large amounts of Gen0.

As a default, if .Net Core detects a server, it runs the Server Garbage Collection type, which was the case with our application. To run the Workstation Garbage Collection type add the following snippet to your project file:

  <PropertyGroup> 
    <ServerGarbageCollection>false</ServerGarbageCollection>
  </PropertyGroup>

We made this configuration change to Ed-Pro, using dotMemory, we profiled Ed-Pro’s memory with Workstation Garbage Collection enabled and loaded the same screens as in the previous test. Here are the results:

The memory usage is significantly decreased. The Gen0 allocations are virtually non-existent. Beyond the differences in the graph, the Server Garbage Collection memory usage topped 1 gig while the Workstation Garbage Collection topped at roughly 200 megs.

Every application is different. Our application used a ton of temporary data and thus uses a ton of Gen0 memory. Your application may leverage longer lived memory such as Gen1 or Gen2 in which Server Garbage Collection makes a whole lot of sense. My advice is to profile your memory under different conditions for an idea of how memory is used and then decide which mode is best for you application.

Business Layer Value

I had a great discussion with my supervisor about application architecture.

The question at hand was, what’s the value of the Business Layer? Most applications I’ve worked on are CRUD applications. Is there any value in a thin veneer over the data layer?

In my experience, most business layers consist of pass through methods.

If there isn’t any value, call the data layer directly. Handle the business logic on a case-by-case basis. In most cases this will entail creating a service class to encapsulate the business logic.

In the end, having a business layer that provides nothing but pass through methods is pre-optimization. It’s the “it will save me down the road” mentality. 95% of the time, it’s a waste, it creates multiple of points of change and it increases maintainability.