Tag: C#

C# 8 – Nullable Reference Types

Microsoft is adding a new feature to C# 8 called Nullable Reference Types. Which at first, is confusing because all reference types are nullableā€¦ so how this different? Going forward, if the feature is enabled, references types are non-nullable, unless you explicitly notate them as nullable.

Let me explain.

Nullable Reference Types

When Nullable Reference Types are enabled and the compiler believes a reference type has the potential of being null, it warns you. Youā€™ll see warning messages from Visual Studio:

And build warnings:

To remove this warning, add a question mark to the back for the reference type. For example:

public string StringTest()
{
    string? notNull = null;
    return notNull;
}

Now the reference type behaves as it did before C# 8. 

This feature is enabled by adding  #nullable enable   to the top of any C# file or adding lt;NullableReferenceTypes>true</NullableReferenceTypes> to the .csproj file. Out of the box itā€™s not enabled, which is a good thing if it was enabled any existing code-base would likely light up like a Christmas tree.

The Null Debate

Why is Microsoft adding this feature now? Nulls have been part of the language since, well the beginning? Honestly, I donā€™t know why. Iā€™ve always used nulls, itā€™s a fact of life in C#. I didnā€™t realize not having nulls was an optionā€¦ Maybe life will be better without them. Weā€™ll find out.

Should you or should you not use nulls? Iā€™ve summarized the ongoing debate as I understand them.

For

The argument for nulls is generally that an object has an unknown state. This unknown state is represented with null. You see this with the bit data type in SQL Server, which has 3 values, null (not set), 0 and 1. You also see this in UIā€™s, where sometimes itā€™s important to know if a user touched a field or not. Someone might counter with, ā€œInstead of null, why not create an unknown state type or a ā€˜not setā€™ state?ā€ How is this different than null? Youā€™d still have to check for this additional state. Now youā€™re creating unknown states for each instance. Why not just use null and have a global unknown state? 

Against

The argument against nulls is itā€™s a different data type and must be checked for each time you use a reference type. The net result is code like this:

var user = GetUser(username, password);

if(user != null)
{
    DoSomethingWithUser(user);
} else 
{
    SetUserNotFoundErrorMessage()
}

If the GetUser method returned a user in all cases, including when the user is not found. If the code never returns null, then itā€™s a waste guarding against it and ideally, this simplifies the code. However, at some point, youā€™ll need to check for an empty user and display an error message. Not using a null doesnā€™t remove the need to fill the business case of a user not found.

Is this Feature a Good Idea?

The purpose of this feature is NOT to eliminate the use of nulls, but to instead ask the question: ā€œIs there a better way?ā€ And sometimes the answer is ā€œNoā€.  If we can eliminate the constant checking for nulls with a little forethought, which in turn simplifies our code. Iā€™m in. The good news is C# has made working with nulls trivial.

I do fear some will take a dogmatic stance and insisting on eliminating nulls to the detriment of a system. This is a fool’s errand, because nulls are integral to C#.

Is Nullable Reference Types a good idea? It is, if the end result is simpler and less error prone code.

With or Without Curly Braces?

Thereā€™s a heated debate around single statements and whether they should have curly braces or not.

In C++, C#, Java, and Javascript a single line statement without curly braces is valid, some take advantage of this feature, while others donā€™t.

For Example

if(ifTrue) 
  MowTheLawn();

for(var index; index > 10; index++)
  ChopWood();

foreach(var dollar in money)
  BuyLollipop();

while(untilTheEnd)
  Read();

Arguments Against Single Line Curly-Braces

The argument against curly-braces are itā€™s terser syntax, itā€™s fewer characters to type, and itā€™s valid syntax. Why not take advantage of it?

Arguments For Single Line Curly-Braces

The argument for curly-braces is consistency, fewer bugs and more natural to mentally parse.

In an article written by Jon Abrams titled Single-line ā€˜ifā€™ statements, Jon explains how a defect in Appleā€™s TLS implementation was introduced as a result of a single line if statement without curly-braces. Jon goes on to say while omitting curly braces in single-line statements is terser, preventing defects is more important than terseness.

Jon proposes a compromise, to allow single-line statements if they are truly on a single line:

if(ifTrue) MowTheLawn();

I echo Jonā€™s thoughts, omitting the curly-braces in single lines isnā€™t worth the benefit it offers. It forces the software engineer to consider two variations of valid syntax. It may not seem so bad, but itā€™s taxing to make this determination each time you happen upon an if statement. The next effect is the engineer saves a few keystrokes and passes the burden on to future readers to parse their code.

For the C# Software Engineers, Microsoft has taken a side in their coding conventions, which call for curly braces.

When we use curly-braces in all cases regardless of the number of lines, whatā€™s in scope and whatā€™s out of scope is very clear. This makes the code less error-prone and more consistent, although some might argue this point, I find it easier to read.

Codifying the Secret Sauce

Each application has its secret sauce, itā€™s reason for existing. Codifying the secret sauce is instrumental in writing maintainable and successful applications.

Wait. What is codifying? Patience my friend, weā€™ll get there.

First let’s hypothesize:

Youā€™ve just been promoted to Lead Software Engineer (Congratulations!). Your CEOā€™s first task is creating a new product for the company. Itā€™s a ground-up accounting application. The executives feel having a custom accounting solution will give them an edge on the competition.

A few months have passed, most of the cross-cutting concerns are developed (yay you!). The team is now focused on the yummy goodness of the application: the business domain (the secret sauce). This is where codifying the secret sauce begins.

Codifying, is putting structure around an essential concept in the business domain.

In accounting, the price (P) earning (E) ratio (P/E Ratio) is a measurement of earnings for a company. A high P/E ratio suggests high earnings growth in the future. The P/E ratio is calculated by taking the market value per share (share price) divided by the earnings per share (profit – dividends / # of outstanding shares).

A simple, and I argue, naive implementation:

public class Metric
{
    public string Name { get; set; }
    public decimal Value {get; set}
    public int Order {get; set;}
}
public class AccountingSummary
{
    public Metric[] GetMetrics(decimal price, decimal earnings)
    {
        var priceEarningsRatio = price/earnings;
        
        var priceEarningsRatioMetric = new Metric 
        {
            Name = "P/E Ratio",
            Value = priceEarningsRatio,
            Order = 0
        }
        return new [] {priceEarningsRatioMetric};
    }
}

If this is only used one place, this is fine. What if you use the P/E ratio in other areas?

Like here in PriceEarnings.cs

var priceEarningsRatio = price/earnings;

And here in AccountSummary.cs

var priceEarningsRatio = price/earnings;

And over here in StockSummary.cs

var priceEarningsRatio = price/earnings;

The P/E Ratio is core to this application, but how itā€™s implemented, hardcoded in various places, makes the importance of the P/E Ratio lost in a sea of code. Itā€™s just another tree in the forest.

You also open to the risk of changing the ratio in one place but not the other. This may throw off downstream calculations. These types of errors are notoriously difficult to find.

Often testers will assume if it works in one area, itā€™s correct in all areas. Why wouldnā€™t the application use the same code to generate the P/E Ratio for the entire application? Isnā€™t this the point of Object Oriented Programming?

I can imagine an error like this making it into production and not discovered until a visit from your executive who demands to know why the SECā€™s P/E Ratio calculations are different from what the company filed. Thatā€™s not a great place to be.

Letā€™s revisit our P/E ratio implementation and to see how we can improve upon our first attempt.

In accounting systems formulas are a thing, letā€™s put structure around formulas by adding an interface:

public interface IFormula
{
    decimal Calculate<T>(T model);
}

Each formula now is implemented with this interface giving us consistency and predictability.

Hereā€™s our improved P/E Ratio after implementing our interface:

Weā€™ve added a PriceEarningsModel to pass the needed data into our Calculate method.

public class PriceEarningsModel
{
    public decimal Price {get; set;}
    public decimal Earnings {get; set;}
}

Using our PriceEarningsModel, weā€™ve created an implementation of the IFormula interface for the P/E Ratio.

public class PriceEarningsRatioFormula : IFormula
{
    public decimal Calculate<PriceEarningsModel>(PriceEarningsModel model)
    {
        return model.Price / model.Earnings;
    }
}

Weā€™ve now codified the P/E Ratio. Itā€™s a first-class concept in our application. We can use it anywhere. Itā€™s testable, and a change impacts the entire application.

As a reminder, hereā€™s the implementation we began with:

public class Metric
{
    public string Name { get; set; }
    public decimal Value {get; set}
    public int Order {get; set;}
}
public class AccountingSummary
{
    public Metric[] GetMetrics(decimal price, decimal earnings)
    {
        var priceEarningsRatio = price/earnings;
        
        var priceEarningsRatioMetric = new Metric 
        {
            Name = "P/E Ratio",
            Value = priceEarningsRatio,
            Order = 0
        }
        return new [] {priceEarningsRatioMetric};
    }
}

Itā€™s simple and gets the job done. The problem is the P/E Ratio, which is a core concept in our accounting application doesnā€™t stand out. Engineers not familiar with the application or the business domain wonā€™t understand its importance.

Our improved implementation uses our new P/E Ratio class. We inject the PriceEarningsRatioFormula class into our AccountSummary class.

We replace our hardcoded P/E Ratio with our new `PriceEarningsRatioFormula` class.

public class AccountingSummary
{
    private PriceEarningsRatioFormula _peRatio;
    
    public AccountingSummary(PriceEarningsRatioFormula peRatio)
    {
        _peRatio = peRatio;
    }
    
    public Metric[] GetMetrics(decimal price, decimal earnings)
    {
        var priceEarningsRatio = _peRatio.Calculate(new PriceEarningsModel 
        { 
            Price = price, 
            Earnings = earnings
        });
        
        var priceEarningsRatioMetric = new Metric 
        {
            Name = "P/E Ratio",
            Value = priceEarningsRatio,
            Order = 0
        }
        return new [] {priceEarningsRatioMetric};
    }
}

One could argue there is a bit more lifting with the PriceEarningsRationFormula over the previous implementation and Iā€™d agree. There is a bit more ceremony but the benefits are well worth the small increase in code and in ceremony.

First, we gain the ability to change the P/E ratio for the entire application. We also have a single implementation to debug if defects arise.

Lastly, weā€™ve codified the concept of the PriceEarningsRatioFormula in the application. When a new engineer joins the team, theyā€™ll know formulas are essential to the application and the business domain.

There are other methods of codifying (encapsulate) the domain, such as microservices and assemblies. Each approach has its pros and cons, you and your team will have to decide whatā€™s best for your application.

Ensconcing key domain concepts in classes and interfaces creates reusable components and conceptual boundaries. Making an application easier to reason with, reducing defects and lowering the barriers to onboarding new engineers.

Garbage Collection Types in .Net Core

Memory management in modern languages is often an afterthought. For all intents and purposes, we write software without nary a thought about memory. This serves us well but there are always exceptionsā€¦

In California, there are extensive financial reporting requirements for Local Education Agencies (LEA), an LEA can be a county, a district, a charter or a single school. Most LEAs create their own financial reports which are usually centered around Excel, itā€™s no surprise when each report is different. To solve this problem the California Board of Education commissioned software to generate financial reports. 

I was a part of the development team. 

My first stop was the testing logs, Ed-Proā€™s logs pointed to high memory usage, perhaps there was a memory leak? An engineer observed that Ed-Proā€™s calculations used a large amount of short-lived memory. If the memory wasnā€™t cleaned up quickly, it could appear like a memory leak.

Ed-Pro is built on top of .Net Core, Microsoftā€™s multi-platform framework. In .Net Core, memory is divided into three categories: Short-lived (Gen0), medium lived (Gen1), and long-lived (Gen2). Gen0 is for short-lived data that quickly goes out of scope, Gen1 is for medium lived memory that hangs around for a bit longer, it too also eventually goes out of scope and Gen2 is long-lived memory that may live for the life of the application. Gen0 memory is constantly reclaimed, Gen1 is reclaimed less frequently than Gen0, and Gen2 is reclaimed even less frequently than Gen1.

The only sure way to understand the memory usage of Ed-Pro was to profile it, below is a screenshot using dotMemory by JetBrains.

As suspected, we found large amounts of Gen0 memory (the blue), so much so, it appeared that Garbage Collection couldnā€™t keep up. A strategy to compensate for a large amount of memory, caused Garbage Collection to oscillated between increasing memory space (adding more memory for the applicationā€™s use) and cleaning it up. During the cleanup cycles, the application is unresponsive.

At first, we were stumped, isnā€™t the purpose of the GC to keep memory tidy? Two articles were instrumental in our understanding of how Garbage Collection works in .Net: Mark Vinczeā€™s article Troubleshooting high memory usage with ASP.Net Core on Kubernetes and Fundamentals of Garbage Collection by Microsoft. Both are great reads and brought clarity to the memory usage in Ed-Pro. 

Hereā€™s a summary of what we learned, there are two types of Garbage Collection in .Net: Server Garbage Collection and Workstation Garbage Collection.

Server Garbage Collection makes a couple of assumptions: First, there is ample memory available and second, the processors are multi-core and are fast. Both can be true, but we live in a world of virtual machines and Docker where itā€™s more likely that both assumptions are false.

Server Garbage Collection allows memory to build, at some point, it does one of two things: it either increases the memory space allowing memory to grow or it frees up orphaned memory. When it chooses to free memory, the Garbage Collection starts the process on a high priority thread. The high priority thread is a higher priority than the application; if the machine is fast, the clean up shouldnā€™t be noticed. However, if itā€™s not, itā€™ll cause the application to halt until the clean up is completed.

Workstation Garbage Collection operations differently. It continuously runs reclaiming memory on a thread with the same priority as the application. This means itā€™s also competing for resources with the application which can cause application slowness. The upside is the applicationā€™s memory usage can stay quite low, primarily when it uses large amounts of Gen0.

As a default, if .Net Core detects a server, it runs the Server Garbage Collection type, which was the case with our application. To run the Workstation Garbage Collection type add the following snippet to your project file:

  <PropertyGroup> 
    <ServerGarbageCollection>false</ServerGarbageCollection>
  </PropertyGroup>

We made this configuration change to Ed-Pro, using dotMemory, we profiled Ed-Proā€™s memory with Workstation Garbage Collection enabled and loaded the same screens as in the previous test. Here are the results:

The memory usage is significantly decreased. The Gen0 allocations are virtually non-existent. Beyond the differences in the graph, the Server Garbage Collection memory usage topped 1 gig while the Workstation Garbage Collection topped at roughly 200 megs.

Every application is different. Our application used a ton of temporary data and thus uses a ton of Gen0 memory. Your application may leverage longer lived memory such as Gen1 or Gen2 in which Server Garbage Collection makes a whole lot of sense. My advice is to profile your memory under different conditions for an idea of how memory is used and then decide which mode is best for you application.

The Collection Comparer, Finding the Differences Between Two Collections

Have you had to compare two collections and execute some logic based on whether the item is in the source collection, in the comparing collection or in both? Yeah, me too, I needed to merge data from the UI and the database. I couldn’t find a good solution, so, I wrote a collection comparer.

To illustrate how this works letā€™s look at an example.

In the source data we have the values 1, 3, 4, 6, and in the
comparing collection we have the values 1, 2, 3, 4, 5.

The source data is missing the 2 and the 5 when compared to the comparing collection, and the comparing collection is missing the 6 when compared to the source collection.

Letā€™s walk through this merge:

  1. in both (update)
  2. only in the comparing collection (add to source)
  3. in both (update)
  4. in both (update)
  5. only in the comparing collection (add to source)
  6. only in the source collection (remove from source)

Here what the code looks like:

var source = new []{1, 3, 4, 6};
var collection = new[] {1, 2, 3, 4, 5};

source.CompareTo(collection, (s, d) => s == d)
    .OnlyInSourceCollection(s=> {/* do something */})
    .OnlyInComparingCollection(s=>{/* do something */})
    .InBoth(s=> {/*do something*/})
    .Process();

Why not use LINQ?

You can use LINQ, however, LINQ will iterate the collections at least 3 times which doesnā€™t include operating (adding, updating, and deleting) on the data. Using the CollectionComparer, the data is only iterated twice.

There are faster ways to find the differences such as a binary search, but a binary search only works with integers. The collection comparer supports any type of comparison. The comparison is defined with this code: (s, d) => s == d.

The source code is found on GitHub.