Chuck Conway is a software craftsman with nearly 30 years of experience. His expertise extends to Python, Generative AI, .Net, React JS, and CI/CD practices, emphasizing craftsmanship, architecture, and processes. Passionate about continuous improvement and automation, Chuck's career embodies a commitment to beautiful software and efficiency.
Chuck lives in Folsom, CA, with his wife Erin, daughter, and a cat.
"Software is getting slower more rapidly than hardware becomes faster. -Niklaus Wirth: A Plea for Lean Software"
Niklaus Wirth – Algorithms + Data Structures = Programs
There are two sides to the performance coin:
Efficiency through Algorithms – How much work is required by a task.
Improving efficiency involves doing less work.
An efficient program is one that does the minimum (that we’re aware of) amount of work to accomplish a given task.
Performance through Data Structures – How quickly a program does it work.
Improving performance involves doing work faster
But there is no such thing as a "performant", not even work in the English language.
There is essentially no point at which a program cannot do work any faster… until you hit Bremermann’s limit…
What does it mean to improve the performance of software?
The software is going to run on a specific, real machine
There is some theoretical limit on how quickly it can do work
Try to light up all the transistors, get the most work done as fast as possible.
All Comes back to Watts
Every circuit not used on a processor is wasting power
Don’t reduce this to the absurd — it clearly doesn’t make sense to use more parts of the CPU without improving performance!
Algorithms
Complexity theory and analysis
Common across higher-level languages, etc.
Very well understood by most (I hope)
Improving algorithmic efficiency requires finding a different way of solving the problem.
Algorithmic complexity is a common mathematical concept that spans languages and processors. Don’t get lazy, you have to pay attention to your algorithms.
Example: – "it’s about doing less work", analyze the current approach and find ways to do it more efficiency
Initially, you might have a basic O(n^2) algorithm
Next, we have a Knuth-Morris-Pratt (a table to skip)
Finally, we have a Boyer-Moore (use the end of the needle)
Do Less Work by Not Wasting Effort, Example 1
std::vector<X> f(int n) {
std::vector<X> result;
for(int i = 0; i < n; ++i)
result.push_back(X(...));
return result;
}
Initialize the collection size
std::vector<X> f(int n) {
std::vector<X> result;
result.reserve(n);
for(int i = 0; i < n; ++i)
result.push_back(X(...));
return result;
}
Always be measuring – don’t do it because you think it’s the right thing to do. It might not be.
Managed memory is cheap. That’s not true, allocation is cheap and garbage collection is expensive. Garbage collection stops the world while things are cleaned up and optimized. Sometimes memory cleanup can impact the performance of the application.
Performance Low Hanging Fruit
Object reuse
Object pooling. Pass in existing array rather than allocating a new one. Preallocate array length.
String concatenation.
Use StringBuilder, Preallocate length if possible.
In Code: MyParamsMethod("Hello", "World"); MyParamsMethod();
Complier: MyParamsMethod(new {"Hello", "World"}); MyParamsMethod(new [0]); // this creates a new array object in memory
Instead, do this in code: MyParamsMethod(new {"Hello", "World"}); MyParamsMethod(Array.Empty); // reuses an existing empty string array in memory. It's the same idea behind string.Empty
Suggestion: Introduce overloads with common number of arguments.
Boxing
Boxing creates a new object on the heap. Now you have two memory locations with the same value. Changing either value does not impact the other value.
Suggestion: Introduce generic overloads
Closures
The complier converts closures to classes. Captured values are passed in as constructor parameters. This class is then allocated to the heap.
Suggestion: Avoid critical paths. Pass state as argument to lambda. Investigate local functions. Since local functions lifetime is known, it can be allocated on the stack where allocation and cleanup are cheap.
LINQ
Lambda expressions are treated the same as Closures. They are allocated as a class to the heap. Because much of LINQ is based on static methods, additional allocations of Iterators and IEnumerable happen.
Suggestion: Avoid critical paths. Use good old foreach and if statements
Iterators
iterators are rewritten as a state machine, which means more allocations to the heap.
Suggestion: Return a collection. Be aware of the cost.
async/await
Async/await also, generate a state machine. Task and Task also trigger more allocations which then can’t be reused.