Chuck Conway

Removing Large Files From You Git Repository

I've resisted moving my projects onto GitHub. When GitHub first opened it's doors, it surprised me. Why would anyone build an UI on top of version control? It just seems like such a simple idea, which had already been done many times over. So what made GitHub different?

GitHub Logo

As it turns out, GitHub is different. They have a wonderful product.

I expected the switch to be uneventful, but things don't always go as we expect. My previous git provider didn’t have filesize restrictions. During the push into GitHub, I received a warning at 50 megs. At 100 megs, it turned into a roadblock.

Luckily, GitHub has detailed instruction on how to remove the large files.

First, if it’s a pending check-in, you can simply remove the file from the cache.

git rm --cached giant_file
# Stage our giant file for removal, but leave it on disk

Commit the change.

git commit --amend -CHEAD
# Amend the previous commit with your change
# Simply making a new commit won't work, as you need
# to remove the file from the unpushed history as well

Push your changes to GitHub.

git push
# Push our rewritten, smaller commit

If it’s not in a pending check-in, but is a part of your repo’s history things get interesting. There is a utility, BFG Repo-Cleaner, that makes this process a breeze.

The command (from GitHub documentation).

bfg --strip-blobs-bigger-than 50M
# Git history will be cleaned - files in your latest commit will *not* be touched

The GitHub documentation must assume you have BFG install, because the command didn’t work for me.

I downloaded the jar file from and ran it. Don’t forget to be in the root of your git repository.

java -jar bfg.jar --strip-blobs-bigger-than 50M

Here is my output

Scanning packfile for large blobs: 35170
Scanning packfile for large blobs completed in 251 ms.
Found 10 blob ids for large blobs - biggest=125276291 smallest=53640151
Total size (unpacked)=958626718
Found 1691 objects to protect
Found 1 tag-pointing refs : refs/tags/v0.1
Found 7 commit-pointing refs : HEAD, refs/heads/dev, refs/heads/master, ...

Protected commits

These are your protected commits, and so their contents will NOT be altered:

 * commit a99dbf81 (protected by 'HEAD')


Found 1093 commits
Cleaning commits:   100% (1093/1093)
Cleaning commits completed in 8,427 ms.

Updating 6 Refs

Ref  Before After
refs/heads/dev | 02eeab40 | 8ad272d3
refs/heads/master  | a99dbf81 | 8008478b
refs/heads/prod| 15f1558b | dc52efeb
refs/heads/qa  | 15f1558b | dc52efeb
refs/remotes/origin/master | 0c71d31f | d992278d
refs/tags/v0.1 | fc78e278 | ba078ff6

Updating references:100% (6/6)
...Ref update completed in 45 ms.

Commit Tree-Dirt History

Earliest  Latest
|  |

D = dirty commits (file tree fixed)
m = modified commits (commit message or parents changed)
. = clean commits (no changes to file tree)

Before After
First modified commit | 71ab4035 | 5963444b
Last dirty commit | 48c18598 | d7000b5a

Deleted files

Filename  Git id
---------------------------------------------------------------- | 2ace978f (117.2 MB) | 3fb67bc6 (117.8 MB) | edc34fe0 (118.3 MB) | cf8b9f19 (118.5 MB) | a41ce08a (119.5 MB)
Grover_be2.mdb  | 129a7cc8 (61.4 MB)   | d730b329 (62.6 MB)
Unify | 5fca437c (53.2 MB), 728b06a4 (51.2 MB)  | ebe5f6cf (94.6 MB)

In total, 3191 object ids were changed. Full details are logged here:


BFG run is complete! When ready, run: git reflog expire --expire=now --all && gi
t gc --prune=now --aggressive

Has the BFG saved you time?  Support the BFG on BountySource:


*image reference

This is a response of sorts to this thoughtful post:

Here is a short summary of the post:

The author spent many years as a programmer. It was difficult to balance life and work. He was always running to the next engagement. Stuck in the learning rat race of software engineering. His job consumed him. He didn't have time for anything except the job. To find balance, he pivoted his career into a less demanding field and achieved balance between his job and his life.

I understand his pain. Much of my energy is devoted to learning. New technologies are getting more diverse and breeding innovation, which means even more learning.

One can easily get consumed by programming. In many ways it's crack for the brain.

Your are in a perpetual state of sharpening your sword. The programmer who stops is relegated to obsolescent in a short few years. In extreme situations, they will find themselves unemployable.

So why do I program? Because I love to do it, I get paid to do what I love. Like the author in the post, I’ve had to learn balance between my career and my personal time.

Some will disagree, but for me programming is an art. There is no limit to how skilled I can become. Applications are my canvas, programming is the medium I use to express myself. It's how I create.


The Mind State of a Software Engineer

Have patience.

I'll wait

Coding is discovery. Coding is failing. Be ok with this.


*image reference

Don't blame the framework. It’s more probable it’s your code. Accept this fallibility. Lady Bug

*image reference

Know when to walk away. You mind is a wonderful tool, even at rest it’s working on unsolved problems. Rest, and let your mind do it’s work.


*image reference

Be comfortable not knowing. Software engineering is a vast ocean of knowledge. Someone will always know more than you. The sooner you are OK with this the sooner you will recognize the opportunity to learn something new.

Ocean Sailing

*image reference

Anger and frustration don't fix code. Take a break, nothing can be accomplished in this state.


*image reference


Simplifying Null Checks

I have checks for nulls littered throughout my codebase. We see this in all C# codebases it's just comes with the territory. So much so, we don't see it as an issue. We are numb to the pain.

An example of a null check.

if (source != null)
    return source.Select(s => base.Convert(s));

return null;

It's trivial code, no doubt about that, but my issue is it doesn't provide much in the way of intent. Can we make it better? I think so. Lets make it more compact with a ternary operator. Here is the result.

return (source != null ? source.Select(s => base.Convert(s)) : null);

This is more succinct, but it lacks readability. We've squeezed the if-statement into one line. It feels harder to read than the previous form.

How can we make this more succinct and maintain readability? What if we used an extension method for the null check, instead of comparing the object to null?

if (source.IsNotNull())
    return source.Select(s => base.Convert(s));

return null;

Ok, I like this. There is meaning to the if-statement evaluation. If it's not null then we enter the if-statement.

Here is the code for the IsNotNull() extension method.

public static bool IsNotNull(this object val)
    return val != null;

This is good, but I am bothered by the if-statement. I wonder if we can get rid of the if-statement all together. Maybe if we use a little of C#'s functional magic we can eliminate the if-block.

return source.IsNotNull(() => source.Select(s => base.Convert(s)));

Ahh, that's better.

When the object is not null is executes the passed in lambda expression. If you aren't familiar with lambda expressions this might have you scratching your head. The code is below. Take a look at it, hopefully it will clean things up.

    public static T IsNotNull<T>(this object val, Func<T> result) where T : class 
        if (val != null)
            return result();

        return null;

It's a constant challenge keeping code readable. Most of us write code like this all day without a thought to making it better. We've been walking on glass for so long that we don't feel the pain. Each little nugget helps.

On a side note, the next version of C# will have null-conditional operators making the extension method I created irrelevant. Here is an example.

return source?.Select(s => base.Convert(s));

Index Fragmentation in SQL Azure, Who Knew!

I’ve been on my project for over a year and it has significantly grown as an application and in data during the year. It’s been nonstop new features. I’ve rarely gone back and refactored code. Last week I noticed some of the data heavy pages were loading slowly. At the worst case one view could take up to 30 seconds to load. 10 times over my maximum load time...

Call me naive, but I didn’t consider index fragmentation in SQL Azure. It’s the cloud! It’s suppose to be immune to premise issues… Apparently index fragmentation is also an issue in the cloud.

I found a couple of queries on an MSDN blog, that identify the fragmented indexes and then rebuilds them.

After running the first query to show index fragmentation I found some indexes with over 50 percent fragmentation. According to the article anything over 10% needs attention.

First Query Display Index Fragmentation

--Get the fragmentation percentage

,OBJECT_NAME(ps.object_id) AS TableName
, AS IndexName
FROM sys.dm_db_partition_stats ps
INNER JOIN sys.indexes i
ON ps.object_id = i.object_id
AND ps.index_id = i.index_id
CROSS APPLY sys.dm_db_index_physical_stats(DB_ID(), ps.object_id, ps.index_id, null, 'LIMITED') ips
ORDER BY ps.object_id, ps.index_id

Second Query Rebuilds the Indexes

--Rebuild the indexes
DECLARE @TableName varchar(255)

 SELECT '[' + IST.TABLE_SCHEMA + '].[' + IST.TABLE_NAME + ']' AS [TableName]

 OPEN TableCursor
 FETCH NEXT FROM TableCursor INTO @TableName

 PRINT('Rebuilding Indexes on ' + @TableName)
Begin Try
End Try
Begin Catch
 PRINT('Cannot do rebuild with Online=On option, taking table ' + @TableName+' down for douing rebuild')
 End Catch
FETCH NEXT FROM TableCursor INTO @TableName

CLOSE TableCursor



Moving from Wordpress to Hexo

I love Wordpress - it just works. It’s community is huge and it’s drop dead simple to get running.

I started blogging in 2002 when the blogging landscape was barren. Blogging platforms were few and far between. Heck “blogging” wasn’t even a term.

My first blogging engine was b2 the precursor to Wordpress. In 2003, Wordpress forked b2 and started on the journey to the Wordpress we now all love. At the time I felt conflicted. Why create a second blogging platform? Why not lend support to b2? Wasn’t b2 good enough? Ultimately it was a good decision. Not to long after forking the code, the development on b2 stalled.

Wordpress has enjoyed a huge amount of popularity. It’s, by far, the most popular CMS (content management system).

So, it’s with sadness that after writing over 500 posts on b2 and Wordpress, I am moving from Wordpress. I simply don’t need the functionality and versatility of Wordpress. I am moving to Hexo, a node based blog/site generator.

Assets and posts are stored on the file system. The posts are written in Markdown. Hexo takes the Markdown and generates HTML pages linking the pages as it moves through the content. Depending on which theme you choose and how you customize the it you can generate just about anything.

I hope you enjoy the change. The site is much faster. The comments are now powered by Disqus. These changes will allow me to deliver a better and a faster experience for you.


A General Ledger: A Simple C# Implementation

If you don’t have a basic understanding of general ledgers and double-entry bookkeeping read my post explaining the basics of these concepts.

Over the years I've worked on a systems with financial transactions. To have integrity with financial transactions using a general ledger is a must. If not, you can’t account for revenue and accounts payable. Believe me you, when your client wants detailed reports on their cash flow you better be able to generate it. Not to mention any legal issues you might encounter.

Early in my career, I had a discussion with a C-Level executive, I explained the importance of a general ledger. I was getting push back because it pushed out the timeline a bit to implement the general ledger. Eventually we won out and implemented a ledger and thankfully so. Just as we predicted the requests for reports started rolling in.

A basic schema for a general ledger.

CREATE TABLE [Accounting].[GeneralLedger] (
    [Id]             INT             IDENTITY (1, 1) NOT NULL,
    [Account_Id]     INT             NOT NULL,
    [Debit]          DECIMAL (19, 4) NULL,
    [Credit]         DECIMAL (19, 4) NULL,
    [Transaction_Id] INT             NOT NULL,
    [EntryDateTime]  DATETIME        NOT NULL,

The C# class.

  public class GeneralLedger
       public int Id { get; set; }

       public  Account Account { get; set; }

       public decimal Debit { get; set; }

       public decimal Credit { get; set; }

       public Transaction Transaction { get; set; }

       public DateTime EntryDateTime { get; set; }

In my system I track all the transactions in and out of the system. For example, if a customer pays an invoice. I track the total payment in the general ledger. The credit account is called “Revenue” and the debit account is my company. Remember for each financial transaction two records are entered into the general ledger: a credit and a debit.

In my system I wanted higher fidelity so I added Transaction to the ledger. The transaction tracks the details of the entry. Only the transaction total is recorded in the general ledger. The transaction details(taxes, per item costs, etc) tells the story of how we arrived at the total.

Lets look at some data. Find an account with some credits and debits. Sum all the debit rows and sum all the credit rows. Subtract the debit from the credits. If the number is positive, the account finished in the black (has a profit), if it’s negative, then the account finished in the red (has a loss).

Your CEO wants to know how much money a client spent with your company. No problem. Again just sum the debits and credits and subtract them from each other for the clients account.

I hope this has helped you understand the power of the ledger and why it’s important when dealing with financial transactions.


A General Ledger : Understanding the Ledger

cropped_ledgerWhat is a general ledger and why is it important? To find out read on!

What is a general ledger? A general ledger is a log of all the transactions relating to assets, liabilities, owners’ equity, revenue and expenses. It’s how a company can tell if it’s profitable or it’s taking a loss. In the US, this is the most common way to track the financials.

To understand how a general ledger works, you must understand double entry bookkeeping. So, what is double entry bookkeeping? I’m glad you asked. Imagine you have a company and your first customer paid you $1000. To record this, you add this transaction to the general ledger. Two entries made: a debit, increasing the value of your assets in your cash account and a credit, decreasing the value of the revenue (money given to you by your customer payment). Think of the cash account as an internal account, meaning an account that you track the debits (increasing in value) and credits (decreasing in value). The revenue account is an external account. Meaning you only track the credit entries. External accounts don’t impact your business. They merely tell you where the money is coming from and where it’s going.

Here is a visual of our first customers payment.


If the sum of the debit column and the sum of the credit column don’t equal each other, then there is an error in the general ledger. When both sides equal each other the books are said to be balanced. You want balanced books.

Let’s look at a slightly more complex example.

You receive two bills: water and electric, both for $50. You pay them using part of the cash in your cash account. The current balance is $1000. What entries are needed? Take your time. I’ll wait.


Four entries are added to the general ledger: two credit entries for cash and one entry for each the water and electric accounts. Notice the cash entries are for credits.

For bonus, how would we calculate the remaining balance of the cash account? Take your time. Again, I’ll wait for you.

To get the remaining balance we need to identify each cash entry.


To get the balance of the Cash account we do the same thing we did to balance the books, but this time we only look at the cash account. We take the sum of the debit column for the cash account and the sum of the Credit column for the cash account and subtract them from each other. The remaining value is the balance of the cash account.


And that folks, is the basics of a general ledger and double entry bookkeeping. I hope you see the importance of this approach. As it give you the ability to quicking see if there are errors in your books. You have high fidelity in tracking payments and revenues.

This is just the tip of the iceberg in accounting. If you’d like to dive deeper into accounting, have a look at the accounting equation: Assets = Liabilities + Owner’s Equity.

Hopefully this post has given you a basic understanding of what a general ledger is and how double-entry bookkeeping works. In the next post I’ll go into how to implement a general ledger in C#.


A Tour of Bidwell Mansion

This week, I was out of town and didn't find the time to write a technical post. I did tour the Bidwell Mansion in Chico, California and found it fascinating and I wanted to share a few photos of my trip. The technical content will return next week.

John Bidwell was an influential political figure in the 1800’s. He helped California present it’s case for statehood. He brought agriculture into California. John ran for governor of California 3 times and failed all three times. In many ways John was ahead of his times.

The 1800’s and into the 1900’s American Indians were hunted and persecuted. In a time where this was a common occurrence, John Bidwell took exception to it. John invited a local Mechoopda Maidu Indian tribe to move their village onto his land. In essence, he provided protection.

The Bidwell mansion is a 3 story Italianate styled mansion. There are 26 rooms, including a ballroom on the third floor.

The Mansion


John Bidwell's Desk


Setting Room

After dinner people would migrate into this room to talk it up. Notice the fireplace. Most rooms in the mansion have a fire place, wood heat was the only source of heat.


High-chair with Wheels

Wheels fold down and turns into a stroller.





Your own library. I'd love to have one of those.



You don't see such grand staircases anymore.


This is a beautiful mansion, if you are ever in the Chico Area, I recommend stopping and seeing the mansion.


9 JavaScript Libraries that Every Serious Application Needs

javascript_logoOver the past year I have been developing a line of business application with AngularJs. AngularJs has many out of the box features that just work. That’s the beauty of AngularJs. This is also it’s downside. You can’t be good at everything -- some of the API are lacking performance and features.

Through trial and error I've discovered what JavaScript libraries fill the gaps. I've compiled a list of these libraries.


jQuery This is the swiss army knife of the JavaScript world. It is immensely helpful when I want to get close to the DOM and manipulate

Dropzone I have yet to find a better library for queuing and uploading documents.

Lodash Filtering, sorting, mapping data with this library is a must have. If you still using for and forEach statements you are doing it wrong.

Toastr Displaying message to the client is always a challenge. This library makes it simple. The messages are configurable and appear professional.

Radio AngularJs has known issues with pub/sub. Whether you choose $emit or $broadcast each has it’s failings. Radio is an alternative pub/sub library that just works and avoids the issues found in the Angular options

Pikaday Angular UI Bootstrap has a fully featured datepicker. However it was discovered to have a design flaw that in certain conditions slows the page to a 3 or a 4 second load time. Pikaday is a lightweight zippy datepicker alternative that just works.

Accounting.js Formatting money is a pain. It might seem straightforward, but it’s not. Accounting.js takes the pain out of it. It’s another library that just works.

filesize.js Filesize takes the number of bytes and converts it to mb, K, g notation. It’s that simple and it works.

moment.js Anyone who is dealing with time, needs this library. Dates and time are a headache in JavaScript. The browsers try to do too much for you. When it works, it’s great when it doesn’t, it’s like fighting with a Chinese finger puzzle.