I had a great talk with my friend Dave today. He’s a Data Scientist. He knows his stuff, for sure.
We talked about a number of things, but one that really stuck out was data migration. He says never to migrate via code, use a tool. You are reinventing the wheel. You are locked into your solution. All the risk is in your court. And the solution is not flexible. With that said. He went on to say the most efficient way to move data is with a primary key and a hash.
The destination side will request all the primary key and row hash. Taking the primary key it will check if the row exists. If it does exist it will compare the hash of the source to the hash of the destination row. If they match then the process is repeated for the next row. If they don’t match, then the primary key is added to a list of rows to request from the source. If the primary key does not exist then the primary key is added to the list of rows to be retrieved from the source. When the row comparison is completed all the rows that are stale or do not exist are requested from the source and persisted to the destination.
If you enjoy grunt work you’ll do the above. If you are a developer who enjoys building robust applications you’ll leave the grunt work to the tools.