Easy Way to Track Changes to your CSV data
I’m a big fan of the python library csv-diff. I’ve used it in some custom projects that required me to compare raw datasets over time. With a few python commands, you can output JSON that contains all changes between an old and a new dataset:
{
"added": [
{
"id": "3",
"name": "Bailey",
"age": "1"
}
],
"removed": [
{
"id": "2",
"name": "Pancakes",
"age": "2"
}
],
"changed": [
{
"key": "1",
"changes": {
"age": [
"4",
"5"
]
}
}
],
"columns_added": [],
"columns_removed": []
}
The keys added
, removed
, changed
are pretty self-explanatory. Having such data can be useful to:
- spot trends over time
- timestamp changes in a dedicated table using a microservice
- setting up custom alerts that notify us when a specific value is introduced, modified, or removed
I’m a fan of this library and keep returning to it whenever I have a problem that requires a comparison of CSV datasets.