I read many posts talking about importance of having multiple copies. but the problem is, even if you have multiple copies, how do you make sure that EVERY FILE in each copy is good. For instance, imagine you want to view a photo taken a few years ago, when you checkout copy 1 of your backup, you find it already corrupted. Then you turn to copy 2/3, find this photo is good. OK you happily discard copy 1 of backup and keep 2/3. Next day you want to view another photo 2, and find that photo 2 in backup copy 2 is dead but good in copy 3, so you keep copy 3, discard copy 3. Now some day you find something is wrong in copy 3, and you no longer have any copies with everything intact.

Someone may say, when we find that some files for copy 1 are dead, we make a new copy 4 from copy 2 (or 3), but problem is, there are already dead files in this copy 2, so this new copy would not solve the issue above.

Just wonder how do you guys deal with this issue? Any idea would be appreciated.

  • @FizzicalLayerB
    link
    fedilink
    English
    17 months ago

    I realize there are solutions, but I wanted my own for various reasons (better fit to the peculiar way I store and backup).

    It was straightforward to write a python script to crawl a directory tree, adding files to an sqlite database. The script has a few commands:

    - “check” computes checksums on files whose modification times have changed since last check, or on any file whose checksum is older than X days (find bitrot this way).

    - “parity” Use par2 to compute parity files for all files in database. Store these in a “.par2” directory in the directory tree root so it doesn’t clutter the directory tree.

    I like this because I can compute checksums and parity files per directory tree (movies, music, photos, etc), and by disk (no raid here, just JBOD + mergerfs). Each disk corresponds exactly to a backup set kept in a pelican case.

    The sqlite database has the nice side effect that checksum / parity computation can run in the background and be interrupted at any time (it takes a loooooooong time). The commits are atomic, so machines crashes or have to shut down, it’s easy to resume from previous point.

    Surely… SURELY… someone has already written this. But it took me a couple of afternoons to roll my own. Now I have parity and the ability detect bitrot on all live disks and backup sets.