how do you know if a backup is dead (even if you have multiple copies)

Illustrious-Pay-7516 · 1 year ago

how do you know if a backup is dead (even if you have multiple copies)

hobbyhacker · 1 year ago

if you use real backups, and not just simple copies, then your backup software has verify function. For simple copies you should use hash files or something that can build a hash database and verify it. Btw. you should already use hash checking for live data anyway. For archiving you can create winrar archives with 10% recovery record, so it can self-verify and self-repair easily.

Silencer306 · 1 year ago

What do you mean by “real backups”?

hobbyhacker · 1 year ago

dedicated software that can create verifiable historical backup files. Like Veeam or Macrium, or the new generation like Duplicacy, Arq, Borg, etc. All of them have integrity verification integrated.

Far_Marsupial6303 · 1 year ago

Ideally you would have generated and saved a HASH before you copied your files as a control. Otherwise, it’s just a probability game. If the HASH on copy 1&2 match, but doesn’t match 3, then the probability is 1&2 are correct. If all three don’t match, you toss a coin.

If you’re on Windows, I recommend using Teracopy for all your file copying (always copy, never move!) and set verify on, which will perform a CRC and generate a HASH which you can then save. You can also use it to Test your files after the fact and generate a HASH.

FizzicalLayer · 1 year ago

I realize there are solutions, but I wanted my own for various reasons (better fit to the peculiar way I store and backup).

It was straightforward to write a python script to crawl a directory tree, adding files to an sqlite database. The script has a few commands:

- “check” computes checksums on files whose modification times have changed since last check, or on any file whose checksum is older than X days (find bitrot this way).

- “parity” Use par2 to compute parity files for all files in database. Store these in a “.par2” directory in the directory tree root so it doesn’t clutter the directory tree.

I like this because I can compute checksums and parity files per directory tree (movies, music, photos, etc), and by disk (no raid here, just JBOD + mergerfs). Each disk corresponds exactly to a backup set kept in a pelican case.

The sqlite database has the nice side effect that checksum / parity computation can run in the background and be interrupted at any time (it takes a loooooooong time). The commits are atomic, so machines crashes or have to shut down, it’s easy to resume from previous point.

Surely… SURELY… someone has already written this. But it took me a couple of afternoons to roll my own. Now I have parity and the ability detect bitrot on all live disks and backup sets.

Silencer306 · 1 year ago

Mind sharing on github or something?

fediverser · 1 year ago

This post is an automated archive from a submission made on /r/DataHoarder, powered by Fediverser software running on alien.top. Responses to this submission will not be seen by the original author until they claim ownership of their alien.top account. Please consider reaching out to them let them know about this post and help them migrate to Lemmy.

Lemmy users: you are still very much encouraged to participate in the discussion. There are still many other subscribers on !datahoarder@selfhosted.forum that can benefit from your contribution and join in the conversation.

Reddit users: you can also join the fediverse right away by getting by visiting https://portal.alien.top. If you are looking for a Reddit alternative made for and by an independent community, check out Fediverser.