RAID + backup
2020-02-11 Filed in: (soft|hard)ware
I've solved the problem of having robust storage both
on- and off-line: RAID 1 over 3 disks with a
write-intent bitmap.
Let me explain. I've been moving more and more information to disk lately, scanning in books and de-duplicating all cdrom/dvd backups (we had all our vinyl and negatives digitized a while back). Takes only 250 Gb so far, not counting mp3's/m4v's that is (which have less uniqueness value to me). See this NYT article - where Brewster Kahle summarizes it well: "Paper is no longer the master copy; the digital version is".
But with bits, particularly if some of 'em are in compressed files, data integrity is a huge issue. Which is where RAID makes sense. So I got two big SATA disks, hooked them up to trusty old "teevie", a 6 year old beige box stashed away well out of sight and hearing. With RAID 1 mirroring, everything gets written to both disks, so if either one fails: no sweat.
But RAID does not guard against fire or theft or "rm -r". It's a redundancy solution, not a backup mechanism. I want to keep an extra copy around somewhere else, just in case. Remote storage is still more expensive than yet another disk, and then you need encryption to prevent unauthorized access. Hmm, I prefer simple, as in: avoid adding complexity.
So now I'm setting up RAID to mirror over three disks. The idea being that you can run RAID 1 just fine in "degraded" mode as long as there are still 2 working disks in the array. Once in a while I will plug in the third disk, let the system automatically bring it in sync with the rest, and then take it out and move it off-site again.
But the story does not end here. Bringing a disc up to date is the same as adding a new disk: RAID will do a full copy, taking many hours when the disks are half a terabyte each. Which is where the "write-intent bitmap" comes in: it tracks the blocks which have not been fully synced to all disks yet. Whenever a block is known to be in sync everywhere, its bit is cleared. What this means is that I can now put all three disks in, let them do their thing, and after a while the bitmap will be all clear. Once I pull out the third disk, bits will start accumulating as changes are written to the other two. Later, when putting the third disk back on-line, the system will automatically save only the changed blocks. No need to remember any commands, start anything, just put it on-line. Quick and easy!
If a disk fails: buy a new one, replace it, done. Every few months, I'll briefly insert the third disk and then safely store it off-site again. If I were to ever mess up really badly (e.g. "rm -r"), I can revert via the third one: mark the two main disks as failed, and put the third one in for recovery to its older version.
Methinks it's the perfect setup for my needs. Total cost under €300 for the disks plus cheapo drive bays. Welcome to the digitalization decade.
Let me explain. I've been moving more and more information to disk lately, scanning in books and de-duplicating all cdrom/dvd backups (we had all our vinyl and negatives digitized a while back). Takes only 250 Gb so far, not counting mp3's/m4v's that is (which have less uniqueness value to me). See this NYT article - where Brewster Kahle summarizes it well: "Paper is no longer the master copy; the digital version is".
But with bits, particularly if some of 'em are in compressed files, data integrity is a huge issue. Which is where RAID makes sense. So I got two big SATA disks, hooked them up to trusty old "teevie", a 6 year old beige box stashed away well out of sight and hearing. With RAID 1 mirroring, everything gets written to both disks, so if either one fails: no sweat.
But RAID does not guard against fire or theft or "rm -r". It's a redundancy solution, not a backup mechanism. I want to keep an extra copy around somewhere else, just in case. Remote storage is still more expensive than yet another disk, and then you need encryption to prevent unauthorized access. Hmm, I prefer simple, as in: avoid adding complexity.
So now I'm setting up RAID to mirror over three disks. The idea being that you can run RAID 1 just fine in "degraded" mode as long as there are still 2 working disks in the array. Once in a while I will plug in the third disk, let the system automatically bring it in sync with the rest, and then take it out and move it off-site again.
But the story does not end here. Bringing a disc up to date is the same as adding a new disk: RAID will do a full copy, taking many hours when the disks are half a terabyte each. Which is where the "write-intent bitmap" comes in: it tracks the blocks which have not been fully synced to all disks yet. Whenever a block is known to be in sync everywhere, its bit is cleared. What this means is that I can now put all three disks in, let them do their thing, and after a while the bitmap will be all clear. Once I pull out the third disk, bits will start accumulating as changes are written to the other two. Later, when putting the third disk back on-line, the system will automatically save only the changed blocks. No need to remember any commands, start anything, just put it on-line. Quick and easy!
If a disk fails: buy a new one, replace it, done. Every few months, I'll briefly insert the third disk and then safely store it off-site again. If I were to ever mess up really badly (e.g. "rm -r"), I can revert via the third one: mark the two main disks as failed, and put the third one in for recovery to its older version.
Methinks it's the perfect setup for my needs. Total cost under €300 for the disks plus cheapo drive bays. Welcome to the digitalization decade.