nfirvine.comwiki

TorrentBasedDiskImaging

Filed in: Ideas.TorrentBasedDiskImaging · Modified on : Thu, 19 Jan 12

Symantec Ghost is kind of the industry standard for imaging a whole pile of computers a once. Ghost has a lot of shortcomings, but one thing that set it apart was the ability to distribute an image over a network to many computers by leveraging the mighty multicast packet. Multicast is, simply, a low-level networking function whereby the sender sends a single packet to the switch, but the switch, knowing the packet to be addressed to a multicast group, sends the packet to multiple recipients. This equates to drastically smaller network overhead, since, using the multicast method, transmitting the image to many machines takes about as many packets as transmitting to one, instead of adding up (in Big-O, O(1) instead of O(n)).

However, in practice, this has some problems. The Ghost multicast protocol seems to require the image to be sequentially and more or less synchronously with the other nodes: it streams the image to the disk, rather than streaming the image somewhere temporary (where?) then streaming it to the disk from there. This is a problem if you've got different NICs in your machines or, hell, machines on different models of switches, since it can throw the ghostcast out of sync and cause it to fail. And forget about jumping VLANs or NATs.

Of course, when Ghost works, it works wonders, even if it has become a gross bundle of hacks by now. Many a FOSS project has aped Ghost, perhaps most popular being Clonezilla, which uses udpcast to multicast. But multicast is multicast.

But hey, you know what does jump VLANs and NATs and work perfectly well out of sequence? BitTorrent!

Pros

  • The performance will be less than a perfect multicast session, but better than a many-unicast or directed broadcast, since instead of having one sender, we have 1 + the number who already have that block.

Cons ("Challenges")

  • You need to store the image somewhere on the target machine in a temporary area. This has to be done whenever you're not streaming straight from network to memory to disk anyway. The disk would need to be at a minimum twice the size of the image (considering compression and empty space in the partition). There are many solutions to this (using potential swap partition, using free space, deleting the image as its used (hard!)), but they depend on the situation.
  • You can't image in parallel to receiving, since it will be out of order.

Powered by PmWiki