2015-07-04

Ceph and BitRot

BitRot is the tendency for data to degrade over time on storage devices. CERN reports error rates are at the 10-7 level, so BitRot is a significant problem. This short article talks about how to deal with BitRot on Ceph clusters.

Ceph's main weapon against BitRot is the deep scrubbing process. Deep scrubbing verifies data in a placement group against its checksum. If an object fails this test then the placement group is marked inconsistent and the administrator should repair it. Note that deep-scrub only detects and inconsistency and does not attempt an automatic repair. By contrast, a normal scrub only checks object sizes and attributes.

Deep scrubbing is resource intensive and can cause a noticeable performance drop. You can temporarily disable scrubbing and deep-scrubbing:

ceph osd set noscrub
ceph osd set nodeep-scrub
And the re-enable scrubbing with:
ceph osd unset noscrub
ceph osd unset nodeep-scrub

The configuration options for scrubbing allow the administrator to suggest how quiet the cluster should be before initiating a scrub, how long the cluster is allowed to go before it must scrub, and how many scrubs can run in parallel.

It's also technically possible to manually trigger scrubs via the command line. This means that an administrator that doesn't mind writing code could scrub the placement groups in different pools according to different policies. This article scrubs on a seven day cycle at night-time.

Another source of data degradation can occur in RAM before the data is written into the primary placement group. The best way to guard against this happening is to use ECC RAM. This particular problem is not unique to Ceph but is exacerbated because the nature of clusters is that they increase the number of potential corruption point in the supply-chain between application and storage device.

Ceph uses an underlying filesystem as a backing store and this in turn sits on a block device. There are choices an administrator might make in those layers to also help guard against BitRot - but there are also performance trade offs. For example ext4 and XFS do not protect against BitRot but ZFS and btrfs can if they are configured correctly. Ars Technica has an excellent article on the topic called BitRot and Atomic COWs: Inside next-gen filesystems. Also, don't expect that RAID will detect or repair BitRot.

For more technical details you can read this Q&A post by Sage Weil of InkTank from the Ceph mailing list.

No comments:

Post a Comment