This paper descibes version 2.0 of jerasure, a library in C that supports erasure coding in storage applications. In this paper, we describe both the techniques and algorithms, plus the interface to the code. Thus, this serves as a quasi-tutorial and a programmer's guide.
Version 2.0 does not change the interface of jerasure 1.2. What it does is change the software for doing the Galois Field back-end. It now uses GF-Complete, which is much more flexible and powerful than the previous Galois Field arithmetic library. In particular, it leverages Intel SIMD instructions so that Reed-Solomon coding may be blazingly fast
In order to use jerasure, you must first download and install GF-Complete. Both libraries are posted and maintained at bitbucket.com.
The library itself is protected by the New BSD License. It is free to use and modify within the bounds of that License. None of the techniques implemented in this library have been patented.
Jerasure supports a horizontal mode of erasure codes. We assume that we have k devices that hold data. To that, we will add m devices whose contents will be calculated from the original k devices. If the erasure code is a Maximum Distance Separable (MDS) code, then the entire system will be able to tolerate the loss of any m devices.
As depicted in Figure 1, the act of encoding takes the original k data devices, and from them calculates m coding devices. The act of decoding takes the collection of (k + m) devices with erasures, and from the surviving devices recalculates the contents of the original k data devices.
Most codes have a third parameter w, which is the word size. The description of a code views each device as having w bits worth of data. The data devices are denoted D0 through Dk-1 and the coding devices are denoted C0 through Cm-1. Each device Di or C j holds w bits, denoted di,0, . . . di,w-1 and ci,0, . . . ci,w-1. In reality of course, devices hold megabytes of data. To map the description of a code to its realization in a real system, we do one of two things:
Suppose we have k data words and m coding words, each composed of w bits. We can describe the state of a
matrix-based coding system by a matrix-vector product as depicted in Figure 3. The matrix is called a distribution
matrix and is a (k + m) × k matrix. The elements of the matrix are numbers in GF(2w) for some value of w.
This means that they are integers between 0 and 2w-1, and arithmetic is performed using Galois Field arithmetic:
addition is equal to XOR, and multiplication is implemented in a variety of ways. The Galois Field arithmetic library
in [Pla07a] has procedures which implement Galois Field arithmetic.
The top k rows of the distribution matrix compsose a k × k identity matrix. The remaining m rows are called the coding matrix, and are defined in a variety of ways [Rab89, Pre89, BKK+95, PD05]. The distribution matrix is multiplied by a vector that contains the data words and yields a product vector containing both the data and the coding words. Therefore, to encode, we need to perform m dot products of the coding matrix with the data.
To decode, we note that each word in the system has a corresponding row of the distribution matrix. When devices fail, we create a decoding matrix from k rows of the distribution that correspond to non-failed devices. Note that this matrix multiplied by the original data equals the k survivors whose rows we selected. If we invert this matrix and multiply it by both sides of the equation, then we are given a decoding equation - the inverted matrix multiplied by the survivors equals the original data.
As with the matrix-vector product in GF(2w), each row of the product corresponds to a row of the BDM, and is computed as the dot product of that row and the data bits. Since all elements are bits, we may perform the dot product by taking the XOR of each data bit whose element in the matrix's row is one. In other words, rather than performing the dot product with additions and multiplications, we perform it only with XORs. Moreover, the performance of this dot product is directly related to the number of ones in the row. Therefore, it behooves us to find matrices with few ones.
Decoding with bit-matrices is the same as with matrices over GF(2w), except now each device corresponds to w rows of the matrix, rather than one. Also keep in mind that a bit in this description corresponds to a packet in the implementation.
While the classic construction of bit-matrices starts with a standard distribution matrix in GF(2w), it is possible to construct bit-matrices that have no relation to Galois Field arithmetic yet still have desired coding and decoding properties. The minimal density RAID-6 codes work in this fashion.
Since the matrix is sparse, it is more efficient to precompute the coding operations, rather than traversing the matrix each time one encodes. The data structure that we use to represent encoding is a schedule, which is a list of 5-tuples:
As noted in [HDRT05, Pla08], one can derive schedules for bit-matrix encoding and decoding that make use of common expressions in the dot products, and therefore can perform the bit-matrix-vector product with fewer XOR operations than simply traversing the bit-matrix. This is how RDP encoding works with optimal performance [CEG+04], even though there are more than kw ones in the last w rows of its BDM. We term such scheduling smart scheduling, and scheduling by simply traversing the matrix dumb scheduling.
There are additional techniques for scheduling that work better than what we have implemented here [HLC07, Pla10, PSR12]. Embedding these within jerasure is the topic of future work.
7     PART 2 OF THE LIBRARY: KERNEL ROUTINES 10
7     PART 2 OF THE LIBRARY: KERNEL ROUTINES 11
When we use an argument from the list above, we omit its type for brevity.
7     PART 2 OF THE LIBRARY: KERNEL ROUTINES 12
Each of these returns an integer which is zero on success or -1 if unsuccessful. Decoding can be unsuccessful if there are too many erasures.
7     PART 2 OF THE LIBRARY: KERNEL ROUTINES 13
Finally, jerasure.c keeps track of three quantities:
7     PART 2 OF THE LIBRARY: KERNEL ROUTINES 14
There is one procedure that allows access to those values:
The procedure galois_w08_region_multiply() and its kin have a parameter that causes it to XOR the product with another region with the same overhead as simply performing themultiplication. For that reason, when these procedures are called with this functionality enabled, the resulting XORs are not counted with the XOR's performed with galois_region_ xor().
In the Examples directory, there are eight programs that demonstrate nearly every procedure call in jerasure.c. They are named jerasure_0x for 0 < x ≤ 8. There are also programs to demonstrate Reed-Solomon coding, Cauchy Reed-Solomon coding and Liberation coding. Finally, there are programs that encode and decode files.
All of the example programs, with the exception of the encoder and decoder emit HTML as output. Many may be read easily as text, but some of them format better with a web browser.
7     PART 2 OF THE LIBRARY: KERNEL ROUTINES 15
This demonstrates usage of jerasure print bitmatrix() and jerasure_matrix_to_bitmatrix().
The parameter k must be less than 2w. The element in row i, column j is set to:
where division is in GF(2w),⊕ is XOR and subtraction is regular integer subtraction. When k > 2w-1, there
will be i and j such that i ⊕ (2w - j - 1) = 0. When that happens, we set that matrix element to zero.
After creating the matrix and printing it, we test whether it is invertible. If k≤ 2w-1, then it will be invertible. Otherwise it will not. Then, if it is invertible, it prints the inverse, then multplies the inverse by the original matrix and prints the product which is the identity matrix. Examples:
7     PART 2 OF THE LIBRARY: KERNEL ROUTINES 16
This demonstrates usage of jerasure_print_matrix(), jerasure_invertible_matrix(),jerasure_invert_matrix() and jerasure_matrix_multiply().
7     PART 2 OF THE LIBRARY: KERNEL ROUTINES 17
This demonstrates usage of jerasure_print_bitmatrix(), jerasure_matrix_to_bitmatrix(), jerasure_invertible - bitmatrix(), jerasure_invert_bitmatrix() and jerasure_matrix_multiply().
7     PART 2 OF THE LIBRARY: KERNEL ROUTINES 18
where division is in GF(2w), ⊕ is XOR, and addition is standard integer addition. It prints out these m rows. The program then creates k data devices each with size bytes of random data and encodes them into m coding devices using jerasure_matrix_encode(). It prints out the data and coding in hexadecimal- one byte is represented by 2 hex digits. Next, it erases m random devices from the collection of data and coding devices, and prints the resulting state. Then it decodes the erased devices using jerasure_matrix_decode() and prints the restored state. Next, it shows what the decoding matrix looks like when the first m devices are erased. This matrix is the inverse of the last k rows of the distribution matrix. And finally, it uses jerasure_matrix_dotprod() to show how to explicitly calculate the first data device from the others when the first m devices have been erased. Here is an example for w = 8 with 3 data devices and 4 coding devices each with a size of 8 bytes:
7     PART 2 OF THE LIBRARY: KERNEL ROUTINES 19
Referring back to the conceptual model in Figure 3, it should be clear in this encoding how the first w bits of C0
are calculated from the first w bits of each data device:
where multiplication is in GF(28).
However, keep in mind that the implementation actually performs dot products on groups of bytes at a time. So in this example, where each device holds 8 bytes, the dot product is actually:
This is accomplished using galois_w08_region_multiply().
Here is a similar example, this time with w = 16 and each device holding 16 bytes:
With each of these codes, m must be equal to two and k must be less than or equal to w. The value of w has restrictions based on the code [PBV11]: