Jerasure: A Library in C Facilitating Erasure Coding for Storage
Applications

Version 2.0

James S. Plank*        Kevin M. Greenan


Technical Report UT-EECS-14-721
Department of Electrical Engineering and Computer Science
University of Tennessee
Knoxville, TN 37996


http://www.cs.utk.edu/~plank/plank/papers/UT-EECS-14-721.html
Source code: https://bitbucket.org/jimplank/jerasure

This describes revision 2.0 of the code.

Abstract

This paper descibes version 2.0 of jerasure, a library in C that supports erasure coding in storage applications. In this paper, we describe both the techniques and algorithms, plus the interface to the code. Thus, this serves as a quasi-tutorial and a programmer's guide.

Version 2.0 does not change the interface of jerasure 1.2. What it does is change the software for doing the Galois Field back-end. It now uses GF-Complete, which is much more flexible and powerful than the previous Galois Field arithmetic library. In particular, it leverages Intel SIMD instructions so that Reed-Solomon coding may be blazingly fast

In order to use jerasure, you must first download and install GF-Complete. Both libraries are posted and maintained at bitbucket.com.

If You Use This Library or Document

Please send me an email to let me know how it goes. One of the ways in which I am evaluated both internally and externally is by the impact of my work, and if you have found this library and/or this document useful, I would like to be able to document it. Please send mail to plank@cs.utk.edu.

The library itself is protected by the New BSD License. It is free to use and modify within the bounds of that License. None of the techniques implemented in this library have been patented.



Finding the code

Please download the code from:

https://bitbucket.org/jimplank/jerasure.

Before you compile jerasure, you must download, compile and install GF-Complete. That is available from

https://bitbucket.org/jimplank/gf-complete.

Both libraries use autoconf, which means that you go through the following steps from the main directory:

UNIX> ./configure
UNIX> make
UNIX> sudo make install

            The example programs are in the directory Examples. The source code is in the directory src.

History of Jerasure

This is the third major revision of jerasure. Jerasure's revision history is as follows:
CONTENT 3

Contents

1 Introduction 4

2 The Modules of the Library 5

3 Matrix-Based Coding In General 6

4 Bit-Matrix Coding In General 6

4.1 Using a schedule rather than a bit-matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

5 MDS Codes 8

6 Part 1 of the Library: Galois Field Arithmetic 9

7 Part 2 of the Library: Kernel Routines 9

7.1 Matrix/Bitmatrix/Schedule Creation Routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
7.2 Encoding Routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
7.3 Decoding Routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
7.4 Dot Product Routines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7.5 Basic Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7.6 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7.7 Example Programs to Demonstrate Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

8 Part 3 of the Library: Classic Reed-Solomon Coding Routines 21


8.1 Vandermonde Distribution Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
8.2 Procedures Related to Reed-Solomon Coding Optimized for RAID-6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
8.3 Example Programs to Demonstrate Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

9 Part 4 of the Library: Cauchy Reed-Solomon Coding Routines 26

9.1 The Procedures in cauchy.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
9.2 Example Programs to Demonstrate Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
9.3 Extending the Parameter Space for Optimal Cauchy RAID-6 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28

10 Part 5 of the Library: Minimal Density RAID-6 Coding 29

10.1 Example Program to Demonstrate Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

11 Example Encoder and Decoder 29

11.1 Judicious Selection of Buffer and Packet Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31

12 Changing the Underlying Galois Field 32


1     INTRODUCTION 4

1     Introduction

Erasure coding for storage applications is growing in importance as storage systems grow in size and complexity. This paper describes jerasure, a library in C that supports erasure coding applications. Jerasure has been designed to be modular, fast and flexible. It is our hope that storage designers and programmers will find jerasure to be a convenient tool to add fault tolerance to their storage systems.

Jerasure supports a horizontal mode of erasure codes. We assume that we have k devices that hold data. To that, we will add m devices whose contents will be calculated from the original k devices. If the erasure code is a Maximum Distance Separable (MDS) code, then the entire system will be able to tolerate the loss of any m devices.

(a)Encoding (b) Decoding

Figure 1: The act of encoding takes the contents of k data devices and encodes them on m coding devices. The act of decoding takes some subset of the collection of (k + m) total devices and from them recalcalates the original k devices of data.

As depicted in Figure 1, the act of encoding takes the original k data devices, and from them calculates m coding devices. The act of decoding takes the collection of (k + m) devices with erasures, and from the surviving devices recalculates the contents of the original k data devices.

Most codes have a third parameter w, which is the word size. The description of a code views each device as having w bits worth of data. The data devices are denoted D0 through Dk-1 and the coding devices are denoted C0 through Cm-1. Each device Di or C j holds w bits, denoted di,0, . . . di,w-1 and ci,0, . . . ci,w-1. In reality of course, devices hold megabytes of data. To map the description of a code to its realization in a real system, we do one of two things:

  1. When w ∈ {8, 16, 32}, we can consider each collection of w bits to be a byte, short word or word respectively. Consider the case when w = 8. We may view each device to hold B bytes. The first byte of each coding device will be encoded with the first byte of each data device. The second byte of each coding device will be encoded with the second byte of each data device. And so on. This is how Standard Reed-Solomon coding works, and it should be clear how it works when w = 16 or w = 32.

  2. Most other codes work by defining each coding bit ci,j to be the bitwise exclusive-or (XOR) of some subset of the other bits. To implement these codes in a real system, we assume that the device is composed of w packets of equal size. Now each packet is calculated to be the bitwise exclusive-or of some subset of the other packets. In this way, we can take advantage of the fact that we can perform XOR operations on whole computer words rather than on bits.

The process is illustrated in Figure 2. In this figure, we assume that k = 4, m = 2 and w = 4. Suppose that a code is defined such that coding bit c1,0 is goverened by the equation:
c1,0 = d0,0 ⊕ d1,1 ⊕d 2,2 ⊕d3,3,

3     MATRIX-BASED CODING IN GENERAL 6


3     Matrix-Based Coding In General

The mechanics of matrix-based coding are explained in great detail in [Pla97]. We give a high-level overview here.

Authors' Caveat: We are using old nomenclature of "distribution matrices." In standard coding theory, the "distribution matrix" is the transpose of the Generator matrix. In the next revision of jerasure, we will update the nomenclature to be more consistent with classic coding theory.

Suppose we have k data words and m coding words, each composed of w bits. We can describe the state of a matrix-based coding system by a matrix-vector product as depicted in Figure 3. The matrix is called a distribution matrix and is a (k + m) × k matrix. The elements of the matrix are numbers in GF(2w) for some value of w. This means that they are integers between 0 and 2w-1, and arithmetic is performed using Galois Field arithmetic: addition is equal to XOR, and multiplication is implemented in a variety of ways. The Galois Field arithmetic library in [Pla07a] has procedures which implement Galois Field arithmetic.


Figure 3: Using a matrix-vector product to describe a coding system.


The top k rows of the distribution matrix compsose a k × k identity matrix. The remaining m rows are called the coding matrix, and are defined in a variety of ways [Rab89, Pre89, BKK+95, PD05]. The distribution matrix is multiplied by a vector that contains the data words and yields a product vector containing both the data and the coding words. Therefore, to encode, we need to perform m dot products of the coding matrix with the data.

To decode, we note that each word in the system has a corresponding row of the distribution matrix. When devices fail, we create a decoding matrix from k rows of the distribution that correspond to non-failed devices. Note that this matrix multiplied by the original data equals the k survivors whose rows we selected. If we invert this matrix and multiply it by both sides of the equation, then we are given a decoding equation - the inverted matrix multiplied by the survivors equals the original data.

4     Bit-Matrix Coding In General

Bit-matrix coding is first described in the original Cauchy Reed-Solomon coding paper [BKK+95]. To encode and decode with a bit-matrix, we expand a distribution matrix in GF(2w) by a factor of w in each direction to yield
4     BIT-MATRIX CODING IN GENERAL 7


a w(k +m)×wk matrix which we call a binary distribution matrix (BDM). We multiply that by a wk element vector, which is composed of w bits from each data device. The product is a w(k + m) element vector composed of w bits from each data and coding device. This is depicted in Figure 4. It is useful to visualize the matrix as being composed of w × w sub-matrices.

Figure 4: Describing a coding system with a bit-matrix-vector product.


As with the matrix-vector product in GF(2w), each row of the product corresponds to a row of the BDM, and is computed as the dot product of that row and the data bits. Since all elements are bits, we may perform the dot product by taking the XOR of each data bit whose element in the matrix's row is one. In other words, rather than performing the dot product with additions and multiplications, we perform it only with XORs. Moreover, the performance of this dot product is directly related to the number of ones in the row. Therefore, it behooves us to find matrices with few ones.

Decoding with bit-matrices is the same as with matrices over GF(2w), except now each device corresponds to w rows of the matrix, rather than one. Also keep in mind that a bit in this description corresponds to a packet in the implementation.

While the classic construction of bit-matrices starts with a standard distribution matrix in GF(2w), it is possible to construct bit-matrices that have no relation to Galois Field arithmetic yet still have desired coding and decoding properties. The minimal density RAID-6 codes work in this fashion.

4.1    Using a schedule rather than a bit-matrix

Consider the act of encoding with a bit-matrix. We give an example in Figure 5, where k = 3, w = 5, and we are calculating the contents of one coding device. The straightforward way to encode is to calculate the five dot products for each of the five bits of the coding device, and we can do that by traversing each of the five rows, performing XORs where there are ones in the matrix.
5     MDS CODES 8



Figure 5: An example super-row of a bit-matrix for k = 3, w = 5.


Since the matrix is sparse, it is more efficient to precompute the coding operations, rather than traversing the matrix each time one encodes. The data structure that we use to represent encoding is a schedule, which is a list of 5-tuples:

< op, sd, sb, dd, db >,

where op is an operation code: 0 for copy and 1 for XOR, sd is the id of the source device and sb is the bit of the source device. The last two elements, dd and db are the destination device and bit. By convention, we identify devices using integers from zero to k +m-1. An id i < k identifies data device Di, and an id i ≤ k identifies coding device C i-k.
A schedule for encoding using the bit-matrix in Figure 5 is shown in Figure 6.

< 0, 0, 0, 3, 0 >,< 1, 1, 1, 3, 0 >,< 1, 2, 2, 3, 0 >,
< 0, 0, 1, 3, 1 >,< 1, 1, 2, 3, 1 >,< 1, 2, 3, 3, 1 >,
< 0, 0, 2, 3, 2 >,< 1, 1, 2, 3, 2 >,< 1, 1, 3, 3, 2 >,< 1, 2, 4, 3, 2 >,
< 0, 0, 3, 3, 3 >,< 1, 1, 4, 3, 3 >,< 1, 2, 0, 3, 3 >,
< 0, 0, 4, 3, 4 >,< 1, 1, 0, 3, 4 >,< 1, 2, 0, 3, 4 >,< 1, 2, 1, 3, 4 > .
c0,0 = d0,0 ⊕ d 1,1 ⊕d2,2
c0,1 = d0,1 ⊕ d1,2 ⊕ d2,3
c0,2 = d0,2 ⊕ d1,2 ⊕ d1,3 ⊕ d2,4
c0,3 = d0,3 ⊕ d1,4 ⊕ d2,0
c0,4 = d0,4 ⊕ d1,0 ⊕ d2,0 ⊕ d2,1
(a) (b)

Figure 6: A schedule of bit-matrix operations for the bit-matrix in Figure 5. (a) shows the schedule, and (b) shows the dot-product equations corresponding to each line of the schedule.

As noted in [HDRT05, Pla08], one can derive schedules for bit-matrix encoding and decoding that make use of common expressions in the dot products, and therefore can perform the bit-matrix-vector product with fewer XOR operations than simply traversing the bit-matrix. This is how RDP encoding works with optimal performance [CEG+04], even though there are more than kw ones in the last w rows of its BDM. We term such scheduling smart scheduling, and scheduling by simply traversing the matrix dumb scheduling.

There are additional techniques for scheduling that work better than what we have implemented here [HLC07, Pla10, PSR12]. Embedding these within jerasure is the topic of future work.

5     MDS Codes

A code is MDS if it can recover the data following the failure of any m devices. If a matrix-vector product is used to define the code, then it is MDS if every combination of k rows composes an invertible matrix. If a bit-matrix is used, then we define a super-row to be a row's worth of w × w submatrices. The code is MDS if every combination of k super-rows composes an invertible matrix. Again, one may generate an MDS code using standard techniques such as employing a Vandermonde matrix [PD05] or Cauchy matrix [Rab89, BKK+95]. However, there are other constructions that also yield MDS matrices, such as EVENODD coding [BBBM95, BBV96], RDP coding [CEG+04, Bla06], the STAR code [HX05], and the minimal density RAID-6 codes [PBV11].


6     PART 1 OF THE LIBRARY: GALOIS FIELD ARITHMETIC 9


6     Part 1 of the Library: Galois Field Arithmetic

The files galois.h and galois.c contain procedures for Galois Field arithmetic in GF(2w) for 1≤ w ≤ 32. They contains procedures for single arithmetic operations, for XOR-ing a region of bytes, and for performing multiplication of a region of bytes by a constant in GF(28), GF(216) and GF(232). They are wrappers around GF-Complete, and can inherit all of the functionality and flexibility of GF-Complete.
   For the purposes of jerasure, the following procedures from galois.h and galois.c are used:     In section 12, we go over some example programs that change the Galois Field. We don't do it here, because we feel it clutters up the description at this point.

7     Part 2 of the Library: Kernel Routines

The files jerasure.h and jerasure.c implement procedures that are common to many aspects of coding. We give example programs that make use of them in Section 7.7 below.
   Before describing the procedures that compose jerasure.c, we detail the arguments that are common to multiple procedures:
7     PART 2 OF THE LIBRARY: KERNEL ROUTINES 10



7     PART 2 OF THE LIBRARY: KERNEL ROUTINES 11


7.1    Matrix/Bitmatrix/Schedule Creation Routines

When we use an argument from the list above, we omit its type for brevity.

7.2 Encoding Routines


7     PART 2 OF THE LIBRARY: KERNEL ROUTINES 12


7.3 Decoding Routines

Each of these returns an integer which is zero on success or -1 if unsuccessful. Decoding can be unsuccessful if there are too many erasures.
7     PART 2 OF THE LIBRARY: KERNEL ROUTINES 13


7.4     Dot Product Routines

7.5 Basic Matrix Operations

7.6     Statistics

Finally, jerasure.c keeps track of three quantities:
7     PART 2 OF THE LIBRARY: KERNEL ROUTINES 14


There is one procedure that allows access to those values:

The procedure galois_w08_region_multiply() and its kin have a parameter that causes it to XOR the product with another region with the same overhead as simply performing themultiplication. For that reason, when these procedures are called with this functionality enabled, the resulting XORs are not counted with the XOR's performed with galois_region_ xor().

7.7 Example Programs to Demonstrate Use

In the Examples directory, there are eight programs that demonstrate nearly every procedure call in jerasure.c. They are named jerasure_0x for 0 < x ≤ 8. There are also programs to demonstrate Reed-Solomon coding, Cauchy Reed-Solomon coding and Liberation coding. Finally, there are programs that encode and decode files.

All of the example programs, with the exception of the encoder and decoder emit HTML as output. Many may be read easily as text, but some of them format better with a web browser.


7     PART 2 OF THE LIBRARY: KERNEL ROUTINES 15


0001 0010 0100 1001 0011 0110 1101 1010 0101 1011
1011 0111 1111 1110 1100 1000 0001 0010 0100 1001
1110 1100 1000 0001 0010 0100 1001 0011 0110 1101
1111 1110 1100 1000 0001 0010 0100 1001 0011 0110
0111 1111 1110 1100 1000 0001 0010 0100 1001 0011
0011 0110 1101 1010 0101 1011 0111 1111 1110 1100
1010 0101 1011 0111 1111 1110 1100 1000 0001 0010
1101 1010 0101 1011 0111 1111 1110 1100 1000 0001
0110 1101 1010 0101 1011 0111 1111 1110 1100 1000
UNIX>

This demonstrates usage of jerasure print bitmatrix() and jerasure_matrix_to_bitmatrix(). The parameter k must be less than 2w. The element in row i, column j is set to:

1
i⊕(2w -j-1)
where division is in GF(2w),⊕ is XOR and subtraction is regular integer subtraction. When k > 2w-1, there will be i and j such that i ⊕ (2w - j - 1) = 0. When that happens, we set that matrix element to zero.
After creating the matrix and printing it, we test whether it is invertible. If k≤ 2w-1, then it will be invertible. Otherwise it will not. Then, if it is invertible, it prints the inverse, then multplies the inverse by the original matrix and prints the product which is the identity matrix. Examples:

UNIX> jerasure_03 4 3
<HTML><TITLE>jerasure_03 4 3 </TITLE>
<h3>jerasure_03 4 3</h3>
<pre>
The Cauchy Matrix:
4 3 2 7
3 4 7 2
2 7 4 3
7 2 3 4

Invertible: Yes

Inverse:
1 2 5 3
2 1 3 5
5 3 1 2
3 5 2 1

Inverse times matrix (should be identity):

1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
UNIX> jerasure_03 5 3
<HTML><TITLE>jerasure_03 5 3</TITLE>
<h3>jerasure_03 5 3</h3>

7     PART 2 OF THE LIBRARY: KERNEL ROUTINES 16


<pre>
The Cauchy Matrix:
4 3 2 7 6
3 4 7 2 5
2 7 4 3 1
7 2 3 4 0
6 5 1 0 4
Invertible: No
UNIX>

This demonstrates usage of jerasure_print_matrix(), jerasure_invertible_matrix(),jerasure_invert_matrix() and jerasure_matrix_multiply().
UNIX> jerasure_04 4 3
<HTML><TITLE>jerasure_04 4 3/TITLE>
<h3>jerasure_04 4 3</h3>
<pre>
The Cauchy Bit-Matrix:
010 101 001 111
011 111 101 100
101 011 010 110

101 010 111 001
111 011 100 101
011 101 110 010

001 111 010 101
101 100 011 111
010 110 101 011

111 001 101 010
100 101 111 011
110 010 011 101

Invertible: Yes

Inverse:
100 001 110 101
010 101 001 111
001 010 100 011

001 100 101 110
101 010 111 001
010 001 011 100

110 101 100 001
001 111 010 101
100 011 001 010

101 110 001 100
111 001 101 010

7     PART 2 OF THE LIBRARY: KERNEL ROUTINES 17


011 100 010 001

Inverse times matrix (should be identity):
100 000 000 000
010 000 000 000
001 000 000 000

000 100 000 000
000 010 000 000
000 001 000 000

000 000 100 000
000 000 010 000
000 000 001 000

000 000 000 100
000 000 000 010
000 000 000 001
UNIX> jerasure_04 5 3
<HTML><TITLE>erasure_04 5 3</TITLE>
<h3>jerasure_04 5 3 </h3>
<pre>
The Cauchy Bit-Matrix:
010 101 001 111 011
011 111 101 100 110
101 011 010 110 111

101 010 111 001 110
111 011 100 101 001
011 101 110 010 100

001 111 010 101 100
101 100 011 111 010
010 110 101 011 001

111 001 101 010 000
100 101 111 011 000
110 010 011 101 000

011 110 100 000 010
110 001 010 000 011
111 100 001 000 101
Invertible: No
UNIX>
This demonstrates usage of jerasure_print_bitmatrix(), jerasure_matrix_to_bitmatrix(), jerasure_invertible - bitmatrix(), jerasure_invert_bitmatrix() and jerasure_matrix_multiply().
1
i⊕(m+j)

7     PART 2 OF THE LIBRARY: KERNEL ROUTINES 18


where division is in GF(2w), ⊕ is XOR, and addition is standard integer addition. It prints out these m rows. The program then creates k data devices each with size bytes of random data and encodes them into m coding devices using jerasure_matrix_encode(). It prints out the data and coding in hexadecimal- one byte is represented by 2 hex digits. Next, it erases m random devices from the collection of data and coding devices, and prints the resulting state. Then it decodes the erased devices using jerasure_matrix_decode() and prints the restored state. Next, it shows what the decoding matrix looks like when the first m devices are erased. This matrix is the inverse of the last k rows of the distribution matrix. And finally, it uses jerasure_matrix_dotprod() to show how to explicitly calculate the first data device from the others when the first m devices have been erased. Here is an example for w = 8 with 3 data devices and 4 coding devices each with a size of 8 bytes:

UNIX> jerasure_05 3 4 8 8 100
<HTML><TITLE>jerasure_05 3 4 8 8 100/TITLE>
<h3>jerasure_05 3 4 8 8 10</h3>
<pre>
The Coding Matrix (the last m rows of the Generator Matrix Gˆ T):

71 167 122
167 71 186
122 186 71
186 122 167

Encoding Complete:

Data                                                        Coding
D0 : 8b e3 eb 02 03 5f c5 99     C0 : ab 09 6d 49 24 e2 6e ae
D1 : 14 2f f4 2b e7 72 85 b3      C1 : ee ee bb 70 26 c2 b3 9c
D2 : 85 eb 30 9a ee d4 5d b1     C2 : 69 c0 33 e8 1a d8 c8 e3
                                                       C3 : 4b b3 6c 32 45 ae 92 5b

Erased 4 random devices:

Data                                                        Coding
D0 : 8b e3 eb 02 03 5f c5 99     C0 : 00 00 00 00 00 00 00 00
D1 : 00 00 00 00 00 00 00 00     C1 : 00 00 00 00 00 00 00 00
D2 : 85 eb 30 9a ee d4 5d b1     C2 : 69 c0 33 e8 1a d8 c8 e3
                                                       C3 : 00 00 00 00 00 00 00 00

State of the system after decoding:

Data                                                        Coding
D0 : 8b e3 eb 02 03 5f c5 99   C0 : ab 09 6d 49 24 e2 6e ae
D1 : 14 2f f4 2b e7 72 85 b3    C1 : ee ee bb 70 26 c2 b3 9c
D2 : 85 eb 30 9a ee d4 5d b1   C2 : 69 c0 33 e8 1a d8 c8 e3
                                                     C3 : 4b b3 6c 32 45 ae 92 5b

Suppose we erase the first 4 devices. Here is the decoding matrix:

130 25 182
252 221 25
108 252 130

And dm_ids:

4 5 6

7     PART 2 OF THE LIBRARY: KERNEL ROUTINES 19


After calling jerasure_matrix_dotprod, we calculate the value of device #0 to be:

D0 : 8b e3 eb 02 03 5f c5 99

UNIX>

Referring back to the conceptual model in Figure 3, it should be clear in this encoding how the first w bits of C0 are calculated from the first w bits of each data device:

byte 0 of C0 = (71 × byte 0 of D0) ⊕ (167 × byte 0 of D1) ⊕ (122 × byte 0 of D2)

where multiplication is in GF(28).
However, keep in mind that the implementation actually performs dot products on groups of bytes at a time. So in this example, where each device holds 8 bytes, the dot product is actually:

8 bytes of C0 = (71 × 8 bytes of D0) ⊕ (167 × 8 bytes of D1) ⊕ (122 × 8 bytes of D2)

This is accomplished using galois_w08_region_multiply().
Here is a similar example, this time with w = 16 and each device holding 16 bytes:

UNIX>jerasure_05 3 4 16 16 102
<HTML><TITLE>jerasure_05 3 4 16 16 102 </TITLE>
<h3>jerasure_05 3 4 16 16 102</h3>
<pre>
The Coding Matrix (the last m rows of the Generator Matrix Gˆ T):

52231 20482 30723
20482 52231 27502
30723 27502 52231
27502 30723 20482

Encoding Complete:
Data Coding
D0 : 5596 1e69 b292 a935 f01a 77b8 b22e 9a70   C0 : 122e 518d c2c7 315c 9c76 2591 1a5a 397c
D1 : f5ad 3ee2 fa7a 2ef7 5aa6 ad44 f41f cfad C1 : 7741 f8c4 765c a408 7f07 b937 b493 2730
D2 : 4988 470e 24c8 182a a7f4 45b2 e4e0 3969 C2 : 9b0d c474 e654 387a e4b7 d5fb 2d8c cdb5r
C3 : eb25 24d4 6e49 e736 4c9e 7ab6 0cd2 d2fa

Erased 4 random devices:

Data Coding
D0 : 0000 0000 0000 0000 0000 0000 0000 0000 C0 : 0000 0000 0000 0000 0000 0000 0000 0000
D1 : f5ad 3ee2 fa7a 2ef7 5aa6 ad44 f41f cfad C1 : 7741 f8c4 765c a408 7f07 b937 b493 2730
D2 : 4988 470e 24c8 182a a7f4 45b2 e4e0 3969 C2 : 0000 0000 0000 0000 0000 0000 0000 0000
C3 : 0000 0000 0000 0000 0000 0000 0000 0000
State of the system after decoding:

Data Coding
D0 : 5596 1e69 b292 a935 f01a 77b8 b22e 9a70 C0 : 122e 518d c2c7 315c 9c76 2591 1a5a 397c
D1 : f5ad 3ee2 fa7a 2ef7 5aa6 ad44 f41f cfadC1 : 7741 f8c4 765c a408 7f07 b937 b493 2730
D2 : 4988 470e 24c8 182a a7f4 45b2 e4e0 3969C2 : 9b0d c474 e654 387a e4b7 d5fb 2d8c cdb5
C3 : eb25 24d4 6e49 e736 4c9e 7ab6 0cd2 d2fa

7     PART 2 OF THE LIBRARY: KERNEL ROUTINES 20


Suppose we erase the first 4 devices. Here is the decoding matrix:

130 260 427
252 448 260
108 252 130

And dm_ids:

4 5 6

After calling jerasure_matrix_dotprod, we calculate the value of device #0 to be:

D0 : 5596 1e69 b292 a935 f01a 77b8 b22e 9a70

UNIX>

In this encoding, the 8 16-bit half-words of C0 are calculated as:

(52231 × 8 half-words of D0) ⊕(20482 × 8 half-words of D1) ⊕ (30723 × 8 half-words of D2)

using galois_w16_region_multiply().

This program demonstrates usage of jerasure_matrix_encode(), jerasure_matrix_decode(), jerasure_print_ matrix(), jerasure_make_decoding_matrix() and jerasure_matrix_dotprod().

jerasure 06.c: This takes five parameters: k, m, w, packetsize and seed, and performs a similar example to jerasure 05, except it uses Cauchy Reed-Solomon coding in GF(2w), converting the coding matrix to a bitmatrix. The output this time is formatted HTML. k + m must be less than or equal to 2w and packetsize must be a multiple of sizeof(long). It sets up each device to hold a total of w * packetsize bytes. Here, packets are numbered p0 through pw-1 for each device. It then performs the same encoding and decoding as the previous example but with the corresponding bit-matrix procedures.
The HTML file at http://web.eecs.utk.edu/~plank/plank/jerasure/j06_3_4_3_8_100.html shows the output of

UNIX> jerasure_06 3 4 3 8 100

In this encoding, the first packet of C0 is computed according to the six ones in the first row of the coding matrix:

C0p0 = D0p0 ⊕ D0p1 ⊕ D0p2 ⊕ D1p2 ⊕ D2p0 ⊕ D2p2


These dotproducts are accomplished with galois_region_xor().

This program demonstrates usage of jerasure_bitmatrix_encode(), jerasure_bitmatrix_decode(), jerasure_ print_bitmatrix(), jerasure_make_decoding_bitmatrix() and jerasure_bitmatrix_dotprod().

jerasure 07.c: This takes four parameters: k, m, w and seed. It performs the same coding/decoding as in jerasure 06, except it uses bit-matrix scheduling instead of bit-matrix operations. The packetsize is set at sizeof(long) bytes. It creates a "dumb" and "smart" schedule for encoding, encodes with them and prints out how many XORs each took. The smart schedule will outperform the dumb one.
Next, it erases m random devices and decodes using jerasure_schedule_decode_lazy(). Finally, it shows how to use jerasure_do_scheduled_operations() in case you need to do so explicitly.

The HTML file at http://web.eecs.utk.edu/~plank/plank/jerasure/j07_3_4_3_102.html shows the output of


8     PART 3 OF THE LIBRARY: CLASSIC REED-SOLOMON CODING ROUTINES 21


UNIX> jerasure_07 3 4 3 102

This demonstrates usage of jerasure_dumb_bitmatrix_to_schedule(), jerasure_smart_bitmatrix_to_schedule(), jerasure_schedule_encode(), jerasure_schedule_decode_lazy(), jerasure_do_scheduled_operations() and jerasure_ get_stats().

jerasure 08.c: This takes three parameters: k, w and a seed, and performs a simple RAID-6 example using a schedule cache. Again, packetsize is sizeof(long). It sets up a RAID-6 coding matrix whose first row is composed of ones, and where the element in column j of the second row is equal to 2j in GF(2w). It converts this to a bit-matrix and creates a smart encoding schedule and a schedule cache for decoding.

It then encodes twice - first with the smart schedule, and then with the schedule cache, by setting the two coding devices as the erased devices. Next it deletes two random devices and uses the schedule cache to decode them. Next, it deletes the first coding devices and recalculates it using jerasure_do_parity() to demonstrate that procedure. Finally, it frees the smart schedule and the schedule cache.

Example - the output of the following command is in http://web.eecs.utk.edu/˜plank/plank/jerasure/ j08_7_7_100.html.

UNIX> jerasure_08 7 7 100

This demonstrates usage of jerasure_generate_schedule_cache(), jerasure_smart_bitmatrix_to_schedule(), jerasure_schedule_encode(), jerasure_schedule_decode_cache(), jerasure_free_schedule(), jerasure_free_ schedule_cache(), jerasure_get_stats() and jerasure_o_parity().

8 Part 3 of the Library: Classic Reed-Solomon Coding Routines

The files reed sol.h and reed sol.c implement procedures that are specific to classic Vandermondematrix-based Reed- Solomon coding, and for Reed-Solomon coding optimized for RAID-6. Refer to [Pla97, PD05] for a description of classic Reed-Solomon coding and to [Anv07] for Reed-Solomon coding optimized for RAID-6. Where not specified, the parameters are as described in Section 7.

8.1 Vandermonde Distribution Matrices

There are three procedures for generating distributionmatrices based on an extended Vandermondematrix in GF(2w). It is anticipated that only the first of these will be needed for coding applications, but we include the other two in case a user wants to look at or modify these matrices.

  • int *reed_sol_vandermonde_coding_matrix(k, m, w): This returns the last m rows of the distribution matrix in GF(2w), based on an extended Vandermonde matrix. This is a m × k matrix that can be used with the matrix routines in jerasure.c. The first row of this matrix is guaranteed to be all ones. The first column is also guaranteed to be all ones.

  • int *reed_sol_extended_vandermonde_matrix(int rows, int cols, w): This creates an extended Vandermonde matrix with rows rows and cols columns in GF(2w).

  • int *reed_sol_big_vandermonde_distribution_matrix(int rows, int cols, w): This converts the extended matrix above into a distribution matrix so that the top cols rows compose an identity matrix, and the remaining rows are in the format returned by reed_sol_vandermonde_coding_matrix().

8     PART 3 OF THE LIBRARY: CLASSIC REED-SOLOMON CODING ROUTINES 22


8.2 Procedures Related to Reed-Solomon Coding Optimized for RAID-6

In RAID-6, m is equal to two. The first coding device, P is calculated from the others using parity, and the second coding device, Q is calculated from the data devices Di using:
k-1
Q =Σ 2iDi
i=0
where all arithmetic is in GF(2w). The reason that this is an optimization is that one may implement multiplication by two in an optimized fashion. The following procedures facilitate this optimization.

  • int reed_sol_r6_encode(k, w, data ptrs, coding ptrs, size): This encodes using the optimization. w must be 8, 16 or 32. Note, m is not needed because it is assumed to equal two, and no matrix is needed because it is implicit.

  • int *reed_sol_r6_coding_matrix(k, w): Again, w must be 8, 16 or 32. There is no optimization for decoding. Therefore, this procedure returns the last two rows of the distribution matrix for RAID-6 for decoding purposes. The first of these rows will be all ones. The second of these rows will have 2j in column j.

  • reed_sol_galois_w08_region_multby_2(char *region, int nbytes): This performs the fast multiplication by two in GF(28) using Anvin's optimization [Anv07]. region must be long-word aligned, and nbytes must be a multiple of the word size.

  • reed_sol_galois_w16_region_multby_2(char *region, int nbytes): This performs the fast multiplication by two in GF(216).

  • reed_sol_galois_w32_region_multby_2(char *region, int nbytes): This performs the fast multiplication by two in GF(232).

8.3 Example Programs to Demonstrate Use


There are four example programs to demonstrate the use of the procedures in reed_sol.
  • reed sol 01.c: This takes three parameters: k, m and w. It performs a classic Reed-Solomon coding of k devices onto m devices, using a Vandermonde-based distribution matrix in GF(2w). w must be 8, 16 or 32. Each device is set up to hold sizeof(long) bytes. It uses reed sol vandermonde coding matrix() to generate the distribution matrix, and then procedures from jerasure.c to perform the coding and decoding.
Example:

UNIX>reed_sol_01 7 7 8 105
<HTML><TITLE>reed_sol_01 7 7 8 105/TITLE>
<h3>reed_sol_01 7 7 8 105</h3>
<pre>
The Coding Matrix (the last m rows of the Generator Matrix GˆT):

1 1 1 1 11 1
1 199 210 240 105 121 248
1 70 91 245 56 142 167
1 170 114 42 87 78 231
1 38 236 53 233 175 65

8     PART 3 OF THE LIBRARY: CLASSIC REED-SOLOMON CODING ROUTINES 23


1 64 174 232 52 237 39
1 187 104 210 211 105 186
Encoding Complete:

Data Coding
D0 : 6f c1 a7 58 a0 b4 17 74 C0 : 49 20 ea e8 18 d3 69 9a
D1 : 82 13 7f c0 9f 3f db a4 C1 : 31 d1 63 ef 0b 1d 6c 0e
D2 : b5 90 6d d0 92 ea ac 98 C2 : 0f 05 89 46 fb 75 5d c5
D3 : 44 6a 2b 39 ab da 31 6a C3 : 0d 37 03 f0 80 cd c7 69
D4 : 72 63 74 64 2b 84 a4 5a C4 : 63 43 e9 cc 2a ae 18 5c
D5 : 48 af 72 7d 98 55 86 63 C5 : 4f e9 37 1b 88 4f c0 d7
D6 : 6f c4 72 80 ad b9 1a 81 C6 : d2 af 66 51 82 ba e1 10

Erased 7 random devices:

Data Coding
D0 : 6f c1 a7 58 a0 b4 17 74 C0 : 00 00 00 00 00 00 00 00
D1 : 00 00 00 00 00 00 00 00 C1 : 00 00 00 00 00 00 00 00
D2 : 00 00 00 00 00 00 00 00 C2 : 0f 05 89 46 fb 75 5d c5
D3 : 00 00 00 00 00 00 00 00 C3 : 0d 37 03 f0 80 cd c7 69
D4 : 72 63 74 64 2b 84 a4 5a C4 : 63 43 e9 cc 2a ae 18 5c
D5 : 00 00 00 00 00 00 00 00 C5 : 4f e9 37 1b 88 4f c0 d7
D6 : 00 00 00 00 00 00 00 00 C6 : d2 af 66 51 82 ba e1 10

State of the system after decoding:
Data Coding
D0 : 6f c1 a7 58 a0 b4 17 74 C0 : 49 20 ea e8 18 d3 69 9a
D1 : 82 13 7f c0 9f 3f db a4 C1 : 31 d1 63 ef 0b 1d 6c 0e
D2 : b5 90 6d d0 92 ea ac 98 C2 : 0f 05 89 46 fb 75 5d c5
D3 : 44 6a 2b 39 ab da 31 6a C3 : 0d 37 03 f0 80 cd c7 69
D4 : 72 63 74 64 2b 84 a4 5a C4 : 63 43 e9 cc 2a ae 18 5c
D5 : 48 af 72 7d 98 55 86 63 C5 : 4f e9 37 1b 88 4f c0 d7
D6 : 6f c4 72 80 ad b9 1a 81 C6 : d2 af 66 51 82 ba e1 10

UNIX>

This demonstrates usage of jerasure_matrix_encode(), jerasure_matrix_decode(), jerasure_print_matrix() and reed_sol_vandermonde_coding_matrix().

reed sol 02.c: This takes three parameters: k, m and w. It creates and prints three matrices in GF(2w):

  1. A (k + m) × k extended Vandermonde matrix.

  2. The (k + m) × k distribution matrix created by converting the extended Vandermonde matrix into one where the first k rows are an identity matrix. Then row k is converted so that it is all ones, and the first column is also converted so that it is all ones.

  3. The m × k coding matrix, which is last m rows of the above matrix. This is the matrix which is passed to the encoding/decoding procedures of jerasure.c. Note that since the first row of this matrix is all ones, you may set int row_k_ones of the decoding procedures to one.

  4. Note also that w may have any value from 1 to 32.
    Example:
    8     PART 3 OF THE LIBRARY: CLASSIC REED-SOLOMON CODING ROUTINES 24


    UNIX> reed_sol_02 6 4 11
    <HTML><TITLE>reed_sol_02 6 4 11/TITLE>
    <h3>reed_sol_02 6 4 11</h3>
    <pre>
    Extended Vandermonde Matrix:

    1 00 0 0 0
    111 1 1 1
    12 4 8 16 32
    1 3 5 15 17 51
    1 416 64 256 1024
    1517 85 257 1285
    16 20120 272 1632
    1 7 21 107 273 1911
    1 864 512 10 80
    000 0 0 1
    Vandermonde Generator Matrix (GˆT):

    1 00 0 0 0
    010 0 0 0
    00 1 0 0 0
    0 0 0 1 0 0
    0 00 0 1 0
    000 0 0 1
    16 20120 272 1632
    1 1 1 1 1 1
    1 18791231 1283 682 1538
    113661636 1480 683 934
    1 10232045 1027 2044 1026


    Vandermonde Coding Matrix:

    1 11 1 1 1
    118791231 1283 682 1538
    113661636 1480 683 934
    110232045 1027 2044 1026


    UNIX>

    This demonstrates usage of reed_sol_extended_vandermonde_matrix(), reed_sol_big_vandermonde_coding_ matrix(), reed_sol_vandermonde_coding_matrix() and jerasure_print_matrix().

    reed sol 03.c: This takes three parameters: k, w and seed. It performs RAID-6 coding using Anvin's optimization [Anv07] in GF(2w), where w must be 8, 16 or 32. It then decodes using jerasure_matrix_decode().
    Example:

    UNIX>reed_sol_03 9 8 100
    <HTML><TITLE>reed_sol_03 9 8 100/TITLE>
    <h3>reed_sol_03 9 8 100</h3>
    <pre>
    Last 2 rows of the Generator Matrix:

    1 11 1 1 1 1 1 1
    124 8 16 32 64 12829

    Encoding Complete:

    8     PART 3 OF THE LIBRARY: CLASSIC REED-SOLOMON CODING ROUTINES 25


    Data Coding
    D0 : 8b 03 14 e7 85 ee 42 c5 C0 : fb 97 87 2f 48 f5 68 8c
    D1 : 7d 58 3a 05 ea b1 a7 77 C1 : 6e 3e bf 62 de b6 9e 0c
    D2 : 44 24 26 69 c3 47 b9 49
    D3 : 16 5b 8e 56 5d b3 6d 0d
    D4 : b2 45 30 84 25 51 42 73
    D5 : 48 ff 19 2d ba 26 c1 37
    D6 : 3c 88 be 06 68 25 d9 71
    D7 : f5 dd 8d e7 fa b6 51 12
    D8 : 6c 5c 1b ba b4 ba 52 5d

    Erased 2 random devices:

    Data Coding
    D0 : 8b 03 14 e7 85 ee 42 c5 C0 : fb 97 87 2f 48 f5 68 8c
    D1 : 7d 58 3a 05 ea b1 a7 77 C1 : 6e 3e bf 62 de b6 9e 0c
    D2 : 44 24 26 69 c3 47 b9 49
    D3 : 16 5b 8e 56 5d b3 6d 0d
    D4 : b2 45 30 84 25 51 42 73
    D5 : 00 00 00 00 00 00 00 00
    D6 : 3c 88 be 06 68 25 d9 71
    D7 : 00 00 00 00 00 00 00 00
    D8 : 6c 5c 1b ba b4 ba 52 5d

    State of the system after decoding:

    Data Coding
    D0 : 8b 03 14 e7 85 ee 42 c5 C0 : fb 97 87 2f 48 f5 68 8c
    D1 : 7d 58 3a 05 ea b1 a7 77 C1 : 6e 3e bf 62 de b6 9e 0c
    D2 : 44 24 26 69 c3 47 b9 49
    D3 : 16 5b 8e 56 5d b3 6d 0d
    D4 : b2 45 30 84 25 51 42 73
    D5 : 48 ff 19 2d ba 26 c1 37
    D6 : 3c 88 be 06 68 25 d9 71
    D7 : f5 dd 8d e7 fa b6 51 12
    D8 : 6c 5c 1b ba b4 ba 52 5d

    UNIX>

    This demonstrates usage of reed_sol_r6_encode(), reed_sol_r6_coding_matrix(), jerasure_matrix_decode() and jerasure_print_matrix().

    reed sol 04.c: This simply demonstrates doing fast multiplication by two in GF(2w) for w ⊕ {8, 16, 32}. It has two parameters : w and seed.

    UNIX>reed_sol_04 16 100
    <HTML><TITLE>reed_sol_04 16 100/TITLE>
    <h3>reed_sol_04 16 100</h3>
    <pre>
    Short 0: 907 *2 = 1814
    Short 1: 59156 *2 = 56867
    Short 2: 61061 *2 = 52481
    Short 3: 50498 *2 = 39567
    Short 4: 22653 *2 = 45306
    Short 5: 1338 *2 = 2676
    Short 6: 45546 *2 = 29663

    9     PART 4 OF THE LIBRARY: CAUCHY REED-SOLOMON CODING ROUTINES 26


    Short 7: 30631 *2 = 61262
    UNIX>

    This demonstrates usage of reed_sol_galois_w08_region_multby_2(), reed_sol_galois_w16_region_multby_2() and reed_sol_galois_w32_region_multby_2().

    9 Part 4 of the Library: Cauchy Reed-Solomon Coding Routines

    The files cauchy.h and cauchy.c implement procedures that are specific to Cauchy Reed-Solomon coding. See [BKK+95, PX06] for detailed descriptions of this kind of coding. The procedures in jerasure.h/jerasure.c do the coding and decoding. The procedures here simply create coding matrices. We don't use the Cauchy matrices described in [PX06], because there is a simple heuristic that creates better matrices:

    • Construct the usual Cauchy matrixM such thatM[i, j] = i⊕(m+j) , where division is over GF(2w), ⊕ is XOR and the addition is regular integer addition.

    • For each column j, divide each element (in GF(2w)) by M[0, j]. This has the effect of turning each element in row 0 to one.

    • Next, for each row i > 0 of the matrix, do the following:

    • -Count the number of ones in the bit representation of the row.
      -Count the number of ones in the bit representation of the row divided by elementM[i, j] for each j.
      -Whichever value of j gives the minimal number of ones, if it improves the number of ones in the original row, divide row i by M[i, j].

      While this does not guarantee an optimal number of ones, it typically generates a good matrix. For example, suppose k = m = w = 3. The matrix M is as follows:

      6 7 2
      5 2 7
      1 3 4

First, we divide column 0 by 6, column 1 by 7 and column 2 by 2, to yield:

1 1 1
4 3 6
3 7 2

Now, we concentrate on row 1. Its bitmatrix representation has 5+7+7 = 19 ones. If we divide it by 4, the bitmatrix has 3+4+5 = 12 ones. If we divide it by 3, the bitmatrix has 4+3+4 = 11 ones. If we divide it by 6, the bitmatrix has 6+7+3 = 16 ones. So, we replace row 1 with row 1 divided by 3.

We do the same with row 2 and find that it will have the minimal number of ones when it is divided by three. The final matrix is:
1 1 1
5 1 2
1 4 7

This matrix has 34 ones, a distinct improvement over the original matrix that has 46 ones. The best matrix in [PX06] has 39 ones. This is because the authors simply find the best X and Y , and do not modify the matrix after creating it.
9     PART 4 OF THE LIBRARY: CAUCHY REED-SOLOMON CODING ROUTINES 27


9.1 The Procedures in cauchy.c


The procedures are:

9.2 Example Programs to Demonstrate Use

There are four example programs to demonstrate the use of the procedures in cauchy.h/cauchy.c.