RMA and Resilving

Written by Michael Cole - July 10th 2015


Western Digital's RMA went well. They accepted my drive and shipped me a new one which arrived today. I plugged it in and FreeBSD saw it right away. Unfortunately since I didn't have a spare drive, I couldn't rebuild with the damaged drive as recommended. So now on to the rebuilding:

gpart create -s gpt da5
gpart add -a 4k -l id5p1 -t freebsd-zfs da5
			

The drive isn't automatically detected because it is a new drive, even though it is on the same controller with the same name, etc. This is what the ZPOOL looks like currently:

zpool status
  pool: storage
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
	the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: resilvered 3.91G in 0h41m with 0 errors on Sat Jul  4 00:01:38 2015
config:

	NAME                     STATE     READ WRITE CKSUM
	storage                  DEGRADED     0     0     0
	  raidz2-0               DEGRADED     0     0     0
	    gpt/id1p1            ONLINE       0     0     0
	    gpt/id2p1            ONLINE       0     0     0
	    gpt/id3p1            ONLINE       0     0     0
	    gpt/id4p1            ONLINE       0     0     0
	    9087965629824981845  UNAVAIL      0     0     0  was /dev/gpt/id5p1
	    gpt/id6p1            ONLINE       0     0     0

errors: No known data errors
			

Now that the drive is prepared to be put in the ZPOOL we can go ahead and try to re-add it. To do this I run:

zpool replace storage 9087965629824981845 /dev/gpt/id5p1
			

Now the status shows:

zpool status
  pool: storage
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Jul 11 12:17:03 2015
        4.91G scanned out of 1.97T at 179M/s, 3h11m to go
        834M resilvered, 0.24% done
config:

        NAME                       STATE     READ WRITE CKSUM
        storage                    DEGRADED     0     0     0
          raidz2-0                 DEGRADED     0     0     0
            gpt/id1p1              ONLINE       0     0     0
            gpt/id2p1              ONLINE       0     0     0
            gpt/id3p1              ONLINE       0     0     0
            gpt/id4p1              ONLINE       0     0     0
            replacing-4            UNAVAIL      0     0     0
              9087965629824981845  UNAVAIL      0     0     0  was /dev/gpt/id5p1/old
              gpt/id5p1            ONLINE       0     0     0  (resilvering)
            gpt/id6p1              ONLINE       0     0     0

errors: No known data errors
			

That's going to take a while (about 2 hours according to the status). I appreciate this experience so early on in the process. It underscores the importance of labeling so I could identify the disk quickly. I can also see that in the future buying 1 extra disk as a spare or doing an advanced RMA may be the best option. I must also say that seeing ZFS identify drive errors and continue to operate in a DEGRADED state for a week makes me feel more confident with it, and it's ability to continue to function well. As they say, "a watched pot never boils", so I'm going to let the pool resilver in peace.