I’m a fan of Microsoft’s Data Protection Manager (now a part of System Center 2012). Data Protection Manager (DPM) is a data protection and recovery application, designed to protect an entire network. The protection of network data is absolutely essential in any business, regardless of size. DPM is a great system, but really uses up disk space. What do you do when your DPM server runs out of space? This article will discuss one approach to solving the space issue: the use of a Dell MD1000 (connected to a Dell server) to expand available storage.
How Does DPM Work?
The DPM system is capable of protecting a large number of data formats, in a variety of different ways. Bare Metal Recovery, Exchange Server, SQL Server, SharePoint, individual workstations, and raw files are all supported. The first step in backing up your data is to define what you will be protecting, then deciding how often that data should be backed up. The frequency of backups is fully configurable, with synchronization occurring as often as every 15 minutes. For example, you could choose to back up crucial data every 15 minutes, while leaving less crucial data with only a few checkpoints per day.
How long to retain the data on the backup server is equally configurable. You can retain the data for days or weeks, depending upon your needs. Before determining how long to retain disk-based data, one must take into account DPM’s ability to store data on tape. Tape remains the most cost-effective, long-term storage method available. Tape also allows one to transfer the data offsite, ensuring that a backup will be available in the case of a large-scale catastrophe.
Ideally, all backup data would be stored on disk. Disk-based storage allows you to access and recover the data quickly and easily. The problem is, physical disks are expensive. Storing more than a few weeks worth of data, depending upon the amount of data being backed up, requires a large amount of space. This is where tape-based storage comes in. Data requiring less frequent access can be stored for an infinite amount of time, without taking up precious disk space.
A typical backup routine may look like this:
- All high priority data is synchronized every 15 minutes. Less critical data is synced every 1, 2, or 3 hours.
- Data is retained on disk for 2 to 3 weeks. The longer data can be stored on disk, the easier it is to recover. The trade off is an increase in the amount of disk space required.
- Once a week, a current snapshot of the backed-up data is stored on tape. These tapes are set to expire after 4 weeks, so that the tapes may be re-used.
- Once a month, a snapshot is stored on tape. These tapes are set to expire after 12 months.
This approach provides for a good amount of convenient, disk-based recovery, while also ensuring that older revisions of files will be available via tape for up to a year.
How Did We Get Here?
I first rolled out DPM with the 2007 version. At the time, Dell sold a server pre-configured with DPM. After much research, I had settled upon DPM as the way to go, so a pre-built server was a quick method of deployment. Cut to a few years later, and the machine had run out of space. What happened? Well, our data needs simply grew faster than anticipated. Since DPM works off of a restore point system, you need far more space than that taken up by the data you are protecting. Each revision of a file takes up space. Setup your system for two weeks worth of restore points, and you can end up with a lot of used space on your backup server. Unfortunately, the machine that was selected at the time was incapable of being expanded with additional space. It was what it was, and there was no getting around it. Note that the purchase of the restricted system was not an oversight, but a limitation imposed by budgetary restraints outside of my control.
With the old system of little use, it was now time to build a new backup server. This time, I made sure to push for far more flexible (and powerful) hardware. Something that actually had a few expansion slots. I got the hardware, but sadly, those pesky fiduciary restraints crept back in and limited the number of hard drives I ended up with. The rebuild turned out to be the perfect time to upgrade to DPM 2010. Everything hummed along nicely for a year or so, then…low on space…again.
I picked up enough drives to fill up the server, and set to work on expanding the storage array. I performed a Bare Metal Recovery (BMR) of the OS, popped in the new drives, re-created the array with the new drives, restored the OS from the BMR, and I was good to go. I didn’t worry about keeping the disk-based recovery points throughout this process, as I had a year’s worth of tape backups ready and waiting. The disks were back in action and full of fresh recovery points within a few days. Now, I knew that this latest rebuild would only act as stop-gap. A few more drives wasn’t going to make a huge difference, after all. I just wanted to make sure and maximize the space I could get out of the server, before moving on to my grand expansion plan.
Grand Expansion Plan?
When I first deployed DPM, I pushed for a server that would allow me to expand it with the use of a Dell MD1000 expansion enclosure. That didn’t happen, but over the years I was able to build my way up to a server with room for expansion. Three previous rebuilds created a compelling case for a proper space upgrade, so my grand expansion plan was finally approved.
There are newer expansion enclosures available from Dell, but a used MD1000 can be acquired for a very attractive price these days, with little difference in performance. Essentially, the MD1000 is just a box that holds a bunch of disks (15, to be exact). Unlike the MD3000 unit, the MD1000 doesn’t have a RAID card. Instead, the MD1000 is connected to a RAID card in the server, and that card handles the management of the array. There is the potential for degraded performance when connecting to an external enclosure, but that shouldn’t matter in this use case
Take a look at the MD1000’s feature list, straight from Dell, and you’ll quickly be sold on it:
- Enclosure storage in an efficient rack-mount design
- Capacity for either 15 3.5-inch, hot-plug, 3.0-Gbps, serial-attached SCSI (SAS) hard drives or 15 3.5-inch, hot-plug, 3.0-Gbps, Serial ATA (SATA) hard drives
- Host-based RAID support via a PERC 5/E adapter
- Redundant hot-plug power supply and cooling fans that are integrated for improved serviceability
- Optional second enclosure management module (EMM) for redundant system management capability
- Support for either of the following direct-attach configurations:
– Unified mode for direct connectivity of up to 15 hard drives
– Split mode (with dual EMMs) providing direct connectivity to drives 0 through 6 on one EMM and a separate direct connectivity to drives 7 through 14 on the second EMM
- Front-panel, two-position switch for setting the enclosure mode (unified or split mode)
- Support for up to three daisy-chained storage enclosures in unified mode for a total of 45 hard drives
- In-band enclosure management provided through SCSI enclosure services (SES)
- RAID and system management using Dell OpenManage™ Server Administrator Storage Management Service
- Four sensors for monitoring ambient temperatures (with redundant EMMs)
- Over-temperature shutdown capability
- Audible warning for critical component failure
- Support for a wide range of servers (See your system’s readme file for supported systems. An updated readme can be viewed from the Dell website at support.dell.com.)
The goal in configuring the MD1000 unit was to maximize the amount of space gained. Backups require space and stability, not speed. Not needing the speed facilitated by SAS drives, I was able to stick with low-cost, high-capacity SATA drives. I ended up with 15 2TB 7.2k SATA drives. Since a backup system almost never requires quick access, the 7.2k drives are more than sufficient. Technically, it’d be possible to stop into your local brick-and-mortar store and load up on cheap consumer drives, but enterprise-class drives are a must since the drives will be getting constant, 24-hours-a-day use. So, 15 2TB drives…that’s 30TB of space! Finally, the backup storage I’ve been after all along.
eBay Isn’t for the Easily Disheartened
Ahh, eBay, the second home of the Server Admin on a budget. I picked up all of the parts I needed for the MD1000 off of eBay, but it wasn’t completely smooth sailing. The 15 drives arrived without a problem, so no worries there. The MD1000 unit, itself, was a different story. I was able to source a unit that had the MD1000 shell, dual power supplies and EMS controllers, rack rails, SAS cables, extra drive trays, a PERC 6/E RAID card, 15 500GB hard drives, and most importantly 15 SATA interposers. Why were the interposers the most important aspect? Well, they’re not the easiest item to find for sale, and the few you do find run $30 – $50 each. That really adds up when you need 15 of them. The PERC 6/E card was a nice bonus, as it was a necessary item, and would have cost $150 – $200 separately. Honestly, the 15 500GB drives were unnecessary. In fact, I’m still working on finding a use for them. The total cost of the unit was less than buying the items I needed separately, so having 15 extra drives lying around wasn’t an issue.
Most servers I’ve purchased off of eBay have come in great shape. They’ve been packaged and shipped by people who really knew what they were doing. Sadly, that was not the case with the MD1000. Here’s the first thing I saw when I opened the box.
See the green circuit board, underneath the hardware trays? Yeah, that’s the PERC card. Not only is it not in any sort of anti-static bag, it’s just tossed in a pile with the trays. Not surprisingly, the card was fried. Oh, and did I mention that it wasn’t a 6/E card, as promised? It was actually a 5/E card.
Okay, so the card was shot. No big deal, PERC cards are easy to find. We’ll just order another one. Let’s keep unpacking, shall we?
I’ve seen my share of bent faceplates. In fact, I’ve often wondered about the abuse that servers are faced with. Are people just ripping the units out of their racks and tossing them on the floor? It often seems like that is the case. Yet, broken handles and rack screws are something I hadn’t seen before. At the top of the picture you can see the handle from the left side of the MD1000. That one completely broken off. The one on the right is still hanging on by one badly-bent screw.
Despite all of these issues, the only item that didn’t work was the PERC card. I ended up getting the seller to cover the cost of a replacement PERC card and a replacement faceplate. Much to my surprise, all 15 drives passed a full set of diagnostics, though I did have a bit of an issue with getting the unit to recognize some of the drives. It turned out that the bent faceplate was the cause of the non-recognized drives. Bending it back into shape took care of the problem, but I still ordered up a new faceplate as the broken handles and rack screws weren’t salvageable.
See the exposed pins in the picture above? That small gap was causing the MD1000 to have trouble finding some of the drives. When seating a hard drive in the MD1000, the drive’s tray hooks onto the bump at the front of the unit (in the lower right corner of the picture), then pulls itself the rest of the way into the drive slot. Since the faceplate was bent, and the bumps are part of the faceplate, some of the bumps were bent away from the MD1000’s chassis. This resulted in the drives not fully seating into their slot.
After bending the faceplate back into shape, the drives were able to make a much better connection (as seen below).
With all the hardware issues sorted, it was time to get everything connected. The first step was to get the PERC 6/E card installed in the DPM server. Nothing out of the ordinary there. A quick update to the card’s firmware, and I was good to go, as I already had the appropriate drivers installed on the server (hint: they’re natively installed on newer Windows operating systems). You can find all of the driver and firmware downloads you might need on Dell’s MD1000 product page.
The MD1000 is a pretty simple device, but there’s a few different things to consider when configuring it. The unit has only one switch on the front of it. This is the enclosure mode switch. When the switch is in the uppermost position, the unit is in unified mode. When the switch is in its lowermost position, the unit is in split mode. You need to set the mode before you power on the device. What’s the difference? Here’s what Dell has to say:
In unified mode, a SAS host can communicate with up to 15 drives in the enclosure via a single EMM, or up to 45 drives in three enclosures daisy chained together. In split mode, the enclosure is split into two virtual groups, with up to eight consecutive drives (slots 7-14) controlled by the primary (left) EMM. The remaining drives (slots 0-6) are controlled by the secondary (right) EMM. You must select either mode using the enclosure mode switch on the front panel of the enclosure before powering on.
The MD1000 most often comes configured with two Enclosure Management Modules (EMM). The EMMs act as the brains of the unit. They monitor and control the fans, temperatures, power supplies, and LEDs; control access to the drives; and communicate with the servers connected to the MD1000. In order to run the device in split mode, you’ll need two EMMs. If you’ll be using it in unified mode, then you only need one. It’s worth having two, even if you’re only using one, as the unused EMM can act as a backup. So, assuming you have two EMMs installed, you can hook up the MD1000 to two separate servers by using split mode. This allows each server to have access to roughly half of the drives (one server would get seven drives, the other eight). You could also hook up each block of drives to two separate ports on a single server, allowing you to create separate storage arrays.
One particularly good reason to have dual EMMs is that it lets you hook up the MD1000 in redundant mode. It should be noted that you have to have a PERC 6/E card (something newer, like a H700 might work, too) in order for this mode to work. Here’s a diagram from Dell, illustrating this cabling configuration:
As you can see, both EMMs are utilized, with a cable running to each port on the SAS card. A PERC 6/E card is able to detect this redundant configuration, and upon failure of one of the EMMs, switch over to the remaining EMM. Here’s a look at the redundant configuration carried over to a daisy-chain setup:
For more information on any of these configuration options, consult Dell’s documentation.
Configuring the RAID Array
Typically, I would configure RAID (redundant array of independent disks) via the controller card’s bios. This case was a bit different than the norm, though, in that the machine hosting the RAID array was a live production server. Not wanting to take the machine out of service for an extended period of time, I decided to try using Dell’s OpenManage software to perform the configuration. Although these instructions are meant for use with a Dell server, running Dell’s software, the general concepts should be applicable to other hardware/software combinations as well.
The first step in using OpenManage, of course, is to get it installed. You’ll need to download the latest version of the Dell OpenManage Server Administrator Managed Node from your server’s drivers download page. Once it’s installed, and you’ve logged in (whether from on the server or remotely), you should see something like that shown in the picture below. Make your way to the Virtual Disks page for the appropriate storage adapter.
Do you really want to go the “Express” route and leave things to chance? I didn’t think so. Select the Advanced Wizard, make your choice for RAID type, then click Continue. With a setup like this, there are two primary choices for the RAID level: RAID-5 or RAID-6. With RAID-5, one hard disk will be devoted to acting as a parity disk. This allows the storage array to continue operating normally when one of the disks dies. RAID-6 is similar, but sets aside two disks to act as parity disks. Due to the large number of disks in my array (15), I decided to use RAID-6. The more disks you’re working with, the more likely it is that one will go bad. With RAID-5, you’re okay if one disk goes bad, but if a second goes bad before you get the first one replaced and rebuilt…then the entire array is lost. I didn’t want to worry about multiple disks going bad over a weekend, so I decided to play it safe and use RAID-6. That should allow the array to keep operating as long as no more than two disks go bad at a time.
On this screen, select all of the disks you want to include in the storage array you are building. I made one large array, spanning all available disks. Once you’ve made your selection, click Continue.
Name: You can define any name you like. This is the name that your controller card will display, so while not that important, you may want to define something appropriate to the disk’s use.
Size: This is the size of the virtual disk you are creating. Since the disks in the server hosting the MD1000 were configured in MBR mode, I decided to keep the MD1000 consistent with this, and go with MBR for the new disks, as well. MBR is limited to disks under 2TB in size. To get around this limitation, MBR is slowly being phased out in favor of GPT mode. Since I was sticking with MBR, I setup each disk as 1900GB. If I had planned to use GPT mode, then I could have created one large disk, spanning the entire array.
Stripe Element Size: It is often recommended that you set the stripe size to the same size that your OS or software program uses when writing to the disk. Whenever something is written to disk, a chunk of disk that is a multiple of the size defined here is used. For example, if you define a stripe size of 64KB, and then write a 15KB file to disk, a 64KB chunk is used for that one file. If you had a 120KB file, then two chunks (128KB) would be used. The main advantage of larger stripes is that there is less chance of file fragmentation. If the majority of your files can fit in the stripes you define, then it is less likely that the files will be spread across multiple parts of the disk. The downside to larger stripes is the possibility of wasted space. If all of your files are 10KB, then each file will actually take up 64KB on the disk, as that is the minimum size of a data stripe.
I found some conflicting information on the Web, but from what I can tell, DPM writes to the disk in 64KB chunks. Knowing that, a 64KB stripe size should be perfect. Generally speaking, 64KB is a pretty solid choice for most applications.
Read Policy: You have three choices here: read-ahead, no-read-ahead, or adaptive read-ahead. Per Dell:
- Read-Ahead — When using read-ahead policy, the controller reads sequential sectors of the logical drive when seeking data. Read-ahead policy may improve system performance if the data is actually written to sequential sectors of the logical drive.
- No-Read-Ahead — Selecting no-read-ahead policy indicates that the controller should not use read-ahead policy.
- Adaptive Read-Ahead — When using adaptive read-ahead policy, the controller initiates read-ahead only if the two most recent read requests accessed sequential sectors of the disk. If subsequent read requests access random sectors of the disk, the controller reverts to no-read-ahead policy. The controller continues to evaluate whether read requests are accessing sequential sectors of the disk, and can initiate read-ahead if necessary.
Read-ahead tells the controller to start pulling in additional disk sectors whenever a sector is accessed. The theory is that, whenever a sector is read, it is likely that additional nearby sectors will be needed, as most files will span multiple sectors, and usually, those sectors are in sequential order. If you are reading data that is likely to be badly fragmented, like database transactions, then read-ahead can decrease performance. The controller takes the time to grab extras sectors, thinking they might be of use, but when it turns out that they aren’t, that effort was nothing but a waste of time.
Adaptive read-ahead is similar to read-ahead mode. The difference is that read-ahead only kicks in after two sequential disk sectors have been accessed. This allows read-ahead mode to kick in less frequently, but still be utilized when it seems more likely to be of use.
Write Policy: Dell has the following to say about these two options:
- Write-Back Caching — When using write-back caching, the controller sends a write-request completion signal as soon as the data is in the controller cache but has not yet been written to disk. Write-back caching may provide improved performance since subsequent read requests can more quickly retrieve data from the controller cache than they could from the disk. Write-back caching also entails a data security risk, however, since a system failure could prevent the data from being written to disk even though the controller has sent a write-request completion signal. In this case, data may be lost. Other applications may also experience problems when taking actions that assume the data is available on the disk.
- Write-Through Caching — When using write-through caching, the controller sends a write-request completion signal only after the data is written to the disk. Write-through caching provides better data security than write-back caching, since the system assumes the data is available only after it has been safely written to the disk.
Your choice here is going to be dependent upon your hardware configuration. With write-back caching, when data is sent off to be written to the disk, the command is sent and the controller moves on with the next request. With write-through caching, after the write command is sent, the controller waits for that command to finish before moving on to the next command. With write-back caching, if your system loses power during a write operation, then it is highly likely that data will be corrupted, as multiple incomplete commands could be queued up. Because of this, write-through caching is much safer for data security purposes. Unfortunately, it also results in much slower performance. This is why most modern RAID cards include a dedicated battery backup. This small battery allows the card to finish any incomplete writes in the case of power failure. Since my RAID card had a battery, I went with write-back caching.
Disk Cache Policy: This option defines whether the cache on the individual disks is enabled. This is separate from the previous cache options, as this cache is not controlled by the controller card. Since this cache isn’t tied to the controller card, the card’s battery backup will not help to prevent data errors in the case of a power failure. Disabling this cache does increase data security, but it can also cause a big decrease in performance. Since I’m confident in my UPS setup, I typically leave this cache enabled.
Make your choices, then create your virtual disk.
Repeat the above steps for each virtual disk you need to create. As you do, you will find that your options are much more limited. You won’t be able to change the RAID level, and once you select one physical disk to include, all of the disks you previously selected will automatically be included. Once you’ve setup all of your virtual disks, head back to the virtual disk page and you should see them listed in a manner similar to this:
Adding Disks to DPM
We’ve got our array setup, so now all we need to do is get the new virtual disks added to DPM. Head into the Disk Management tool (via Computer Management). You should see all of the new disks listed, with each showing as completely unallocated space. If you see the screen shown below upon entering Computer Management, go ahead and use it to initialize each disk. Choose MBR or GPT, based upon your own requirements. If you do not see this screen, then you will have to right click on each drive, initializing each manually.
Head into DPM, then go to the Disks tab under the Management section. Click Add… and this window will pop up:
And that’s it! After a bit of configuration work from DPM, the disks should show up in DPM’s disk pool.
Just take a look at all of the free space that resulted from the completion of my grand expansion plan.It took a bit of work, but space on the backup server shouldn’t be a concern for at least a few years. Now just watch…2014 will be the year when we launch a video hosting service…