My Giant Hard Drive: Building a Storage Box with FreeNAS
UPDATE 2/20/2015: This build failed after about 15 months, due to extensive drive failure. By extensive, I mean there were a total of 9 drive replacements, before three drives gave out over a weekend. This correlates closely to data recently published by Backblaze, which suggested 3 TB Seagate drives are exceptionally prone to failure. I’ve replaced these with 6 HGST Deskstar NAS 4TB drives, which were rated highly, and are better suited for NAS environments.
For many years, I’ve had a lot of hard drives being used for data storage. Movies, TV shows, music, apps, games, backups, documents, and other data have been moved between hard drives and stored in inconsistent places. This has always been the cheap and easy approach, but it has never been really satisfying. And with little to no redundancy, I’ve suffered a non-trivial amount of data loss as drives die and files get lost. Now, I’m not alone to have this problem, and others have figured out ways of solving it. One of the most interesting has been in the form of a computer dedicated to one thing: storing data, and lots of it. These computers are called network-attached storage, or NAS, computers. A NAS is a specialized computer that has lots of hard drives, a fast connection to the local network, and…that’s about it. It doesn’t need a high-end graphics card, or a 20-inch monitor, or other things we typically associate with computers. It just sits on the network and quietly serves and stores files. There are off-the-shelf boxes you can buy to do this, such as machines made by Synology or Drobo, and you can assemble one yourself for the job.
I’ve been considering making a NAS for myself for over a year, but kept putting it off due to expense and difficulty. But a short time ago, I finally pulled the trigger on a custom assembled machine for storing data. Lots of it; almost 11 terabytes of storage, in fact. This machine is made up of 6 hard drives, and is capable of withstanding a failure on two of them without losing a single file. If any drives do fail, I can replace them and keep on working. And these 11 terabytes act as one giant hard drive, not as 6 independent ones that have to be organized separately. It’s an investment in my storage needs that should grow as I need it to, and last several years.
Building a NAS took a lot of research, and other people have been equally interested in building their own NAS storage system, so I have condensed what I learned and built into this post. Doing this yourself is not for the faint of heart; it took at least 12 hours of work to assemble and setup the NAS to my needs, and required knowledge of how UNIX worked in order to make what I wanted. This post walks through a lot of that, but still requires skill in system administration (and no, I probably won’t be able to help you figure out why your system is not working). If you’ve never run your own server before, you may find this to be too overwhelming, and would be better suited with an off-the-shelf NAS solution. However, building the machine yourself is far more flexible and powerful, and offers some really useful automation and service-level tools that turn it from a dumb hard drive to an integral part of your data and media workflows.
Before we begin, I’d like to talk about the concepts and terminology to be discussed as part of the assembly. Feel free to skip this section if you already understand RAID, ZFS, and computer assembly.
Data Storage for Newbies
At its core, a NAS is just a computer with a number of hard drives in it. Its only purpose is to store and load data, and make all that stuff available over the network. Since all it’s ever doing is holding on to lots of data, you typically don’t need a lot of the things that you’d put into a normal computer; stuff like a graphics card, keyboard, mouse, and monitor aren’t needed very much. You instead buy parts that focus on a few key areas: number of hard drives you can connect, and how fast you can get data in and out. In this case, you need these parts:
- a motherboard
- a CPU
- some RAM
- a bunch of hard drives
- a power supply
- a case to put everything inside of
Your laptop has a hard drive in it. If you’ve ever plugged in an external drive or a Flash drive, you’d see that they’re two separate places for you to store stuff. If one of them fails, you lose all of the data on it, but it doesn’t affect the data on your other drives. And you have to organize everything yourself. Trying to scale up to 4 or 6 or 10 drives sounds like a disaster. What we really would like is to make all of those drives pretend like they’re one giant hard drive. And we’d like to be resilient to a hard drive dying without losing data.
There’s a tool for this, and it’s called RAID, or “redundant array of independent disks”. RAID is a set of technologies that takes multiple hard drives, called an array, and combines them under the hood to make them look and act like one giant hard drive. The way this works is complicated, but the basic idea is that RAID takes a file, chops it up into little pieces, and spreads them out across all your hard drives. Then, when you want the file, RAID will grab all those pieces from each hard drive and combine them back into the original file. (Please note: this is an overly simplified discussion of the technology, and is not technically accurate, but is adequate for our purposes of conceptualizing.) There are different strategies called “RAID levels” you can use that will change the specific behavior; some are more focused on redundancy, some are focused on speed.
The benefits you get with most RAID levels are: a bunch of hard drives that look like one storage place, improved speed when reading/writing data, the ability to survive a drive failing, and the ability to replace a dead drive with a new one. However, the downside is potentially a big one. Because the files are never stored as a whole on one drive, if you lose enough drives at once and don’t replace them in time, you lose all the data, even on drives that haven’t failed. Depending on your RAID level, you can survive zero, one, two, three, or more drives failing. But the more dead drives you want to be able to withstand, the more storage of those drives gets used for redundant data. So it’s a balance of how much storage you want vs. how much protection you want from dying drives. You can calculate how much storage you’ll have based on how many drives you buy using a RAID calculator. A healthy minimum is that for every 3 drives you buy, you want to be able to withstand one failing. So 2 or 3 drives should withstand 1 drive failing, 4-6 drives should withstand 2 failing, 7-9 should withstand 3, etc.
For this build, I set up my array as a form of RAID called RAID-Z2. RAID-Z and RAID-Z2 are based on a technology called ZFS, which is a modern file system that supports “storage pools”. This gives us the “make a bunch of hard drives act like one giant hard drive” behavior, which RAID-Z builds on to give us the “survive a hard drive failure” behavior we want. RAID-Z lets you survive one drive failure, RAID-Z2 lets you survive two, RAID-Z3 lets you survive 3. The major downside to RAID-Z is that it requires all data to be processed by the CPU, so you’ll want something reasonably fast to process your data. The more drives you add, the bigger the CPU will need to be.
Building the Computer
The part that was the most daunting for me to overcome was actually purchasing the pieces necessary to build the computer. I’m a software guy who’s owned Macs all my life, so I’ve never actually assembled a computer before (I will take this opportunity to let all the nerds out there get a good laugh in before we move on). If the idea of building your own computer is scary, you may want to just go buy an off-the-shelf NAS, such as the Synology DS413j and stop reading. Keep in mind, though, that a preassembled NAS will be more expensive and far less flexible than building one yourself.
After waffling on this for months, I finally decided to go the custom build approach. I figured I could make it cheaper, quieter, and run whatever services I wanted directly on the machine by building it myself. After putting some pieces together, here’s the parts I went with. Prices are what they cost as of September 30, 2012. All links to Amazon include affiliate links, so I get a tiny kickback. Feel free to search for the part names if you wish. You may be able to find these parts cheaper elsewhere on the Internet.
- Motherboard: Gigabyte GA-H61MA-D3V Micro ATX LGA1155 – $64.99
- CPU: Intel Celeron G540 2.5GHz Dual-Core – $49.98
- CPU Cooler: Arctic Cooling Freezer 7 Pro Rev.2 92mm Fluid Dynamic CPU Cooler – $29.99
- RAM: Pareema 16GB (2 x 8GB) 240-Pin DDR3 SDRAM DDR3 1333 (PC3 10600) Desktop Memory Model MD313D81609L2 – $64.99
- Power Supply: Antec EarthWatts EA-380D Green 380 Watt 80 PLUS BRONZE – $44.99
- Case: Fractal Design Define Mini – $95.98
- Hard Drives: Seagate Barracuda 7200RPM 3.5-Inch Internal Bare Drive ST3000DM001 – $149 each (six drives + two spares)
Cost without drives: $292.92
Cost with drives: $1492.20
A few notes about this hardware configuration:
- The case has 6 hard drive slots, so you can put up to 6 drives in it. You can, of course, put fewer in it.
- The motherboard has 6 SATA ports, but only two are 6 Gbps, while the others are 3 Gbps.
- The power supply has 5 SATA connections, so if you want to run 6 drives, you’ll need a Molex to SATA power adapter.
- Besides the Molex adapter, the parts mentioned all the cables necessary for internal setup. But you will need your own power cable.
- The motherboard includes some onboard graphics, and you’ll want to have a DVI monitor available for making sure the machine is booting correctly. You won’t need to keep it plugged in beyond setup, however.
- RAM is cheap, and if you’re accessing the same files over and over, they can remain in RAM and be even faster than loading from disk. It’s better not to skimp on this. Just make sure your CPU is 64-bit.
- There’s no Wi-Fi here, so you’ll either need to get a wireless card or (ideally) plug an Ethernet cable into it connected to your network.
Installing the OS
For the operating system, I decided to use FreeNAS 8.2, a distro of FreeBSD that is designed to run ZFS-based RAID systems. It includes a web-based administration tool that lets you set up the array, monitor it, set up tests to detect failing drives, run services and plugins, and lots of other stuff. To run this, I copied it to a USB key (at least 2 GB necessary, you probably want 4 GB) and just leave that plugged in to the back of the machine all the time. Once you copy the image onto the key, you set the default boot drive to the USB key, and it will boot to it each time. You will also need a keyboard (and note, Apple’s keyboards will not work with this setup, so have a USB or even a PS/2 Windows keyboard) to get into the BIOS settings. After you have the BIOS auto-boot set up, when you turn the computer on, it’ll take a minute or two to set everything up, and then the web admin will be available on your local network. If you have a router that can tell you what’s connected, you can get the IP there; otherwise, plug a monitor into the motherboard and it’ll tell you the IP. If your router supports it, you should grab the MAC address and assign it to a static IP on your network so that your NAS is always available on the same IP address. Once this is all running automatically, you can disconnect the monitor and keyboard and just run the machine headless.
The web admin is divided into a few sections. Along the top are the sections/actions that are the most commonly used; System, Network, Storage, Sharing, Services, Account, Help, Alert Status, and Log Out. The absolute first thing you should do is click the Account button and change the username and password for the admin account (which you got logged into automatically). Once this is set, nobody will be able to log in to the web admin without these credentials, or without physical access to the machine (as you can disable the login from the console if you have a monitor/keyboard attached). You’ll also want to click the Users tab in that section and create a user for yourself for connecting to the array. Make sure it’s in the group “wheel”, at the very least.
Once you have that out of the way, you can set up your storage array and actually get those hard drives to do something. Click Storage at the top to view the Active Volumes, which is empty, as we haven’t set any up yet. Set one up by clicking the Volume Manager button; give the volume a name (I just called mine “Main”), select all the disks from your list, choose ZFS, then choose your RAID-Z level. Click Add, and after some processing, you’ll have a giant hard drive. The amount of storage will be considerably less than the sum capacity of the hard drives you put in, as it is reporting the capacity after taking out the backup data it will eventually be storing. In my case, the 6x3TB drives have about 16.3 TB of raw capacity, but after the backup data in RAID-Z2 is accounted for, only 10.7 TB is available. Note: If you added 6 drives to the array, you should see 6 drives in the list when creating the volume; if you don’t, you probably didn’t connect something correctly inside the machine. Make sure you set the permissions on this new volume so your user can access it, and do this recursively.
ZFS has a cool feature called “datasets”. A dataset is just a folder with special types of rules around how big those folders can be. You can set a quota, which is the maximum size a folder can grow to, and a reserved space amount, which (as the name implies) reserves a certain amount of space for use in that folder. You can customize permissions on these separately from the whole array. You can set certain compression levels based on if you’re more concerned with speed vs space. All of these values can be changed later. You can also ignore all of this, and just use datasets for organization. So, for example, I have two primary datasets:
- Media, which has no quota or reserved space, permissions set so that anyone can read but only I can write, and no compression so it can stream fast, and
- Backups, for Time Machine, which has the maximum level of compression (as read/write speed doesn’t matter), no access to anyone except my user, and a quota of 500 GB
Actually Getting Data In/Out
So now I have a ZFS volume running RAID-Z2,
/mnt/Main, which has two datasets,
/mnt/Main/Backups. Now we need to actually make them available for use by other computers. To do this, we set up Shares. FreeNAS has three different types of shares – AFP (for Macs), CIFS (for Windows, also known as SMB or Samba), and NFS (for Unix/Linux/FreeBSD). For our purposes, I will be setting up two AFP shares, one for each of the two datasets.
Shares are a type of Service, which is a program that FreeNAS will run automatically for you. Besides Shares, FreeNAS has services for things like FTP, LDAP, Rsync, SSH, UPS integration, and plugins. At the top of the admin UI, click Services, and click the On/Off switch next to the AFP service to start it up. Feel free to turn on whatever else you like (except Plugins, which will not quite work out of the box, but I’ll discuss Plugins at greater length below). You may be prompted for settings before a given service will start.
Now you can create your Shares. Click the Sharing tab at the top, and make sure “Apple (AFP)” is selected. Click the “Add Apple (AFP) Share” button, and you’ll be prompted with a daunting form. You can leave some of the more confusing fields as their default. The fields you really need to worry about are:
- Name, the displayed name of the share
- Path, where you want the share to point
- Share password, if you want to set a password
- Allow/Deny list and Read-Only/Read-Write Access, to control who can do what on the share
- Disk Discovery, which will allow the share to be seen if you just ask the server for a list of shares
- Disk Discovery Mode, which will let you toggle between a normal Finder share and a Time Machine backup share
- Permissions, which let you control who can read, write, and run programs on the share
Once you have this in place, click OK, and you’ll have created the Share. If you enabled Disk Discovery mode, your NAS should appear in the Finder’s sidebar. If you did not, you can connect to it by selecting “Connect To Server” from the Go menu in the Finder (⌘K), and typing
afp://NAS_IP/SHARE_NAME and filling in the
SHARE_NAME as appropriate. Authenticate if you set it up, and you should be connected. Then you can drag stuff from your hard drive into the share and it will copy over. You can also use
cp from the Terminal to copy data.
When I tried setting this up originally, I got permissions errors while doing this. My rules for setting the permissions up are:
- Make sure the user you want to have read/write access is in both the allow list and the read-write access list
- If you want read-only access available to everyone, add
@nobodyto the allow list and the read-only list
- Set all file/directory permissions to on, with the exception of “other/write”.
- Set the owner of the ZFS dataset to your user, and set all the permissions there to on, with the exception of “other/write”.
To test the permissions on the ZFS dataset, the easiest thing to do is enable the SSH service, SSH into the machine with your user account,
cd into the dataset, and try to
touch a file. If it fails, you can’t write. If it does work,
cat the file; if it fails, you can’t read. If that succeeds, but trying to connect via AFP doesn’t let you read/write files, the error is on the AFP share permissions.
Keeping Your NAS Healthy
If you have a system dedicated to making sure your data is reliably accessible, you want to know sooner rather than later if you’re going to have hard drive problems. FreeNAS includes a drive testing system called S.M.A.R.T. which is a tool for testing your drives to determine if they are behaving abnormally (higher temperature, higher error rates when reading data, lower throughput, etc.). These can then be emailed to you on a schedule you decide for your analysis. These tests are not run on the array as a whole, but rather on individual disks within the array. These tests can be created and found on the sidebar, under System > S.M.A.R.T. Tests.
I rely primarily on the “short” S.M.A.R.T. test which runs once a day, and occasionally a “long” test which runs manually when I won’t need the array for awhile. The short test scans electrical circuits and selected parts of the disk for errors, and these tests take only a couple of minutes. The long test scans every bit on the drive for failures; this takes a very long time, especially on high capacity disks, so it should be run infrequently. There’s also a “conveyance” test, which is useful to run before/after moving the drives, to determine if they were damaged during transport. Set these up at your preference.
The easiest way to see this data is to have it emailed to you. Test reports are sent to the email address associated with the root user. To change this, select Account > Users > View Users from the sidebar. In the list that appears, the root user will be at the top of the second list. The last button lets you change the email address, so set this to your email address. You then have tell FreeNAS how to connect to an SMTP server with an account. You can use Gmail or iCloud for this. On the sidebar, select System > Settings and choose the Email tab. Fill out the fields as appropriate for your mail server. Once this is in place, you can send a test email. If you get it, you’re all set up, and your S.M.A.R.T. tests will send their results to you when they run.
Extending with Plugins
FreeNAS 8.2 introduced a plugin system based on FreeBSD jails, which are sandboxed environments for programs to run in. Plugins are like other services that run automatically in the background, but instead of being services for managing the array themselves, they are apps that you might want to run directly on your storage array. As they are sandboxed, they will only be able to write to specific folders in your array. A number of services have been ported to the FreeNAS plugin format, and you can use these to extend your array’s functionality and provide even more utility. I’ll demonstrate how to set up Transmission, the BitTorrent client, to run natively on your NAS. You can find other plugins on the FreeNAS Forums, or even make them yourself if the app has been ported to FreeBSD.
To begin, we need a place on the array to store plugins, and to store the Jail. Create two ZFS datasets for this (I call them “Jail” and “Plugins”). You’ll rarely need to go in here manually, but the plugin system needs a place for this stuff to live. All FreeNAS plugins are .pbi files, and in fact the service that runs the plugins is itself a pbi file, which is not installed by default. Once you have your datasets set up, go to the Services tab, and click the settings icon next to the Plugins service. There are three steps to the installation. First, it needs a temporary place to store the plugin while it installs (this will be the root of your ZFS volume). Next, it needs to know the path to your dataset for your jail and plugins folder, as well as the IP address you’re going to use as the jail’s IP (make this something unique, out of your DHCP range). Finally, it needs the plugin service PBI that is appropriate for the version of FreeNAS you’re using and the architecture of your CPU.
If it installed successfully, you can then install plugins. Near the top is a tab called “Plugins”. Here you can upload the pbi for whatever plugin you like. On the page where you downloaded the plugin service PBI, you can also download the pbi for Transmission. Download it from the site and upload it to your NAS. You’ll have to set up the parameters before you can turn it on. Make note of the Download directory you specify, as we’ll need it later (but you can leave it as the default). Then, you can turn it on and access it by going to
http://JAIL_IP:9091/ in your browser.
Now, before we go on a download spree, we need to understand where those files will end up. They go into the Download directory specified in the settings, which for me was
/usr/pbi/transmission-amd64/etc/transmission/home/Downloads. But there’s a catch: since this is in a FreeBSD jail, that path is relative to the jail root, which is itself part of your array. Now, you can access that folder, but you probably will want to set up a nicer path for it, that doesn’t go through your jail.
That’s where Mount Points come in. A Mount Point is a way of making a folder available from the outside of your jail to inside of it. So you can set up a Downloads dataset at
/mnt/Main/Downloads, and establish a Mount Point from that to the Transmission download folder, and suddenly everything Transmission downloads will appear in
/mnt/Main/Downloads, even though Transmission itself is jailed. In the Plugins tab of Services, there is a “View Mount Points” button. If you add a mount point, it asks you what the route is you want to set up. So for the case above, we need a mount point that looks like this:
Once this is set up, turn it on, and it will just start writing data from the Transmission downloads folder into your Downloads dataset. You may have to fiddle with permissions; I found I had to make the folder within the jail writable by the user that was running the Transmission process. To enter a jail, SSH in to the NAS box as a user in the wheel group,
su root, and run
jexec 1 csh. To exit, just
The case was larger than I expected, but not too large. It’s about as tall and deep as my media center, so it sits nicely next to it (which is handy as that’s where my Internet switch is). The case looks great, with off-black on all sides and no obnoxious branding on the front, and has some convenient USB ports on top. The only problem with the front is that the case has a power button on the top with a REALLY BRIGHT BLUE LED noting that the machine is on; I would love to figure out a way to turn that off (or at least knock the brightness down). But the real win here is that the case is very quiet. It has noise insulating material on the walls, which knock down the sound, and the hard drive trays have rubber grommets on the screw holes, which helps quiet the spinning of the hard drives. The case emits so little sound that, even with 6 hard drives and fans, the entire thing is less noisy than a single Western Digital MyBook (and I had 5 of those to replace). It blew away my expectations of noise.
The machine is quite fast. It handles reading and writing stuff like a champ, downloading and streaming at the same time with no problems. It’s been running for weeks at a time with no uptime issues. Even with 7 plugin services running, it has all run very, very smoothly. I’ve run into one or two bugs in the FreeNAS web admin UI, mostly happening when you try to save an options form that includes a permissions field (when you aren’t actually changing permissions). When this happens, a manual reboot of the machine fixes the problems, and since it’s manual you can take down connections as you need to. But you really shouldn’t have to change them once they’re set up, so this is a problem of setup more than anything.
The permissions on the system remain the biggest single headache. I’ve definitely spent most of my time struggling to make sense of the permission model, which gets more complicated and difficult to track down when you introduce Shares and Mount Points into the mix. But once you have it figured out, you can build in the permissions you want to offer and it will stick. You can also SSH in to the system to see the permissions at the UNIX level, which is helpful if you’re familiar with the shell.
The second biggest headache has been learning FreeBSD, which is starkly different from Linux or Mac OS X. There have been several times where I’ll do some muscle-memory shell command, like
sudo su transmission, and it will fail because FreeBSD does things a little differently (in this case, I’ve been doing
su root followed by
su transmission). These are probably just differently configured and there’s ways to get it to do what I want, but it’s not a big deal.
However, nits aside, once this system is running, it’s providing a ton of value. As someone who has always cobbled together storage based on what I had and what was the easiest to get setup, this definitely took more discipline to configure and get working properly, but the value is paying off huge. Since everything is pooled together, I have more incentive to keep it organized and optimized for how I want to use it. The assumptions I set up for myself and through the plugins mean everything works as I want and everything ends up where I need it to be. The extra effort makes it a more useful system.
Building a NAS is not for the cheap or faint of heart. It requires money, time, and effort to build into a great storage system. It is also not a panacea of storage; you still want to back up critical stuff onto a different drive, ideally offsite or in the cloud, and you still need to worry about drives failing. But if you put that energy in, you’ll end up with an indispensable tool that will be more reliable and more powerful than a glued-together system of disparate components and wonky services. It’s an investment that I’m hoping will pay off for a number of years.