Is ZFS Storage Right for Your Digital Asset Management System?
Your datasets may be safer with ZFS vs. RAID from other hardware.
By Adam Betker
What is ZFS?
ZFS is a highly configurable file system that works well for many DAM uses.
Developed by Oracle, ZFS (Z File System) is a highly configurable software raid that allows you to turn nearly any piece of server hardware into a working ZFS storage server.
It is versatile for many storage uses, including digital asset management and media asset management (DAM/MAM) storage, retrieval, video playback, and replicant storage in a network-attached storage (NAS) file-sharing interface using SMB (server message block) and NFS (network file system) protocols, to name a few.
Sharing ZFS is simple, yet it is meant to hold your data sets indefinitely while continually “scrubbing” your data to ensure an error-free experience. This is one reason that sets it apart from other storage solutions.
Protect against bit rot and bad or broken disk writes
ZFS has an innovative way of enhancing data integrity and handling bit-rot, the stale and corrupted data that develops slowly over time in large datasets. It stores an MD-5 checksum of every file on its filesystem. Periodically, it scans every file and checks it against that checksum. If the checksum matches, then the file passes, and ZFS moves on to the next file. If the file fails its checksum, an immediate fix is initiated. The ZFS file system creates a new replacement file that is fully healed of bit-rot by using the MD-5 checksum and its parity data from its RAID-Z. If good parity data exists for a corrupted file, ZFS can fix it automatically and save you much time and effort.
Verified Data
ZFS uses “copy-on-write” technology, meaning data is not replaced on the disk when a file is written.
Instead of overwriting a file, ZFS creates a new copy elsewhere in the data storage pool. This gives ZFS veto power over every transaction going to disk. If a file or group of files fails to write with integrity, it reverts to the old copy-on-write versions. This veto power keeps everything in order and establishes version control, which is especially vital when dealing with data sets of hundreds of thousands of files.
Traditional hardware RAID sets keep MD5 checksums of all data together with the data. This presents a problem when a block or sector of storage becomes corrupted. On the other hand, ZFS attributes the MD5 checksum to the parent block for the file. In the parent block above that, the whole file path and file data have a final checksum. This ensures each file not only has integrity but also a verifiable “home” in the data pool. Every file and file path is accounted for, giving great coherence to the entire data pool.
Data Replication
What good is your data if you can’t back it up reliably?
Many hardware raid storage providers leave backups to a separate, external process or server. This is not true with ZFS since the backup process is built right into the software. It uses a feature called replication, which is a bit-for-bit, block-for-block, snapshot-for-snapshot exact replica of your server’s data pools, which have been stored on a second (or third) ZFS server.
The replication technology creates an exact replica of your data, including permissions, snapshots, and datasets, all verified by MD5 checksums. This saved version of the file system allows you to restore individual files and datasets selectively, as well as entire servers of data when necessary. You can build a nearline ZFS storage solution and replicate that storage for a fraction of the cost of other third-party systems. Replicants can also be located at a secondary location for disaster recovery.
Durability
The durability of a ZFS data pool is perfectly suited for hosting a digital asset management solution.
The key to ZFS durability is that ZFS is a software raid. It is hardware agnostic on nearly every front, including that the ZFS drives are not hardware dependent. For instance, you can shut down your ZFS system and freely swap drives between drive bays. When you power on the ZFS system, it will recognize the drives independently of the drive bays, leaving your RAID-Z safe and intact. You can even swap a failed hardware chassis by moving your RAID-Z set of drives to a new server, installing ZFS, and importing the pools. Most hardware raid sets will not tolerate this type of hardware disruption, leaving your data unsalvageable.
Some ZFS-based systems come with a really great GUI/CLI for administering pools and tasks. If you want to try out an entry-level ZFS based storage system, you can download TrueNAS Core for free or download the Linux (CLI) version from the ZFS on the Linux website. ZFS is open source and free to use.
On every level, ZFS was built with robust durability and a configurable feature-set that makes a ZFS storage appliance a top choice for operating a Digital Asset Management system.