D-Link DNS-321 Network Attached Storage Enclosure

$119.99 $99.99

D-Link DNS-321 Network Attached Storage Enclosure

More Info Buy Now!

Seagate 4 TB BlackArmor NAS 220 Network Attached Storage Server (ST340005LSA10G-RK)

$549.99

The 4 TB BlackArmorA NAS 220 server from SeagateA is a small-business-specific network ...

More Info Buy Now!

3TB Snap Server 410 SATA II RAID Network Attached Storage

$1884.99

The Snap Server 410 is ideal for medium-sized businesses with less than 4TB of storage ...

More Info Buy Now!

BUFFALO LS-XH1.0TL LinkStation Pro Network Attached Storage

$199.99

Take your data with you wherever you go while keeping it safe at home with the LinkStat...

More Info Buy Now!

Seagate 2 TB BlackArmor NAS 220 Network Attached Storage Server (ST320005LSA10G-RK)

$449.99

The 2 TB BlackArmorA NAS 220 server from SeagateA is a small-business-specific network ...

More Info Buy Now!

Western Digital 2TB My Book World Edition II Network Attached Storage (WDH2NC20000N)

$279.99

The My BookA World Editiona II Dual-drive Network Storage from Western DigitalA provide...

More Info Buy Now!

750GB Snap Server 110 SATA II RAID Network Attached Storage

$868.19

The Snap Server 110 combines best-in-class storage performance with customer-centric ea...

More Info Buy Now!

500GB Snap Server 210 SATA II RAID Network Attached Storage

$500.15

The Snap Server 210 combines best-in-class storage performance, RAID data protection, a...

More Info Buy Now!

Fantom 2TB G-Force MegaDisk Network Attached Storage - 32MB, 7200RPM, RAID - NAS Ethernet Disk Storage

$329.95 $311.99

Fantom Drives G-Force MegaDisk NAS is the latest addition to the G-Force MegaDisk famil...

More Info Buy Now!

Data Robotics DroboShare Network Attached Storage Expansion for Drobo Storage Robot

$125.12

Data Robotics DroboShare Network Attached Storage Expansion for Drobo Storage Robot, Gi...

More Info Buy Now!

SiliconFS™


The BlueArc® Filesystem

A. Introduction
BlueArc’s history as a vendor of high-performance, highly-scalable network-attached storage (NAS) solutions stretches back more than ten years. Since its founding in 1998 BlueArc has consistently delivered on the promise of increasing the performance and scalability of its solutions with successive generations of products. Looking back over these many years of continuous development the core architecture continues to evolve, offering increasing levels of performance and scalability at lower and lower total costs to the customer. And yet, the heart of the BlueArc architecture – the filesystem – also continues to grow and develop, adding features and functionality which enable greater utility for the customer as well. The BlueArc Filesystem, SiliconFS, is the engine which drives the entire architectural platform forward. The filesystem is the foundation which enables greater performance and scalability for the entire platform. The filesystem is what ultimately directs and manages that performance and scalability, harnessing the power of the BlueArc family of products to enterprise storage management features, providing real value for our customers.
SiliconFS is built around the Object Store, a collection of object structures referring to data on disks,1 and a set of rules which govern the organizational layout and management of objects in the Object Store. The techniques behind creating, copying, moving, migrating, and deleting the objects in the Object Store make SiliconFS one of the most powerful, scalable, and extensible filesystems in use today. As the use of field-programmable gate arrays (FPGAs) to accelerate data operations is a key component in the differentiation of SiliconFS from every other filesystem in use today,2 deciding which operations to accelerate in hardware plays an important role in the performance, scalability, and robustness of the filesystem’s architecture.
SiliconFS also contains a number of enterprise storage features that distinguish it from competing products, with specific advantages in transparent Data Mobility, advanced Data Protection, and a rich Storage Virtualization engine to complement the high performance and scalability.
While SiliconFS is itself proprietary, BlueArc maintains an entirely open philosophy when it comes to host operating system, network access protocol, and back-end storage manufacturer choices. BlueArc is not in the business of creating proprietary protocols or software to enable greater performance, scalability, or features, preferring instead an open storage path using agreed-upon market standards while fundamentally redefining the file server itself. The ability to have multiple storage tiers, centrally manage them, and simultaneously migrate data among them and to third-party storage platforms is another of the patented advantages of SiliconFS3 and proof of BlueArc’s commitment to openness and standards. SiliconFS has evolved over time to introduce new functionality and other improvements.
B. Filesystem overview
SiliconFS is a highly differentiated technology that provides multiple benefits to its users. BlueArc products are best known for their ability to provide sustained, predictable, and consistent performance under various loads. SiliconFS is equally efficient with a variety of I/O sizes, loads, and access patterns. While superior single-server performance is important to many customers, SiliconFS also acts as an enabler for other important filesystem attributes:
• Scalability without impacting performance: SiliconFS can support millions of files in a single directory, while keeping directory search times to a minimum and sustaining overall system
1. At present both rotating disk and solid-state disk types are supported, as well as SSD/SDRAM hybrids; the dependence of the Object Store on rotating magnetic media is merely a functional definition.
2. Apparatus and Method for Hardware Implementation or Acceleration of Operating System Functions, United States patent 6,826,615 B2, granted 30 November 2004.
3. Network-Attached Storage System Device, and Method Supporting Multiple Storage Device Types, United States patent publication number WO04/95287, application number PCT/US04/01352, filed 4 April 2003 performance. Combined with Cluster Namespace™ SiliconFS can support many petabytes in a single unified namespace – presenting it all as a single filesystem accessible to many concurrent hosts, through a single mount point if desired.
• Consolidation: Extreme scalability enables consolidation, particularly of older hardware and “storage islands”. The ability to provide a unified, large-scale storage solution allows storage administrators to combine the functions of what were separately implemented file servers, reaping the cost-savings and ease-of-management benefits of a consolidated platform.
• Meaningful virtualization: virtualization is about making more efficient use of a single server. The more powerful the individual server is, the better suited it is to virtualize a larger number of less capable, under-utilized devices. BlueArc’s implementation of virtual servers allows groups to retain “ownership” of their virtual entity within a single physical server.
And thin provisioning makes it possible for multiple virtual servers to share a single pool of storage devices.
SiliconFS offers benefits beyond sustained, predictable, consistent file server performance.
Because of its unique architecture SiliconFS can adjust to the customer’s workflow and data sets. Not all data has the same “value” to the customer’s workflow. When data is first created it may be extremely valuable and must therefore reside on storage architected for very high performance. As the data ages application and eventually archival requirements tend to dominate, imposing further conditions on the storage where the data now resides. Yet, once an application knows where the data resides changes to the location are difficult. Applications and especially users do not like data migration. SiliconFS eliminates the difficulty of data relocation by providing a mechanism to migrate transparently across multiple tiers of storage, including data optimization devices (deduplication and compression, archival, etc.) Users do not necessarily need to be aware of the data actual location, and applications need not be rewritten either.
The following is a brief list of the key attributes of SiliconFS:
• Widest applicability to changing workloads, data sets, and access patterns: Fine-grained parallelism, off-loading of specific filesystem operations to FPGAs, and data pipelining all contribute to SiliconFS’ optimized handling of both throughput and metadata processing.
Both attributes have been principal design criteria from the very beginning with SiliconFS.
• Flexible performance scalability due to separation of function between servers and storage:
SiliconFS delivers great performance with relatively small storage systems. The filesystem also allows for performance to increase granularly as more disks are added. Typically this benefit will be felt immediately, even before “restriping” the data across both old and new spindles, as writes will automatically be spread immediately. As a result BlueArc customers may start small and scale performance by adding storage when needed. Additional file servers are not necessarily required for additional performance. As performance requirements grow even further, customers may also take advantage of clustering technology within SiliconFS to add more servers while maintaining a single namespace, providing easy management of very large pools of data.
But again, SiliconFS offers true separation of function between storage and servers. Each may be scaled independently to meet a customer’s needs; there is no requirement to purchase one to get the other as with many competing NAS products.
• Best-in-class namespace scalability: Scaling beyond a single NAS server is essential for high performance storage solutions. Many parallel filesystem implementations rely on clustering multiple servers together for greater aggregate performance. The difference with SiliconFS is the scale: individual servers are much more powerful than traditional CPU-based architectures, meaning fewer servers are needed in a given cluster to achieve some specified level of performance. SiliconFS also makes it possible to create a single, unified namespace across the entire cluster of BlueArc file servers – making it appear as a single filesystem to all network hosts. This functionality is known as Cluster Namespace,™ or CNS. CNS satisfies the most common scalability requirements, allowing network hosts to access data on any BlueArc server in the cluster, regardless of physical location. SiliconFS takes advantage of BlueArc’s unique architecture to move data seamlessly between multiple cluster nodes with minimal impact to performance.
• Advanced multi-tier storage mechanisms: Since data has an assigned value (by age, data type, owner, etc.) the ability to transparently relocate data to an applicable storage “tier” is a key feature of SiliconFS. Transparency requires that applications and users do not have to be pointed to new locations following data migration. SiliconFS provides policy-driven data migration mechanisms which allow data to be migrated transparently between many storage tiers. Individual storage tiers may also include 3rd-party, or foreign, filesystems accessible from the BlueArc servers via NFSv3 and HTTP. This ability to extend SiliconFS to external devices allows integration with many 3rd-party appliances for deduplication, compression, or archival for example. Such data migration mechanisms also allow for repurposing of existing storage devices as external storage tiers, lowering total costs and offering easier platform transitions.
• Robust data protection: SiliconFS provides various mechanisms for ensuring data protection. The storage used by the filesystem is protected by traditional hardware mechanisms, such as the use of redundant arrays of inexpensive disks (RAID) to provide fault-tolerance. SiliconFS also adds layers of functionality for further assurance of data preservation: enterprise features such as snapshots, replication, and high-availability cluster options are all part of the SiliconFS data resiliency framework.
• Advanced storage virtualization framework: A key advantage of NAS architectures over Storage Area Network (SAN) designs is the ability to more readily virtualize storage, simplifying data management and making storage provisioning much easier. SiliconFS provides an advanced virtualization framework that includes a global namespace (the CNS functionality), file server virtualization, storage pools, thin provisioning, and robust quota support.
C. SiliconFS defined
SiliconFS is implemented as an object-based design utilizing an object store, with root and leaf onode hierarchies in a tree structure, with a high degree of parallelization and manipulation of object pointers to accomplish data management duties. Most readers will be familiar with inodes, a data structure widely used in many UNIX or Linux-based filesystems. An inode stores information about a file, directory, or other filesystem object.4 The inode is thus not the data itself, but rather the metadata that describes the data. inodes store such metadata as user and group ownership, access mode (i.e., file permissions), file size, timestamps, file pointers (usually links to this inode from other parts of the filesystem) and file type, for example. When a traditional filesystem is created there is a finite upper limit on the total number of inodes – this limit defines the maximum number of files, directories, or other objects the filesystem can hold. This limit leads to what is called the finite inode problem, and is why most traditional filesystems cannot scale easily to multiple petabytes or billions of files.
In object-based filesystems objects are manipulated by the filesystem and correspond to blocks of raw data on the disks themselves. Information about these objects, or the object metadata, is called an onode, in much the same way inodes refer to file metadata in a traditional filesystem.
In BlueArc’s Object Store the underlying structure used to build up SiliconFS is an “object”, which is any organization of one or more of these raw blocks of data into a tree structure. The object is a container of storage that can be created, written to, read from, deleted, etc. Each element of the object is called an Onode, and while there are strong parallels to the normal use of the term onode in other object-based filesystems the concepts are not identical. In BlueArc’s Object Store objects are manipulated by logic residing in FPGAs located on the hardware modules. SiliconFS achieves great acceleration of many filesystem operations though the use FPGA hardware, and the design offers many performance, scalability, and robustness benefits to the end-user, such as:
4. inode data structure. http://en.wikipedia.org/wiki/Inode
• Performance: Writing Root Onodes to contiguous new space allows SiliconFS to take advantage of stripe set flushing, a technique designed to collate multiple writes into a single disk operation, thus obtaining maximum performance from the disks.
• Relocation: As Root Onodes are continuously written to new disk space, SiliconFS can move Root Onodes from their current positions if desired, allowing features such as volume shrinking and defragmentation.
1.Object types
Different types of objects serve different purposes. Some objects, like the indirection object and free space bitmap, are used to contain critical metadata. Objects are used for other types of metadata as well, such as access control lists (ACLs). All critical metadata objects are automatically duplicated, when possible using different storage devices. SiliconFS contains a number of mechanisms that make it resilient to storage trauma, making it possible to
recover from storage failures.
User data is contained in a file object. A directory name table object contains file and directory names in various formats (DOS short names, POSIX names, etc.), file handles, a CRC hash value, and the associated Object Identifier (OID) that points to the location of another object such as a subdirectory (another directory name table object type) or a file (the file object).
Directory and file manipulation, snapshots, and other filesystem features benefit from this object implementation versus the more traditional file-level inode structure. A good example of this benefit is delivered via a unique object called the directory tree object.
For each directory name table object there is a directory tree object, although there can be many of the former in the latter. The directory tree object is a sorted binary search tree (BST) of Onodes containing numeric values (hashes). Converting the directory/file name to lower case and then applying a CRC algorithm against it derives these hashes. The benefit of this extra bit of metadata comes when it is time to find a directory or a file. When a host requests a particular file/directory by name, that value is again converted to lower case and the CRC algorithm is applied. FPGAs execute a binary search of numeric values (as opposed to having to do computationally expensive string comparisons of names) to locate the position within the directory name table object at which to begin the search for the required name. The result of this structure is a dramatic improvement in object lookup speed, providing a performance benefit to the end-user or application.
2. Checkpoints and NVRAM
Like many other advanced NAS platforms, SiliconFS uses a write-back cache to increase I/O performance while maintaining data integrity. All operations that modify the filesystem are also preserved in non-volatile memory, referred to as NVRAM. Writes and other modifying operations are not acknowledged to network hosts until the changes have been written to both the write-back cache and to NVRAM. Data is periodically flushed from NVRAM to disk as part of a checkpointing process. Checkpoints are taken periodically as a routine part of SiliconFS’ normal operations, when the filesystem has consumed a certain amount of the NVRAM that has been allocated to it, and on certain other filesystem operations (e.g., when taking a snapshot of the filesystem).
At the end of each checkpoint, a consistent copy of the filesystem is located on disk. SiliconFS preserves the newest 128 checkpoints on disk. In the case of a system failure, such as loss of power, all transactions that have been acknowledged to network hosts are preserved either on disk or in NVRAM. Upon restart each filesystem is recovered before it is mounted and made available again to hosts. Filesystem recovery consists of selecting the most recent checkpoint and then replaying all filesystem operations that were logged since the last checkpoint, using the
5. Volume shrinking and defragmentation are not currently supported, but will be available soon in a future release information preserved in NVRAM. This restores the filesystem to the state it was in prior to the system failure, with no loss of data. In order to expedite recovery NVRAM replay is parallelized, and when possible multiple filesystem operations are replayed simultaneously.
In other failure scenarios, such as storage trauma or application-level failures, it may be beneficial to restore data from an older checkpoint or a snapshot. Filesystem rollback to any preserved checkpoint or snapshot is fast and easy, since no data relocation (copying) is actually required. Filesystem rollback is merely the rapid manipulation of object pointers in SiliconFS.
D. Architectural advantages and benefits
1. Fine-grained parallelism
The key to high performance in any filesystem is parallelism, and while SiliconFS can indeed be described as a parallel filesystem implementation there are striking differences when compared to other parallel filesystems on the market. The parallelism of SiliconFS is inherent in its design; it is much more than just a cluster of commodity hardware.
It is fine-grained parallelism which enables the extreme performance of SiliconFS. Historically, Multiple Instruction stream, Multiple Data stream (MIMD) architectures have been employed to attempt such parallelism.6 Traditional MIMD architectures, especially shared-memory implementations, require synchronization via a host operating system for memory coherence; this dependence often limits both overall performance and scalability. Modern MIMD architectures attempt parallelism by using a distributed memory design, using message passing or similar means to mediate synchronization issues. SiliconFS instead achieves fine-grained parallelism through its implementation in state machines, each of which control and enable specific functions. Two key features of this implementation which contribute greatly to the fine-grained parallelism are off-loading and pipelining.
Off-loading allows SiliconFS to independently process metadata and simultaneously move data to/from hosts and disks. Filesystem operations which do not require hardware acceleration through FPGAs are separated and sent to a metadata processor module, while operations in the data path are handled by a pipeline of FPGAs. Each filesystem path has dedicated memory, and in amounts specific to the operations required for that path. This off-loading is similar to traditional co-processor implementations (e.g., digital signal processors, systolic arrays, and certain graphics engines). Deciding exactly which operations are handled by which path is a crucial design characteristic for SiliconFS implementations.
In contrast, traditional shared memory architectures rely on CPUs for all filesystem operations and a single bus normally connects all CPUs to memory. In the SiliconFS design there is no single bus, and therefore no points of contention for memory access either. Even with distributed memory MIMD architectures still have bottlenecks (usually relating to message passing efficiencies at scale). The SiliconFS design avoids these issues as well: the filesystem paths are independent and do not require messages about data to be passed from one path to the other.
While the metadata processor module is dedicated to data management the FPGA pipelines can focus on the business of moving data as quickly as possible.
Pipelining is achieved when multiple filesystem operations are simultaneously overlapped in their execution sequence. For a NAS system pipelining means multiple data requests (usually from some number of independent hosts concurrently) overlapping in the execution pipeline.
SiliconFS achieves data pipelining by routing data operations to independent sets of FPGAs for accelerated processing. The operations are independent and have neither shared-memory nor message-passing dependencies.
6. MIMD architectures. http://en.wikipedia.org/wiki/MIMD
Manipulation of filesystem objects via the Object Store is central to SiliconFS’ design, thus the benefits of extreme performance and massive scalability owe their existence to the fine-grained parallelism inherent in the architecture. Host access to the filesystem, however, is a different story. The Object Store is largely hidden from hosts, behind storage virtualization layers designed to make life easier for the storage administrator. Host machines have no concept of objects or the Object Store, and accessing SiliconFS via standard NFS or CIFS protocols they expect to work with string names and file handles.
For those hosts that require or prefer block level access, BlueArc also supports the iSCSI protocol, which requires presentation of raw blocks of storage to the hosts over an Ethernet connection. The host formats, and lays down its own (host operating-system-specific) filesystem structure upon, the blocks of storage presented to it. To support this within the Object Store structure SiliconFS creates a single large object of up to 2 terabytes in size (this is the current iSCSI limit) within the Object Store, which is presented as a sequence of logical blocks to the host. Since the iSCSI volume is just another object to the Object Store, features like Snapshots or dynamic growth of the entire object are possible, offering additional benefits over traditional
iSCSI-based solutions.
BlueArc’s Open Storage philosophy means that SiliconFS handles all conversion of objects to agreed-upon conventions for host presentation – namely the standards of the NFS and CIFS network filesystem protocols. This conversion is done transparently to ensure perfect compatibility so that hosts see only the standards-based representation of files. The BlueArc platform is not so much a traditional NFS or CIFS server as it is a parallelized filesystem engine presenting files to NFS or CIFS hosts in the manner in which they expect to see those files. This is another reason why the limitations of traditional NFS or CIFS servers really do not apply to BlueArc’s architecture. Performance and scalability benefits are derived from the parallelism inherent in SiliconFS’ architectural design while the “view” of what the hosts expect is a function of SiliconFS’ rich virtualization layer.
2. Performance benefits
Beyond massive scalability in terms of overall data capacity, or in terms of billions of files, the SiliconFS design also enables two further performance benefits: high efficiency with a variety of I/O sizes and data access patterns, and near-linear scalability on I/O throughput with additional servers. Traditional filesystems are normally tuned for either small-block, random I/O workloads, or large-block, sequential workloads. Attempts to optimize filesystem performance for a wider range of application workloads normally involve the use of filesystem caches, read-ahead algorithms, adaptive schemes to avoid memory contention, etc. It is normally not possible to find a filesystem that works well for both small- and large-block data access pat-
terns, or certainly not well for both at the same time.
Consistency of filesystem performance regardless of block size is an important feature and benefit of SiliconFS. Together with BlueArc’s use of Intelligent Tiering for the creation and management of separate tiers of storage, storage administrators can design specific storage tiers for specific application workloads. Moreover, as the separate storage tiers can be united under a common namespace and could even be exported to hosts as a single mount point, SiliconFS can seamlessly merge both small- and large-block advantages into a single filesystem presentation to hosts. When contrasted with typical unoptimized filesystems, or even optimized filesystems with complicated caching and/or tuning workarounds, the advantages of SiliconFS’ simpler design become clear.
7. UFS and NFS Cookbook, http://nasconf.com/pres04/roch.pdf, provides a good overview, albeit somewhat dated, of file-system optimizations for Solaris. CITI Technical Report 06-04, http://www.citi.umich.edu/techreports/reports/citi-tr-06-4. pdf, describes typical filesystem inefficiencies of parallel filesystems in general along with a positioning of pNFS as a way to achieve optimization for both small- and large-block I/O patterns. Optimizing Input/Output Using Adaptive File System Policies by Madhyastha, et. al., http://users.soe.ucsc.edu/~tara/pubs/goddard.pdf, describes an adaptive process for optimizing filesystem performance based on continuous monitoring of application I/O patterns.
Because SiliconFS is not specifically designed for small-block, random workloads or large-block, sequential workloads, but happens to work well with either, the storage architect has more freedom in designing storage solutions. Confidence in SiliconFS to handle both small- and large-block workloads is backed by years of real-world data, and can be easily shown with straightforward tests, e.g., reports commissioned from The Tolly Group, a vendor-neutral benchmark validation organization. Using the well-known industry benchmark IOZone, The Tolly Group independently certified that SiliconFS delivers consistent results for both small and large block sizes. Copies of the report are freely available to anyone registering on The Tolly Group’s website. More information on the open-source IOZone benchmark is available at the IOZone website. Storage I/O bandwidth, however, is only one measure of a file server’s performance. Most storage architects concentrate on bandwidth as a measure of a system’s overall performance because
they assume the system will only deliver optimal bandwidth with one block size, or a limited range (usually very large block sizes only). Because SiliconFS can deliver excellent bandwidth with either small- or large-block workloads, and with just standard software clients (i.e., no proprietary parallel filesystem software), a better measure of system performance is I/O operations per second, or IOPS. Simplistically speaking, and absent other constraints, the delivered storage bandwidth is equal to IOPS multiplied by block size, so a truer measure of overall system performance is IOPS. Many storage vendors tend to shy away from pure IOPS benchmarks, preferring instead to state performance in terms of bandwidth and thus hide behind extremely large block sizes.
SPECsfs is the de facto vendor-neutral standard for all network file servers. Any storage vendor wishing to publically claim performance characterizations for their products must submit to the scrutiny of the SPECsfs benchmark; those vendors who do not submit data are quite often
the ones hiding behind large-block data sets as a way to disguise poor IOPS performance. The implication of not submitting a SPECsfs benchmark is that the product in question is designed for large-block use only, or is not a performance-oriented network storage product at all.
BlueArc first submitted SPECsfs benchmark data in September 2004, with the release of the first-generation Titan server product. Since that time BlueArc has consistently been the highest rated vendor on the SPECsfs benchmark, with the fastest rating for any 1- or 2-node server configuration by any company.10 BlueArc’s dominance of this benchmark has been used for many years to prove the superiority of SiliconFS’ performance compared to traditional file server solutions, and is part of the reason BlueArc is able to deliver lower costs for a given set of performance requirements: faster performance per server directly translates to fewer servers needed, which directly translates to fewer devices to deploy, manage, license, upgrade, etc. and thus lower overall costs.
When clustering together any number of file servers, a certain amount of performance is lost to inefficiencies of the clustering process (usually as a result of increased communications between the various servers in the cluster, particularly for metadata operations). These inefficiencies are known as “clustering overhead” and can be measured as a deviation in measured performance from what might otherwise be expected to be a linear multiple of the number of servers in the cluster. That is, for n servers one might expect n times the performance of single server (call that P). If the measured performance is lower than the product of n and P, that difference is the clustering overhead.
8. http://www.tolly.com/DocDetail.aspx?DocNumber=208351
9 http://www.iozone.org/
10. http://www.spec.org/sfs97r1/results/sfs97r1.html
The level of performance lost to clustering overhead for SiliconFS is less than 0.6%. This compares to much larger overhead losses from other vendors when going from 1- to 2-node clusters, typically 8-10% even with the addition of a large number of additional disk spindles.
The benefit of near-linear scalability is clear, and reinforces the benefits of filesystem performance predictability over a wide range of data access patterns. The storage architect can design specific storage tiers and is confident the design will scale consistently with the addition of further servers. For a filesystem designed to scale to petabytes and many billions of files, near-linear scalability and performance predictability are important characteristics.
More recently, increased focus has been placed on achieving high levels of performance with a relatively small number of spindles, which is another dimension to the performance story.
Future SPECsfs benchmarks will show SiliconFS can not only outperform competing filesystems, but can do so without requiring a large number of disk spindles.
3. Resiliency
Over the years, SiliconFS has added various filesystem mechanisms to tolerate different types of hardware storage failures and recover even in cases of catastrophic disk failure. One of the advantages of SiliconFS’ unique FPGA implementation is that data resiliency mechanisms that would have high performance impacts when implemented in software on traditional CPU-based solutions can be built into FPGA logic with negligible performance costs. Continuing engineering effort is being focused in this area so that higher levels of data protection prevent failures from occurring as much as possible, and recovery times are shortened in the event of failure in any case. Some of the data resiliency functionality currently provided by SiliconFS includes:
• Protection of critical metadata: SiliconFS protects all critical metadata via CRC checksums and end-to-end validators. Two copies of critical metadata objects are maintained, with each copy located on a different set of disks whenever possible. Failure recovery is implemented at a block level, making it possible to recover a failed filesystem even if both copies of the meta-data structure are impacted: as long as one good copy is available for each individual block of data, the complete metadata object may be reconstructed from the constituent pieces.
• Online consistency checking: SiliconFS provides various mechanisms that check data consistency as background processes. These mechanisms are designed to detect various forms of failures, including unreported errors occurring at a disk level, or errors occurring internally to hardware RAID controllers. Although most failures of this type are extremely rare, the detection mechanisms built into SiliconFS ensure that the filesystem can react quickly to unexpected problems and either avoid or mitigate filesystem failures.
• Versioning: SiliconFS has the ability to “roll back” to previous complete checkpoints, usually the most recently completed checkpoint. By maintaining multiple checkpoints SiliconFS can also roll back further than the most recent checkpoint if desired – this ability is dubbed N-way
Rollback. The effect is that the storage administrator can roll back entire filesystems to any arbitrary checkpoint that is complete, and very quickly too.
4. Dynamic storage expansion
An old storage aphorism says there are only two kinds of storage: new and full. Any filesystem must be able to deal with storage growth seamlessly or it will become very difficult to manage over time. The most difficult challenge when dealing with growing filesystems is maintaining consistent filesystem performance when new hardware is added (worst case) or increasing file-system performance with the addition of new hardware (best case). Older filesystems which do not allow for dynamic expansion with new hardware require the storage administrator to either create separate filesystems on old and new hardware or copy the data off of the old hardware and back on to the combined hardware. The former path preserved existing filesystem
11. A comparison of SPECsfs97R3.0 submissions from various vendors confirms typical clustering losses for typical CPU-based architectures.
performance but did nothing for increasing either performance or capacity of existing filesystems with the addition of new hardware. The latter could increase both performance and capacity of existing filesystems but only at the cost of a very laborious process involving large amounts of downtime. Dynamic expansion is the ability to automatically restripe data over both old and new hardware, without having to copy the data off and back on. The benefit of dynamic expansion to modern filesystems is to grow filesystems seamlessly while increasing both filesystem performance and total data capacity under management.
SiliconFS contains two separate but complementary features for dynamic storage expansion:
Dynamic Write Balancing (DWB) and Dynamic Read Balancing (DRB). DWB distributes writes intelligently across old and new storage together. As new storage is added the DWB algorithm distributes new data across both old and new storage whenever a write operation occurs, taking care to balance both performance and data capacity (that portion of total usable capacity that is used for data). As the blocks of data are distributed across more spindles, performance increases. SiliconFS takes advantage of new spindles immediately but in most cases best performance is achieved when existing data is restriped across all spindles. For this reason DWB is not the complete dynamic expansion story, for it operates only with new data on write operations.
For the complete story we need DRB as well.
A complementary feature to DWB, DRB utilizes DWB to complete SiliconFS’ dynamic expansion functionality. Whereas DWB can be thought of as an “always on” algorithm for write operations, DRB can be thought of as a “background process” which first reads and then rewrites data using DWB. When the DRB utility is started it begins rewriting files and stops once the data is balanced across all spindles. This process can take some time to complete if the amount of data to be restriped is considerable but eventually the DRB process will restripe and redistribute all data across all spindles in an automated fashion. Any hosts writing new data during the DRB process contribute to the balancing scheme.
A hidden benefit of dynamic storage expansion is SiliconFS’ ability to start small and deliver more and more performance as additional disk is added, and to do that cost-effectively. While the initial system performance may be short of the maximum possible, this ability to grow dynamically allows storage administrators to architect systems based on available budgets yet still “design in” total system performance as a function of predicted growth. This ability to scale granularly is hardly unique to SiliconFS, however delivering this benefit cost-effectively is another matter. The separation of function between the BlueArc server(s) running SiliconFS and the disk spindles underneath the filesystem is what enables this granular expansion capability on the most cost-effective basis possible. Contrast this benefit with what sounds like similar capabilities from other vendors and the difference is clear.
All clustered or parallel filesystem implementations contain the ability to scale performance granularly with additional disk hardware – this ability is one of the hallmarks of parallel filesystems in general. One may define “performance” simply or more narrowly, but most will agree that as parallel filesystems scale up performance increases. But as most filesystems do not divorce the servers from the disk attached to an individual server, total costs are greatly affected as the entire system scales up. With other filesystems the disk is tied to the server – it is not possible to add more disk without also adding more servers. Whereas SiliconFS gives storage administrators flexibility to scale performance (more servers) and data capacity (more disks) separately, other filesystems tie the two together and thereby increase total costs. Every additional server requires additional capital expense, additional licensing, additional support costs, additional rack space, increases power, cooling, and network port requirements, and generally adds to the complexity
of management. If the same level of performance can be achieved with only the addition of more disk spindles and the unused potential of the existing servers, why add all that extra hardware?
E. Beyond performance and scalability
At its essential core SiliconFS represents the intellectual property of the company, not the hardware platform or the back-end disk architectures, although all parts are needed to form a complete solution. While disk technology (and specifically SAN architectures, as opposed to JBOD storage) certainly is very important for the high performance, scalability, and even robustness of SiliconFS, it is the filesystem which enables the greatest utility for our customers. Enterprise storage is about enterprise data management features, not merely going fast or scaling large. With more than a decade of continuous development, SiliconFS provides a multi tude of features designed to make BlueArc storage solutions easier to deploy, easier to grow, more tolerant of failures, and far, far easier to manage for a variety of enterprise user environments. Many of these features can be broadly classified into Transparent Data Mobility, Data Protection, and Storage Virtualization sections. Other filesystems have some of these features. A few may have a feature or two that SiliconFS does not yet have. But only SiliconFS can draw so heavily on features from all three filesystem pillars and combine them with industry-leading scalability and performance and use only open, agreed-upon industry protocols without proprietary software.
There are a host of features any network-attached filesystem must have to be considered useful for most enterprise environments. While unnecessary for advanced functionality, enterprise customers have come to expect and rely upon these basic features as part of the definition of a network-attached filesystem. Features such as host-side network connection protocols (most often NFS and CIFS), SNMP support, anti-virus support, basic backup services, even Snapshots and replication are today considered to be part and parcel of network-attached storage solutions. Much of the engineering development in BlueArc’s early years centered on the development of these basic features, and today BlueArc offers the full suite of basic features as an integrated part of SiliconFS.
Advanced features, on the other hand, distinguish basic network-attached storage solutions from true enterprise-class filesystems. The ability to manage data on many storage tiers simultaneously, to migrate it among or between tiers, and even to third-party storage solutions, means that the storage architect can design those tiers for specific application or business requirements, and can tailor specific storage technologies for each stage of the data lifecycle, and still retain the flexibility to use a very wide range of storage technologies throughout.
The ability to centrally manage hundreds of disparate filesystems under a single, unified, global namespace, often with different (and concurrent) host connection protocols, means that the storage administrator can more easily manage a large heterogeneous user environment. The ability to automatically and seamlessly fail over server duties from one physical server to another means that the storage architect can avoid unplanned downtime, increase storage utilization across servers, and even load balance to maintain optimum performance. Such advanced storage features are what distinguish SiliconFS from the majority of its competitors. Advanced storage features are not easy to do well; that is the reason many freely available or less-developed filesystems do not have them, or cannot make them work well at scale.
F. Transparent Data Mobility
Transparent Data Mobility (TDM) is a powerful concept in data management. The term refers to the movement of data along various points in the data lifecycle. All data has a point of origin (an instrument for example), and data may need to be moved to where it is initially used (e.g., heavy computational processing), and moved again to where it may more properly be classified for later re-use (home directories are typical), and finally managed data is deposited elsewhere for long-term archival. Different types of data may have different lifecycles or intrinsic value.
It is entirely possible, even preferable, to design application-specific and/or user-specific tiers of storage for each stage of the data lifecycle. As multiple storage technologies are often the most appropriate match to each point in the lifecycle, the concept of tiered storage is central to an effective data management strategy. Certain tiers may be architected with difference performance characteristics in mind, or for better cost-effectiveness, or just so that the data they contain is bound to certain processes, applications, users, or groups. But simple storage tiering is not sufficient for an intelligent filesystem to deliver value to the storage administrator: for best value the ability to transparently move the data from tier to tier, keeping a single filesystem
presentation to the hosts, users, and applications, is far more effective. BlueArc’s term for data movement while maintaining a single filesystem view is called Transparent Data Mobility, and it has several components.
1. Intelligent Tiering
BlueArc has long championed the concept of tiered storage, and has for many years supported
the use of multiple storage technologies underneath SiliconFS. In the early years of disk stor-
age technology, the tiers were as simple as high-performance fibre-channel (FC) disks, and
slower but more cost-effective ATA disks, and only one choice of vendor for each technology.
Today those options have evolved into a number of choices of FC, SAS, SATA, SSD, and
hybrid SDRAM/SSD products from a number of technology vendors, all sold and supported
by BlueArc. While other storage vendors attempt to manipulate customers into just one or a
few disk technologies (usually supplied only by them), BlueArc offers our customers a way
to avoid vendor lock-in and expand their storage options. Today BlueArc sells and supports
many choices of storage technology, allowing our customers to design a very effective and
highly focused data management strategy using the most appropriate storage components

for every point in the data lifecycle. The BlueArc term for this concept is Intelligent Tiering.
BlueArc’s Intelligent Tiering allows customers to build scalable and flexible storage solutions
that offer the highest levels of performance and cost-effectiveness with simplified and consolidat-
ed storage management. Using the various tiers of storage available, customers can keep data on-
line longer without relying exclusively on tape technologies, minimizing the impact of backup,
replication, or disaster-recovery requirements as the strategy requires. Intelligent Tiering gives
data a longer disk lifecycle if desired, which can improve data access times for hosts and users.
2. Data Migrator
Merely offering choices of disk technology is not sufficient for an effective data strategy however.
Once the storage architect defines two or more storage tiers, movement of data between the tiers
becomes a critical design element. Specifically, policy-based movement of data between tiers is
what makes the TDM strategy really effective. BlueArc’s answer for this need is a product called
Data Migrator. The simple description is that Data Migrator is the policy-based engine which
allows storage administrators to implement their data movement policies. Data Migrator works
by allowing administrators to define policies, or even hierarchies of policies, which classify data
and move that data from tier to tier based on criteria defined. Metadata attributes such as file
type, file size, user or group ownership of file, last time of access, and dozens of other variables
can be used to craft extremely effective data movement policies. Data movement may also be
scheduled, running a policy check nightly, weekly, monthly, or whatever time period best suits
the strategy. Different policies may be defined based on available free space, thus allowing for
more aggressive migration policies when space is low. There is even a “what if” checkbox allow-
ing storage administrators to craft a policy and analyze its impact on the various storage tiers,
but without actually implementing the policy and initiating data movement.
Data Migrator solves one of the biggest challenges with out-of-band Information Lifecycle
Management (ILM) solutions, a common problem with products from other storage vendors.
When data is moved out-of-band, users must be notified of the new data location and applications have to be “reconnected” to the relocated files. Data Migrator is transparent to end-users
and applications and does not require external ILM or data management devices. Because
Data Migrator is an embedded feature of SiliconFS, all filesystem functions (e.g., Snapshots,
replication, quotas, etc.) work seamlessly as if the data were still on the original storage tier
and data integrity is maintained during the migration or recall. As far as end-users and appli-
cations are concerned the data has not moved at all. Users and applications see the data as if
it still existed in the original location, while SiliconFS keeps track of where the data actually
resides. For this reason BlueArc’s Data Migrator is often described by storage analysts as a
“transparent, policy-based data migration engine” for implementing ILM policies. But here
at BlueArc we know that Data Migrator is in fact the heart of the TDM concept.
3. Cross-Volume Links
Cross-Volume Links (CVL) and External Cross-Volume Links (XVL) are complementary tech-
nologies that extend the reach of Data Migrator. A cross-volume link is a zero-length file on
a source filesystem (the primary filesystem) which “points” at a corresponding file on a target
filesystem (the secondary filesystem). The pointer is stored in the Onode of the primary file.
A flag in the Onode is used to indicate it is a cross-volume link rather than a regular file, and
an extended Onode contains the information required to access the migrated file. All of the
metadata required for directory level operations (including owner, access mode and ACLs) are
maintained on the primary filesystem, so operations such as “ls –l” or “chmod” do not require
access to the secondary filesystem. Similarly, the information needed for quota tracking is main-
tained on the primary filesystem, so quotas reported will include migrated files on the secondary
filesystem as well.
The utility of the Cross-Volume Links to the TDM strategy becomes obvious once the stor-
age architect migrates data to external storage devices. Cross-Volume Links are designed to
operate either with internal BlueArc storage tiers or external, 3rd-party storage devices. It is the
incorporation of external storage devices which greatly extends the reach of Data Migrator, and
thus the entire BlueArc TDM strategy. Data can be migrated from tier to tier to tier, even to
external tiers, and still be managed and presented to hosts and applications as a single cohesive
whole. This is transparent, end-to-end data migration, a very powerful example of Transparent
Data Mobility. While the use of external devices as remote target filesystems is currently limited
to those devices which can be accessed via NFS or HTTP protocols, in theory future versions

of the XVL technology could make use of additional protocols, greatly expanding the list of 3rd-
party devices which could be incorporated into the BlueArc TDM strategy.
Repurposing of existing storage investments is another obvious benefit of TDM in general
and XVL in particular. Every customer has some storage platform in use before they learn of
BlueArc. Instead of throwing away that investment some customers may choose to take advan-
tage of TDM features and repurpose that 3rd-party storage within the BlueArc namespace,
perhaps as an archival tier or even a crude replication target. While other vendors attempt to
sweep the datacenter floor and encourage vendor lock-in, BlueArc would rather offer choices
and ease platform transitions for customers.
Expansion of the BlueArc ecosystem of data management partners is another benefit of XVL.
Current external XVL targets could be devices such as de-duplication and data-compression tiers
(BlueArc partners with Ocarina to provide this capability, for example), encrypted archive tiers
(Vormetric), content-archive storage tiers (Hitachi’s HCAP), or just about any 3rd-party device
accessible from Titan through NFSv3. Data migration can also be controlled via an API that has
been made available to selected BlueArc partners. The Hitachi Data Discovery Suite (HDDS)
product, for example, uses this API for optimized search and indexing as well as to control data
migration from SiliconFS to HCAP. As solution partners discover how to work with BlueArc’s
SiliconFS, more 3rd-party solutions will be incorporated into the data management framework,
giving customers the capability of using both BlueArc and 3rd-party storage devices within a
single, powerful, and transparent data migration strategy.
4. Dynamic Caching
Data Migrator may be the heart of TDM, but it is but one of several important features. It is
the combination of such features that make the BlueArc’s TDM design extremely robust and
unmatched by any other ILM solution in the storage industry. Complementing Data Migrator
are other features called Dynamic Caching and Data Relocation.
Dynamic Caching is a feature which reserves space on a storage tier for caching of “hot” files.
The space reserved is actually an entire filesystem unto itself, and as such can be as large as any
other filesystem in the BlueArc namespace. By definition, any file which is recently accessed may
have a copy also located in the Dynamic Cache. If the cache is created in a high-performance tier
of storage, this copy guarantees that any hot files are automatically on the highest performance
disk tier (which may actually be an SSD or a hybrid SDRAM/SSD tier). Having the cache
obviates the need for reverse data migration – why move the data back to the originating tier if
a copy of it already exists on the highest performance tier?
Cluster Read Caching is Dynamic Caching applied to a cluster of BlueArc servers (i.e., many
servers under a single namespace) or it may be applied to single BlueArc server. In the latter
case the feature is called Local Read Caching. When used with a cluster of BlueArc servers, each
server maintains its own Dynamic Cache, but is aware of the files accessed by all the other serv-
ers in the cluster. Copies of hot files from anywhere in the cluster therefore make their way to
every cache on every BlueArc server, which can result in dramatic aggregate read performance
improvements since every server can respond to any read request for a given set of hot files. In
this way Dynamic Caching works with Data Migrator to provide policy-based data movement
in both the forward and reverse senses simultaneously.
The read caching approach dynamically and transparently distributes and caches data to one
or more designated data sets across individual BlueArc servers within a cluster. Policy-driven
and fully automated, the Dynamic Caching transparently monitors file access patterns and
caches only those files necessary to satisfy individual host and application requests received by
SiliconFS. Customers with read-intensive workload profiles and a need to stage data in an opti-
mized workflow process can leverage read caching as a way to scale performance when and how
they need it. For many industries this capability translates to a common library of files, centrally
accessed, which increases performance on-demand as additional hosts are added for applications
which need to make use of the files in the library. Wherever storage systems are hitting hard
limitations with performance or scalable and sustainable client/server access, dynamic read cach-
ing can help to achieve new levels of optimization and speed time to results.
5. Data relocation
Data relocation is the final feature of the BlueArc TDM design. Customers may need to relocate
data for various reasons, e.g., optimizing workflow by moving certain data sets to a faster server
or load balancing data across a number of servers. Three different data relocation mechanisms
are provided:
• EVS Migration: (See next section of an explanation of the EVS feature.) EVS Migration
makes it possible to relocate a virtual server within a cluster or to a server outside of the
cluster that shares access to the same storage devices. EVS migration has minimal impact
on network hosts and once it has completed those hosts may access the data using the same
pathnames that were in use prior to the relocation. EVS migration is typically used for
adjusting workflows or vacating a server for scheduled maintenance.
• Filesystem relocation: any filesystem accessed via Cluster Namespace can be relocated to
another server within the cluster. Filesystem relocation has minimal impact on network hosts;
once completed the data is accessed using the same pathnames that were in use prior to the
relocation. Filesystem relocation is typically used to load balance within the unified namespace.
• Data relocation: data may be relocated from any given filesystem to another using a mecha-
nism referred to as Transfer of Primary Access (TPA). TPA makes it possible to relocate
individual directories as well as entire filesystems. TPA does however involve a small amount
of downtime, and data is no longer accessible using the same pathnames that were in use prior
to the relocation. TPA is generally used to better organize filesystems and/or directories within
them.
G. Data Protection
Organizations with business continuity planning needs will recognize the importance of data
protection features in their chosen storage platform. Much of the difference between enterprise
storage platforms and solutions designed for the desktop, home, or small-business lies in these
data protection features. All enterprise storage platforms have some measure of data protection
beyond basic schemes like the use of RAID. Most enterprise systems are designed to continue
operating even with major hardware failures; the system components are specifically designed
with a high degree of fault-tolerance, and to contain zero single-points-of-failure, ensuring
hardware redundancy at all levels. SiliconFS goes beyond other enterprise storage platforms and
also contains features designed to maximize system uptime, balance system load in real time,
and even allow for maintenance windows without the need to take the system off-line. Beyond
system robustness, SiliconFS also offer features for on-line data recovery, data replication, mir-
roring, backup, disaster recovery, and complete system monitoring capabilities.
BlueArc supports High Availability (HA) clustering of servers in a two-node Active/Active
configuration or an N-way (more than two nodes) clustered configuration. Clustered servers
provide NVRAM mirroring for enhanced data protection, automated filesystem failover, and
higher levels of performance as additional servers are added to the BlueArc cluster.
SiliconFS provides additional mechanisms for data protection. Three of the more important
mechanisms are snapshots, data replication, and data backup. Snapshots are generally described
as point-in-time copies of the filesystem, and are a very convenient way to give end-users a

way to “rollback” to a previous point in time to recover their own data. There are several data
replication options within SiliconFS; these may be described as either file- or block-based,
and synchronous or asynchronous. BlueArc provides a robust, flexible architecture for backup
options as well. Snapshots, data replication, and data backup together provide the storage
administrator with a range of choices for data protection. As with the choice of disk tiers,
SiliconFS provides the customer with an architecture capable of selecting data protection
options which are most suitable to the environment at hand.
1. High-Availability design
The foundation for BlueArc’s HA design is the concept of Virtual Servers (EVS, for Enterprise
Virtual Server). Virtual Servers are logical entities that reside on a physical server, in a manner
analogous to operating system virtualization techniques such as VMWare, Microsoft’s Hyper-V,
or Citrix’s XenServer. An EVS does not have physical interfaces per se, but instead has virtual
interfaces that map to the physical interface(s) of the server. As a result of the separation between
the physical and logical interfaces of the EVS and the actual server, an EVS can be migrated
from one physical server to another transparently. In a failover situation for an HA cluster, this
EVS migration is an automated process that can take place without system shutdown, and in
most cases can occur quickly enough that hosts using stateless protocols (e.g., NFSv3) will not
require unmounting and remounting of NFS exports.
Virtual Servers allow for centralized administration of the physical server and the various param-
eters which govern its operation, but give the storage administrator more flexibility to tailor to
various applications, e.g., home directories, databases, backup duties, data migration, etc. Each
EVS has storage dedicated to it, and access to the data is controlled by the Virtual Server Clients
map and not by the physical server itself. Each EVS may have its own IP address, its own data
management policies, and its own data exports/shares, and the assignment of these properties
follow the EVS as it is migrated between physical servers.
Virtual Servers and EVS migration is central to the HA design of SiliconFS, but migration may
also be useful in other scenarios. EVS migration may be used to load-balance operations between
multiple physical servers in a BlueArc cluster, or Virtual Servers may be purposely migrated (i.e.,
manual failover) in order to clear a physical server of traffic prior to a maintenance window.

Figure 1: Virtual Servers configured in a BlueArc Cluster
If the physical interfaces of the server are trunked (e.g., multiple gigabit Ethernet interfaces
together) all the defined Virtual Servers on the physical server use all of the trunked interfaces
in a failover configuration. Individual physical interfaces may also be assigned to specific Virtual
Servers, allowing a more granular level of throughput control for each EVS.
BlueArc servers communicate with each other over a dedicated, out-of-band, high-speed serial
interface (HSSI). The HSSI is used by the servers to propagate the server’s configuration (and
EVS information) to each other, as well as for mirroring NVRAM data between the servers. Use
of the HSSI link ensures that:
• The server configurations are always synchronized
• The surviving servers in an HA cluster can complete any outstanding data operation requests
active in NVRAM. In N-way configurations, BlueArc servers mirror NVRAM to neighboring
servers in a round-robin fashion. In the event of failure, the mirroring is re-established between
the remaining servers.
Host writes are acknowledged only when the write has been committed to the server’s battery-
backed NVRAM. The servers also exchange a cluster heartbeat across the HSI, and a secondary
heartbeat is also maintained via the sideband management network. This ensures that one server
will not prematurely take over the functions of a failed server in the cluster. An independent
management device, the Systems Management Unit (SMU), acts as a quorum device for even-
numbered HA cluster configurations. (Odd-numbered HA clusters do not necessarily require a
quorum device to ensure the cluster remains intact.) Use of quorum devices prevents split-brain
conditions; fencing conditions are communicated cluster-wide through redundant data paths to
ensure resource control and provide data integrity.
12. See http://techthoughts.typepad.com/managing_computers/2007/10/split-brain-quo.html for a description of split-brain
conditions in HA clusters and the use of fencing to ensure resource control.
SiliconFS buffers data in NVRAM until it is written to disk to protect it from failures as
well as power loss. When servers are configured in a cluster, the servers mirror their NVRAM
contents to other servers, thus ensuring data consistency in the event of failure of one of the
servers. When the surviving server(s) in the cluster assume the functions of the failed server, they
use the contents of the NVRAM mirror to complete any and all outstanding data transactions
that were not yet committed to disk, providing seamless failover or service migration from one
physical server to another.
2. Snapshots
Snapshots allow the storage administrator to capture a point-in-time of the filesystem – the
point-in-time image is a read-only view of the filesystem. Using point-in-time images (snapshots),
the storage administrator can:
• Allow end-users to retrieve files that have been deleted without administrator intervention
• Perform backups of the filesystem from a snapshot instead of using the live filesystem
Snapshots are rule-based, giving the flexibility to define them based on business policies. Rules-
based snapshots provide entity management, a more useful configuration compared to simpler
volume-based snapshot management. For example, hourly snapshot rules are managed as one
entity, daily/weekly rules are managed as a separate entity, and monthly rules are managed as a
third separate entity. There is also an implied hierarchy to snapshot rules: an hourly snapshot
will not overwrite a daily/weekly snapshot, etc. An hourly snapshot will overwrite only another
hourly rule, a daily/weekly snapshot will only overwrite a daily/weekly rule, etc.
The BlueArc snapshot mechanism works at the filesystem level using a pointer-based Onode
approach that writes new data on new blocks in the filesystem, preserving the original data
blocks. This method ensures no double-write penalty as seen with alternative copy-on-write
snapshot methods. As with other snapshot implementations, BlueArc snapshots are block-based,
meaning that only changed blocks of data are written to new locations on disk – the unchanged
data is not moved and both the snapshot and the live filesystem can access the same unchanged
data objects. This method vastly increases storage efficiency over file-based snapshot methods,
which must copy the entire file to the snapshot location if any part of that file changes.
Different snapshot policies may be defined by the storage administrator for each filesystem.
Should some data need snapshot protection and other data not, or if there is a large amount of
data churn on a particular filesystem, snapshots can be turned on or off on a per-filesystem basis
to manage the total disk capacity used by the snapshot feature.
Some key characteristics of SiliconFS snapshots are:
• Not all blocks are freed up upon snapshot deletion. Only those blocks which are exclusively
linked to the deleted snapshot are deleted. Other blocks which may be linked to other snap-
shots are not deleted until all of the linked snapshots are deleted.
• With other storage solutions, snapshots can impose a significant system overhead, particularly
when many filesystems are involved, or when a high degree of data churn is present. With
SiliconFS snapshots are created and manipulated in hardware; there is no performance loss
or additional system overhead on reads or writes.
• SiliconFS has aggressive object read-aheads to ensure high-performance read operations for all
snapshot activities.
• Open files are snapped as point-in-time, i.e., last saved or last changed blocks, and may need
to be coordinated with applications to ensure consistency.
The storage administrator may also configure the system so that snapshots are visible to end-
users, or not visible, on a per-export basis. File and directory permissions associated with the
filesystem are also preserved with the snapshot. This preservation allows end-users to access files
and directories from snapshots and maintains the security associated with access rights to the
volume. Because of this preservation the user cannot have permissions for the snapshot directory
above what they have for the live filesystem – files or directories that the user cannot access to
on the live file system are also blocked in the snapshot directory.
A snapshot of the volume may be taken in a number of ways:
• Automatically: via a prescribed rule
• Manually: using the servers’ GUI or CLI management interfaces
• Scripted: using a script; the storage administrator may automate the snapshot process, in a
manner similar to the rules-based method above
• Event based: using scripts and the BlueArc remote scripting tool, the storage administrator
may automate a snapshot based on trigger events generated by the server.
The storage administrator is able to view what percentage of the volume is consumed by the
live file system, and what percentage is used by snapshots. In order to ensure that there is always
available space for snapshots on the volume, the storage administrator may set aside (reserve)
disk space for snapshots, although such reservations are not required. The reserved space is
dedicated for snapshots and the storage administrator can define the level as a hard or soft limit.
Snapshots are stored within the related file system, so no space reservation is required unless the
administrator wants to guarantee a proscribed amount of disk capacity.
3. Replication features
Replication is the process of sharing data between redundant sources, as a method to ensure data
consistency, and to improve reliability and accessibility of the entire system. Replication differs
from backup in that replication aims to have the data in two or more places at once (theoreti-
cally, identical copies of the data in all locations at the same time), while backup aims to have
two or more copies of the data at different points in time. Despite these differences there are
many common design components in replication and backup, as both are designed with data
movement in mind. SiliconFS provides several robust mechanisms for data movement, many of
which are useful for replication scenarios.
Accelerated Data Copy (ADC) is a file-based, asynchronous method of data replication. ADC
allows the storage administrator to define a policy-based data migration, or a mass data migra-
tion to occur either among or between BlueArc servers. Using Intelligent Tiering, ADC can
move data among the various tiers of storage behind a single server in a non-intrusive fashion to
the hosts connected to that server, thereby providing an easy, automated method for data migra-
tion between storage tiers behind a single server. ADC uses the Network Data Management
Protocol (NDMP) to move data between servers. NDMP is an open protocol standard for
enterprise-wide backup of heterogeneous network-attached storage. The Data Migrator
feature of SiliconFS leverages ADC to manipulate data either intra- or inter-server as well.
Incremental Data Replication (IDR) is an optional replication feature of SiliconFS. Replication
occurs at the file level, and only files that have changed since the last scheduled replication are
replicated. Multiple schedules may be defined on a per-EVS basis with support for pre- and
post- scripting, enabling automated functions to occur prior to and immediately after the IDR
schedule. A useful example is the automatic quiescence of a database prior to the IDR and then
return of the database to an on-line state after the IDR process.
IDR also uses BlueArc’s ADC (NDMP basis) for data movement. IDR uses snapshots as a basis
for replication, maintaining the last snapshot as the reference point for the next repli cation to
occur to detect and track changes to files. Any delete, move and/or link operations occurring
between the two snapshot references is replicated on the destination volume. Using snapshots
as a means for replication also allows files that would normally be skipped because they are in
use to be replicated, ensuring that all files in the volume/directory are replicated and protected.
This offers a benefit over traditional replication schemes which may skip open files during the
replication cycle.
Incremental Block Replication (IBR) allows storage administrators to set up a scheduled,
incremental backup of volumes. Replication occurs at the block level, and only data blocks
that have changed since the last scheduled replication are replicated. Block-level replication
is extremely efficient, particularly if a volume has relatively small changes for large files, and
may be more bandwidth-friendly compared to file-based replication schemes. IBR may be either
synchronous or asynchronous depending on the configuration. As with IDR, multiple schedules
may be defined on a per-server basis with support for pre- and post- scripting, enabling auto-
mated functions to occur before and after the IBR schedule.
Depending on the disk technology being used, BlueArc also supports specific block-based
replication options available at the RAID controller level. Certain RAID controllers support
controller-to-controller block mirroring (synchronous or asynchronous). Normally dark fiber
connections between controller pairs are required for controller mirroring; depending on class
of hardware and features selected, replication distances can range from 500m to 10km. Through
the use of advanced fiber-optic wavelength-division multiplexing technologies, this range can be
considerably extended to 100km or more.
As opposed to replication, backup services seek to achieve a copy of the data at a specific point
in time, usually for off-line preservation. As with replication, SiliconFS leverages snapshots to
allow the storage administrator to perform backups while continuing to serve data to hosts with
the live filesystem.
SiliconFS supports NDMP versions 2, 3, and 4 for backup, and has an NDMP client built
into the server as well. SiliconFS supports LAN-free backup of data using either FC or Ethernet
networks and dedicated connections with dedicated bandwidth. This separation allows the
live filesystem to continue serving data to hosts unimpeded by the backup operations. NDMP-
compliant FC-attached tape libraries are recommended for most efficient use of available
connections and bandwidth when designing backup solutions. As the host Ethernet network is
not used in FC-attached backup scenarios, the storage administrator may decide to run backup
operations at any time instead of waiting for periods of low bandwidth utilization on the
Ethernet network (usually at night when users are less active), thereby greatly increasing the
available backup window.
Beyond high-availability, snapshots, replication, and backup, BlueArc’s call-home monitoring
service provides a further level of data protection capability in the form of predictive failure
analysis, as well as useful statistics for historical system usage and troubleshooting. The call-
home service complements business continuity features such as high-availability, snapshots,
replication, mirroring, and backup. The call-home service collects data from each BlueArc
customer which allows such collection, and relies upon the Systems Management Unit (SMU)
at each installation.
The SMU is a dedicated and integrated device which helps manage, configure, and monitor
BlueArc systems. In its present form the SMU is a 1U, rack-mountable device integrated into
the BlueArc solution; future versions will offer the ability to be integrated directly into the
BlueArc servers or the SMU may be virtualized via VMWare or similar virtualization software
and hosted on any customer server.
The SMU operates on the sideband management network of the BlueArc servers and does not
impede the transfer of any data between the hosts and BlueArc servers. Rather, the SMU only
deals with management functions for the system. Conceptually the SMU can be thought of as
an integrated syslog server, collecting management data from all devices (BlueArc servers, FC
fabric, and storage controllers) and presenting the storage administrator with a single plane of
glass for unified systems management.
H. Storage Virtualization
1. Cluster Namespace and Mixed-Mode host support
An individual or clustered set of BlueArc servers can present a single, unified or “global”
namespace to hosts. This unified namespace allows any host accessing any BlueArc clustered
server to see the same directory structure. The BlueArc term for this capability is called Cluster
Namespace (CNS). When implementing CNS each BlueArc server still owns its own filesys-
tems. When data is requested from a host to a given BlueArc server in the cluster that does not
own the filesystem in question, that BlueArc server transfers the request to the appropriate server
in the cluster.
Beyond host redirection, the principal advantage of using CNS is simplified storage manage-
ment at scale. CNS reduces the number of mount points and presents an abstraction layer
of the individual filesystems to the end-user or application, thereby giving the storage admini-
strator the freedom to leverage any filesystem within the BlueArc cluster for presentation
as a single directory with a hierarchical tree structure. This hierarchy allows for any number
of storage tiers to appear as a single filesystem to hosts, with a single mount point if desired. The
storage administrator achieves enormous flexibility with CNS; end-users or applications do not
necessarily know the type of physical storage on which their data resides. This abstraction allows
administrators to best match the type of storage to the classification of data on each tier, without
requiring users or applications to know the physical location of the different filesystems. All tiers
appear as a single large filesystem, incorporation whichever storage technologies are best suited,
scaling to petabytes. CNS also allows administrators the freedom to expand or change the underly-
ing storage architecture without having to notify users or re-write applications.
In enterprise consolidation scenarios, CNS provides a centralized file server structure. Windows,
UNIX, Linux, or other software-based file servers have much lower limits on the size of indi-
vidual filesystems (which is the predominant reason why enterprises have experienced such
a proliferation of such systems and now need to consolidate), and have lower performance
characteristics as well. With CNS storage administrators can scale to massive capacities and
provide a virtual CNS tree for all end-users and applications. As the number of users, projects,
and overall data capacity increases, more pseudo-directories can be added within the existing
CNS structure and immediately be accessed by the appropriate users. Increases in capacity,
number of filesystems, and/or aggregate performance requirements are all easily accommo-
dated with SiliconFS without changes to the way users mount the data exports. In conjunc-
tion with BlueArc’s virtual storage pool capabilities, CNS enables administrators to configure
filesystems to automatically grow as needed within pre-defined rules, eliminating downtime
associated with storage provisioning. The end-user view of the file structure never changes as it
dynamically scales to meet demands. The ability to access any data, anywhere within a BlueArc
CNS structure reduces end-user confusion, eliminates the need to rewrite applications, low-
ers administrator overhead, and provides a flexible, robust design to better accommodate the

storage needs of both today and tomorrow.
2. EVS and Secure EVS support
As discussed previously, Enterprise Virtual Servers and Secure EVS partitioning are two impor-
tant storage virtualization features. The EVS concept is central to SiliconFS’s high-availability
design, and provides the storage administrator with a powerful and convenient method to
perform maintenance, load balancing, and data migration functions without compromising
system uptime.
Virtual Servers may reside within one security domain (i.e., one user authentication framework)
or across multiple security domains allowing for separated, partitioned, secure operation. The
storage administrator does not have to sacrifice centralized storage and centralized management,
but can still make the BlueArc cluster look like many independent servers, each possibly with its
own security model. Such a configuration is very useful in academic research environ ments for
example, where it is common to have multiple departments sharing centralized resources, yet
each wanting to maintain their own independent security domains. Multi-tenant environments
for hosted data services (as typically offered by datacenter co-location facilities) also have the same
requirements and would enjoy the same benefits.
When configuring Virtual Servers, there are a few common resources which are defined
at the physical server level that each EVS inherits; e.g., physical server DNS and Domain
Name entries. Aside from these shared parameters, each EVS can be individually tailored
with additional parameters including CIFS shares and/or NFS exports, IP addresses, and
an independent host name.
3. BlueArc Virtualization framework
BlueArc provides a sophisticated storage virtualization framework, intended to free the storage
administrator from mundane storage architecture duties, simply management of SiliconFS,
and enable dynamic expansion of the total data capacity under management. The underlying
concept of the virtualization framework is the concept of Storage Pools, an abstraction object to
which storage administrators assign various properties, like maximum size for example.
Storage Pools are created from one or more System Drives (SDs), a BlueArc term for what
most readers will identify as Logical Unit Numbers, or LUNs, used in SCSI terminology to
identify the physical target for storage operations. Preferably, Storage Pools contain a large
number of SDs, spread out over a large number of physical RAID controllers. The greater
the number and spread of SDs the better the performance of the entire Storage Pool, and
the greater the initial data capacity may be.
Storage Pools provide a layer of virtualization over the SD itself; the most immediate and obvi-
ous advantage of this arrangement is that the Storage Pool can be sized independently of any
single SD. This virtualization frees the storage administrator from having to plan LUN sizing
ahead of time, or from having to copy data off particular LUNs in order to resize or otherwise
redefine an existing RAID array consisting of some number of LUNs. Such limitations plagued
early SAN implementations, as storage administrators could not know ahead of time how large
a particular RAID set might need to grow, nor how many or what type LUNs might be needed
(to say nothing of migrating data form older disk hardware to new as the infrastructure aged….)
Filesystems are defined within Storage Pools in SiliconFS, and a given Storage Pool may have
a large number of filesystems. Storage administrators create filesystems based on individual
application, user, group, or other business needs, but the raw filesystem is not itself exposed
to the end-user or application. Filesystems are individually bound to Enterprise Virtual Servers
– it is at the EVS layer that network exports are created (with dependencies on IP addresses,
for example). The host computer sees the BlueArc “server” as that EVS (with the specified IP
address) with whatever network exports are defined. In this way a single EVS may contain a
number of network exports, each of which specifies a filesystem underneath. When using the
BlueArc Cluster Namespace feature, a number of filesystems may be organized into a single tree
structure, with the root of the tree exported to hosts, so that the entire collection looks like a
single filesystem.
There are a number of advantages to filesystem and Storage Pool virtualization. Rebinding of
a single filesystem to a different EVS to achieve fine-grained control over load-balancing without needing to reconfigure the underlying storage, for example. Taking a snapshot of a single
filesystem, or rolling back an entire filesystem to any given consistent checkpoint is another.
Flexibility in sizing filesystems is a very useful advantage. Perhaps only a small amount of data
needs to be stored, but paying the performance penalty of using a small number of disks is

not a smart design, so why not spread the small amount of data in one small filesystem over
as large a Storage Pool as possible?
Figure 2: Schematic representation of SiliconFS with Storage Pools
Virtual Volumes (ViVols) are another component of the BlueArc Virtualization Framework, but
exist apart from Storage Pools. The concept and function of a Virtual Volume is the creation of
a logical container that allows for quotas to be applied to whatever filesystems reside within that
container. Because the ViVol is a logical object, as long as there is space available in the under-
lying storage the ViVol can be expanded (or contracted) automatically. Likewise, if a ViVol is
contracted or deleted the disk space consumed by it is returned to the underlying storage, freeing
up space which other ViVols may use.
Because ViVols have properties similar to that of any physical volume, the storage administrator
may easily manage and monitor various ViVols as separate entities. Yet ViVols are an adminis-
trative tool; end-users and applications have no concept or knowledge that they exist. ViVols can
be created anywhere on the underlying storage and may be dynamically expanded or contracted
at any time by adjusting the quota associated with them. SiliconFS supports quota definitions
on a per-user, -group, and/or -ViVol basis.
Quotas allow the administrator to define the amount of space that a volume, Virtual Volume,
group or user may consume. If a disk-based quota is set, the system reports the available space
available to the user or group, based on the defined quota limit. Disk-based quotas may
be defined in bytes, kilobytes, megabytes or gigabytes.13 All quota settings (Volume, Virtual
Volume, user and group quotas) have the same quota properties, namely:
• Quota limit
• Hard or soft limit
• Warning threshold
• Critical threshold
• e-mail recipient alert event list
Quota event generation is designed to provide the storage administrator with prompt notifica-
tion without an inundation of event messages. The warning message is sent when the quota
threshold is first exceeded.
13. This is the current configuration, future versions may allow quotas to be defined in larger increments.
resolved and a subsequent quota violation occurs. This setup avoid the hysteresis effect when
volumes fluctuate rapidly just below or just above a given quota threshold.
I. BlueArc Open Storage philosophy
A large amount of the hardware platform used by SiliconFS is proprietary, and must be almost by definition. Even if SiliconFS were open-source, how could others apply it to extend or change the implementation? SiliconFS must be executed largely in FPGAs and cannot be placed “on top of” normal host operating systems, unlike many other open-source filesystem offerings. A deep knowledge of VLSI programming is necessary for any software architect to understand how to implement even basic functions of SiliconFS.
For these reasons BlueArc is actively developing the BlueArc Data Management Framework, a set of open software API’s for which others may more readily program additional features. An example of this extensibility is the use of the Data Migrator feature for third-party enterprise storage products, like Hitachi’s Data Discovery Suite, or the integration of specific management software suites (e.g., Hitachi’s HiCommand) for specific RAID controller manufacturers.
Beyond the BlueArc Data Management Framework, BlueArc maintains an aggressively open storage and platform philosophy. Unlike other filesystem vendors BlueArc is not in the business of developing or selling proprietary software for host-side connectivity. Vendors of proprietary software protocols assert that anything open must be slow, clumsy, or somehow ill-suited to the tasks at hand (NFS is a typical example), but their view overlooks the benefits of ubiquity and standardization in the community, not to mention the avoidance of vendor lock-in. BlueArc instead prefers to rely on well-established, agreed-upon industry standards (preferably standards which do not change often). For this reason BlueArc helps develop and fully implements and supports protocols such as NFS, iSCSI, NDMP, etc. BlueArc would rather redesign the file server itself (for higher performance, larger scalability, advanced features, etc.) and keep the open protocols in place, rather than scrap the protocols and go the proprietary route.
The BlueArc Open Storage philosophy extends to other areas outside the filesystem as well, e.g., authentication frameworks and choice of back-end RAID controller manufacturers. Better to use open standards such as LDAP, Active Directory, or NIS/NIS+ than create another vendor-specific, closed solution. Better to offer the customer a range of well-supported RAID controller technologies to better architect specific storage tiers for specific business needs, rather than force fit a limited range of rebranded, OEM’ed, or otherwise proprietary technologies. The storage industry is already plagued by a number of closed storage manufacturers, each seeking to lock the customer in to their specific technology solutions. This direction is not in agreement with the BlueArc Open Storage philosophy. There have been many attempts to market what “open” storage really means. For BlueArc being “open” means working with as many protocols, frameworks, and technology manufacturers as feasible, a strict adherence to industry standards, and having the company’s future product direction driven by the voice of the customer. BlueArc cannot support every protocol, framework, or manufacturer out there (even the industry standard ones) but we can support the greatest number of them that satisfy the largest amount of customer requirements.
J. Future-proofing

The filesystem universe is littered with implementations which address narrow segments of the market. Distributed network filesystems are ubiquitous, providing persistent storage duties for simultaneous host access, and useful enterprise data management features as well, but are usually characterized by poor performance and limited scale. Shared-SAN filesystems are designed to open SAN architectures to network hosts, but suffer from lack of metadata scalability and are expensive to implement across the enterprise. Parallel filesystems offer tremendous storage bandwidth, very high scalability, and have built-in availability features, but are largely proprietary and extremely complex to implement and continuously tune for optimal performance. Achieving a balance between high performance, high scalability, enterprise data management features, and use of ubiquitous network filesystem protocols requires a “round” filesystem platform.
SiliconFS is round in the sense that it is a filesystem which provides high scalability, fast metadata processing performance, and excellent data movement speed under a wide range of host loads, usage patterns, and data types. Other filesystems are more like “point” solutions, optimized either for performance as defined by narrow criteria, or for specific features or applications. As such, these filesystems were designed to perform well only under certain loads, access patterns, and data types. Much of the marketing behind these filesystems is geared to defining the limited range where those filesystems perform well, and hiding from the broader, non-optimal situations. The BlueArc philosophy for SiliconFS is different in that it seeks to be useful across the widest range of loads, access patterns, and data types. Not specifically designed for any corner cases, SiliconFS offers the storage administrator filesystem flexibility and a degree of future-proofing: even unforeseen requirements can be handled with minimal effort or re-architecting of existing solutions.
K. Conclusions
BlueArc is proud to build robust, flexible, advanced storage solutions. SiliconFS is the foundation for one of the most open, most adaptable, most future-proof network storage solutions available – BlueArc is confident this family of products will meet many current and future storage needs for our customers in the most cost-effective manner possible.
Implementing BlueArc storage products has many key benefits for our enterprise customers:
• simplified, extremely easy-to-use data management
• a high degree of data protection for business continuity
• transparent data mobility for any number of different storage tiers
• industry-leading scalability
• exceptional performance
• low total cost-of-ownership
These attributes contribute to BlueArc’s strong presence in many industry segments. Worldwide customers on the forefront of their respective industries rely on BlueArc solutions for their critical application needs. These customers have reaped the benefits of SiliconFS to better serve the needs of their users, now and into the future.
With the SiliconFS architecture BlueArc has embarked on a journey which leverages its numerous successful deployments with customers in many markets and its leading performance and scalability to enable:
• Best metadata scalability for high-performance computing (HPC) environments
• Best-in-class single-server IOPS performance
• Industry-leading storage throughput using open, standardized network filesystem protocols
• A platform for unified storage management: scalable and predictable performance, unmatched scalability, and sophisticated data management functions in a single platform
While there may be other storage solutions for specific market segments, SiliconFS delivers the most functionality across a broad range of customer requirements and thus has the largest applicability in the world of filesystems available today. SiliconFS delivers the performance and scalability benefits of a SAN with the ease of management and client neutrality of NAS, all in one unified solution. Although our hardware architecture is unique, SiliconFS uses standard disk architectures, standard network protocols, standard management protocols, and standard backup protocols. BlueArc strongly believes in client neutrality, not in tying our customers to any vendor-specific solution. Ubiquity is the lingua franca of our storage solutions – we architect our platform to serve our customers’ needs.