D-Link DNS-321 Network Attached Storage Enclosure

$119.99 $99.99

D-Link DNS-321 Network Attached Storage Enclosure

More Info Buy Now!

Seagate 4 TB BlackArmor NAS 220 Network Attached Storage Server (ST340005LSA10G-RK)

$549.99

The 4 TB BlackArmorA NAS 220 server from SeagateA is a small-business-specific network ...

More Info Buy Now!

BUFFALO LS-XH1.0TL LinkStation Pro Network Attached Storage

$199.99

Take your data with you wherever you go while keeping it safe at home with the LinkStat...

More Info Buy Now!

Western Digital 2TB My Book World Edition II Network Attached Storage (WDH2NC20000N)

$279.99 $259.99

The My BookA World Editiona II Dual-drive Network Storage from Western DigitalA provide...

More Info Buy Now!

Seagate 2 TB BlackArmor NAS 220 Network Attached Storage Server (ST320005LSA10G-RK)

$449.99

The 2 TB BlackArmorA NAS 220 server from SeagateA is a small-business-specific network ...

More Info Buy Now!

3TB Snap Server 410 SATA II RAID Network Attached Storage

$1884.99

The Snap Server 410 is ideal for medium-sized businesses with less than 4TB of storage ...

More Info Buy Now!

750GB Snap Server 110 SATA II RAID Network Attached Storage

$868.19

The Snap Server 110 combines best-in-class storage performance with customer-centric ea...

More Info Buy Now!

500GB Snap Server 210 SATA II RAID Network Attached Storage

$500.15

The Snap Server 210 combines best-in-class storage performance, RAID data protection, a...

More Info Buy Now!

Fantom 2TB G-Force MegaDisk Network Attached Storage - 32MB, 7200RPM, RAID - NAS Ethernet Disk Storage

$329.95 $311.99

Fantom Drives G-Force MegaDisk NAS is the latest addition to the G-Force MegaDisk famil...

More Info Buy Now!

2TB 2big Network v1.2 Network Attached Storage

$359.95

2TB 2big Network v1.2 Network Attached Storage

More Info Buy Now!

Massively Scalable NAS Infrastructure Requirements

Understanding Baseline Capabilities in Dense File Storage Environments

The area of dense storage is just dawning and will require a different sort of storage infrastructure that can manageably scale to the multi-petabyte range. In the past, the HPC market had been the primary target for massively scalable storage platforms. Today, network attached storage solutions built on massively scalable storage platforms are increasingly penetrating commercial markets such as media and entertainment and online services, as well as secondary storage applications like backup, archiving and remote vaulting. Generally, these markets have required a specific type of performance – either small block I/O or specific throughput demands– but the rise of Web 2.0 applications brings with it a requirement for mixed workload support. Platforms designed to support the high throughput required for media and entertainment applications or the small block I/O performance of large, multi-user environments have not necessarily been able to handle these mixed workloads well. What are needed are platforms that can be configured
appropriately to provide the levels of performance required to meet different vertical market requirements.
A certain set of baseline feature requirements is emerging for these massively scalable NAS environments, including massive capacity, performance, high availability, and manageability.
Each vertical market may have its own unique set of requirements for storage management capabilities, but massively scalable NAS platforms across all environments will very likely share this core set of requirements. This paper discusses these requirements in detail, maps them to customer requirements, and then examines Exanet’s ExaStore 2008 NAS platform.
Exanet, an early entrant into this space, has a mature platform with strong customer references, and can be configured to meet a variety of performance requirements. End users and vertical market-focused channel partners should be aware of ExaStore and its capabilities, as it established and has maintained a very high bar for NAS scalability and performance since 2004.
Dense File Storage Market
Drivers
Data growth rates in the 50% - 100% range are not uncommon for most enterprises today. This phenomenal growth is driven by a number of factors, including the Web 2.0 phenomenon fueled by very dynamic applications, the growth of online services, and the increasing digitization of information, both personal and business oriented. Other related areas driving the need to store massive quantities of data include streaming media and secondary storage applications. Our discussions with end users indicate that 85% - 90% or more of this data is unstructured (file-based). The growth of unstructured data is hard to predict, and it tends to be highly variable. The impact of these developments will significantly change approaches to storage management. Traditional methods surrounding the management of file-based data, including the hardware and software platforms used to store it, are not well suited to handling the massive growth and variability of unstructured data synonymous with these new digital content-rich environments. The problem areas include:
· Infrastructure platforms designed to handle terabytes of data face severe technical and physical limitations in scaling to manage the petabytes of data most enterprises will require
· Growing system performance as the number of users expands in a large system has been difficult to do without “forklift” upgrades
· Manual techniques used today to manage provisioning, performance tuning, data redundancy, and capacity expansion cannot keep up with the scale of expansion driven by the new environment
· Conventional data protection processes built around point-in-time backups cannot handle the massive amounts of data
The new challenge is how to manage all of this data cost-effectively.
Storage Requirements in the Digital Content Age
Storage infrastructure platforms well-suited for large-scale storage environments share many characteristics across vertical markets.
They all require massive scalability, high availability, and aggressive $/GB costs, but what differentiates them is performance in specific application environments. Streaming media platforms must offer very high throughput, while Web 2.0 platforms must also offer excellent small-block I/O performance. We believe customers should evaluate storage infrastructure platforms for these environments based on the following four criteria:
Massive, linear scalability. This requirement covers a number of different areas that must be considered for any storage platform that must scale to support capacities in the tens of petabytes. The costs and management paradigms associated with monolithic storage architectures do not translate cost-effectively to multi-petabyte configurations, which require new approaches that will minimize energy and floor-space requirements as well as provide a manageable, cost-effective massively scalable storage platform. Monolithic platforms also offer less flexibility in accommodating
growth modularly; scale out platforms, which use clustering or grid architectures to combine large numbers of independent resources into a single logical solution, meet the requirement for massive scalability more cost-effectively and with more flexibility.
Upgradability is another key issue here. The new requirements for digital content management can require time horizons of 50-100 years (such as active archival storage) and therefore demand the ability to accommodate multiple technology generations over long periods of time without requiring disruptive forklift upgrades. Next, the hardware and software components of such a platform should support linear or nearly linear scalability as capacities move into the range of hundreds of terabytes and beyond. Grid and clustered storage architectures provide a sound infrastructure to achieve capacities in the multi petabyte range but may introduce caching and internode communication algorithms that affect linear scalability.
And finally, it is important to understand any limitations that might exist around file system management semantics. What are the maximum sizes for the file systems and files?
What is the maximum number of files that can be supported?
Configurable high performance. On one end of the performance spectrum is small block I/O performance, which tends to be applicable to environments with large numbers of concurrent users and very high file counts. On the other end is throughput, a type of performance applicable to media, entertainment, and other types of video streaming environments that do not require low-access latencies. Because of their design, some platforms may be much better for certain applications than others, while other platforms may be tuned, based upon configuration, to provide high performance at different points along the performance continuum. Although some standard benchmarks are available, such as SPECsfs, which can provide some indication of
system-level performance capabilities for certain environments, it is important to understand your performance requirements and ensure that any solution you’re considering can be configured to provide the type of performance you need. Understand the performance per node as well as top-end scalability, and understand the price/performance implications of each.
Solutions with higher-node level performance may allow you to meet your performance requirements with less hardware and lower costs (in terms of both CAPEX and OPEX).
Look for architectures that offer the flexibility to create a configuration well-tailored to meet specific performance requirements due to their ability to scale performance and capacity independently.
Grid or clustered storage architectures meet this requirement very well.
High availability. Several general areas must be considered here. First, how well does the solution handle hardware and software failures? Many applications for which these platforms will be used require 24x7 availability, so look for solutions that monitor components for availability,
incorporate transparent, automatic recovery from failures, and support online replacement of failed components. How is data redundancy handled, and what impact does this have on disk rebuild times as well as overall costs?
Second, the platform should support nondisruptive performance and capacity expansion or reconfiguration. Third, consider what capabilities the
platform offers to minimize the impact that common administrative tasks, like backup, will have on overall system availability.
Conventional backup to tape is not a workable data protection strategy for platforms that must scale to hundreds of terabytes and beyond. Look for integrated replication capabilities that allow replicas to be easily created and maintained without impacting system availability. The use of replication not only solves the “backup” problem for large-scale configurations, but also enables a remote site-disaster recovery option. Since replication relies on disk, this approach requires additional disk capacity, with its associated costs, that may not have been required in a tape-based data protection infrastructure, but in environments of this size, tape is not a workable option for data protection.
Centrally managed, unstructured data services support. The classic problem in traditional NAS environments has been that when the performance or capacity of a single filer is outgrown, introducing new filers creates separate name spaces and sets of file data that must be provisioned, protected,
replicated, and maintained individually. For large environments, file management done in this way is just not workable. To meet the requirements of scale, vendors have introduced very scalable, single file systems as well as aggregation products that create a single, global namespace from a management point of view across multiple file systems.
Both ease the file management tasks associated with scale-out NAS solutions by allowing an administrator to manage an extremely large-scale, file-based environment as a single file system. Supporting the management of centralized heterogeneous resources can be a strong differentiator in
this category; some vendors, while supporting very large environments, only manage their own underlying hardware.
A related feature required for environments of this scale is some level of self-management.
When new resources (performance or capacity) are added, does the system automatically rebalance itself, or does it require manual intervention? Is the
system self-healing with respect to node failures? Without these types of features, administrators are still operating within the more conventional and manually intensive administrative model of traditional NAS.
And finally, look for feature parity with storage management capabilities provided within traditional NAS solutions such as thin provisioning, snapshots, and replication.
Certain vertical market solutions built on massively scalable NAS platforms may have specific storage management requirements, such as WORM (write once read many) capabilities and data de-duplication for active archival storage.
Spotlight on Exanet
Exanet designs scale-out NAS systems that support scalability into the multi-petabyte range. Since shipping its initial offerings in this space in 2004, Exanet has achieved good traction with web service providers, digital media/broadcasting, and telecommunications companies. Exanet sells a massively
scalable file system software solution that supports NFS, CIFS, and Apple File Protocol access, among others, and is designed for deployment around a clustered storage backend leveraging heterogeneous storage.
Configurations are built up by combining file servers and storage arrays as appropriate to produce the desired performance characteristics. Exanet sells primarily through channel partners offering specific vertical market expertise. Exanet has a major partnership with IBM, although Exanet works with a number of other hardware (servers/storage), software (backup), and vertical market partners as well.
In 2008, Exanet expanded the capabilities of its platform with a set of enhancements dubbed ExaStore 2008. These enhancements include File Servers with quad-core processors, improved Windows support, the introduction of ExaMonitor, and some packaging changes. Support for the new quad-core processors offers a straightforward performance benefit, making Exanet the first and only scale-out NAS offering to date with this support. This is an example of the flexibility of Exanet’s architecture to incorporate new higher performance or power efficient hardware much more quickly than traditional NAS vendors. The improved Windows support comes in the form of integration with Microsoft Management Console for easier manageability in Windows-centric environments as well as native support for Windows ACLs.
ExaMonitor is a built-in performance and capacity monitor that also improves hardware monitoring for faster root cause analysis and improved capacity planning.
The packaging changes offer additional flexibility to partners and customers in buying and configuring scale-out NAS solutions. Although Exanet is a software developer, it is now offering integrated clustered NAS solutions. Complete systems are available from Exanet and its partners.
File server products designed to be used with heterogeneous storage offer a lower-entry price point for those customers that may want to re-purpose existing storage hardware.
How Does ExaStore 2008 Stack Up?
Due to its configurability, ExaStore is a platform that can be configured to meet various performance and capacity requirements, making it a platform that can be considered as the basis for a variety of vertical market solutions. Although this is a common claim among some scale-out NAS vendors, the proof is in the pudding. Talk to Exanet’s customers in the same vertical as you, check the SPECsfs benchmarks (if relevant to your environment) available at www.spec.org or better yet, try out a small, 2-node system against your specific workload.
Given that node performance varies by as much as 4x among various vendors today, choosing the right vendor to match your performance requirements can result in significant cost savings.
Massive linear scalability. First, ExaStore configurations clearly meet the multi-petabyte capacity requirement today. File servers are deployed in pairs, with each pair able to support up to 500TB of attached storage today. There is no limit to the number of pairs that may be deployed.
Clustered storage density can vary based on which heterogeneous storage is chosen for the storage nodes. ExaFS, Exanet’s massively scalable file system, supports files and file systems up to one exabyte (1024 petabytes) in size in a single pool and can hold up to 128 billion files.
Exanet’s DX Series storage arrays offer up to 48 SATA-2 drives in 4U of rackspace, RAID 6 with background parity checking and auto drive rebuild, and support for up to four 4Gb/sec Fibre Channel connections.
Second, ExaStore configurations can be built using heterogeneous storage hardware. ExaStore software runs on Intel Architecture servers and is designed to accommodate new generations of compatible servers. ExaStore’s ability to support heterogeneous storage offers the flexibility to choose backend drives that meet performance requirements in the areas of response time, duty cycle, capacity, etc. This and the fact that multiple generations of storage from multiple vendors are supported simultaneously give ExaStore a “future-proof” architecture. Expansion does not require forklift upgrades – it’s as easy as just adding more of the type of required resources (performance or capacity) to an existing configuration. This type of granular expandability is an easier, less disruptive way to support scaling than more monolithic approaches.
The ability to achieve linear scalability is dependent upon architecture and workload. Exanet’s architecture provides linear scalability, rebalancing workloads automatically as system configurations evolve over time (for resource additions or deletions) to maintain optimum performance. Exanet has SPECsfs numbers that indicate extremely linear scalability on the SPECsfs ops benchmark with an industry-leading price/performance metric –
less than half the cost of the next closest published $/SFS ops competitor. The price/performance comparison is an important one to take into account when evaluating scale-out NAS platforms. Many vendors will be able to scale out to meet performance and capacity requirements, but the vendors that can allow you to achieve your objectives with the fewest number of nodes have a clear price/performance advantage and require less power, floor
space, and management. Configurable high performance. When evaluating scale-out NAS platforms, understand what your specific performance
requirements are. Note that Web 2.0 environments tend to have a highly variable workload where the ability to configure a system at different points along the “small block I/O – throughput” performance continuum is important. Systems only good at throughput and not small block I/O may impose limitations if performance requirements change. ExaStore offers the option to configure performance and capacity independently. Its performance
numbers and customer references back up the fact that its systems can be configured to achieve high performance at different points along the “small block I/O – throughput” performance continuum.
ExaStore’s SPECsfs ops numbers indicate industry-leading single file system NAS performance, with the next closest vendor’s published numbers showing roughly 1/3 as much performance per file system. They also indicate industry-leading SPECsfs response times, again based on comparisons of
published data. ExaStore supports adaptive cache management algorithms with a strong bias for metadata, adaptive read-ahead caching for both data and directories, and localized allocation policies that minimize the amount of multi streaming on each LUN, all designed to allow it to scale to high performance levels for either small block I/O or throughput, depending on how the performance and capacity resources are configured.
ExaStore’s operating environment immediately identifies failures in nodes, disks, and controllers, rerouting any inflight requests around the failed
components and re-distributing the workload in real time to other system components. This makes hardware failures completely transparent from an end-user point of view. Failed components are easily identified using ExaAdmin and can be replaced online without affecting file operations. The same self-managing, self-healing capabilities used to support online replacement also support the online addition, removal, and/or reconfiguration of resources non-disruptively.
Clustered storage subsystems offer RAID for data redundancy; these capabilities can be used transparently in ExaStore configurations. Supported RAID levels vary depending on which heterogeneous storage is chosen for the storage cluster, but ExaStore’s flexibility allows end users to choose storage that best meets their data redundancy requirements.
Exanet offers ExaSync, IP-based, asynchronous replication software, as an add-on option for any ExaStore configuration. Supporting scheduled
volume- or directory-level replication, ExaSync can be used to create and maintain a local disk-based backup copy of the Exanet data store without impacting the availability of file services. ExaSync supports heterogeneous replication, so lower cost storage can be used for secondary copies if
desired. ExaSync can be used with any of the ExaStore data management capabilities discussed in the next section, including ExaSnapshots, ExaRestore, ExaBalance, and ExaMonitor. Finally, ExaSync can be used to create remote site copies for disaster recovery purposes.
Centrally managed, unstructured data services support. ExaStore’s central component is ExaFS, a massively scalable single file system that supports NFS and CIFS access (among others) and is centrally managed. Supported file services and access protocols include Network Time Protocol, Network Information Service, Domain Name Service, Kerberos and LDAP, NFS v2/v3 over UDP/TCP, Apple File Protocol, File Transfer Protocol, Secure Copy Protocol, and Microsoft CIFS, Active Directory, ACLs, IP-based ACLs, local users/groups in mixed and native modes, and Oplocks. This makes ExaStore a very complete file-serving solution for both Windows and Unix environments.
The performance and capacity characteristics of ExaStore were discussed earlier, but ExaStore also includes a full range of tools that simplify the management of massive unstructured data stores. ExaVolumes allows administrators to identify separate containers for separate data sets without downtime. ExaSnapshots lets administrators set policies for logical, space-efficient snapshot creation and retention. ExaRestore allows rapid restores of complete data sets directly from disk. ExaBalance is the integrated load balancer that works to maintain optimized performance as configurations evolve over time. ExaMonitor offers performance monitoring and capacity trending capabilities from a web-based graphical user interface and is incorporated as part of ExaAdmin, the Exanet management console. With respect to the scalability of these tools, ExaVolumes supports up to 2000 volumes per system and LUNs up to 8TB in size, while ExaSnapshots supports up to 1500 snapshots per ExaVolume. Quota support includes quota definition at the user/group level, default quotas, hard and soft quotas, and notification.
ExaStore Differentiators
Several architectural choices differentiate Exanet from the competition, but they all contribute to Exanet’s ability to support massive scalability with industry-leading performance levels for different types of fileserving environments and very aggressive price/performance metrics. This is borne out by a number of different capabilities and results, including ExaFS’ ability to support 1 exabyte file and file system sizes, up to 128 billion files in a single file system, industryleading node-level performance numbers, and price/performance metrics on published benchmarks. This ability to support high levels of both scalability and performance is based on Exanet’s highly efficient, latency minimizing, scale-out cache design.
Unlike some other vendors in this space, Exanet has chosen to go with an approach built entirely on industry standards and heterogeneous support. ExaStore supports widely used, industry-standard file system access protocols and can be built using storage hardware from a variety of different vendors. This flexibility contributes to its ability to deliver the performance necessary in different types of environments, since back-end disk characteristics (performance, duty cycle, capacity, price/performance) can be selected to support required performance characteristics. Vendors with proprietary hardware designs or those requiring that their systems be used with certain hardware sole sourced through them, cannot offer the flexibility necessary to meet a variety of performance requirements with the same basic system.
Taneja Group Opinion
The dawn of dense storage computing is upon us, driven primarily by the growth of unstructured data. While vendors like Exanet were pioneers in this space, the entry of trusted storage suppliers is already starting to happen. With several years of experience under its belt, Exanet offers a more mature platform and stronger references in several key verticals than other players. Exanet has been an innovator in this space throughout its history and continues today with its industry-leading performance numbers in published benchmarks and as well as its industry-first support for quad-core CPUs.
But it is really the scalability of its file system (ExaFS) that sets it apart functionally from other players. Its choice to deploy using industry standards, including commodity server and heterogeneous storage hardware, has very positive cost implications, and is just one factor supporting its industry leading SPECsfs ops numbers today. Vendors like Exanet that support heterogeneous servers and storage in its scale-out NAS configurations also offer a level of “future proofing” not available from vendors with proprietary or single-source solutions. If you are looking for a massively scalable NAS solution, regardless of whether it is for small block I/O, throughput, or mixed workloads, ExaStore offers the configurability to give you petabyte-level scalability today along with extremely high performance at very aggressive price points.