Western Digital My Book World Edition II - 2 TB (2 x 1 TB) Network
Attached Storage WDH2NC20000N (White)

Western Digital My Book World Edition II - 2 TB (2 x 1 TB) Networ...

Category: (CE)

4 new, starting at $259.00

1 used, starting at $245.99

Buy Now More Info
Netgear ReadyNAS Duo 2-Bay 1 TB (1 x 1 TB) Desktop Network Attached
Storage RND2110

Netgear ReadyNAS Duo 2-Bay 1 TB (1 x 1 TB) Desktop Network Attach...

Category: (CE)

15 new, starting at $322.56

2 used, starting at $305.99

Buy Now More Info
Linksys by Cisco Network Storage System with 2 Bays (NAS200)

Linksys by Cisco Network Storage System with 2 Bays (NAS200)

Category: (CE)

4 new, starting at $89.99

5 used, starting at $70.99

Buy Now More Info
D-Link 2-Bay Network Attached Storage Enclosure DNS-323

D-Link 2-Bay Network Attached Storage Enclosure DNS-323

Category: (CE)

37 new, starting at $155.95

4 used, starting at $125.00

Buy Now More Info
D-Link 2-Bay Network Attached Storage Enclosure DNS-321

D-Link 2-Bay Network Attached Storage Enclosure DNS-321

Category: (CE)

42 new, starting at $116.99

2 used, starting at $110.00

Buy Now More Info
Western Digital My Book World Edition 2 TB Network Attached Storage
WDH1NC20000N (White)

Western Digital My Book World Edition 2 TB Network Attached Stora...

Category: (CE)

32 new, starting at Too low to display

Buy Now More Info
Netgear ReadyNAS Duo 2-Bay (Diskless) Desktop Network Storage
RND2000

Netgear ReadyNAS Duo 2-Bay (Diskless) Desktop Network Storage RND...

Category: (CE)

24 new, starting at Too low to display

1 used, starting at $222.99

Buy Now More Info

D-Link DNS-321 Network Attached Storage Enclosure

$119.99 $99.99

D-Link DNS-321 Network Attached Storage Enclosure

More Info Buy Now!

Seagate 4 TB BlackArmor NAS 220 Network Attached Storage Server (ST340005LSA10G-RK)

$549.99

The 4 TB BlackArmorA NAS 220 server from SeagateA is a small-business-specific network ...

More Info Buy Now!

BUFFALO LS-XH1.0TL LinkStation Pro Network Attached Storage

$199.99

Take your data with you wherever you go while keeping it safe at home with the LinkStat...

More Info Buy Now!

Western Digital 2TB My Book World Edition II Network Attached Storage (WDH2NC20000N)

$279.99 $259.99

The My BookA World Editiona II Dual-drive Network Storage from Western DigitalA provide...

More Info Buy Now!

Seagate 2 TB BlackArmor NAS 220 Network Attached Storage Server (ST320005LSA10G-RK)

$449.99

The 2 TB BlackArmorA NAS 220 server from SeagateA is a small-business-specific network ...

More Info Buy Now!

3TB Snap Server 410 SATA II RAID Network Attached Storage

$1884.99

The Snap Server 410 is ideal for medium-sized businesses with less than 4TB of storage ...

More Info Buy Now!

500GB Snap Server 210 SATA II RAID Network Attached Storage

$500.15

The Snap Server 210 combines best-in-class storage performance, RAID data protection, a...

More Info Buy Now!

750GB Snap Server 110 SATA II RAID Network Attached Storage

$868.19

The Snap Server 110 combines best-in-class storage performance with customer-centric ea...

More Info Buy Now!

Fantom 2TB G-Force MegaDisk Network Attached Storage - 32MB, 7200RPM, RAID - NAS Ethernet Disk Storage

$329.95 $311.99

Fantom Drives G-Force MegaDisk NAS is the latest addition to the G-Force MegaDisk famil...

More Info Buy Now!

Data Robotics DroboShare Network Attached Storage Expansion for Drobo Storage Robot

$199.79

Data Robotics DroboShare Network Attached Storage Expansion for Drobo Storage Robot, Gi...

More Info Buy Now!

PANASAS STORAGE FOR PETASCALE SYSTEMS

OVERVIEW
Panasas is a leading provider of storage for large scale, high performance systems. Our customers depend on high performance computing systems to solve demanding problems in energy exploration, financial analysis, climate modeling, computational fluid dynamics, manufacturing design, digital animation, computational physics, higher education, and many similar applications. A critical component of their high performance computing systems is the Panasas storage system that lets them manipulate large datasets by thousands of compute nodes that are organized into one or more clusters that communicate via high speed networks. Without the right storage system, their investment in computational power and network infrastructure will be
underutilized, either because of performance limitations or down time due to reliability issues. Our customers choose Panasas because they know they can solve their large problems while relying on our equipment.
Many of our customers are preparing for a future that involves very large scale computations that require high performance access to petabytes of storage. This paper explains the elements of the Panasas system that are designed to handle very large scales. Our recent paper in the 2008 FAST conference provides a technical overview of the Panasas system. Our workshop paper at SC07 provides a more background on our internal distributed system platform. Earlier work presented at SC04 describes our approach to high performance file-based RAID.
Scalability In the Panasas Architecture
The elements of the system that provide scalability include:
• A distributed system platform that manages the rest of the system.
• Distributed block management using the object storage protocol.
• Distributed metadata management with a global namespace.
• Per-file RAID protection.
• Declustered RAID for scalable reconstruction performance.
• Fully redundant hardware and software with automatic fault handling.
The Panasas system is based on a distributed system platform that provides a scalable framework for managing a large collection of software components and hardware components. Part of this is a common platform layer that includes the base operating system, a local process monitor, a local
hardware agent, and a message passing agent that communicates with a global cluster manager. The cluster manager is a replicated service that uses a quorum-based voting protocol to make decisions and maintain a replicated copy of the overall system state. The cluster manager keeps track
of services and hardware components, starting and stopping services as necessary, monitoring their state and the state of the hardware, and reacting to faults and changes in the environment.
The Panasas file system is an application hosted by the distributed system platform. The separation of the file system from the overall cluster management means that the file system protocols can be optimized for performance while the management system is optimized for robustness. The
system architecture allows for clean integration of other services such as backup agents, replication agents, and more.
Block management is a fundamental aspect of any storage system. The Panasas system delegates block management to StorageBlades that export an Object Storage Device (OSD) interface. Higher levels of the file system manage objects that are containers for data and attributes, and the Storage-Blades implement the object abstraction that involves traditional block management. Each StorageBlade is a balanced component that has disks, a network interface, a processor, and memory.
As storage capacity scales up, the necessary computing resources to manage the storage and provide high bandwidth access are automatically scaled up at the same time. Files are striped across objects on different StorageBlades, so that even a single I/O stream benefits from distributed block
allocation.
Metadata management has two aspects of distribution. The first is that multiple metadata management services control different parts of the file system namespace. These run on different DirectorBlades so the system can be scaled up to harness the power of many DirectorBlades.
File system clients are responsible for
4
WHITE PAPER:
PANASAS® STORAGE FOR PETASCALE SYSTEMS
generating redundant data and they transmit data and parity in parallel to the StorageBlades. This
provides a natural scaling in RAID performance as the number of file system clients increases. A
unique property of the Panasas system is that clients can verify the RAID equation during reads to
provide true end-to-end data integrity checking. In addition, write performance remains very close
to read performance as the system scales up, in contrast to traditional RAID controllers that pay a
substantial write performance penalty in redundant configurations. Panasas data is fully protected
in high available configurations without compromising write performance.
The system handles very large numbers of small files as well as it handles lesser numbers of very
large files. Small files start out mirrored on two StorageBlades so they are cheap to create, have
low space overhead, are efficient to write with small I/Os, and quick to rebuild after failures.
These are automatically converted to widely striped files as they grow in size to optimize band-
width and reduce parity overhead. The memory on each StorageBlade is used to cache hundreds of
thousands of object descriptors, as well as data, in order to optimize access to a hot working set of
files. That working set could be a small number of large files that are shared by a single computa-
tion and spread out over many StorageBlades, or very large numbers of relatively small files used
by many concurrently running independent applications. The system scales its resources naturally
to handle either kind of workload.
The per-file RAID approach is exploited to provide scalable RAID rebuild. Parity groups are
declustered (i.e., spread out) among the StorageBlades, and DirectorBlades distribute the rebuild
work on a fine-grained, per-file basis. Thus the system naturally harnesses the power of many
disks, many network interfaces, and many computer systems to tackle the critically important
problem of RAID rebuild. The result is a parallel RAID rebuild system that scales RAID rebuild
performance in larger storage systems.
ROadmaP tO PEtaScalE
Today our largest single system is a 2 petabyte system at Los Alamos National Labs for the Road-
Runner super computer. This system is created from 1000 StorageBlades that each have two 1 TB
drives, processor, memory, and a 1GE interface. There are also 100 DirectorBlades that provide
metadata management and RAID rebuild. The blades are housed in a 4u chassis that holds 11
blades. Each chassis has redundant 10GE connections to the LANL scalable network infrastruc-
ture. Each blade has two NICs routed through two different switch modules. The chassis has dual
redundant power supplies and a battery that runs the system for several minutes, which is long
enough to gracefully flush data to disk in the event of AC power loss.
The next largest system at LANL is about half that size and has been in production for over two
years. It is a shared storage cluster accessed by 3 different compute clusters (TLCC, Lightning, and
Viewmaster). Commercial installations of our product typically range in size from 100 Storage-
Blades to 200 StorageBlades, and we have one commercial customer that has 500 StorageBlades in
one system. The commercial systems are all used in demanding 24x7 environments where they are
shared by hundreds or thousands of compute servers that run a wide variety of applications. Our
smallest configuration is 10 StorageBlades and 1 DirectorBlade in a single chassis, and it is easy
enough to manage that they are deployed in boats that take seismic data for oil exploration.
Our blade chassis has a potential throughput of up to 2 GB/sec assuming both network switches
and all blade NICS are fully utilized. Our current blades can generate over 600 MB/sec from disk
out to file system clients. We plan to boost blade performance by the end of 2009. By bonding
the two network switches and doing further blade improvements we plan to reach the 2 GB/sec
mark by 2011.
We are introducing a multicore hardware platform in 2010 that couples a high performance server
with more drives. This will be a larger building block that will allow us to scale the storage system
to many petabytes without having to scale the number of computer systems we use to manage the
storage. This platform gives us flexibility to provide very large pools of storage (many petabytes)
with either the same high level of performance as our blades, or to throttle back on the available
performance in order to reduce the cost of the system. Both hardware platforms will use the same
file system architecture and can co-exist within the same file system. Our existing data migration
facilities will allow online migration of data between different storage pool classes.
Our largest systems today harness over 1000 computer systems to provide very high performance
access to 2 petabytes of data. We will be able to use the same distributed system architecture to
harness 1000 computer systems that are much more powerful than our current blades, and that
manage one or two orders of magnitude more disks than our current systems.
cOncluSIOn
The road to reliable, high performance, petascale systems starts with a system foundation designed
to support large numbers of hardware components and software services. The distributed system
platform within the Panasas system is that foundation. Success on that road comes from experi-
ence gained through larger and larger deployments. The Panasas system has been in production
for several years in a variety of demanding commercial and scientific environments. We have been
able to refine our approach and improve our internal software architecture based on that experi-
ence. While it is easy to focus on performance, the real key to customer loyalty is an emphasis
on stability and reliability at scale. Performance will follow naturally from advances in hardware
technology. Panasas has proven its ability to organize large numbers of hardware and software
components to reliably support petabytes of storage in a single, high performance system. We are
ready to apply our architecture and experience to support much larger deployments as our custom-
ers tackle ever larger problems.