The New NAS: Parallel Performance, Power and Scale

Until recently, Network Attached Storage (NAS) was rarely considered a solution for mission-critical and high performance applications. NAS offers a rich environment for file-based management, which is often preferred by enterprise IT, but falls short when compared with the reliability and block-level performance of Storage Area Networks (SAN). With the advent of flash memory, NAS solutions now possess the scale-up performance needed to compete with SAN. However, NAS is still governed by metadata operations, often lacks leading scale-out capabilities, and is limited to proprietary clustering solutions that do not allow for clustering across vendors.

While the Network File Systems (NFS) protocol brings simplicity of use and management to filebased networked storage, clients (applications and their storage stacks) must communicate with a NAS device using NFS operations to retrieve information about or gain access to a file. These commands are sequential operations that add performance overhead as they act on metadata requests (such as getattr, lookup, access, etc.) or on a file itself (such as read, write, rename, etc.). Traditional NAS performance is therefore limited by how fast it can service sequential NFS protocol requests and file activity.

KEY BENEFITS
  • Overcome performance bottlenecks using existing storage
  • Improve application performance by moving data to your fastest storage tier without disruption
  • Increase clustered NAS performance with automated load balancing at the file level, non-disruptively, according to business objectives
  • Accelerate metadata operations
  • Automatic capacity balancing across multiple vendors without disrupting applications

STEP OUT OF THE QUEUE: ACCELERATE METADATA PERFORMANCE IN PARALLEL

DataSphere is a metadata engine designed to separate and offload the architecturally rigid relationship between applications and where their data is stored.

Products | Primary Data

Figure 1 - DataSphere abstracts the data path from the metadata path to make it possible to connect different storage resources across a global namespace.

Offloading metadata access with DataSphere delivers predictable, low-latency metadata operations by guaranteeing that metadata operations do not get “stuck” in the queue behind other data requests. Rather than having to wait for sequential operations to complete, DataSphere can leverage parallel access with the latest optimizations of the standard NFS v4.2 protocol. Leveraging NFS v4.2 significantly speeds up metadata and small file operations by requiring less than half of the protocol-specific network round trips compared to NFS v3.

Products | Primary Data

Figure 2 - In traditional architectures, metadata requests can slow down performance as each operation needs to be completed before more important file operations can be executed.

Since DataSphere can place files transparently across different storage resources, the platform also overcomes the bandwidth limitations of a single storage resource to deliver the ability to serve large file requests rapidly. This breakthrough achieves unprecedented improvements in performance, efficiency and scalability.

Products | Primary Data

Figure 3 - DataSphere significantly speeds up metadata and file operations by requiring less than half the network round trips compared to NFS v3.

The DataSphere platform leverages the latest advancements in NFS 4.2 environments natively by utilizing the client that is part the mainstream Linux distribution. DataSphere Extended Services (DSX) are used to service legacy NFS v3 and SMB environments, and can be used in tandem with clients using the latest NFS v4.2 advancements. Regardless of the client types, DataSphere provides insight into how an application makes use of its data. With this knowledge, DataSphere can place or move files to different storage types or tiers without disrupting an application’s access, even while files are open and the data is in-flight. This enables IT to ensure that applications will always meet the business’s performance, price, and protection requirements. DataSphere can easily create a powerful scale-out architecture that can load balance files across arrays to accelerate application performance and reduce hotspots. DataSphere integrates with the cloud for offloading cold data, reduces costs through increased utilization and provides the freedom to combine NAS arrays from different vendors into a global namespace.

SCALE-OUT NAS WITH CLUSTERS

A shared mount point is a practical and efficient approach to connecting many compute nodes to a clustered storage system. This method provides simple setup, maintenance and flexibility for a wide variety of applications. On the other hand, it can also restrict logical resources, such as metadata and directory structures, which hinders application performance. For some clustered NAS solutions, this means the single performance bottleneck is the network connectivity between the application and the related storage share (node).

When configured with DataSphere, a group of NAS arrays can be logically pooled together as a scale-out cluster under the global namespace. With its unique metadata engine and out-ofband architecture, DataSphere creates a comprehensive a scale-out NAS solution built using a pool of NAS arrays from a single or multiple vendors. Using the a global namespace, DataSphere presents a group of off-the-shelf NAS arrays as a cluster with new levels of scale-out performance and capabilities. Metadata operations move to DataSphere, allowing data to be load balanced across each NAS to promote parallel data access and higher application performance.

Products | Primary Data

Figure 4 - DataSphere enables a group of NAS arrays to be logically pooled together as a scale-out cluster under the global namespace.

This scale-out cluster can extend into the cloud or other object storage and leverages block storage for multi-tier efficiencies. This means you no longer need to purchase proprietary, single source clustered NAS solutions with rudimentary scale-out capacity or performance.

Products | Primary Data

Figure 5 - DataSphere creates a global namespace that connects different types of NAS storage and the cloud, then automatically places the right data on the right resource to meet IT-defined objectives.

Let’s examine a specific use case for this approach. Consider the production challenges of 4K resolution video in the Media and Entertainment (M&E) industry, where the limits of today’s scale-out NAS solutions often force studios to use a compressed 4K workflow rather than the preferred uncompressed 4K format. One hour of uncompressed 4096x3072 10bit RGB media at 24 frames per second requires over 4TB of storage.

Products | Primary Data

Figure 6 - Parallel access enables DataSphere to overcome the performance bottlenecks of traditional NAS architectures.

The real challenge comes when uncompressed 4K image sequences consist of a single 50MB file for every frame of video. Processing 24 of these files per second requires bandwidth and low latency with the ability to sustain 1,200 MB per second. A single shared mount point using a 10Gb Ethernet network therefore lacks the necessary throughput. The problem worsens when several developers work simultaneously, as there is no guarantee each user’s application will be mounted to a different share and thus, each share contends for bandwidth. Using DataSphere, a logical cluster with file-granular load balancing will place the video frame files so that the client can simultaneously access the files in parallel over the multiple 10GbE connections across multiple NAS nodes. Similar gains in efficiency are possible across numerous industry-specific use cases.

VENDOR AGNOSTIC NAS CLUSTERING

Using standard NFS protocols, DataSphere combines virtually any vendor’s NAS arrays or clustered arrays together into the global namespace. DataSphere leverages each resource based on its attributes for performance (IOPS, bandwidth, and latency), capacity, cost and reliability. DataSphere matches those capabilities to IT’s defined business objectives to direct file placement and movement.

Products | Primary Data

Figure 7 - DataSphere makes it simple to integrate different storage resources - even from different vendors - into a single global namespace that can be accessed by all data.

With DataSphere, IT gains the ability to grow capacity and performance even when the deployed arrays are scaled-up as far as possible. Customers can leverage newer, competitive products without the risk, burden, expense and time of data migration. Once the arrays are mounted and capacity increases, DataSphere load balances data to ensure objectives are met without impacting active applications.

EXTEND AND TIER A SCALE-OUT NAS CLUSTER

DataSphere’s ability to actively migrate data between storage volumes makes it possible to easily tier data across groups of NAS arrays and clusters. This can be done at a fine granularity with operations driven by application activity and IT cost objectives.

DataSphere collects data and differentiates which NAS stores should be used as primary performance tiers or low-cost, high capacity secondary tiers such as cloud or object storage. With user-defined characteristics and real time statistics, DataSphere best uses your assets based on the attributes of each backend system. The DataSphere metadata engine instructs DataSphere Extended Services (DSX) to place or move data between tiers to meet business goals and improve the efficiency, performance, and scalability of software applications, even when data is actively being accessed. If data is in flight to another tier, the DSX Data Mover function ensures all reads and writes to data are completed atomically.

It’s well known that storage is underutilized over time due to the accumulation of colder, lessused data that resides on expensive primary storage. Extending the global namespace to the cloud allows non-disruptive offloading of cooled off data from the scale-out NAS cluster to low cost object stores. Just as with performance objectives, DataSphere can use time and application activity to decide when to demote data to the cloud tier and make the more valuable primary storage tiers available for newly created and more active data. IT saves budget by avoiding unnecessary purchases of more expensive primary storage. Of course, if the same cold data is accessed in the future, it can be automatically promoted and moved back to a more performant tier of storage.

RIGHT DATA, RIGHT PLACE, RIGHT TIME WITH DATASPHERE

With DataSphere, admins can deliver higher performance with new forms of scale-out NAS storage solutions built from existing scale-out NAS deployments. DataSphere makes it possible to combine NAS arrays from different vendors for cost savings and agility, create logical storage tiers for improved capacity efficiency, define performance tiers for increased application throughput, automate demands using objectives, upgrade storage without disruption and leverage the use of the cloud today – seamlessly and without changing applications. DataSphere expands architectural storage choices to meet both IT’s budget constraints and the application demands of the business.

Connect With Us

Enter your name and email to receive news and updates from Primary Data. Fields marked with an * are required:

Contact Form

Channel Partner