Kick NAS Performance into High Gear
Posted in tech
Data is exploding, and enterprises are on a storage spending spree just trying to keep up. In fact, IDC reports that in Q1 of 2017, total capacity shipments were up 41.4% year over year, reaching 50.1 exabytes of storage capacity shipped during the quarter. As IT departments continue to increase their spending on capacity, few realize that their existing storage is a diamond in the rough that can be fully utilized by fixing the inefficiencies created by storage silos.
The DataSphere metadata engine virtualizes application data by separating the data (physical) path from the metadata (control) path. This eliminates storage silos by making all storage devices simultaneously accessible to applications within a global namespace. The DataSphere software intelligently moves and places even active data across storage resources using information it gathers about metadata and data. With DataSphere, storage is put to work in the service of data, making storage resources more efficient, powerful and simple to manage.
In this blog series, comprised of four posts, we examine the
inefficiencies of traditional NAS systems and how DataSphere solves them. This
first post discusses how DataSphere improves NAS performance. Future posts will
describe how DataSphere improves capacity utilization, automates data movement
across different NAS types and the cloud, and enhances the capabilities of existing NAS systems.
Traditional NAS: In-Band Metadata and Manual Load Balancing Degrades Performance
With traditional NAS systems, the metadata (control) path lies in the data path. This results in queuing of metadata operations as they wait for data operations to complete, which can slow application response times and disrupt business. Metadata having to wait for data is like a little grain of pure sugar (metadata) having to wait behind an elephant (a large packet of data) to be eaten. Enterprises address this problem by trying to distribute application data evenly across nodes in a cluster, but it’s common for an I/O spike to create resource contention, where one application gets all storage resources, slowing or halting the responsiveness of others.
When a node in a cluster becomes oversubscribed, hot spots that slow applications can be more common and severe. Enterprises typically avoid oversubscription through massive overprovisioning. When budgets don’t allow overprovisioning, IT currently addresses oversubscription by manually redistributing data across other nodes in the cluster. This process is manual and disruptive, and often requires applications to be stalled or processing queues to be shut down. This can halt business productivity from hours to days.
Manual redistribution can temporarily solve predictable and permanent hot spots, but I/O spikes are often unpredictable and transient. In these cases, IT rarely has enough time to find the problem application and the node its data is on, let alone fix the problem. The complaints just have to endured until the activity subsides. Enterprises often attempt to fix transient hot spotting by throwing more hardware at the problem, but this won’t help much because active data on the problem storage node cannot be moved. This means adding more storage nodes only helps the remaining storage nodes in the cluster—not the storage node experiencing the hot spot.
DataSphere Streamlines Performance, Automatically Remediates Hot Spots
DataSphere solves hot spots in several ways. First, it streamlines data and metadata performance by moving metadata out of the data path. This guarantees that metadata operations do not get “stuck” in the queue behind other data requests, which improves the performance of individual storage nodes. Second, it improves the aggregate performance of the cluster by enabling applications to access multiple storage nodes in parallel, eliminating the storage bottleneck that develops when applications can only access a single storage node. The following diagram illustrates these improvements:
Figure 1 - DataSphere eliminates queuing between metadata and data, while enabling multiple storage nodes to be accessed in parallel.
DataSphere also enables live data mobility without interrupting application access. This enables enterprises to automate the distribution (and redistribution) of data, on demand, according to business objectives across performance, protection, and price. Hot spots are remediated automatically, as soon as they arise, increasing service levels. Each storage node is also used more efficiently, while performance and capacity scale linearly to reduce costs.
With DataSphere, enterprises can achieve far greater performance from their existing NAS systems, while delivering linearly scaling performance as more nodes are added. This enables enterprises to support today’s demanding workloads at far lower costs, and with far less maintenance. In the next blog post, we will examine how DataSphere solves the problem of inefficient NAS capacity utilization. If you can’t wait that long to start investigating how much DataSphere can improve NAS performance, connect with us at firstname.lastname@example.org.