Fix It For Post: Streamline M&E Storage Performance
Posted in tech
In this blog series, we discuss how Media and Entertainment
companies working in visual effects (VFX) and post production use DataSphere to
simplify scale-out architectures and make existing resources more powerful and
effective. The series is comprised of three posts that compare DataSphere to a
leading traditional scale out system. This first post discusses how DataSphere
improves performance to accelerate time to market. Future posts will describe how
DataSphere reduces infrastructure needs to lower cost, and how DataSphere
increases visibility and automates remediation to improve manageability.
For background, DataSphere allows organizations to orchestrate data non-disruptively across all of their storage resources. With DataSphere, enterprises assign Service Level Objectives to data that define performance and protection needs. If you’d like to read more, check out our DataSphere page.
DataSphere Distributes Workloads Dynamically and Granularly
Traditional scale out architectures use clusters of storage nodes with storage attached to each node and data dispersed across the cluster. While these systems allow organizations to distribute load across nodes, however the assignment of clients to nodes is static, and performed in a round-robin fashion. Once a client is assigned to a node, it typically stays assigned to that node unless the node goes down or an admin manually reassigns the client to a different node. Distribution across nodes is also very coarse, as an entire client is assigned to a node’s IP address. This lack of granularity becomes a real problem when clients have to compete for resources on a busy node.
Figure 1. Data for the entire client is limited to one node. Application response times slow when the node is busy.
By comparison, DataSphere assigns data to nodes at the file level. This enables workloads to be distributed very finely, across multiple nodes. DataSphere uses visibility into file access on the client and workloads on the nodes to assign data to nodes intelligently, according to what the data needs and what resources nodes have available.
Figure 2. DataSphere can assign data at a granular level to evenly distribute data access and workloads.
DataSphere’s ability to distribute data is enhanced by its ability to incorporate storage of all types into a global dataspace. This enables DataSphere to place data on the ideal storage for the data’s requirements. For example, DataSphere can dynamically place data such as temp and scratch files on NVMe flash in clients, active data on performance storage, cool data on capacity storage, and cold data in the cloud, automatically and non-disruptively. This increases peak system performance and also makes individual application performance more consistent and predictable.
Figure 3. DataSphere expands storage choice and automatically aligns data needs with node supply.
Simplifying Metadata Management
As scale out systems grow, metadata management becomes an increasingly common performance chokepoint. Simple operations such as opening and closing a file take four to six roundtrips between client and storage to complete. As M&E workloads can access millions of files during normal operation, this overhead can significantly slow productivity.
DataSphere resolves the metadata chokepoint by separating the metadata (control) path from the data path. A recent blog post on accelerating file-intensive applications discussed how offloading metadata operations increases performance. DataSphere offloads metadata operations to a server that is purpose built to handle the heaviest metadata workloads and reduces the number of roundtrips for file operations.
Figure 4. DataSphere accelerates performance by separating the metadata (control) path from the data path.
Cache Isn’t King Any Longer
With traditional systems, organizations commonly mitigate performance problems by purchasing and deploying expensive caching appliances that sit between the node cluster and the clients. This Band-Aid approach works when data is in the cache, but when data isn’t in the cache, clients must wait for the data to be pulled from storage into the cache. As with nodes, customers in the M&E industry have told us that up to 70% of clients end up hitting the same caching server. This means more cache misses, making caching performance improvements, and application response, unpredictable.
Figure 5. Caches misses require extra hops to retrieve data, adding latency and slowing application response times.
By eliminating the biggest chokepoints in scale out systems, DataSphere can eliminate the need for many organizations to purchase separate caching appliances and software. For example, DataSphere can place temp and scratch data on NVMe flash in clients, active primary data on performance tiers, cool data on capacity tiers, and cold data in the cloud - automatically and non-disruptively.
Figure 6. DataSphere eliminates the need for caching appliances, making application response times fast and predictable.
DataSphere eliminates complexity that slows performance in scale out systems. In the next blog post, we will examine how DataSphere also simplifies manageability by increasing visibility into performance problems and giving admins a way to automate remediation. If you can’t wait that long to start investigating how much it could save you, connect with us at firstname.lastname@example.org.