Over the last few years, few breakthroughs in enterprise IT have been as disruptive to “business as usual” as the cloud. End-user software is now more collaborative and current in the cloud, and these benefits of increased flexibility and agility also extend into how enterprises are storing their data. Yet the cloud also represents a new tier of storage to integrate and manage, presenting a number of adoption challenges to IT teams tasked with adding the cloud to their infrastructure.
DataSphere is a metadata engine designed to separate the architecturally rigid relationship between applications and storage to achieve unprecedented improvements in performance, efficiency and scalability. It leverages the latest advancements in NFS 4.2 environments natively or can be deployed with DataSphere Extended Services (DSX) in legacy NFS v3, SMB 2.1 and 3.x environments to gain insight into how an application makes use of its data. With this knowledge, DataSphere can place or move data to different storage types or tiers – including the cloud – without disrupting an application’s access, even while the data is in-flight.
DataSphere enables IT and application owners to distinguish values for the petabytes of data that they have under management. With minimal configuration, existing storage is utilized optimally according to those values throughout the data’s lifetime. DataSphere helps enterprises reduce the total cost of ownership (TCO) for managing and storing data by providing the ability to define and account for data movement based on price-to-performance targets. This means that the cloud can now be viewed and used as a low-cost, lower performance, highly reliable storage resource. In addition to the benefits of archiving to the cloud, DataSphere enables additional savings and agility by leveraging the cloud as a unique tier to store cold data, manage snapshots, ensure data governance, and enable on-demand usage models.
- Seamlessly add single or multiple cloud storage tiers
- Move data to the cloud and back without disrupting application access
- Automatic Deduplication and Compression before sending data to the Cloud
- Restore on demand only the files needed from the cloud to minimize cloud bandwidth charges
- Free existing storage capacity and eliminate the need to purchase and deploy new storage by moving less frequently used data to the cloud
THE CLOUD ON YOUR HORIZON
A growing number of enterprises are taking a “cloud first” mentality. Gartner reports that the worldwide public cloud services market is projected to grow 18 percent in 2017 to total $246.8 billion, up from $209.2 billion in 2016. In part, this is because the cloud can deliver significant cost savings to companies by serving as an archive for cold data. When it comes to cloud archival, the challenge is determining what data can be safely archived, and how to move that data once it is identified. Managing data by objectives with DataSphere allows IT to easily identify data that meets the enterprise’s criteria for moving to the cloud. DataSphere automates the movement of that data to the cloud and back as needed without requiring intervention by IT.
Once DataSphere is deployed, there are important benefits that save the organization time and money. Many archiving solutions use simple rules, like file creation date, as a trigger to archive data. An old file that may be accessed extensively over time could be inadvertently archived because it has not been changed. This situation would lead to a productivity issue for the end user and IT intervention to get the file back and ensure that it does not get moved again in the future. With DataSphere, objectives ensure this important file stays on primary storage, automatically.
If the company is using the cloud as a store for their backup data, restores can be costly due to the bandwidth charges associated with retrieving data from the cloud provider. For example, if a company needed to restore a single file, they would still need to pay the bandwidth charge to move the entire backup bundle on premise and then rehydrate the bundle to restore the file. If the bundle contained video and audio files, these bandwidth charges could be significant. DataSphere maintains access to data in the cloud as files. This means that companies can restore just the file that is needed, minimizing cloud bandwidth charges.
GAIN BUSINESS INTELLIGENCE BY TAMING THE BIG DATA BEAST
More companies are looking to gain insight through business intelligence and data warehousing (BI and DW) applications that analyze highly transient data and mine old data. Since companies have difficulty determining whether data might become valuable in the future, they rarely delete it. Some organizations are even using their Big Data platform as a Backup/DR/archive repository, which compounds the amount of data stored in an enterprise.
According to IBM, “approximately 75 percent of the data stored is typically inactive, rarely accessed by any user, process or application. An estimated 90 percent of all data access requests are serviced by new data—usually data that is less than a year old.” As the amount of mostly unused data companies are storing continues to increase, using the cloud with DataSphere is an effective solution to the storage sprawl created by big data. DataSphere uniquely allows data to remain accessible by applications and can seamlessly promote data to other storage tiers that suit an application’s data demands.
1 The fundamentals of data lifecycle management in the era of big data, 2013.
SNAPSHOTS AS BACKUP?
Many companies augment their traditional backup and recovery with array-based snapshots as a convenient shortcut to backup or restore an entire dataset to a previous point-in-time. This can add risk as the snapshot won’t be available if the storage system fails. With DataSphere, backups or snapshots of production data can be moved to another location for disaster recovery and restoration of deleted or modified files. To ensure data is kept safe, retaining a series of snapshots requires companies to purchase additional capacity. Over time, this can get very expensive, or IT is forced to prematurely delete older snapshots.
With DataSphere, enterprises can set objectives and use the cloud to move aged snapshots off production infrastructure. When data needs to be recovered or restored from a series of older snapshots, they can be easily accessed by management software while DataSphere moves the data back to the production environment to complete a recovery entirely automatically.
Figure 1 - By creating a global namespace that can integrate existing and new flash, cloud and shared storage, DataSphere gets the right data to the right resource to meet IT defined objectives automatically and without application interruption.
Hierarchical Storage Management (HSM) has conceptually been in existence since the beginnings of commercial data processing as admins make attempts to move data between storage tiers for either better performance or lower cost storage. However, in practice, HSM tends to treat higher performing storage tiers as a cache, and less frequently accessed data gets archived.
Companies often use HSM solutions to help with data management, but they suffer from many deficiencies: limited or no cloud support, complex management, proprietary protocols and inflexibility in data movement. DataSphere can move data between tiers and seamlessly integrate a cloud tier according to objectives that define the performance, price and protection that the data requires. With these abilities, DataSphere automates the movement of data to overcome the following HSM challenges:
- Limited or no cloud support: Most solutions do not integrate with the cloud, and those that do require applications to be modified to use data retrieved from the cloud.
- Complex management: While HSM insulates end users from data movement, policy creation is extraordinarily complex for the storage administrator. To create good policies, an admin must figure out how to classify data, when data can be safely archived, when data can be moved and manually ensure that each tier has sufficient capacity to accomplish the activity.
- Vendor and protocol specific: HSM solutions typically work only for specific storage hardware from specific vendors, or may support only specific storage protocols (file, block, or object).
- Reliance on cache for performance: Many HSM products use caches to mask storage performance deficiencies. Since it’s too expensive to cache all data sets in use, application performance can be unpredictable because a cache cannot tell which data is most important.
- Tiering focus is one-way: Most HSM solutions assume the data lifecycle is a steady progression down storage tiers. While many solutions enable data to be recovered from an archive, recovery can be manual (consuming time and resources), or require creative policy building for specific use cases.
DataSphere provides the ability to manage data throughout its lifetime to finally achieve the goals of HSM. It automates the movement and placement of any data to deliver IT-defined access rates (high, medium, low, slow), availability (levels of accessibility), durability (protection against data loss), and security (protection against unauthorized access) from the time data is created until it is retired to long-term storage, including the cloud.
AUTOMATE CLOUD DATA MOVEMENT AND PLACEMENT WITH OBJECTIVES
It is well documented that 75% or more of data stored in a business sits idle, consuming valuable space on expensive storage. At the same time, data is no longer being deleted as it is impossible for the admin to determine the value of the data. DataSphere remedies this by moving older, unused data to low cost cloud storage options driven by objectives, which in turn frees capacity on higher value storage. Importantly, moving data to the cloud with DataSphere does not take the data o ine.
DataSphere automates the movement and placement of data throughout its lifecycle, across storage from any vendor, while integrating seamlessly with multiple clouds. This enables enterprises to use the cloud to keep unused data accessible while preserving storage capacity and performance for more active files.
With the cloud seamlessly integrated as an active archive, data remains visible and accessible by applications, even when data is in motion. Applications do not need to be modified to use the data regardless of whether it is archived in on premise object storage or in a public cloud. With DataSphere, companies can move snapshots to a cloud tier and free up large amounts of capacity. The data is still accessible by previously dormant applications, and DataSphere will move the data back from the cloud on to a more appropriate store if objectives allow. DataSphere will automatically provision data to the appropriate storage and automatically move data if resources become contended.
IT can also respond with agility to unexpected changes. If an application requires a change with respect to performance or protection, IT can simply change the objective and let DataSphere handle the rest. This provides enterprises with the flexibility to adapt to modern data demands that may require multidirectional movement between tiers. DataSphere insulates end users from any performance or accessibility impact during data movement.
These simple and intuitive polices make life easy for IT, even when adopting a new resource like the cloud. Capacity planning becomes a breeze, and admins no longer need to think about what tiers they need to create and how much capacity they think the tier will require. Admins can view individual and aggregate resources and deploy additional performance or capacity as needed in minutes. DataSphere automatically rebalances data across new resources, eliminating the need to purchase and deploy more capacity than truly needed.
STREAMLINED CLOUD ADMINISTRATION
Storage administrators can create objectives for data performance and protection that they can make available to application owners, who can then assign those policies to their data, as they see fit. This can simplify IT’s job immensely, as DataSphere automatically handles the initial placement and subsequent movement of data to meet objectives all the way to and back from the cloud.
AUTOMATE AGILITY AND RESPOND TO CHANGING BUSINESS NEEDS
DataSphere gives petabyte-scale enterprises the ability to automate the movement of data from creation to archival, including the integration of public clouds as an active archive. DataSphere also automates many core management tasks, making it easy for companies to maximize storage efficiency and cost savings, while ensuring the performance and protection required to meet service levels.
Use the Primary Data TCO Calculator on our web site to see how much DataSphere might save you.
Connect With Us
Enter your name and email to receive news and updates from Primary Data. Fields marked with an * are required: