How We Do It.

The Technology

The DataSphere architecture simplifies operations for customers while helping enterprises seamlessly align the right data to the right storage at the right time. From its metadata engine to its ability to move live data to its high-performance scale-out NAS technology, DataSphere is designed to finally overcome the limitations of traditional storage to help petabyte-scale enterprises respond to changing business demands.

Your Most Valuable Asset: Metadata

  • Metadata and machine learning unlocks intelligent data management: As a metadata engine, DataSphere is designed to separate and offload the architecturally rigid relationship between applications and where their data is stored. Offloading metadata access with DataSphere delivers predictable, low-latency metadata operations by guaranteeing that metadata operations do not get “stuck” in the queue behind other data requests.
  • Live mobility frees your data from gravity: Rather than having to wait for sequential operations to complete, DataSphere can leverage parallel access with the latest optimizations of the standard NFS v4.2 protocol. Leveraging NFS v4.2 significantly speeds up metadata and small file operations by requiring less than half of the protocol-specific network round trips compared to NFS v3.
  • Parallel access across storage maximizes data performance: DataSphere collects metadata of a client’s data access and how it experiences storage (IOPS, latency, bandwidth, and availability). Intelligent analytics are then applied against business objectives and data is moved, as needed, to achieve desired levels of performance, cost, and reliability. DataSphere makes real-time automated decisions for data placement, moves data without disruption in order to overcome or prevent outages, and maintains alignment to service level agreements or objectives.

Your Most Valuable Asset: Metadata | Primary Data

How Machine Learning Makes Intelligent Decisions for Your Objectives

DataSphere provides clients access to billions of files across multiple storage devices in parallel. Performance is accelerated by balancing I/O load at a file level across the storage devices and by offloading the metadata tasks from the storage devices so they are free to serve more data.

DataSphere continuously collects telemetry in the form of metadata to learn the IOPS, bandwidth and latency from each client, for each file accessed. This provides a rich understanding about how storage devices are performing; which files are active and if application data is out of alignment with objectives. If data falls out of alignment, DataSphere automatically moves data to the right storage tier without disruption to running applications.

DataSphere makes real-time automated decisions for data placement, moves data without disruption to overcome or prevent outages, and maintains alignment to service level agreements or objectives. DataSphere manages data using DSX Data Portals and Data Movers. Data can flow across heterogeneous storage types, including the cloud. Data inflight remains accessible to applications during the movement from one store to another.

Intelligent Decisions for Your Objectives | Primary Data

How do We Free Data from Its Gravitational Pull?

A Global Namespace Makes Storage More Efficient

DataSphere pools multiple physical storage resources and presents a virtualized single logical namespace to clients. The global namespace greatly simplifies management, while using open standards-based protocols to easily connect clients to storage.

Intelligent Decisions for Your Objectives | Primary Data

DataSphere Extended Services (DSX)

DataSphere makes real-time automated decisions for data placement, moves data without disruption to overcome or prevent outages, and maintains compliance to service level agreements or objectives.

DataSphere manages data using DSX Data Portals and Data Movers, flowing data across heterogeneous storage types, including the cloud. Data inflight remains accessible to applications during the movement from one store to another. Data Stores extend DataSphere’s global namespace to block storage, including HDDs, SSDs, and NVMe.

Intelligent Decisions for Your Objectives | Primary Data

Add Intelligence to Data Management with Objectives

To dynamically and automatically respond to evolving business demands, DataSphere uses objectives that define an application’s data performance, cost, and reliability goals for data’s operational life. Managing by IT-defined objectives ensures the right data is on the right storage at the right time.

  • Get started quickly with an easy to use workflow to create tiers and objectives, expand with deep control capabilities with Objective Expressions
  • Data placement and data mobility can be entirely controlled by Objective Expressions
  • Performance telemetry is delivered as part of NFS v4.2 protocol every 15 seconds, with no additional client software required.

Intelligent Decisions for Your Objectives | Primary Data

Tier Data on the Right Storage at the Right Time

Maximize the Unique Features of Each Storage Resource

Thanks to a wide range of capabilities across performance, protection, and price, today’s IT professionals have more choice than ever before when selecting a storage type or vendor to meet an application or business need. Given the storage diversity found in most petabyte-scale enterprises today, the challenge for IT is quickly becoming how to ensure the right resource is serving the right data at the right time.

Flash in a server is an ultra-fast storage memory that can be attached via PCI-Express to serve as a very low latency, high IOPS, direct-attached storage tier, but it comes at a premium cost. Network-attached flash in an array also brings more performance to primary storage at a high cost. Classic shared or networked NAS and SAN storage are known for high reliability and capacity, and cloud storage fulfills expandability at low costs, but with lower access or near-line performance for cold data and archiving functions. Each of these storage types provide a unique price-performance capability with different levels of data reliability, and the choices get even broader when considering emerging technologies.

No matter which storage technologies your enterprise is using today, or adding in the future, DataSphere’s global namespace allows you to define the attributes of each resource and automatically align the right resource to meet your objectives.

Tier Data on the Right Storage at the Right Time | Primary Data

Increase Performance: Out-of-Band Operation and Data Access

In modern enterprise architectures, out-of-band management is the separation of administration (control) from application data. DataSphere leverages this architectural approach because of its many significant advantages over in-band or gateway based solutions. These include:

  • Native Data Access: Applications do not see increased latency because they directly access storage devices containing the data, rather than passing through a gateway or agent.
  • Scalable: File-granular load balancing with parallel access across multiple stores increases application performance.
  • Fast Metadata Performance: With dedicated metadata servers, metadata operations are never stuck behind data payloads and are always executed with low-latency performance. Focusing only on metadata without the burden of data requests allows DataSphere to support billions of data objects within a single namespace.
  • Virtualizing the View of Data: DataSphere creates a global namespace with a unified view of application data on top of heterogeneous storage.
  • Data Orchestration: By virtualizing the view of data, DataSphere gains the ability to move data between different storage tiers without application disruption, according to IT-defined objectives for performance, reliability, and availability.
  • Storage Agnostic: Operating out-of-band enables DataSphere to support any storage type, from any vendor, for unprecedented choice and flexibility in meeting business needs.
  • Highly Reliable: With DataSphere, data integrity continues to be fulfilled by the designated storage devices. If you have invested in a reliable, redundant storage system, you continue to get all its benefits. DataSphere knows the capabilities of the storage and will place data on systems that can meet data policies.

By separating the control plane from the data plane, DataSphere can achieve enterprise-class, mission critical reliability and scalability while ensuring performance even while applications are running and data is in motion.

Tier Data on the Right Storage at the Right Time | Primary Data

Avoid Agents with Common Storage Protocols

Standards-based protocols simplify access, adoption, and customers use, and also provide a vendor-independent, future-proof method to access client data. DataSphere supports several standards-based protocols to capture telemetry on how an application uses its data or to virtualize a client's view of its data. This avoids the common pitfall in solutions that require the installation of an agent or a custom driver, which is the difficulty in managing updates over time across thousands or tens of thousands of clients.

DataSphere supports two major protocol families for accessing managed data: NFS (Network File System) and SMB (Server Message Block). Any data under management, no matter what type of storage it is stored on, can be accessed using either protocol.

DataSphere does not require the users to store their data using the same protocol as the protocol used for access. For example, SMB access to a file is fully supported when the file is stored using NFS v3 or even Amazon S3 object protocols. The DataSphere Extended Services (DSX) nodes in a DataSphere environment act as the protocol access points for NFS v3 and SMB clients, allowing the environment to scale with as many access points as required.

For native support, NFS v4.2 clients can connect directly to multiple storage devices and scale to higher I/O performance with parallel file access. Native access with NFS v4.2 also enables additional enhancements, such as offloaded file cloning and space efficiency improvements. DataSphere supports backend storage volumes using either NFS v3, block or Amazon S3 compatible protocols.

NFS v4.2 – Primary Data Drives Industry-Standard Protocols

For several years now, Primary Data has been the top contributor to the open-source NFS protocol and playing the leadership role has been key to the success of our technologies. Driving innovation in the 4.2 release has enabled some important features including:

  • Live data migration
  • Free performance telemetry from clients
  • Enhanced security
  • Enterprise support from most of the major Linux distributions
  • Improved network efficiency makes this the fastest NFS release yet

Software Subscription | Primary Data

Figure: NFS contributions since October 2013, report generated June 22, 2017

Contact Form

Channel Partner