WHAT IS DSX?
Primary Data’s DataSphere Extended Services (DSX) is the data workhorse for the DataSphere Metadata Engine. DSX performs data mobility, connects to cloud storage and provides access for legacy clients. Installed on physical or virtual machines, DSX instances are deployed as a scale-out architecture and support several key functions to extend data orchestration across the storage infrastructure and into the cloud.
To ease customer adoption, DataSphere’s Metadata Engine and DSX features the use of open- source industry standards that are well understood by customers and supported by all major storage vendors. To that end, Primary Data has made significant contributions to the Linux kernel and is the top contributor to NFS since 2013.
- Live data mobility for non-disruptive tiering
- Cloud connector for easy integration with S3 object storage
- Protocol translation for legacy compatibility
- Data Store for connecting in-server storage to the broader infrastructure
- Scale-out to increase performance across storage infrastructure
- Performance Acceleration up to 60% better response time with 5X more workloads
HOW DOES IT WORK?
DSX instances are the bridge that connects the DataSphere Metadata Engine to customers’ clients, applications, and servers. DSX commu- nicates metadata to and from DataSphere, orchestrates the flow of data between different storage systems according to policies managed by DataSphere, allows server-attached storage to be globally accessed, and connects to the cloud for cold data tiering. To perform its many tasks, any DSX instance can be configured to take on various personalities:
When DataSphere’s analytics engine determines that to meet an administrator’s objectives data needs to move, the Data Mover personality of DSX gets to work. Data in-flight between storage arrays is still accessible (reading or writing), without disrupting applications.
Many environments still have legacy protocols such as NFS v3 or SMB 2.1/3.x. DSX does the protocol translation so that data on systems using these protocols are available in the DataSphere global namespace. DSX can also be federated to load-balance demands from multiple clients ensuring availability in case a DSX instance goes offine.
Connections to S3 compatible cloud-based storage can be managed by one or more DSX instances simultaneously. With the Cloud Connector, files are still available in the DataSphere global namespace and can be accessed by clients. DSX supports multiple secure (https) cloud end-points, facilitating cloud-to-cloud mobility. In-line data reduction techniques using content-based, variable block size deduplication and compression preserves network bandwidth and improves performance.
DSX can extend the use of block-based storage into DataSphere’s global namespace. The block storage can be anything from HDDs, SSDs to NVMe, enabling enterprises to easily integrate and maximize the value of existing and new storage resources by making them available to the broader infrastructure.
Figure 1. How the different DSX personalities interact with data inside the global namespace.
HOW IS IT LICENSED?
DataSphere DSX follows a software subscription per instance model.
DSX DATA PORTAL – PERFORMANCE ACCELERATION FOR LEGACY PROTOCOLS
For NFS v3 or SMB, DSX must operate in the data path as a Data Portal, performing both protocol translation and read caching. In this scenario, DSX scales-out to handle large client environments, increasing application performance across the entire infrastructure with its built-in read caching capabilities and automatically maintaining cache coherency amongst all nodes, ensuring that all reads are the most current data available.
DSX Data Portal really shines when scaled out to take full advantage of both read caching by DSX, and offloading all of the metadata transactions by DataSphere. The scenario outlined in Figure 3 shows we show how DSX can boost a typical storage array operating at a load of 150K IOPS to support 750K in application IOPS, a 5X total increase.
Figure 2. Assuming 80% read cache hit in DSX Data Portal, based on an 80/20 R/W IO mix.
The DSX Data Portal can improve the response time by up to 60% while having negligible impact on IOPS per individual node on a typical storage platform deployed today. Our benchmarks show that a single DSX Data Portal can nearly saturate a 10 GbE connection delivering impressive throughput and latency on both NFS and SMB protocols.
Figure 3: This scenario assumes an 80% cache hit rate. Typical storage performance numbers are an aggregate of hybrid-arrays deployed across enterprises today.
Note: The benchmark generating the workload is Flexible IO tester (FIO). Performance is measured using 16 x Linux or Windows 10 clients, each with 16 threads, on a whitebox server with 4x NVMe drives running Primary Data DSX Store software. Benchmarks are measured from a physical server, separate from the Data Portal itself, to more realistically mimic a production deployment scenario.
About Primary Data
Primary Data develops intelligence and automation software for enterprise data management across on-premises IT infrastructure and into the cloud. Its DataSphere platform combines metadata management and machine learning to move the right data to the right place at the right time across a global namespace, automatically and without application disruption. DataSphere makes hetero-geneous data stores simultaneously available to all applications, enabling enterprises operating at petabyte scale to easily manage billions of files, automate data migration, integrate the cloud, and scale out NAS performance while getting the most value out of infrastructure investments on a per-client, per-file basis. To learn more, visit us at PrimaryData.com, follow us on Facebook.com/PdDataSphere, or Twitter at @Primary_Data.