Cascade Architecture

Topics


Overview

Cascade cloud storage software is deployed on servers called data nodes. As a cluster, these nodes virtualise and federate the back-end capacity supplied by Cascade storage, block, file or public cloud object sources. Each data node in the Cascade system runs a complete software stack made up of the appliance operating system and the Cascade core software. All data nodes run an identical software image to ensure maximum reliability and fully symmetrical operation of the system. Cascade data nodes can serve as both an object repository and an access point for your applications. They can also take over the functions of other nodes in the event of node failure.

 

A Cascade system is inherently a distributed system, spreading key functions, such as metadata management and storage placement, across all nodes. To process incoming client requests, software components on one node interact with components on other nodes through a private back-end network. All runtime operations are distributed among the access nodes. No single node becomes a bottleneck as each node bears equal responsibilities for processing requests, storing data and sustaining the overall health of the system. They work cooperatively to ensure system reliability and performance.

 

The Cascade distributed processing scheme allows it to scale linearly to accommodate capacity growth or more application clients. When a new node is added to the Cascade system, the cluster automatically integrates that node into the overall workflow without manual intervention.

 

By incorporating support for off-site public cloud storage targets, Cascade encourages adoption of hybrid cloud configurations, which can lower the costs of storing older less-active data. By trading some performance and latency, you can gain near instant capacity elasticity while retaining a single point of management for both new and old data.

 

Cascade Architecture:


Shared Storage

Shared storage allows you to make hardware investments based on application need rather than an artefact of architecture design. For example, as the number of your clients grow, there is generally a proportional increase on the Cascade workload. Cascade data nodes are scaled to linearly improve small object performance and large object throughput, or increase CPU power available to Cascade search and data services.

 

Alternatively, you may decide to tackle a new application that needs to store larger media or video files. In this case, Cascade is not driving a lot of new I/O as much as it is directing many large files. Additional Cascade Storage can quickly add several petabytes to your virtualised storage pool. Cascade data nodes and Cascade Storage are networked through a combination of 10Gb Ethernet ports and VLANS in a loosely coupled architecture. Cascade Storage is particularly well suited for storage scaling.


Consistency

Cascade spans multiple geographical sites. When considered from a single site perspective, Cascade design favours consistency and availability as defined by Brewers CAP theorem1. The theory postulates that a distributed system of nodes can satisfy at most two of these three properties:

 

  • Consistency: All nodes see the same data at the same time.
  • Availability: Every request receives a response about whether it was successful or failed, guaranteed.
  • Partition tolerance: The system continues to operate despite arbitrary message loss or failure of part of the system.

 

Within a single site, Cascade will never return stale data, which is useful for applications that require strong data consistency. While Cascade can handle many forms of partition failure, it does require that a majority of the Cascade data nodes (total nodes/2+1) be available and communicating with each other in order to take write requests. Reads can be processed with as few as one surviving node.

 

When a Cascade deployment spans two or more sites, supporting an active-active global namespace, Cascade favours data availability and partition tolerance over strict consistency. This is the favoured model for public cloud deployments and is referred to as an eventually consistent model. In response to a whole site outage, Cascade may deliver data from a surviving site that was not yet consistent with the failing site. This effect is a result of asynchronous replication, but minimised by Cascade’s global access topology, which performs hyper-replication of metadata.

 

Hyper-replication is possible because each Cascade system maintains a separate structure for object data versus object metadata. When an application writes an object, metadata is stored in a separate but parallel branch of Cascade’s internal file system. This physical separation enables many unique capabilities. One of these is improved data consistency between sites as Cascade prioritises metadata replication over replicating the actual object. Intersite consistency is less affected by network speed or object size, and participating sites are more quickly aware of new or modified objects. A physically separate structure for metadata is also key to Cascade search, tiering and fencing capabilities. To find out more, see Object Storage Architecture. When all sites are optimal, each Cascade instance can respond to I/O requests with local resources, and remain unaffected by the speed or latency of the WAN interconnecting sites.


Capacity

Cascade supports configurations exceeding 2.9 exabytes. The full potential of Cascade’s scalable storage capabilities incorporates the use of Cascade's multitenancy management, delegation and provisioning features. There is no need to prepurchase or reserve storage for specific applications as you can grow capacity incrementally as demand increases. The available service options appeal to data usage patterns, such as versioning and compliancy. Automated tiering plans lower the costs of carrying older data.


Administration

The following table sets out the key administrative roles and features.

 

Roles/Features

Description

System

These CCL management roles cannot read or write data, but they do control how physical storage resources are virtualised and monitored. They design service plans to govern data placement, how it ages and how it is retired. These managers prioritise system services, create tenants and delegate control over capacity using a quota system.

Tenants

Provide management and control isolation at an organisational level, but are bounded by policies set forth by the system-level administrator. A tenant typically represents an actual organisation such as a company or a department within a company that uses a portion of a repository. A tenant can also correspond to an individual person. A Cascade cluster can have many Cascade tenants, each of which can own and manage many namespaces.

Tenant-level Administration

There is a separate client administrator for each tenant. They can:

  • Create and manage namespaces for application use at a micro level
  • Control namespace capacity through quotas, define user membership, access protocols and service policies, and
  • Further define which users can read, write, delete, or search a namespace.

 

The Cascade system-level administrator controls the number of namespaces each Cascade tenant can create.

Namespace

This is the smallest unit of Cascade multitenancy capacity partitioning. Namespaces are thin provisioned, and carved from the common virtualised storage pool. They provide:

  • The mechanism for separating the data stored by different applications, business units or customers. Access to one namespace does not grant a user access to any other namespace. Objects stored in one namespace are not visible in any other namespace.
  • Segregation of data, while tenants provide segregation of management.

 

Applications access Cascade namespaces through Cascade REST, S3, Swift, WebDAV, CIFS (SMB 3.1.1), NFS v3 and SMTP protocols. These protocols can support authenticated and/or anonymous types of access. When applications write a file, Cascade conceptually puts it in an object container along with associated metadata that describes the data. Although Cascade is designed for WORM access of information, namespaces can be enabled with versioning to permit write and re-write I/O semantic.

 

Cascade hierarchy:

 

Dashboard

Using the web-based overview dashboard, the system-level administrator can quickly assess Cascade cluster status. The single pane summary displays color-coded health alerts, data services, major events and the total capacity consumed by all tenants and namespaces. Use one-click to drill down into any of the 500+ alerts or events and electively choose to enable email notifications or system logging (syslog).

 

The overview dashboard for a tenant administrator displays a summary of events, and the sum total capacity consumed by all its defined namespaces. The panel provides one-click drill down into any events, which are also forwarded to an email address.

Namespace Configuration Templates

Each tenant administrator is delegated with authority over an allotted capacity. These templates help them create namespaces, configure permitted protocols, set capacity quotas and policies for retention, disposition, indexing and search. Optionally, configuration can be carried out through REST API or Microsoft PowerShell utilities.

Enterprise Mode

The tenant administrator is permitted to create namespaces with an enterprise retention policy. While normal users cannot delete objects under enterprise retention, a tenant-admin can be enabled to perform audit-logged privileged deletes.

Compliance Mode

The tenant administrator can be permitted to create namespaces with a compliance retention policy. Objects under compliance retention cannot be deleted through any user or administrative action until their expiry date. Industry-specific regulations sometimes mandate immutable compliance modes to protect electronic business records. Utilise this mode with care, as experimenting can create permanent undeletable content.


 

 

The page cannot be found

The page you are looking for might have been removed, had its name changed, or is temporarily unavailable. Please make sure you spelled the page name correctly or use the search box.