Data Access and Protection

5 minute read

Access Cluster Data

Access Data

Access Data with HDFS

Access Data with MapR-FS and NFS

Features

  • Files in MapR-FS are fully read-write
    • Read files as they are written
    • Overwrite files
    • Modify files in place
  • Direct-access NFS
    • Easily access data with Linux commands

Direct-Access NFS

  • Read Performance

Mounting the Cluster

  • Mount your cluster file system locally
    • /mapr/my.cluster.com/ by default
  • Use Hadoop jobs or standard Linux commands

Handling Data

Set up Client Access

Types of Client Access

There are several ways to setup client access with MapR using :

  1. Direct Access NFS™
  2. MapR Client
  3. MapR POSIX Client

Direct Access NFS

  1. Overview
    • Mount the cluster via NFS
    • Read and write cluster data directly
    • NFS mounting models:
      • Gateway
      • Collocation
      • Self-mounting
  2. Gateway

  3. Collocation

  4. Self-Mounting
    • Run Direct Access NFS on cluster node and mount via localhost
  5. Mounting Procedure
    • From the client machine:
      $ mkdir /mapr
      $ mount -o tcp,ver=3,nolock  <MapR NFS_node>:/mapr /mapr
      $ ls /mapr/<my cluster name>/<path_from_mfs_root>
      
    • With local NFS server:
      $ mkdir /mapr
      $ mount -o tcp,ver=3,nolock  localhost:/mapr /mapr
      $ ls /mapr/<my cluster name>/<path_from_mfs_root>
      

When mounting an local NFS server on a cloud based cluster use localhost or loopback address 127.0.0.1. Don’t use the nodes ipaddress as that actually routes the network out and back in.

MapR Clients

MapR Client

MapR client is available for non-cluster nodes.Instructions differ based on OS.

MapR POSIX Client

Along with regular access , Direct posix access is supported only in linux.Compared to native NFS posix

  • Data is compressed on client side before sending to cluster.
  • Uses a secure direct connection between client and cluster.

MapR POSIX Client Types

  • Starting MapR5.1 FUSE-based posix client are released instead of NFS loopback.
  • Basic
    • Up to 1 GB/sec
    • 10 free
  • Platinum
    • Up to 5 GB/sec (hyper-threading disabled)
    • Paid

Configure Virtual IP Addresses

What is a VIP?

  • A virtual IP address
  • A “pool” of static IP addresses
  • If the connection to the static IP address goes down, the VIP switches to another address in the pool

Static IP versus Virtual IP

Static IP

A client connects directly to the IP of the node.

VIP

Alternatively a client can connect to a virtualIPaddress which has access to a pool of nodes. When the node1 goes down the client will automatically connected to another node.

Making NFS Highly Available

Configure VIPs

  1. Configure VIPs through the MCS or through the CLI: $ maprcli virtualip add

  2. To create a VIP pool with desired subnet of network interface

  3. Add or remove a node to VIP pool

Review VIPs

  1. Review or Modify VIP Pools

  2. View VIP Assignments

  3. ifconfig for VIPs (we can see virtual cards created)

Summary

  • Each VIP will route connections to a single node
    • Client connections will remain on the same node until failover
  • VIPs are not load balancing
    • Set up round-robin DNS on your DNS server
    • Or use third-party load balancers instead
  • You can select which NICs are assigned to each VIP
    • If not all NICs are in the same subnet
    • If you wish to restrict VIPs assignments to specific NICs

Control Access to the Cluster

Who Can Do What?

Depends on permissions granted:

  • At the cluster level
    • MapR cluster operation permissions
  • At the volume level
    • MapR volume operation permissions
    • Root directory permissions (POSIX)
    • Access Control Expressions (ACEs) to access volume data
  • At the file/directory level
    • UNIX mode bits (POSIX)
    • MapR ACEs

Access Control Expressions (ACEs)

Working with Whole Volume ACEs

Snapshots

Understand Snapshots

Overview

What is a Snapshot?

  • A view of a source volume at a specific point in time
  • Use to:
    • Recover from user errors, data corruption, or program errors
    • Create static data sets for queries or auditing
    • better than replication or mirroring.

Characteristics of a Snapshot

  • Read-only point-in-time image of a volume
  • Doesn’t copy data but simply create pointers to the data
    • Tiny space penalty
    • Very quick(seconds)
  • Can scheduled or taked on demand
    • Persists until user-set expiration

Where Do Snapshots Go?

  • Top level of every volume
    • .snapshot directory
    • Exists even if empty
  • Not visible to ls –ltarh command
  • Accessed via NFS or Hadoop shell possible if we include .snapshot in the query path.

Snapshot Architecture

Use Case: Protection From Human Error

Situation

You know human error is one of the most common causes of lost or corrupted data, and you want to protect against loss of time and data.

Solution

Make frequent snapshots of key data. To control space requirements, set reasonable retain times.

Configure and use Snapshots

Take a Snapshot

Manual Snapshot: MCS

Manual Snapshot: CLI

  • Command line:
    $ maprcli volume snapshot create -volume <vol name> -snapshotname <snapshot name>
    
  • Manual snapshots do not expire automatically

Snapshot Schedules

Scheduled snapshots expire automatically. Normal, Important and Critical are pre-defined schedules.

  1. Normal

  2. Important

  3. Critical

  4. Custom Schedules

    • You can create custom schedules through
      • the MCS
      • or by using the CLI:
        maprcli schedule create -schedule '{"name":"Schedule-1","rules":[{"frequency":"once","retain":"1w","time":13,"date":"12/4/2018"}]}'
        

Schedule a Snapshot

Display Schedules

Recover Data From a Snapshot

  1. Create a Recovery Directory

  2. Copy Files With Hadoop Commands

  3. Use Linux commands, if the cluster file system is mounted.

Preserve a Snapshot

Preserving a snapshot removes its expiration date

Snapshot Maintenance

  • Scheduled snapshots are deleted automatically
  • Manual snapshots must be deleted manually (through the CLI or MCS)

Mirrors

Describe Mirroring

About Mirror Volumes

How Mirrors Are Created

Configure and Use Local Mirrors

Create a Local Mirror Volume

Create a Mirror Volume

Set a Mirror Schedule

Manually Start the Mirror

Promote a Mirror

  • Promoting a mirror changes it from Read-Only to a standard Read-Write Volume.
  • Typically a mirror is promoted when the source volumes remains unavailable.
  • Once a mirror is promoted any mirroring schedule associated with the volume is diabled.

Rules for Promoted Mirrors

Important!

  • To use promotable mirrors, the source and destination volumes must have the same name and mount point
  • Mirror volumes that are promoted to standard volumes cannot be written to until they are explicitly mounted
$ maprcli volume modify -name <name> -type rw
$ maprcli volume mount -name <name> -path <path>

Use Cascading and Remote Mirrors

Cascading Mirrors

  • A cascading mirror is a mirror of a mirror of source volumes.
  • Cascading is useful for deployment

  • Deployment Without Cascading Mirrors :

  • Deployment With Cascading Mirrors

Remote Mirrors

  1. Local vs. Remote Mirrors

  2. Remote Mirrors and Data Protection

Review of Mirrors

  • Local mirrors
    • Source and destination volumes in the same cluster
  • Cascading mirrors
    • Source and destination volumes are both mirrors
  • Remote mirrors
    • Source and destination volumes in different clusters

Leave a Comment