Cluster Maintenance

10 minute read

Intro

This post teaches you how to monitor the cluster’s health, manage cluster resources, and perform ongoing maintenance and troubleshooting to keep the cluster in top working order.

Monitor and Manage the Cluster

Monitor the Cluster

Monitoring Tools

Tools

The Cluster can be monitored using MapR tools like MCS, CLI and external tools like Ganglia and Nagios.

Nagios

Nagios can monitor network and host resources
Configuration script generator
Conduit for alarms to go through standard escalation
Integrates other sources of alarms aside from MapR builtin alarms.

Ganglia

Monitors Clusters and grids
Scalable, distributed system monitoring tool
Unlike Nagios, Ganglia is not used to raise alarms but to report cluster health.
Show cluster status in a given time period
Install on CLDB nodes to gather metrics from CLDB like cpu, memory, active server nodes and volumes created on the node and overall cluster.

Maintenance Tasks

Overview
- Regular maintenance
  - Replace disk(s)
  - Maintain nodes
  - Upgrade software
  - Clean out log files
- Design changes
  - Make cluster HA
  - Expand cluster
  - Balance services
Sample Schedule Shown here is a possible Monitoring and Maintenance Schedule :

Configure and Respond to Alarms

MapR Alarms

About Alarms

Alarms are raised to notify users of a problem
When an alarm is raised alerts are posted to the MCS
- Can also be sent via email
Configure custom email addresses for specific alerts to accountable entities.

Alarm Classifications

Alarms can be classified into :

Cluster Alarms :
- Cluster Space
- Licensing Issues
- CLDB problems.
Volume Alarms :
- Specific to a Volume.
- Snapshots and Mirroring failures
- Under replicated and Unavailable data
- Volume topology problem.
Node Alarms :
- Specific to a Node
- Disk failures
- Time-skew problems
- Failed Services

$ maprcli alarm names

View Alarms

Alarms can be viewed either in Dashboard or MapR CLI. To see all alarms:

$ maprcli alarm list

View Alarm Details

Vie alarm details in Dashboard and investigate the problems , restart services or nodes affected.

Configure Alerts

Overview

Before alarms are set configure SMTP and user email addresses.

Set up SMTP
Configure email addresses of cluster users
Set up custom alerts

MCS and CLI

Standard alerts send email to the entity involved like Volume Quota Alarms to Volume Accountable Entity. Additional address can be defined for alternative notifications to other target users.

Test Alerts

It’s a good idea to test alarms because Email address and SMTP are not configured when alarms were defined.

Raises the specified alarm
Description appears in maprcli alarm list output and MCS

Verify Configuration

Respond to Alarms

Determine Urgency

Alarms differ in urgency
- Some resolve on their own
- Some are early warnings
Determine:
- Underlying cause
- What action is needed
- How soon action should be taken
Many alarms are self-explanatory
Refer to documentation for details

Example: Normal Operation

CLUSTER_ALARM_UPGRADE_IN_PROGRESS

Normal operation
Alarm will clear when upgrade is done

Example: Schedule Action

ALARM_ADVISORYQUOTA_EXCEEDED

Urgency depends on:
- Rate of data growth
- Differential between advisory and hard quotas
- Which volume or AE

Example: Degraded Performance

NODE_ALARM_SERVICE_*_DOWN

May impact a single node, or entire cluster
Criticality varies by service

Example: Urgent Situation

NODE_ALARM_TIME_SKEW

Alarm raised for time skew
- Maximum differential: 20 seconds
- Recommended differential: <500 ms
System time critical for ZooKeeper functionality
- Use NTP

Balance Cluster Resources

As the cluster keeps growing and changing it is important to balance location and replication of data. There are two balancers available :

Disk balancer
- Redistributes data
Role balancer
- Changes container replication role

Balancing starts 30 minutes after CLDB startup, by default

change with cldb.balancer.startup.interval.sec

Disk and Role Balancers

Disk Balancer

Moves containers between storage pools
Ensures disk usage on nodes is similar

Disk balancer is particularly useful when new nodes or rack is added to matured cluster.

Disk Balancer Details

There are two primary disk balancer settings :

Threshold defines a percentage of utilization at which a storage pool is eligible for rebalancing. Can be set between 10%-99%
Concurrent disk rebalancers define percentage of data can be actively rebalanced at a point of time. This prevents rebalancing from disrupting normal cluster operations. Can be set from 2%-30%

Role Balancer

Changes replication roles of containers to ensure all nodes have equal share of master containers.
Distributes master containers among nodes
Evenly distributing replication roles balances network bandwidth and also spreads load for writes across cluster.
Write operation on MapRFS and Read on MapRDB are performed on master container.So role balancing also distributes table reads.

Role Balancer Details

There are two primary role balancer settings

Enabling Balancers Through the CLI

Balancing: CLI

Turn on:

maprcli config save -values {"<parameter>" : "<value>"}

Adjust:

maprcli config save -values { "cldb.balancer.disk.paused" : "0"}
maprcli config save -values { "cldb.balancer.role.paused" : "0"}

Check values:
```
maprcli config load -json | grep balancer
```
Balancer Parameters

List of some of the disk and role balancing parameters

Manage Logs and Snapshots

Manage Snapshots

Scheduled snapshots expire automatically
Manual snapshots must be deleted
- Check regularly (at least monthly)
Use schedules if possible

Remove Snapshots

maprcli volume snapshot list -columns snapshotid -filter [expirytime=="0"]
maprcli volume snapshot remove -snapshots <ID1, ID2, ID3 ...>

Manage Log Files

Overview
- Some managed automatically
  - Job logs
  - Data audit logs (if enabled)
- Others must be cleaned out manually
  - maprcli audit logs
  - Cluster audit logs (if enabled)
Retention Times
- Retain longer if needed for troubleshooting
  - example: job logs with local logging deleted after 3 hours by default
- Shorten retention time if logs get too large

Add and Remove Services

Removing Services

The following is the procedure for removing a service from a node.

Adding services

Adding services is even easier than removing as follows :

Disk and Node Maintenance

Replace a Failed Disk

Disk Failure

Replacing failed disks is a normal part of cluster maintenance. Factors impacting disk failure rate:

Number of Disks
Disk Reliability
Data Center Environment

What Happens When a Disk Fails?

Disk Failure Information

Look at the Disk Failure Report and perform recommeded recovery methods.

Disk Failure Report
Recovery Methods
Alarms Raised
- VOLUME_ALARM_DATA_UNDER_REPLICATED
  - data under available disk is replicated but doesn’t meet minimum replication requirements.
  - automatically re-replicated from another copy and alarm cleared
- VOLUME_ALARM_DATA_UNAVAILABLE
  - If the only copy of data was on failed disk or all disks with that data failed.
  - run /opt/mapr/server/fsck to attempt to recover
  - Can bring storagepool online if fsck clears disk issues.
  - risk of data loss

Caution : Risk of data loss when running fsck -r

Replace a Failed Disk: MCS

To replace a failed disk in MCS navigate as follows :

Locate the Failed Disk
Remove the disk from MapR-FS and physically replace failed disk
Add New/Repaired Disks
Remove the failed disk report when replacement is complete.
Verify New Storage Pool generated for new disks.

Replace a Failed Disk: CLI

To replace a failed disk in CLI perform following commands :

Run $ maprcli disk remove
Note drives removed; physically replace failed drive
Remove failed disk report log
Run $ maprcli disk add with list of disk being replaced.

Add and Remove Disks

Best Practices

Add disks in groups to form a storage pool
- If you add a single disk a storage pool of single disk is created.
- If you add multiple disks MapR will automatically generated storagepool(s) from them.
Add disks with homogenous sizes and speeds
- Optimize performance
- Maximize capacity
If you add heterogenous disk the performace and capacity is limited by slowest and lowest of the pool(randomly grouped).
Add disks in groups by size ( or performance like SSD vs HDD); this ensures that only disks of the same size go into a storage pool

Procedures: Adding and Removing Disks

Add Disks

After physically installing the new disks we can add them to MapRFS using MCS or CLI as follows :

Remove Disks

Removing disk removes the entire storage pool(all the other disks in the pool too)
Removed disks no longer contain usable data
Make sure data exists elsewhere (storagepool replicated). GUI will warn if the data is not replicated elsewhere.

Perform Node Maintenance

Decommissioning vs. Maintenance

Decommissioning : Permanently remove from cluster
- Failed services
- Disks unrecoverable
- corrupted filesystems or OS
Maintenance : Plan to perform an action but keep the node as part of the cluster
- Upgrade the OS
- Add or replace disk drives
- Diagnose problems

Decommission a Node

Move node to the /decommissioned topology
- Doesn’t need to be named that but move the node to a topology outside of /data
- Wait for replication to finish
- Verify volume information removed
Stop Warden and ZooKeeper
Run maprcli remove node
Remove installed packages

Clean Up

Remove /opt/mapr directory
Remove MapR core files in /opt/cores
If removed node has CLDB or ZooKeeper, run configure.sh on all other nodes to maintain HA
- configure.sh –C if CLDB node changed
- configure.sh –Z if ZooKeeper node changed

Perform Maintenance on a Node

Maintenance Overview

No need to decommission
Run maprcli node maintenance
Replication starts after specified –timeoutminutes
If multiple nodes, perform sequentially
Don’t offline more than half of the ZooKeepers
Don’t offline all CLDB or ResourceManager nodes

Perform Maintenance

Issue node maintenance command:

$  maprcli node maintenance -nodes <node> –timeoutminutes <minutes>

Stop Warden
Stop ZooKeeper (if installed on node)
Perform maintenance and reboot

Add Nodes

Expand a Cluster

When a new node is added to existing cluster you need to consider

Node Topology
How Management services are dispersed throughout the cluster?
How data populates?

Node Topology

Rack is a typical failure domain as some outages can take down entire rack.
Rack can support 15+ nodes, you should consider spreading nodes evenly across three racks.
By default three replicas of each data container, having three racks allow three sub-topology and containers can be replicated across three racks.
Use fewer nodes per rack to distribute cluster nodes

Management Services

If you have multiple management services on a node, it’s recommended to move them to the newly added node
Recommended to put NFS service on all the nodes.

Data Considerations

New data written randomly, weighted toward new nodes so gradually data is distributed.
Alternatively distribute data more quickly with disk balancer after a new node is added.

Add Nodes

Here is the overview of steps performed when adding a new node. Refer MapR documentation for more details

Prepare

Set default topology to /newnodes
- Optional
Verify node specifications
Create MapR user
Add MapR repository
Install and Configure
Add MapR package key
Install MapR service packages

Run configure.sh

$  configure.sh -C <cldb nodes> -Z <zk nodes> -N <cluster name> -M7

Start Warden on new node
Finish Up
Format and add disks
Set node topology
Mount cluster
- Optional
- Only if NFS installed

Troubleshooting

Troubleshooting Different Problem Types

How Do You Know There’s a Problem ?

The three common give-aways are :

Alarm raised
Performance issues
Jobs failing

Resources

There are several resources to consult during troubleshooting.

Log files
MapR documentation
MapR Answers
Apache forums and websites
MapR Support (with Converged Enterprise Edition)

Log File Locations

This table lists some of common log types and their default locations. Some are found in local filesystem and while others in MapRFS

Log File Retention

Many logs expire automatically, so its possible for long running jobs logs are deleted before completion. You can change the retention times.

Troubleshooting Scenarios

Service Not Running

Typically corresponds to a raised alarm
Warden will attempt start or restart 3 times
Each service has its own logs in /opt/mapr/logs
- Don’t forget to check Warden logs

Service Problems

Open two terminal windows
Window 1:
```
$ service mapr-warden stop
```
Window 2:
```
$ tail –f /opt/mapr/logs/warden.log
```
Window 1:
```
$ service mapr-warden start
```

Cluster Startup Issues

ZooKeepers up?
```
$ service mapr-zookeeper qstatus
```
Review warden.log and cldb.log
Get maprcli node cldbmaster to succeed
Troubleshoot individual services

Data Not Available

Is affected volume node-specific?
- /var/mapr/local/<node name>
- Is node offline?
Failed disks and nodes
Permission problems

Jobs Not Running

Start with the YARN job logs
- command line
- ResourceManager page (<IP address>:19888)
Permissions problems
Resource problems
Recommended to run Yarn Log Aggregator to retain logs for long time centrally in MapRFS.

Degraded Performance

Check /opt/mapr/logs/mfs.log-3
- Failures communicating with the CLDB nodes or other servers?
Check node resources
- CPU, memory, swap space, disk space, network
Isolate management services

Use Support Utilities

Tools

Core Files

Copies of contents of memories when certain anamolies are detected.
Located in /opt/cores
- Name includes service involved
An alarm is raised when a core file is created
No immediate action required
- Service will automatically restart
Forward to MapR Support when possible

`fsck`

Find/fix file system inconsistencies
- /opt/mapr/server/fsck
Repair storage pools after disk failure
Verification or repair
- Use -r for repair

Caution: Using fsck -r can result in loss of data. Contact support first!

`gfsck`

Find/fix cluster/volume/snapshot inconsistencies
- /opt/mapr/bin/gfsck
Scans and repairs a cluster, volume, or snapshot
Typical process
- Take storage pool offline
- Run fsck on storage pool
- Bring storage pool back online
- Run gfsck on affected cluster/volumes/snapshots

`mapr-support-dump.sh`

Collects information/logs from a single node

$  /opt/mapr/support/tools/mapr-support-dump.sh

Output at /opt/mapr/support/dump
- Creates a .tar file

`mapr-support-collect.sh`

Collects information/logs from entire cluster

$  /opt/mapr/support/tools/mapr-support-collect.sh

Output at /opt/mapr/support/collect
- Uses SSH and SCP
- Creates a .tar file

Share on

Twitter Facebook Google+ LinkedIn

Abhinav Yalamanchili

Intro

Monitor and Manage the Cluster

Monitor the Cluster

Monitoring Tools

Tools

Nagios

Ganglia

Maintenance Tasks

Configure and Respond to Alarms

MapR Alarms

About Alarms

Alarm Classifications

View Alarms

View Alarm Details

Configure Alerts

Overview

MCS and CLI

Test Alerts

Verify Configuration

Respond to Alarms

Determine Urgency

Example: Normal Operation

Example: Schedule Action

Example: Degraded Performance

Example: Urgent Situation

Balance Cluster Resources

Disk and Role Balancers

Disk Balancer

Disk Balancer Details

Role Balancer

Role Balancer Details

Enabling Balancers Through the CLI

Balancing: CLI

Balancer Parameters

Manage Logs and Snapshots

Manage Snapshots

Remove Snapshots

Manage Log Files

Add and Remove Services

Removing Services

Adding services

Disk and Node Maintenance

Replace a Failed Disk

Disk Failure

What Happens When a Disk Fails?

Disk Failure Information

Replace a Failed Disk: MCS

Replace a Failed Disk: CLI

Add and Remove Disks

Best Practices

Procedures: Adding and Removing Disks

Add Disks

Remove Disks

Perform Node Maintenance

Decommissioning vs. Maintenance

Decommission a Node

Decommission a Node

Clean Up

Perform Maintenance on a Node

Maintenance Overview

Perform Maintenance

Add Nodes

Expand a Cluster

Node Topology

Management Services

Data Considerations

Add Nodes

Prepare

Install and Configure

Finish Up

Troubleshooting

Troubleshooting Different Problem Types

How Do You Know There’s a Problem ?

Resources

Resources

Log File Locations

Log File Retention

Troubleshooting Scenarios

Service Not Running

`fsck`

`gfsck`

`mapr-support-dump.sh`

`mapr-support-collect.sh`