XMover Handbook

Installation

Install using uv (recommended) or pip:

uv tool install cratedb-toolkit

# Alternatively use `pip`.
# pip install --user cratedb-toolkit

Create an .env file with your CrateDB connection details:

CRATE_CONNECTION_STRING=https://your-cluster.cratedb.net:4200
CRATE_USERNAME=your-username
CRATE_PASSWORD=your-password
CRATE_SSL_VERIFY=true

Quick Start

Test Connection

xmover test-connection

Analyze Cluster

# Complete cluster analysis
xmover analyze

# Analyze specific table
xmover analyze --table my_table

Find Movement Candidates

# Find shards that can be moved (40-60GB by default)
xmover find-candidates

# Custom size range
xmover find-candidates --min-size 20 --max-size 100

Generate Recommendations

# Dry run (default) - shows what would be recommended
xmover recommend

# Generate actual SQL commands
xmover recommend --execute

# Prioritize space over zone balancing
xmover recommend --prioritize-space

Shard Distribution Analysis

This view is dedicating a specific focus on large tables.

# Analyze distribution anomalies for top 10 largest tables
xmover shard-distribution

# Analyze more tables
xmover shard-distribution --top-tables 20

# Detailed health report for specific table
xmover shard-distribution --table my_table

Zone Analysis

# Check zone balance
xmover check-balance

# Detailed zone analysis with shard-level details
xmover zone-analysis --show-shards

Advanced Troubleshooting

# Validate specific moves before execution
xmover validate-move SCHEMA.TABLE SHARD_ID FROM_NODE TO_NODE

# Explain CrateDB error messages
xmover explain-error "your error message here"

Commands Reference

analyze

Analyzes current shard distribution across nodes and zones.

Options:

  • --table, -t: Analyze specific table only

Example:

xmover analyze --table events

find-candidates

Finds shards suitable for movement based on size and health criteria.

Options:

  • --table, -t: Find candidates in specific table only

  • --min-size: Minimum shard size in GB (default: 40)

  • --max-size: Maximum shard size in GB (default: 60)

  • --node: Only show candidates from this specific source node (e.g., data-hot-4)

Examples:

# Find candidates in size range for specific table
xmover find-candidates --min-size 20 --max-size 50 --table logs

# Find candidates on a specific node
xmover find-candidates --min-size 30 --max-size 60 --node data-hot-4

recommend

Generates intelligent shard movement recommendations for cluster rebalancing.

Options:

  • --table, -t: Generate recommendations for specific table only

  • --min-size: Minimum shard size in GB (default: 40)

  • --max-size: Maximum shard size in GB (default: 60)

  • --zone-tolerance: Zone balance tolerance percentage (default: 10)

  • --min-free-space: Minimum free space required on target nodes in GB (default: 100)

  • --max-moves: Maximum number of move recommendations (default: 10)

  • --max-disk-usage: Maximum disk usage percentage for target nodes (default: 85)

  • --validate/--no-validate: Validate move safety (default: True)

  • --prioritize-space/--prioritize-zones: Prioritize available space over zone balancing (default: False)

  • --dry-run/--execute: Show what would be done without generating SQL commands (default: True)

  • --node: Only recommend moves from this specific source node (e.g., data-hot-4)

Examples:

# Dry run with zone balancing priority
xmover recommend --prioritize-zones

# Generate SQL for space optimization
xmover recommend --prioritize-space --execute

# Focus on specific table with custom parameters
xmover recommend --table events --min-size 10 --max-size 30 --execute

# Target space relief for a specific node
xmover recommend --prioritize-space --min-size 30 --max-size 60 --node data-hot-4

# Allow higher disk usage for urgent moves
xmover recommend --prioritize-space --max-disk-usage 90

zone-analysis

Provides detailed analysis of zone distribution and potential conflicts.

Options:

  • --table, -t: Analyze zones for specific table only

  • --show-shards/--no-show-shards: Show individual shard details (default: False)

Example:

xmover zone-analysis --show-shards --table critical_data

check-balance

Checks zone balance for shards with configurable tolerance.

Options:

  • --table, -t: Check balance for specific table only

  • --tolerance: Zone balance tolerance percentage (default: 10)

Example:

xmover check-balance --tolerance 15

validate-move

Validates a specific shard move before execution to prevent errors.

Arguments:

  • SCHEMA_TABLE: Schema and table name (format: schema.table)

  • SHARD_ID: Shard ID to move

  • FROM_NODE: Source node name

  • TO_NODE: Target node name

Examples:

# Standard validation
xmover validate-move CUROV.maddoxxxS 4 data-hot-1 data-hot-3

# Allow higher disk usage for urgent moves
xmover validate-move CUROV.tendedero 4 data-hot-1 data-hot-3 --max-disk-usage 90

explain-error

Explains CrateDB allocation error messages and provides troubleshooting guidance.

Arguments:

  • ERROR_MESSAGE: The CrateDB error message to analyze (optional - can be provided interactively)

Examples:

# Interactive mode
xmover explain-error

# Direct analysis
xmover explain-error "NO(a copy of this shard is already allocated to this node)"

monitor-recovery

Monitors active shard recovery operations on the cluster.

Options:

  • --table, -t: Monitor recovery for specific table only

  • --node, -n: Monitor recovery on specific node only

  • --watch, -w: Continuously monitor (refresh every 10s)

  • --refresh-interval: Refresh interval for watch mode in seconds (default: 10)

  • --recovery-type: Filter by recovery type - PEER, DISK, or all (default: all)

  • --include-transitioning: Include recently completed recoveries (DONE stage)

Examples:

# Check current recovery status
xmover monitor-recovery

# Monitor specific table recoveries
xmover monitor-recovery --table PartioffD

# Continuous monitoring with custom refresh rate
xmover monitor-recovery --watch --refresh-interval 5

# Monitor only PEER recoveries on specific node
xmover monitor-recovery --node data-hot-1 --recovery-type PEER

# Include completed recoveries still transitioning
xmover monitor-recovery --watch --include-transitioning

Recovery Types:

  • PEER: Copying shard data from another node (replication/relocation)

  • DISK: Rebuilding shard from local data (after restart/disk issues)

test-connection

Tests the connection to CrateDB and displays basic cluster information.

Operation Modes

Analysis vs Operational Views

XMover provides two distinct views of your cluster:

  1. Analysis View (analyze, zone-analysis): Includes ALL shards regardless of state for complete cluster visibility

  2. Operational View (find-candidates, recommend): Only includes healthy shards (STARTED + 100% recovered) for safe operations

Prioritization Modes

When generating recommendations, you can choose between two prioritization strategies:

  1. Zone Balancing Priority (default): Focuses on achieving optimal zone distribution first, then considers available space

  2. Space Priority: Prioritizes moving shards to nodes with more available space, regardless of zone balance

Safety Features

  • Zone Conflict Detection: Prevents moves that would place multiple copies of the same shard in the same zone

  • Capacity Validation: Ensures target nodes have sufficient free space

  • Health Checks: Only operates on healthy shards (STARTED routing state + 100% recovery)

  • SQL Quoting: Properly quotes schema and table names in generated SQL commands

Example Workflows

Regular Cluster Maintenance

  1. Analyze current state:

xmover analyze
  1. Check for zone imbalances:

xmover check-balance
  1. Generate and review recommendations:

xmover recommend --dry-run
  1. Execute safe moves:

xmover recommend --execute

Targeted Node Relief

When a specific node is running low on space:

  1. Check which node needs relief:

xmover analyze
  1. Generate recommendations for that specific node:

xmover recommend --prioritize-space --node data-hot-4 --dry-run
  1. Execute the moves:

xmover recommend --prioritize-space --node data-hot-4 --execute

Monitoring Shard Recovery Operations

After executing shard moves, monitor the recovery progress:

  1. Execute moves and monitor recovery:

# Execute moves
xmover recommend --node data-hot-1 --execute

# Monitor the resulting recoveries
xmover monitor-recovery --watch
  1. Monitor specific table or node recovery:

# Monitor specific table
xmover monitor-recovery --table shipmentFormFieldData --watch

# Monitor specific node
xmover monitor-recovery --node data-hot-4 --watch

# Monitor including completed recoveries
xmover monitor-recovery --watch --include-transitioning
  1. Check recovery after node maintenance:

# After bringing a node back online
xmover monitor-recovery --node data-hot-3 --recovery-type DISK

Manual Shard Movement

  1. Validate the move first:

xmover validate-move SCHEMA.TABLE SHARD_ID FROM_NODE TO_NODE
  1. Generate safe recommendations:

xmover recommend --prioritize-space --execute
  1. Monitor shard health after moves

Troubleshooting Zone Conflicts

  1. Identify conflicts:

xmover zone-analysis --show-shards
  1. Generate targeted fixes:

xmover recommend --prioritize-zones --execute

Configuration

Environment Variables

  • CRATE_CONNECTION_STRING: CrateDB HTTP endpoint (required)

  • CRATE_USERNAME: Username for authentication (optional)

  • CRATE_PASSWORD: Password for authentication (optional)

  • CRATE_SSL_VERIFY: Enable SSL certificate verification (default: true)

Connection String Format

https://hostname:port

The tool automatically appends /_sql to the endpoint.

Safety Considerations

⚠️ Important Safety Notes:

  1. Always test in non-production environments first

  2. Monitor shard health after each move before proceeding with additional moves

  3. Ensure adequate cluster capacity before decommissioning nodes

  4. Verify zone distribution after rebalancing operations

  5. Keep backups current before performing large-scale moves

Troubleshooting

XMover provides comprehensive troubleshooting tools to help diagnose and resolve shard movement issues.

Quick Diagnosis Commands

# Validate a specific move before execution
xmover validate-move SCHEMA.TABLE SHARD_ID FROM_NODE TO_NODE

# Explain CrateDB error messages
xmover explain-error "your error message here"

# Check zone distribution for conflicts
xmover zone-analysis --show-shards

# Verify overall cluster health
xmover analyze

Common Issues and Solutions

  1. Zone Conflicts

    Error: "NO(a copy of this shard is already allocated to this node)"
    
    • Cause: Target node already has a copy of the shard

    • Solution: Use xmover zone-analysis --show-shards to find alternative targets

    • Prevention: Always use xmover validate-move before executing moves

  2. Zone Allocation Limits

    Error: "too many copies of the shard allocated to nodes with attribute [zone]"
    
    • Cause: CrateDB’s zone awareness prevents too many copies in same zone

    • Solution: Move shard to a different availability zone

    • Prevention: Use xmover recommend which respects zone constraints

  3. Insufficient Space

    Error: "not enough disk space"
    
    • Cause: Target node lacks sufficient free space

    • Solution: Choose node with more capacity or free up space

    • Check: xmover analyze to see available space per node

  4. High Disk Usage Blocking Moves

    Error: "Target node disk usage too high (85.3%)"
    
    • Cause: Target node exceeds default 85% disk usage threshold

    • Solution: Use --max-disk-usage to allow higher usage for urgent moves

    • Example: xmover recommend --max-disk-usage 90 --prioritize-space

  5. No Recommendations Generated

    • Cause: Cluster may already be well balanced

    • Solution: Adjust size filters or check xmover check-balance

    • Try: --prioritize-space mode for capacity-based moves

Error Message Decoder

Use the built-in error decoder for complex CrateDB messages:

# Interactive mode - paste your error message
xmover explain-error

# Direct analysis
xmover explain-error "NO(a copy of this shard is already allocated to this node)"

Configurable Safety Thresholds

XMover uses configurable safety thresholds to prevent risky moves:

Disk Usage Threshold (default: 85%)

# Allow moves to nodes with higher disk usage
xmover recommend --max-disk-usage 90 --prioritize-space

# For urgent space relief
xmover validate-move <SCHEMA.TABLE> <SHARD_ID> <FROM> <TO> --max-disk-usage 95

When to Adjust Thresholds:

  • Emergency situations: Increase to 90-95% for critical space relief

  • Conservative operations: Decrease to 75-80% for safer moves

  • Staging environments: Can be more aggressive (90%+)

  • Production: Keep conservative (80-85%)

Advanced Troubleshooting

For detailed troubleshooting procedures, see Troubleshooting CrateDB using XMover which covers:

  • Step-by-step diagnostic procedures

  • Emergency recovery procedures

  • Best practices for safe operations

  • Complete error reference guide

Debug Information

All commands provide detailed safety validation messages and explanations for any issues detected.