XMover Handbook¶
Installation¶
Install using uv (recommended) or pip:
uv tool install cratedb-toolkit
# Alternatively use `pip`.
# pip install --user cratedb-toolkit
Create an .env file with your CrateDB connection details:
CRATE_CONNECTION_STRING=https://your-cluster.cratedb.net:4200
CRATE_USERNAME=your-username
CRATE_PASSWORD=your-password
CRATE_SSL_VERIFY=true
Quick Start¶
Test Connection¶
xmover test-connection
Analyze Cluster¶
# Complete cluster analysis
xmover analyze
# Analyze specific table
xmover analyze --table my_table
Find Movement Candidates¶
# Find shards that can be moved (40-60GB by default)
xmover find-candidates
# Custom size range
xmover find-candidates --min-size 20 --max-size 100
Generate Recommendations¶
# Dry run (default) - shows what would be recommended
xmover recommend
# Generate actual SQL commands
xmover recommend --execute
# Prioritize space over zone balancing
xmover recommend --prioritize-space
Zone Analysis¶
# Check zone balance
xmover check-balance
# Detailed zone analysis with shard-level details
xmover zone-analysis --show-shards
Advanced Troubleshooting¶
# Validate specific moves before execution
xmover validate-move SCHEMA.TABLE SHARD_ID FROM_NODE TO_NODE
# Explain CrateDB error messages
xmover explain-error "your error message here"
Commands Reference¶
analyze¶
Analyzes current shard distribution across nodes and zones.
Options:
- --table, -t: Analyze specific table only
Example:
xmover analyze --table events
find-candidates¶
Finds shards suitable for movement based on size and health criteria.
Options:
- --table, -t: Find candidates in specific table only
- --min-size: Minimum shard size in GB (default: 40)
- --max-size: Maximum shard size in GB (default: 60)
- --node: Only show candidates from this specific source node (e.g., data-hot-4)
Examples:
# Find candidates in size range for specific table
xmover find-candidates --min-size 20 --max-size 50 --table logs
# Find candidates on a specific node
xmover find-candidates --min-size 30 --max-size 60 --node data-hot-4
recommend¶
Generates intelligent shard movement recommendations for cluster rebalancing.
Options:
- --table, -t: Generate recommendations for specific table only
- --min-size: Minimum shard size in GB (default: 40)
- --max-size: Maximum shard size in GB (default: 60)
- --zone-tolerance: Zone balance tolerance percentage (default: 10)
- --min-free-space: Minimum free space required on target nodes in GB (default: 100)
- --max-moves: Maximum number of move recommendations (default: 10)
- --max-disk-usage: Maximum disk usage percentage for target nodes (default: 85)
- --validate/--no-validate: Validate move safety (default: True)
- --prioritize-space/--prioritize-zones: Prioritize available space over zone balancing (default: False)
- --dry-run/--execute: Show what would be done without generating SQL commands (default: True)
- --node: Only recommend moves from this specific source node (e.g., data-hot-4)
Examples:
# Dry run with zone balancing priority
xmover recommend --prioritize-zones
# Generate SQL for space optimization
xmover recommend --prioritize-space --execute
# Focus on specific table with custom parameters
xmover recommend --table events --min-size 10 --max-size 30 --execute
# Target space relief for a specific node
xmover recommend --prioritize-space --min-size 30 --max-size 60 --node data-hot-4
# Allow higher disk usage for urgent moves
xmover recommend --prioritize-space --max-disk-usage 90
zone-analysis¶
Provides detailed analysis of zone distribution and potential conflicts.
Options:
- --table, -t: Analyze zones for specific table only
- --show-shards/--no-show-shards: Show individual shard details (default: False)
Example:
xmover zone-analysis --show-shards --table critical_data
check-balance¶
Checks zone balance for shards with configurable tolerance.
Options:
- --table, -t: Check balance for specific table only
- --tolerance: Zone balance tolerance percentage (default: 10)
Example:
xmover check-balance --tolerance 15
validate-move¶
Validates a specific shard move before execution to prevent errors.
Arguments:
- SCHEMA_TABLE: Schema and table name (format: schema.table)
- SHARD_ID: Shard ID to move
- FROM_NODE: Source node name
- TO_NODE: Target node name
Examples:
# Standard validation
xmover validate-move CUROV.maddoxxxS 4 data-hot-1 data-hot-3
# Allow higher disk usage for urgent moves
xmover validate-move CUROV.tendedero 4 data-hot-1 data-hot-3 --max-disk-usage 90
explain-error¶
Explains CrateDB allocation error messages and provides troubleshooting guidance.
Arguments:
- ERROR_MESSAGE: The CrateDB error message to analyze (optional - can be provided interactively)
Examples:
# Interactive mode
xmover explain-error
# Direct analysis
xmover explain-error "NO(a copy of this shard is already allocated to this node)"
monitor-recovery¶
Monitors active shard recovery operations on the cluster.
Options:
- --table, -t: Monitor recovery for specific table only
- --node, -n: Monitor recovery on specific node only
- --watch, -w: Continuously monitor (refresh every 10s)
- --refresh-interval: Refresh interval for watch mode in seconds (default: 10)
- --recovery-type: Filter by recovery type - PEER, DISK, or all (default: all)
- --include-transitioning: Include recently completed recoveries (DONE stage)
Examples:
# Check current recovery status
xmover monitor-recovery
# Monitor specific table recoveries
xmover monitor-recovery --table PartioffD
# Continuous monitoring with custom refresh rate
xmover monitor-recovery --watch --refresh-interval 5
# Monitor only PEER recoveries on specific node
xmover monitor-recovery --node data-hot-1 --recovery-type PEER
# Include completed recoveries still transitioning
xmover monitor-recovery --watch --include-transitioning
Recovery Types:
- PEER: Copying shard data from another node (replication/relocation) 
- DISK: Rebuilding shard from local data (after restart/disk issues) 
test-connection¶
Tests the connection to CrateDB and displays basic cluster information.
Operation Modes¶
Analysis vs Operational Views¶
XMover provides two distinct views of your cluster:
- Analysis View ( - analyze,- zone-analysis): Includes ALL shards regardless of state for complete cluster visibility
- Operational View ( - find-candidates,- recommend): Only includes healthy shards (STARTED + 100% recovered) for safe operations
Prioritization Modes¶
When generating recommendations, you can choose between two prioritization strategies:
- Zone Balancing Priority (default): Focuses on achieving optimal zone distribution first, then considers available space 
- Space Priority: Prioritizes moving shards to nodes with more available space, regardless of zone balance 
Safety Features¶
- Zone Conflict Detection: Prevents moves that would place multiple copies of the same shard in the same zone 
- Capacity Validation: Ensures target nodes have sufficient free space 
- Health Checks: Only operates on healthy shards (STARTED routing state + 100% recovery) 
- SQL Quoting: Properly quotes schema and table names in generated SQL commands 
Example Workflows¶
Regular Cluster Maintenance¶
- Analyze current state: 
xmover analyze
- Check for zone imbalances: 
xmover check-balance
- Generate and review recommendations: 
xmover recommend --dry-run
- Execute safe moves: 
xmover recommend --execute
Targeted Node Relief¶
When a specific node is running low on space:
- Check which node needs relief: 
xmover analyze
- Generate recommendations for that specific node: 
xmover recommend --prioritize-space --node data-hot-4 --dry-run
- Execute the moves: 
xmover recommend --prioritize-space --node data-hot-4 --execute
Monitoring Shard Recovery Operations¶
After executing shard moves, monitor the recovery progress:
- Execute moves and monitor recovery: 
# Execute moves
xmover recommend --node data-hot-1 --execute
# Monitor the resulting recoveries
xmover monitor-recovery --watch
- Monitor specific table or node recovery: 
# Monitor specific table
xmover monitor-recovery --table shipmentFormFieldData --watch
# Monitor specific node
xmover monitor-recovery --node data-hot-4 --watch
# Monitor including completed recoveries
xmover monitor-recovery --watch --include-transitioning
- Check recovery after node maintenance: 
# After bringing a node back online
xmover monitor-recovery --node data-hot-3 --recovery-type DISK
Manual Shard Movement¶
- Validate the move first: 
xmover validate-move SCHEMA.TABLE SHARD_ID FROM_NODE TO_NODE
- Generate safe recommendations: 
xmover recommend --prioritize-space --execute
- Monitor shard health after moves 
Troubleshooting Zone Conflicts¶
- Identify conflicts: 
xmover zone-analysis --show-shards
- Generate targeted fixes: 
xmover recommend --prioritize-zones --execute
Configuration¶
Environment Variables¶
- CRATE_CONNECTION_STRING: CrateDB HTTP endpoint (required)
- CRATE_USERNAME: Username for authentication (optional)
- CRATE_PASSWORD: Password for authentication (optional)
- CRATE_SSL_VERIFY: Enable SSL certificate verification (default: true)
Connection String Format¶
https://hostname:port
The tool automatically appends /_sql to the endpoint.
Safety Considerations¶
⚠️ Important Safety Notes:
- Always test in non-production environments first 
- Monitor shard health after each move before proceeding with additional moves 
- Ensure adequate cluster capacity before decommissioning nodes 
- Verify zone distribution after rebalancing operations 
- Keep backups current before performing large-scale moves 
Troubleshooting¶
XMover provides comprehensive troubleshooting tools to help diagnose and resolve shard movement issues.
Quick Diagnosis Commands¶
# Validate a specific move before execution
xmover validate-move SCHEMA.TABLE SHARD_ID FROM_NODE TO_NODE
# Explain CrateDB error messages
xmover explain-error "your error message here"
# Check zone distribution for conflicts
xmover zone-analysis --show-shards
# Verify overall cluster health
xmover analyze
Common Issues and Solutions¶
- Zone Conflicts - Error: "NO(a copy of this shard is already allocated to this node)" - Cause: Target node already has a copy of the shard 
- Solution: Use - xmover zone-analysis --show-shardsto find alternative targets
- Prevention: Always use - xmover validate-movebefore executing moves
 
- Zone Allocation Limits - Error: "too many copies of the shard allocated to nodes with attribute [zone]" - Cause: CrateDB’s zone awareness prevents too many copies in same zone 
- Solution: Move shard to a different availability zone 
- Prevention: Use - xmover recommendwhich respects zone constraints
 
- Insufficient Space - Error: "not enough disk space" - Cause: Target node lacks sufficient free space 
- Solution: Choose node with more capacity or free up space 
- Check: - xmover analyzeto see available space per node
 
- High Disk Usage Blocking Moves - Error: "Target node disk usage too high (85.3%)" - Cause: Target node exceeds default 85% disk usage threshold 
- Solution: Use - --max-disk-usageto allow higher usage for urgent moves
- Example: - xmover recommend --max-disk-usage 90 --prioritize-space
 
- No Recommendations Generated - Cause: Cluster may already be well balanced 
- Solution: Adjust size filters or check - xmover check-balance
- Try: - --prioritize-spacemode for capacity-based moves
 
Error Message Decoder¶
Use the built-in error decoder for complex CrateDB messages:
# Interactive mode - paste your error message
xmover explain-error
# Direct analysis
xmover explain-error "NO(a copy of this shard is already allocated to this node)"
Configurable Safety Thresholds¶
XMover uses configurable safety thresholds to prevent risky moves:
Disk Usage Threshold (default: 85%)
# Allow moves to nodes with higher disk usage
xmover recommend --max-disk-usage 90 --prioritize-space
# For urgent space relief
xmover validate-move <SCHEMA.TABLE> <SHARD_ID> <FROM> <TO> --max-disk-usage 95
When to Adjust Thresholds:
- Emergency situations: Increase to 90-95% for critical space relief 
- Conservative operations: Decrease to 75-80% for safer moves 
- Staging environments: Can be more aggressive (90%+) 
- Production: Keep conservative (80-85%) 
Advanced Troubleshooting¶
For detailed troubleshooting procedures, see Troubleshooting CrateDB using XMover which covers:
- Step-by-step diagnostic procedures 
- Emergency recovery procedures 
- Best practices for safe operations 
- Complete error reference guide 
Debug Information¶
All commands provide detailed safety validation messages and explanations for any issues detected.