XMover Handbook¶
Installation¶
Install using uv (recommended) or pip:
uv tool install cratedb-toolkit
# Alternatively use `pip`.
# pip install --user cratedb-toolkit
Create an .env
file with your CrateDB connection details:
CRATE_CONNECTION_STRING=https://your-cluster.cratedb.net:4200
CRATE_USERNAME=your-username
CRATE_PASSWORD=your-password
CRATE_SSL_VERIFY=true
Quick Start¶
Test Connection¶
xmover test-connection
Analyze Cluster¶
# Complete cluster analysis
xmover analyze
# Analyze specific table
xmover analyze --table my_table
Find Movement Candidates¶
# Find shards that can be moved (40-60GB by default)
xmover find-candidates
# Custom size range
xmover find-candidates --min-size 20 --max-size 100
Generate Recommendations¶
# Dry run (default) - shows what would be recommended
xmover recommend
# Generate actual SQL commands
xmover recommend --execute
# Prioritize space over zone balancing
xmover recommend --prioritize-space
Zone Analysis¶
# Check zone balance
xmover check-balance
# Detailed zone analysis with shard-level details
xmover zone-analysis --show-shards
Advanced Troubleshooting¶
# Validate specific moves before execution
xmover validate-move SCHEMA.TABLE SHARD_ID FROM_NODE TO_NODE
# Explain CrateDB error messages
xmover explain-error "your error message here"
Commands Reference¶
analyze
¶
Analyzes current shard distribution across nodes and zones.
Options:
--table, -t
: Analyze specific table only
Example:
xmover analyze --table events
find-candidates
¶
Finds shards suitable for movement based on size and health criteria.
Options:
--table, -t
: Find candidates in specific table only--min-size
: Minimum shard size in GB (default: 40)--max-size
: Maximum shard size in GB (default: 60)--node
: Only show candidates from this specific source node (e.g., data-hot-4)
Examples:
# Find candidates in size range for specific table
xmover find-candidates --min-size 20 --max-size 50 --table logs
# Find candidates on a specific node
xmover find-candidates --min-size 30 --max-size 60 --node data-hot-4
recommend
¶
Generates intelligent shard movement recommendations for cluster rebalancing.
Options:
--table, -t
: Generate recommendations for specific table only--min-size
: Minimum shard size in GB (default: 40)--max-size
: Maximum shard size in GB (default: 60)--zone-tolerance
: Zone balance tolerance percentage (default: 10)--min-free-space
: Minimum free space required on target nodes in GB (default: 100)--max-moves
: Maximum number of move recommendations (default: 10)--max-disk-usage
: Maximum disk usage percentage for target nodes (default: 85)--validate/--no-validate
: Validate move safety (default: True)--prioritize-space/--prioritize-zones
: Prioritize available space over zone balancing (default: False)--dry-run/--execute
: Show what would be done without generating SQL commands (default: True)--node
: Only recommend moves from this specific source node (e.g., data-hot-4)
Examples:
# Dry run with zone balancing priority
xmover recommend --prioritize-zones
# Generate SQL for space optimization
xmover recommend --prioritize-space --execute
# Focus on specific table with custom parameters
xmover recommend --table events --min-size 10 --max-size 30 --execute
# Target space relief for a specific node
xmover recommend --prioritize-space --min-size 30 --max-size 60 --node data-hot-4
# Allow higher disk usage for urgent moves
xmover recommend --prioritize-space --max-disk-usage 90
zone-analysis
¶
Provides detailed analysis of zone distribution and potential conflicts.
Options:
--table, -t
: Analyze zones for specific table only--show-shards/--no-show-shards
: Show individual shard details (default: False)
Example:
xmover zone-analysis --show-shards --table critical_data
check-balance
¶
Checks zone balance for shards with configurable tolerance.
Options:
--table, -t
: Check balance for specific table only--tolerance
: Zone balance tolerance percentage (default: 10)
Example:
xmover check-balance --tolerance 15
validate-move
¶
Validates a specific shard move before execution to prevent errors.
Arguments:
SCHEMA_TABLE
: Schema and table name (format: schema.table)SHARD_ID
: Shard ID to moveFROM_NODE
: Source node nameTO_NODE
: Target node name
Examples:
# Standard validation
xmover validate-move CUROV.maddoxxxS 4 data-hot-1 data-hot-3
# Allow higher disk usage for urgent moves
xmover validate-move CUROV.tendedero 4 data-hot-1 data-hot-3 --max-disk-usage 90
explain-error
¶
Explains CrateDB allocation error messages and provides troubleshooting guidance.
Arguments:
ERROR_MESSAGE
: The CrateDB error message to analyze (optional - can be provided interactively)
Examples:
# Interactive mode
xmover explain-error
# Direct analysis
xmover explain-error "NO(a copy of this shard is already allocated to this node)"
monitor-recovery
¶
Monitors active shard recovery operations on the cluster.
Options:
--table, -t
: Monitor recovery for specific table only--node, -n
: Monitor recovery on specific node only--watch, -w
: Continuously monitor (refresh every 10s)--refresh-interval
: Refresh interval for watch mode in seconds (default: 10)--recovery-type
: Filter by recovery type - PEER, DISK, or all (default: all)--include-transitioning
: Include recently completed recoveries (DONE stage)
Examples:
# Check current recovery status
xmover monitor-recovery
# Monitor specific table recoveries
xmover monitor-recovery --table PartioffD
# Continuous monitoring with custom refresh rate
xmover monitor-recovery --watch --refresh-interval 5
# Monitor only PEER recoveries on specific node
xmover monitor-recovery --node data-hot-1 --recovery-type PEER
# Include completed recoveries still transitioning
xmover monitor-recovery --watch --include-transitioning
Recovery Types:
PEER: Copying shard data from another node (replication/relocation)
DISK: Rebuilding shard from local data (after restart/disk issues)
test-connection
¶
Tests the connection to CrateDB and displays basic cluster information.
Operation Modes¶
Analysis vs Operational Views¶
XMover provides two distinct views of your cluster:
Analysis View (
analyze
,zone-analysis
): Includes ALL shards regardless of state for complete cluster visibilityOperational View (
find-candidates
,recommend
): Only includes healthy shards (STARTED + 100% recovered) for safe operations
Prioritization Modes¶
When generating recommendations, you can choose between two prioritization strategies:
Zone Balancing Priority (default): Focuses on achieving optimal zone distribution first, then considers available space
Space Priority: Prioritizes moving shards to nodes with more available space, regardless of zone balance
Safety Features¶
Zone Conflict Detection: Prevents moves that would place multiple copies of the same shard in the same zone
Capacity Validation: Ensures target nodes have sufficient free space
Health Checks: Only operates on healthy shards (STARTED routing state + 100% recovery)
SQL Quoting: Properly quotes schema and table names in generated SQL commands
Example Workflows¶
Regular Cluster Maintenance¶
Analyze current state:
xmover analyze
Check for zone imbalances:
xmover check-balance
Generate and review recommendations:
xmover recommend --dry-run
Execute safe moves:
xmover recommend --execute
Targeted Node Relief¶
When a specific node is running low on space:
Check which node needs relief:
xmover analyze
Generate recommendations for that specific node:
xmover recommend --prioritize-space --node data-hot-4 --dry-run
Execute the moves:
xmover recommend --prioritize-space --node data-hot-4 --execute
Monitoring Shard Recovery Operations¶
After executing shard moves, monitor the recovery progress:
Execute moves and monitor recovery:
# Execute moves
xmover recommend --node data-hot-1 --execute
# Monitor the resulting recoveries
xmover monitor-recovery --watch
Monitor specific table or node recovery:
# Monitor specific table
xmover monitor-recovery --table shipmentFormFieldData --watch
# Monitor specific node
xmover monitor-recovery --node data-hot-4 --watch
# Monitor including completed recoveries
xmover monitor-recovery --watch --include-transitioning
Check recovery after node maintenance:
# After bringing a node back online
xmover monitor-recovery --node data-hot-3 --recovery-type DISK
Manual Shard Movement¶
Validate the move first:
xmover validate-move SCHEMA.TABLE SHARD_ID FROM_NODE TO_NODE
Generate safe recommendations:
xmover recommend --prioritize-space --execute
Monitor shard health after moves
Troubleshooting Zone Conflicts¶
Identify conflicts:
xmover zone-analysis --show-shards
Generate targeted fixes:
xmover recommend --prioritize-zones --execute
Configuration¶
Environment Variables¶
CRATE_CONNECTION_STRING
: CrateDB HTTP endpoint (required)CRATE_USERNAME
: Username for authentication (optional)CRATE_PASSWORD
: Password for authentication (optional)CRATE_SSL_VERIFY
: Enable SSL certificate verification (default: true)
Connection String Format¶
https://hostname:port
The tool automatically appends /_sql
to the endpoint.
Safety Considerations¶
⚠️ Important Safety Notes:
Always test in non-production environments first
Monitor shard health after each move before proceeding with additional moves
Ensure adequate cluster capacity before decommissioning nodes
Verify zone distribution after rebalancing operations
Keep backups current before performing large-scale moves
Troubleshooting¶
XMover provides comprehensive troubleshooting tools to help diagnose and resolve shard movement issues.
Quick Diagnosis Commands¶
# Validate a specific move before execution
xmover validate-move SCHEMA.TABLE SHARD_ID FROM_NODE TO_NODE
# Explain CrateDB error messages
xmover explain-error "your error message here"
# Check zone distribution for conflicts
xmover zone-analysis --show-shards
# Verify overall cluster health
xmover analyze
Common Issues and Solutions¶
Zone Conflicts
Error: "NO(a copy of this shard is already allocated to this node)"
Cause: Target node already has a copy of the shard
Solution: Use
xmover zone-analysis --show-shards
to find alternative targetsPrevention: Always use
xmover validate-move
before executing moves
Zone Allocation Limits
Error: "too many copies of the shard allocated to nodes with attribute [zone]"
Cause: CrateDB’s zone awareness prevents too many copies in same zone
Solution: Move shard to a different availability zone
Prevention: Use
xmover recommend
which respects zone constraints
Insufficient Space
Error: "not enough disk space"
Cause: Target node lacks sufficient free space
Solution: Choose node with more capacity or free up space
Check:
xmover analyze
to see available space per node
High Disk Usage Blocking Moves
Error: "Target node disk usage too high (85.3%)"
Cause: Target node exceeds default 85% disk usage threshold
Solution: Use
--max-disk-usage
to allow higher usage for urgent movesExample:
xmover recommend --max-disk-usage 90 --prioritize-space
No Recommendations Generated
Cause: Cluster may already be well balanced
Solution: Adjust size filters or check
xmover check-balance
Try:
--prioritize-space
mode for capacity-based moves
Error Message Decoder¶
Use the built-in error decoder for complex CrateDB messages:
# Interactive mode - paste your error message
xmover explain-error
# Direct analysis
xmover explain-error "NO(a copy of this shard is already allocated to this node)"
Configurable Safety Thresholds¶
XMover uses configurable safety thresholds to prevent risky moves:
Disk Usage Threshold (default: 85%)
# Allow moves to nodes with higher disk usage
xmover recommend --max-disk-usage 90 --prioritize-space
# For urgent space relief
xmover validate-move <SCHEMA.TABLE> <SHARD_ID> <FROM> <TO> --max-disk-usage 95
When to Adjust Thresholds:
Emergency situations: Increase to 90-95% for critical space relief
Conservative operations: Decrease to 75-80% for safer moves
Staging environments: Can be more aggressive (90%+)
Production: Keep conservative (80-85%)
Advanced Troubleshooting¶
For detailed troubleshooting procedures, see Troubleshooting CrateDB using XMover which covers:
Step-by-step diagnostic procedures
Emergency recovery procedures
Best practices for safe operations
Complete error reference guide
Debug Information¶
All commands provide detailed safety validation messages and explanations for any issues detected.