Changelog¶
Unreleased¶
- Settings: Stop flagging - gateway.recover_after_timeas a difference when both- gateway.expected_nodesand- gateway.expected_data_nodesare unset (- -1).
- Admin: Added XMover - CrateDB shard analyzer and movement tool. Thanks, @WalBeh. 
2025/08/19 v0.0.41¶
- I/O: Updated to - influxio-0.6.0. Thanks, @ZillKhan.- The target table is now the measurement name when importing ILP files. 
- The InfluxDB source URL accepts a - timeoutquery parameter (seconds) to configure the network timeout when talking to the InfluxDB API.
- For ILP imports, the CrateDB URL no longer needs a table component; you can point it at the schema only (the measurement determines the table). 
 
2025/08/19 v0.0.40¶
- I/O: Fixed MongoDB CDC invocation. Thanks, Mỹ Duyên. 
2025/08/19 v0.0.39¶
- OCI: Started producing image - ghcr.io/crate/cratedb-toolkit-ingest
- I/O: Added drivers for ODBC and Oracle to - cratedb-toolkit-ingest
- I/O: Updated BSON library to support ARM64 
2025/08/14 v0.0.38¶
- I/O: Updated to - ingestr>=0.13.61
- CFR: Improved log output 
- CFR: Fixed double quoting of table name. Thanks, @karynzv. 
- CFR: When importing, started using - replacestrategy instead of- append
- CFR: Improved importing data re. type mapping without NumPy 
- CFR: Truncated target table before importing, using - appendstrategy again, because- replacedoesn’t do the right DDL.
- I/O: Tuned down ingestr, it masked native I/O adapters 
2025/07/01 v0.0.37¶
- Settings: Fixed comparison of - 0svs- 0ms. Thanks, @hlcianfagna.
- DMS: Provided a recipe file to relay primary key and column type map information 
- DMS: Provided a recipe option to ignore processing DMS control DDL events 
- DMS: Started using the “direct” column mapping by default, retaining the “universal” column mapping optionally. 
- Dependencies: Updated to - commons-codec>=0.0.23
- I/O: Adapter for PostgreSQL full-load using ingestr 
- I/O: Added documentation about ingestr adapter 
2025/06/23 v0.0.36¶
- Dependencies: Migrated from - zypto- tikray. It’s effectively the same, but provided using a dedicated package now
- Dependencies: Updated to - croud-1.14
- Dependencies: Updated to - async-kinesis-2.0.0. Thanks, @hampsterx.
- CDC: Added canonical SQL example for PostgreSQL from Ibis 
- CDC: Enabled loading DMS events from Kinesis streams and stream-dump files 
- CDC: Added subcommand - ctk dms table-mappings
2025/05/13 v0.0.35¶
- Added lost - pytestdependencies to- cratedb-toolkit[testing]
2025/05/13 v0.0.34¶
- Downgraded to sqlalchemy-cratedb 0.41, version 0.42 is not GA yet 
2025/05/12 v0.0.33¶
- CFR: Enhanced job statistics with optional reporting database support. Thanks, @WalBeh. 
- Settings: Added settings comparison utility. Thanks, @WalBeh. 
- Meta: Added parser for - https://cratedb.com/releases.jsonfile. Thanks, @WalBeh.
- CFR: Added the ability to anonymize queries recorded by - collect
- Cloud API: SDK and CLI for CrateDB Cloud Cluster and Import APIs. Supports headless/unattended operations on CrateDB Cloud clusters, covering deploy/start/resume and data import procedures using fluent API and CLI. 
- Cloud API: Added JWT authentication to client API and - ctk shell.
- Cloud API: Added - healthand- pingsubcommands to- ctk cluster
- CLI: Downgraded to Click 8.1, as the code is not compatible with 8.2 yet 
Breaking changes
Naming things for CLI options and environment variables:
- Converged - --cratedb-sqlalchemy-urlvs.- --cratedb-http-urloptions into single- --cluster-url
- Converged - CRATEDB_SQLALCHEMY_URLvs.- CRATEDB_HTTP_URLenv vars into single- CRATEDB_CLUSTER_URL
2025/04/23 v0.0.32¶
- MCP: Add subsystem providing a few server and client utilities through the - ctk query mcp {list,inquire,launch}subcommands.
- Docs API: Added extractors for CrateDB functions and settings 
- Connect: Respect - sslmodeURI parameter when converting SQLAlchemy connection URLs to- http(s)://
2025/01/31 v0.0.31¶
- Fixed connectivity for - jobstats collect
- Refactored code and improved CLI interface of - ctk infovs.- ctk cfr
- Dependencies: Updated to - crate-2.0.0, which uses- orjsonfor JSON marshalling
- CFR: Job statistics and slow-query exploration per Marimo notebook. Thanks, @WalBeh. 
2025/01/13 v0.0.30¶
- Dependencies: Minimize dependencies of core installation, defer - polarsto- cratedb-toolkit[io].
- Fixed - ctk cfr info recordabout too large values of- ulimit_hard
- Improved - ctk shellto also talk to CrateDB standalone databases
- Added basic utility command - ctk tail, for tailing a database table, and optionally following the tail
- Table Loader: Added capability to load InfluxDB Line Protocol (ILP) files 
- Query Collector: Now respects - CRATEDB_CLUSTER_URLenvironment variable
2024/10/13 v0.0.29¶
- MongoDB: Added Zyp transformations to the CDC subsystem, making it more symmetric to the full-load procedure. 
- Query Converter: Added very basic expression converter utility with CLI interface 
- DynamoDB: Added query expression converter for relocating object references, to support query migrations after the breaking change with the SQL DDL schema, by v0.0.27. 
2024/10/09 v0.0.28¶
- IO: Improved - BulkProcessorwhen running per-record operations by also checking- rowcountfor handling- INSERT OK, 0 rowsresponses
- MongoDB: Fixed BSON decoding of - {"$date": 1180690093000}timestamps by updating to commons-codec 0.0.21.
- Testcontainers: Don’t always pull the OCI image before starting. It is unfortunate in disconnected situations. 
2024/10/01 v0.0.27¶
- MongoDB: Updated to pymongo 4.9 
- DynamoDB: Change CrateDB data model to use ( - pk,- data,- aux) columns Attention: This is a breaking change.
2024/09/26 v0.0.26¶
- MongoDB: Configure - MongoDBCrateDBConverterafter updating to commons-codec 0.0.18
- DynamoDB CDC: Fix - MODIFYoperation to also propagate deleted attributes
2024/09/22 v0.0.25¶
- Table Loader: Improved conditional handling of “transformation” parameter 
- Table Loader: Improved status reporting and error logging in - BulkProcessor
- MongoDB: Improve error reporting 
- MongoDB Full: Polars’ - read_ndjsondoesn’t load MongoDB JSON data well, use- fsspecand- orjsoninstead
- MongoDB Full: Improved initialization of transformation subsystem 
- MongoDB Adapter: Improved performance of when computing collection cardinality by using - collection.estimated_document_count()
- MongoDB Full: Optionally use - limitparameter as number of total records
- MongoDB Adapter: Evaluate - _idfilter field by upcasting to- bson.ObjectId, to convey a filter that makes- ctk load tableprocess a single document, identified by its OID
- MongoDB Dependencies: Update to commons-codec 0.0.17 
2024/09/19 v0.0.24¶
- MongoDB Full: Refactor transformation subsystem to - commons-codec
- MongoDB: Update to commons-codec v0.0.16 
2024/09/16 v0.0.23¶
- MongoDB: Unlock processing multiple collections, either from server database, or from filesystem directory 
- MongoDB: Unlock processing JSON files from HTTP resource, using - https+bson://
- MongoDB: Optionally filter server collection using MongoDB query expression 
- MongoDB: Improve error handling wrt. bulk operations vs. usability 
- DynamoDB CDC: Add - ctk load tableinterface for processing CDC events
- DynamoDB CDC: Accept a few more options for the Kinesis Stream: batch-size, create, create-shards, start, seqno, idle-sleep, buffer-time 
- DynamoDB Full: Improve error handling wrt. bulk operations vs. usability 
2024/09/10 v0.0.22¶
- MongoDB: Rename columns with leading underscores to use double leading underscores 
- MongoDB: Add support for UUID types 
- MongoDB: Improve reading timestamps in previous BSON formats 
- MongoDB: Fix processing empty arrays/lists. By default, assume - TEXTas inner type.
- MongoDB: For - ctk load table, use “partial” scan for inferring the collection schema, based on the first 10,000 documents.
- MongoDB: Skip leaking - UNKNOWNfields into SQL DDL. This means relevant column definitions will not be included into the SQL DDL.
- MongoDB: Make - ctk load tableuse the- data OBJECT(DYNAMIC)mapping strategy.
- MongoDB: Sanitize lists of varying objects 
- MongoDB: Add treatment option for applying special treatments to certain items on real-world data 
- MongoDB: Use pagination on source collection, for creating batches towards CrateDB 
- MongoDB: Unlock importing MongoDB Extended JSON files using - file+bson://...
2024/09/02 v0.0.21¶
- DynamoDB: Add special decoding for varied lists. Store them into a separate - OBJECT(IGNORED)column in CrateDB.
- DynamoDB: Add pagination support for - full-loadtable loader
2024/08/27 v0.0.20¶
- DMS/DynamoDB: Fix table name quoting within CDC processor handler 
2024/08/26 v0.0.19¶
- MongoDB: Fix and verify Zyp transformations 
- DMS/DynamoDB/MongoDB I/O: Use SQL with parameters instead of inlining values 
2024/08/21 v0.0.18¶
- Dependencies: Unpin commons-codec, to always use the latest version 
- Dependencies: Unpin lorrystream, to always use the latest version 
- MongoDB: Improve type mapper by discriminating between - INTEGERand- BIGINT
- MongoDB: Improve type mapper by supporting BSON - DatetimeMS,- Decimal128, and- Int64types
2024/08/19 v0.0.17¶
- Processor: Updated Kinesis Lambda processor to understand AWS DMS 
- MongoDB: Fix missing output on STDOUT for - migr8 export
- MongoDB: Improve timestamp parsing by using - python-dateutil
- MongoDB: Converge - _idinput field to- idcolumn instead of dropping it
- MongoDB: Make user interface use stderr, so stdout is for data only 
- MongoDB: Make - migr8 extractwrite to stdout by default
- MongoDB: Make - migr8 translateread from stdin by default
- MongoDB: Improve user interface messages 
- MongoDB: Strip single leading underscore character from all top-level fields 
- MongoDB: Map OID types to CrateDB TEXT columns 
- MongoDB: Make - migr8 extractand- migr8 exportaccept the- --limitoption
- MongoDB: Fix indentation in prettified SQL output of - migr8 translate
- MongoDB: Add capability to give type hints and add transformations 
- Dependencies: Adjust code for lorrystream version 0.0.3 
- Dependencies: Update to lorrystream 0.0.4 and commons-codec 0.0.7 
- DynamoDB: Add table loader for full-load operations 
2024/07/25 v0.0.16¶
- ctk load table: Added support for MongoDB Change Streams
- Fix dependency with the - kagglepackage, downgrade to- kaggle==1.6.14
- DynamoDB CDC: Add demo to support reading DynamoDB change data capture 
2024/07/08 v0.0.15¶
- IO: Added the - if-existsquery parameter by updating to influxio 0.4.0.
- Rockset: Added CrateDB Rockset Adapter, a HTTP API emulation layer 
- MongoDB: Added adapter amalgamating PyMongo to use CrateDB as backend 
- SQLAlchemy: Clean up and refactor SQLAlchemy polyfills to - cratedb_toolkit.util.sqlalchemy
- CFR: Build as a self-contained program using PyInstaller 
- CFR: Publish self-contained application bundle to GitHub Workflow Artifacts 
2024/06/18 v0.0.14¶
- Add - ctk infoand- ctk cfrdiagnostics programs
- Remove support for Python 3.7 
- SQLAlchemy dialect: Use - sqlalchemy-cratedb>=0.37.0This includes the fix to the- get_table_names()reflection method.
2024/06/11 v0.0.13¶
- Dependencies: Migrate from - crate[sqlalchemy]to- sqlalchemy-cratedb
2024/05/30 v0.0.12¶
- Fix InfluxDB Cloud <-> CrateDB Cloud connectivity by using - ssl=truequery argument also for- influxdb2://source URLs.
2024/05/30 v0.0.11¶
- Fix InfluxDB Cloud <-> CrateDB Cloud connectivity by propagating - ssl=truequery argument. Update dependencies to- influxio>=0.2.1,<1.
2024/04/10 v0.0.10¶
- Dependencies: Unpin upper version bound of - dask. Otherwise, compatibility issues can not be resolved quickly, like with Python 3.11.9. https://github.com/dask/dask/issues/11038
2024/03/22 v0.0.9¶
- Dependencies: Use - dask[dataframe]
2024/03/11 v0.0.8¶
- datasets: Fix compatibility with Python 3.7 
2024/03/07 v0.0.7¶
- datasets: Fix dataset loader 
2024/03/07 v0.0.6¶
- Added - cratedb_toolkit.datasetssubsystem, for acquiring datasets from cratedb-datasets and Kaggle.
2024/02/12 v0.0.5¶
- Do not always activate pytest11 entrypoint to pytest fixture - cratedb_service, as it depends on the- testcontainerspackage, which is not always installed.
2024/02/10 v0.0.4¶
- Packaging: Use - cloudextra to install relevant packages
- Dependencies: Add - testingextra, which installs- testcontainersonly
- Testing: Export - cratedb_servicefixture as pytest11 entrypoint
- Sandbox: Reduce number of extras by just using - all
2024/01/18 v0.0.3¶
- Add SQL runner utility primitives to - io.sqlnamespace
- Add - import_csv_pandasand- import_csv_daskutility primitives
- data: Add subsystem for “loading” data. 
- Add SDK and CLI for CrateDB Cloud Data Import APIs - ctk load table ...
- Add - migr8program from previous repository
- InfluxDB: Add adapter for - influxio
- MongoDB: Add - migr8program from previous repository
- MongoDB: Improve UX by using - ctk load table mongodb://...
- load table: Refactor to use more OO 
- Add - examples/cloud_import.py
- Adapt testcontainers to be agnostic of the testing framework. Thanks, @pilosus. 
2023/11/06 v0.0.2¶
- CLI: Upgrade to - click-aliases>=1.0.2, fixing erroring out when no group aliases are specified.
- Add support for Python 3.12 
- SQLAlchemy: Improve UNIQUE constraints polyfill to accept multiple column names, for emulating unique composite keys. 
2023/10/10 v0.0.1¶
- SQLAlchemy: Add a few patches and polyfills, which do not fit well into the vanilla Python driver / SQLAlchemy dialect. 
- Retention: Refactor strategies - delete,- reallocate, and- snapshot, to standalone variants.
- Retention: Bundle configuration and runtime settings into - Settingsentity, and use more OO instead of weak dictionaries: Add- RetentionStrategy,- TableAddress, and- Settingsentities, to improve information passing throughout the application and the SQL templates.
- Retention: Add - --schemaoption, and- CRATEDB_EXT_SCHEMAenvironment variable, to configure the database schema used to store the retention policy table. The default value is- ext.
- Retention: Use full-qualified table names everywhere. 
- Retention: Fix: Compensate for - DROP REPOSITORYnow returning- RepositoryMissingExceptionwhen the repository does not exist. With previous versions of CrateDB, it was- RepositoryUnknownException.
2023/06/27 v0.0.0¶
- Import “data retention” implementation from https://github.com/crate/crate-airflow-tutorial. Thanks, @hammerhead.