# Changelog ## Unreleased - Admin: Added XMover - CrateDB shard analyzer and movement tool. Thanks, @WalBeh. ## 2025/08/19 v0.0.41 - I/O: Updated to `influxio-0.6.0`. Thanks, @ZillKhan. - The target table is now the measurement name when importing ILP files. - The InfluxDB source URL accepts a `timeout` query parameter (seconds) to configure the network timeout when talking to the InfluxDB API. - For ILP imports, the CrateDB URL no longer needs a table component; you can point it at the schema only (the measurement determines the table). ## 2025/08/19 v0.0.40 - I/O: Fixed MongoDB CDC invocation. Thanks, Mỹ Duyên. ## 2025/08/19 v0.0.39 - OCI: Started producing image `ghcr.io/crate/cratedb-toolkit-ingest` - I/O: Added drivers for ODBC and Oracle to `cratedb-toolkit-ingest` - I/O: Updated BSON library to support ARM64 ## 2025/08/14 v0.0.38 - I/O: Updated to `ingestr>=0.13.61` - CFR: Improved log output - CFR: Fixed double quoting of table name. Thanks, @karynzv. - CFR: When importing, started using `replace` strategy instead of `append` - CFR: Improved importing data re. type mapping without NumPy - CFR: Truncated target table before importing, using `append` strategy again, because `replace` doesn't do the right DDL. - I/O: Tuned down ingestr, it masked native I/O adapters ## 2025/07/01 v0.0.37 - Settings: Fixed comparison of `0s` vs `0ms`. Thanks, @hlcianfagna. - DMS: Provided a recipe file to relay primary key and column type map information - DMS: Provided a recipe option to ignore processing DMS control DDL events - DMS: Started using the "direct" column mapping by default, retaining the "universal" column mapping optionally. - Dependencies: Updated to `commons-codec>=0.0.23` - I/O: Adapter for PostgreSQL full-load using ingestr - I/O: Added documentation about [ingestr] adapter [ingestr]: https://github.com/bruin-data/ingestr ## 2025/06/23 v0.0.36 - Dependencies: Migrated from `zyp` to `tikray`. It's effectively the same, but provided using a dedicated package now - Dependencies: Updated to `croud-1.14` - Dependencies: Updated to `async-kinesis-2.0.0`. Thanks, @hampsterx. - CDC: Added canonical SQL example for PostgreSQL from Ibis - CDC: Enabled loading DMS events from Kinesis streams and stream-dump files - CDC: Added subcommand `ctk dms table-mappings` ## 2025/05/13 v0.0.35 - Added lost `pytest` dependencies to `cratedb-toolkit[testing]` ## 2025/05/13 v0.0.34 - Downgraded to sqlalchemy-cratedb 0.41, version 0.42 is not GA yet ## 2025/05/12 v0.0.33 - CFR: Enhanced job statistics with optional reporting database support. Thanks, @WalBeh. - Settings: Added settings comparison utility. Thanks, @WalBeh. - Meta: Added parser for `https://cratedb.com/releases.json` file. Thanks, @WalBeh. - CFR: Added the ability to anonymize queries recorded by `collect` - Cloud API: SDK and CLI for CrateDB Cloud Cluster and Import APIs. Supports headless/unattended operations on CrateDB Cloud clusters, covering deploy/start/resume and data import procedures using fluent API and CLI. - Cloud API: Added JWT authentication to client API and `ctk shell`. - Cloud API: Added `health` and `ping` subcommands to `ctk cluster` - CLI: Downgraded to Click 8.1, as the code is not compatible with 8.2 yet **Breaking changes** Naming things for CLI options and environment variables: - Converged `--cratedb-sqlalchemy-url` vs. `--cratedb-http-url` options into single `--cluster-url` - Converged `CRATEDB_SQLALCHEMY_URL` vs. `CRATEDB_HTTP_URL` env vars into single `CRATEDB_CLUSTER_URL` ## 2025/04/23 v0.0.32 - MCP: Add subsystem providing a few server and client utilities through the `ctk query mcp {list,inquire,launch}` subcommands. - Docs API: Added extractors for CrateDB functions and settings - Connect: Respect `sslmode` URI parameter when converting SQLAlchemy connection URLs to `http(s)://` ## 2025/01/31 v0.0.31 - Fixed connectivity for `jobstats collect` - Refactored code and improved CLI interface of `ctk info` vs. `ctk cfr` - Dependencies: Updated to `crate-2.0.0`, which uses `orjson` for JSON marshalling - CFR: Job statistics and slow-query exploration per Marimo notebook. Thanks, @WalBeh. ## 2025/01/13 v0.0.30 - Dependencies: Minimize dependencies of core installation, defer `polars` to `cratedb-toolkit[io]`. - Fixed `ctk cfr info record` about too large values of `ulimit_hard` - Improved `ctk shell` to also talk to CrateDB standalone databases - Added basic utility command `ctk tail`, for tailing a database table, and optionally following the tail - Table Loader: Added capability to load InfluxDB Line Protocol (ILP) files - Query Collector: Now respects `CRATEDB_CLUSTER_URL` environment variable ## 2024/10/13 v0.0.29 - MongoDB: Added Zyp transformations to the CDC subsystem, making it more symmetric to the full-load procedure. - Query Converter: Added very basic expression converter utility with CLI interface - DynamoDB: Added query expression converter for relocating object references, to support query migrations after the breaking change with the SQL DDL schema, by v0.0.27. ## 2024/10/09 v0.0.28 - IO: Improved `BulkProcessor` when running per-record operations by also checking `rowcount` for handling `INSERT OK, 0 rows` responses - MongoDB: Fixed BSON decoding of `{"$date": 1180690093000}` timestamps by updating to commons-codec 0.0.21. - Testcontainers: Don't always pull the OCI image before starting. It is unfortunate in disconnected situations. ## 2024/10/01 v0.0.27 - MongoDB: Updated to pymongo 4.9 - DynamoDB: Change CrateDB data model to use (`pk`, `data`, `aux`) columns Attention: This is a breaking change. ## 2024/09/26 v0.0.26 - MongoDB: Configure `MongoDBCrateDBConverter` after updating to commons-codec 0.0.18 - DynamoDB CDC: Fix `MODIFY` operation to also propagate deleted attributes ## 2024/09/22 v0.0.25 - Table Loader: Improved conditional handling of "transformation" parameter - Table Loader: Improved status reporting and error logging in `BulkProcessor` - MongoDB: Improve error reporting - MongoDB Full: Polars' `read_ndjson` doesn't load MongoDB JSON data well, use `fsspec` and `orjson` instead - MongoDB Full: Improved initialization of transformation subsystem - MongoDB Adapter: Improved performance of when computing collection cardinality by using `collection.estimated_document_count()` - MongoDB Full: Optionally use `limit` parameter as number of total records - MongoDB Adapter: Evaluate `_id` filter field by upcasting to `bson.ObjectId`, to convey a filter that makes `ctk load table` process a single document, identified by its OID - MongoDB Dependencies: Update to commons-codec 0.0.17 ## 2024/09/19 v0.0.24 - MongoDB Full: Refactor transformation subsystem to `commons-codec` - MongoDB: Update to commons-codec v0.0.16 ## 2024/09/16 v0.0.23 - MongoDB: Unlock processing multiple collections, either from server database, or from filesystem directory - MongoDB: Unlock processing JSON files from HTTP resource, using `https+bson://` - MongoDB: Optionally filter server collection using MongoDB query expression - MongoDB: Improve error handling wrt. bulk operations vs. usability - DynamoDB CDC: Add `ctk load table` interface for processing CDC events - DynamoDB CDC: Accept a few more options for the Kinesis Stream: batch-size, create, create-shards, start, seqno, idle-sleep, buffer-time - DynamoDB Full: Improve error handling wrt. bulk operations vs. usability ## 2024/09/10 v0.0.22 - MongoDB: Rename columns with leading underscores to use double leading underscores - MongoDB: Add support for UUID types - MongoDB: Improve reading timestamps in previous BSON formats - MongoDB: Fix processing empty arrays/lists. By default, assume `TEXT` as inner type. - MongoDB: For `ctk load table`, use "partial" scan for inferring the collection schema, based on the first 10,000 documents. - MongoDB: Skip leaking `UNKNOWN` fields into SQL DDL. This means relevant column definitions will not be included into the SQL DDL. - MongoDB: Make `ctk load table` use the `data OBJECT(DYNAMIC)` mapping strategy. - MongoDB: Sanitize lists of varying objects - MongoDB: Add treatment option for applying special treatments to certain items on real-world data - MongoDB: Use pagination on source collection, for creating batches towards CrateDB - MongoDB: Unlock importing MongoDB Extended JSON files using `file+bson://...` ## 2024/09/02 v0.0.21 - DynamoDB: Add special decoding for varied lists. Store them into a separate `OBJECT(IGNORED)` column in CrateDB. - DynamoDB: Add pagination support for `full-load` table loader ## 2024/08/27 v0.0.20 - DMS/DynamoDB: Fix table name quoting within CDC processor handler ## 2024/08/26 v0.0.19 - MongoDB: Fix and verify Zyp transformations - DMS/DynamoDB/MongoDB I/O: Use SQL with parameters instead of inlining values ## 2024/08/21 v0.0.18 - Dependencies: Unpin commons-codec, to always use the latest version - Dependencies: Unpin lorrystream, to always use the latest version - MongoDB: Improve type mapper by discriminating between `INTEGER` and `BIGINT` - MongoDB: Improve type mapper by supporting BSON `DatetimeMS`, `Decimal128`, and `Int64` types ## 2024/08/19 v0.0.17 - Processor: Updated Kinesis Lambda processor to understand AWS DMS - MongoDB: Fix missing output on STDOUT for `migr8 export` - MongoDB: Improve timestamp parsing by using `python-dateutil` - MongoDB: Converge `_id` input field to `id` column instead of dropping it - MongoDB: Make user interface use stderr, so stdout is for data only - MongoDB: Make `migr8 extract` write to stdout by default - MongoDB: Make `migr8 translate` read from stdin by default - MongoDB: Improve user interface messages - MongoDB: Strip single leading underscore character from all top-level fields - MongoDB: Map OID types to CrateDB TEXT columns - MongoDB: Make `migr8 extract` and `migr8 export` accept the `--limit` option - MongoDB: Fix indentation in prettified SQL output of `migr8 translate` - MongoDB: Add capability to give type hints and add transformations - Dependencies: Adjust code for lorrystream version 0.0.3 - Dependencies: Update to lorrystream 0.0.4 and commons-codec 0.0.7 - DynamoDB: Add table loader for full-load operations ## 2024/07/25 v0.0.16 - `ctk load table`: Added support for MongoDB Change Streams - Fix dependency with the `kaggle` package, downgrade to `kaggle==1.6.14` - DynamoDB CDC: Add demo to support reading DynamoDB change data capture ## 2024/07/08 v0.0.15 - IO: Added the `if-exists` query parameter by updating to influxio 0.4.0. - Rockset: Added CrateDB Rockset Adapter, a HTTP API emulation layer - MongoDB: Added adapter amalgamating PyMongo to use CrateDB as backend - SQLAlchemy: Clean up and refactor SQLAlchemy polyfills to `cratedb_toolkit.util.sqlalchemy` - CFR: Build as a self-contained program using PyInstaller - CFR: Publish self-contained application bundle to GitHub Workflow Artifacts ## 2024/06/18 v0.0.14 - Add `ctk info` and `ctk cfr` diagnostics programs - Remove support for Python 3.7 - SQLAlchemy dialect: Use `sqlalchemy-cratedb>=0.37.0` This includes the fix to the `get_table_names()` reflection method. ## 2024/06/11 v0.0.13 - Dependencies: Migrate from `crate[sqlalchemy]` to `sqlalchemy-cratedb` ## 2024/05/30 v0.0.12 - Fix InfluxDB Cloud <-> CrateDB Cloud connectivity by using `ssl=true` query argument also for `influxdb2://` source URLs. ## 2024/05/30 v0.0.11 - Fix InfluxDB Cloud <-> CrateDB Cloud connectivity by propagating `ssl=true` query argument. Update dependencies to `influxio>=0.2.1,<1`. ## 2024/04/10 v0.0.10 - Dependencies: Unpin upper version bound of `dask`. Otherwise, compatibility issues can not be resolved quickly, like with Python 3.11.9. https://github.com/dask/dask/issues/11038 ## 2024/03/22 v0.0.9 - Dependencies: Use `dask[dataframe]` ## 2024/03/11 v0.0.8 - datasets: Fix compatibility with Python 3.7 ## 2024/03/07 v0.0.7 - datasets: Fix dataset loader ## 2024/03/07 v0.0.6 - Added `cratedb_toolkit.datasets` subsystem, for acquiring datasets from cratedb-datasets and Kaggle. ## 2024/02/12 v0.0.5 - Do not always activate pytest11 entrypoint to pytest fixture `cratedb_service`, as it depends on the `testcontainers` package, which is not always installed. ## 2024/02/10 v0.0.4 - Packaging: Use `cloud` extra to install relevant packages - Dependencies: Add `testing` extra, which installs `testcontainers` only - Testing: Export `cratedb_service` fixture as pytest11 entrypoint - Sandbox: Reduce number of extras by just using `all` ## 2024/01/18 v0.0.3 - Add SQL runner utility primitives to `io.sql` namespace - Add `import_csv_pandas` and `import_csv_dask` utility primitives - data: Add subsystem for "loading" data. - Add SDK and CLI for CrateDB Cloud Data Import APIs `ctk load table ...` - Add `migr8` program from previous repository - InfluxDB: Add adapter for `influxio` - MongoDB: Add `migr8` program from previous repository - MongoDB: Improve UX by using `ctk load table mongodb://...` - load table: Refactor to use more OO - Add `examples/cloud_import.py` - Adapt testcontainers to be agnostic of the testing framework. Thanks, @pilosus. ## 2023/11/06 v0.0.2 - CLI: Upgrade to `click-aliases>=1.0.2`, fixing erroring out when no group aliases are specified. - Add support for Python 3.12 - SQLAlchemy: Improve UNIQUE constraints polyfill to accept multiple column names, for emulating unique composite keys. ## 2023/10/10 v0.0.1 - SQLAlchemy: Add a few patches and polyfills, which do not fit well into the vanilla Python driver / SQLAlchemy dialect. - Retention: Refactor strategies `delete`, `reallocate`, and `snapshot`, to standalone variants. - Retention: Bundle configuration and runtime settings into `Settings` entity, and use more OO instead of weak dictionaries: Add `RetentionStrategy`, `TableAddress`, and `Settings` entities, to improve information passing throughout the application and the SQL templates. - Retention: Add `--schema` option, and `CRATEDB_EXT_SCHEMA` environment variable, to configure the database schema used to store the retention policy table. The default value is `ext`. - Retention: Use full-qualified table names everywhere. - Retention: Fix: Compensate for `DROP REPOSITORY` now returning `RepositoryMissingException` when the repository does not exist. With previous versions of CrateDB, it was `RepositoryUnknownException`. ## 2023/06/27 v0.0.0 - Import "data retention" implementation from . Thanks, @hammerhead.