I/O Subsystem

Load and extract data into/from CrateDB.

About

Transfer data from AWS DMS, Databricks, DuckDB, DynamoDB, InfluxDB, MongoDB, MongoDB Atlas, MotherDuck, PostgreSQL, and many more databases, platforms, and services to CrateDB or CrateDB Cloud.

What’s inside

A one-stop command, ctk load table, to load data into database tables.

Installation

Latest release.

pip install --upgrade 'cratedb-toolkit[all]'

Development version.

pip install --upgrade 'cratedb-toolkit[all] @ git+https://github.com/crate/cratedb-toolkit.git'

General Notes

By default, the table name will be derived from the name of the input resource. If you would like to specify a different table name, use the --table command line option, or the CRATEDB_TABLE environment variable.

When aiming to write into a table in a different database schema, use the --schema command line option, or the CRATEDB_SCHEMA environment variable. When omitting this parameter, the default value doc will be used.

Cloud Import API

Using the CrateDB Cloud Import API, you can import files in CSV, JSON, and Parquet formats.

Prerequisites

Authenticate with CrateDB Cloud using one of those identity providers: Cognito, Azure AD, GitHub, Google.

croud login --idp azuread

To discover the list of available database clusters.

croud clusters list

Define the cluster id of your CrateDB Cloud Cluster you are aiming to connect to, and its connection credentials.

export CRATEDB_CLUSTER_ID='<YOUR_CLUSTER_ID>'

Usage

Load data into database table.

ctk load table 'https://cdn.crate.io/downloads/datasets/cratedb-datasets/cloud-tutorials/data_weather.csv.gz'
ctk load table 'https://cdn.crate.io/downloads/datasets/cratedb-datasets/cloud-tutorials/data_marketing.json.gz'
ctk load table 'https://cdn.crate.io/downloads/datasets/cratedb-datasets/timeseries/yc.2019.07-tiny.parquet.gz'

Query and aggregate data using SQL.

ctk shell --command="SELECT * FROM data_weather LIMIT 10;"
ctk shell --command="SELECT * FROM data_weather LIMIT 10;" --format=csv
ctk shell --command="SELECT * FROM data_weather LIMIT 10;" --format=json