# CrateDB

> CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is based on Lucene, inherits technologies from Elasticsearch, and is compatible with PostgreSQL.

Things to remember when working with CrateDB are:

- CrateDB is a distributed database written in Java, where individual nodes form a database cluster, using a shared-nothing architecture.
- CrateDB brings together fundamental components to manage big data after the Hadoop and Spark batch-processing era, more like Teradata, BigQuery and Snowflake are doing it.
- Clients can connect to CrateDB using HTTP or the PostgreSQL wire protocol.
- The default TCP ports of CrateDB are 4200 for the HTTP interface and 5432 for the PostgreSQL interface.
- The language of choice after connecting to CrateDB is to use SQL, mostly compatible with PostgreSQL's SQL dialect.
- The data storage layer is based on Lucene, the data distribution layer was inspired by Elasticsearch.
- Storage concepts of CrateDB include partitioning and sharding to manage data larger than fitting on a single machine.
- CrateDB Cloud offers a managed option for running CrateDB and provides additional features like automated backups, data ingest / ETL utilities, or scheduling recurrent jobs.
- Get started with CrateDB Cloud at `https://console.cratedb.cloud`.
- CrateDB also provides an option to run it on your premises, optimally by using its Docker/OCI image `docker.io/crate`. Nightly images are available per `docker.io/crate/crate:nightly`.

## Docs

- [CrateDB README](https://raw.githubusercontent.com/crate/crate/refs/heads/master/README.rst): README about CrateDB.
- [Welcome to CrateDB](https://cratedb.com/docs/guide/_sources/index.md.txt): Benefits of CrateDB at a glance.
- [CrateDB reference documentation](https://cratedb.com/docs/crate/reference/en/latest/_sources/index.rst.txt): The reference documentation of CrateDB.
- [Concept: Clustering](https://cratedb.com/docs/crate/reference/en/latest/_sources/concepts/clustering.rst.txt): How the distributed SQL database CrateDB uses a shared nothing architecture to form high-availability, resilient database clusters with minimal effort of configuration. 
- [Concept: Distributed joins](https://cratedb.com/docs/crate/reference/en/latest/_sources/concepts/joins.rst.txt): Make joins work on large volumes of data, stored distributed.
- [Concept: Storage and consistency](https://cratedb.com/docs/crate/reference/en/latest/_sources/concepts/storage-consistency.rst.txt): How CrateDB stores and distributes state across the cluster and what consistency and durability guarantees are provided. 
- [Concept: Resiliency](https://cratedb.com/docs/crate/reference/en/latest/_sources/concepts/resiliency.rst.txt): How CrateDB copes with network-, disk-, or machine-failures.
- [CrateDB reference: Partitioned tables](https://cratedb.com/docs/crate/reference/en/latest/_sources/general/ddl/partitioned-tables.rst.txt): A partitioned table is a virtual table consisting of zero or more partitions. A partition is similar to a regular single table and consists of one or more shards. A table becomes a partitioned table by defining partition columns. When a record with a new distinct combination of values for the configured partition columns is inserted, a new partition is created, and the document is inserted into this new partition. 
- [CrateDB reference: Storage](https://cratedb.com/docs/crate/reference/en/latest/_sources/general/ddl/storage.rst.txt): Data storage options can be tuned for each column similar to how indexing is defined. Using the Column Store limits the values of TEXT columns to a maximal length of 32766 bytes. Turning off the Column Store in conjunction with turning off indexing will remove the length limitation. 
- [CrateDB reference: Replication](https://cratedb.com/docs/crate/reference/en/latest/_sources/general/ddl/replication.rst.txt): You can configure CrateDB to replicate tables. When you configure replication, CrateDB will try to ensure that every table shard has one or more copies available at all times. This ensures data resiliency when individual cluster nodes go offline for maintenance. 
- [CrateDB reference: Views](https://cratedb.com/docs/crate/reference/en/latest/_sources/general/ddl/views.rst.txt): Views are stored named queries which can be used in place of table names. They’re resolved at runtime and can be used to simplify common queries. Database views have special privilege properties. 
- [Data modeling: Sequences](https://cratedb.com/docs/guide/_sources/start/modelling/primary-key.md.txt): About autogenerated sequences and PRIMARY KEY values in CrateDB.
- [Data modeling: Optimistic Concurrency Control](https://cratedb.com/docs/crate/reference/en/latest/_sources/general/occ.rst.txt): Even though CrateDB does not support transactions, optimistic concurrency control can be achieved by using the internal system columns `_seq_no` and `_primary_term`. 
- [Guide: CrateDB sharding](https://cratedb.com/docs/guide/_sources/performance/sharding.md.txt): A best practice guide about sharding with CrateDB.
- [Guide: CrateDB query optimization](https://cratedb.com/docs/guide/_sources/performance/optimization.md.txt): Essential principles for optimizing queries in CrateDB while avoiding the most common pitfalls.
- [Guide: Design for scale](https://cratedb.com/docs/guide/_sources/performance/scaling.md.txt): Critical design considerations to successfully scale CrateDB in large production environments to ensure performance and reliability as workloads grow. 
- [Integration Tutorials I](https://cratedb.com/docs/guide/_sources/integrate/index.md.txt): Integrating 3rd party software with CrateDB.
- [Integration Tutorials II](https://community.cratedb.com/raw/1015/1): Overview of CrateDB integration tutorials.

## API

- [CrateDB reference: HTTP interface](https://cratedb.com/docs/crate/reference/en/latest/_sources/interfaces/http.rst.txt): CrateDB provides a HTTP Endpoint that can be used to submit SQL queries.
- [CrateDB reference: PostgreSQL interface](https://cratedb.com/docs/crate/reference/en/latest/_sources/interfaces/postgres.rst.txt): CrateDB supports the PostgreSQL wire protocol v3.
- [CrateDB reference: Information schema](https://cratedb.com/docs/crate/reference/en/latest/_sources/general/information-schema.rst.txt): `information_schema` is a special schema that contains virtual tables which are read-only and can be queried to get information about the state of the cluster. 
- [CrateDB reference: Users and roles management](https://cratedb.com/docs/crate/reference/en/latest/_sources/admin/user-management.rst.txt): Users and roles account information is stored in the cluster metadata of CrateDB and supports standard SQL statements to create, alter and drop users and roles. You need this knowledge to work with permissions in CrateDB. 
- [CrateDB reference: Privileges](https://cratedb.com/docs/crate/reference/en/latest/_sources/admin/privileges.rst.txt): To execute statements, a user needs to have the required privileges. CrateDB has a built-in superuser account (`crate`) which has the privilege to do anything. The privileges of other users and roles have to be managed using the `GRANT`, `DENY` or `REVOKE` statements. The privileges that can be granted, denied or revoked are: `DQL`, `DML`, `DDL`, `AL`. The privileges can be granted on different classes: `CLUSTER`, `SCHEMA`, `TABLE`, `VIEW`. You need this knowledge to work with permissions in CrateDB. 
- [CrateDB tutorial: Multi-tenancy with CrateDB](https://community.cratedb.com/raw/1153/1): Multi-tenancy is an architecture in which different tenants share a single software instance. CrateDB does not support the creation of multiple databases and catalogs as some other solutions (e.g., PostgreSQL). However, there are several ways to implement multi-tenancy in CrateDB, and, as is often the case, which one works the best depends on a variety of options and trade-offs. The article illustrates two methods for sharing a single CrateDB instance between multiple tenants. You need this knowledge to work with permissions in CrateDB. 
- [CrateDB SQL reference: Syntax](https://cratedb.com/docs/crate/reference/en/latest/_sources/sql/index.rst.txt): You can use Structured Query Language (SQL) to query your data.
- [CrateDB SQL reference: CREATE TABLE](https://cratedb.com/docs/crate/reference/en/latest/_sources/sql/statements/create-table.rst.txt): The `CREATE TABLE` command creates a new, initially empty table. The command accepts many important parameters for commandeering CrateDB's special features, mostly `CLUSTERED` and `PARTITIONED BY`, extended by exposing loads of options through its `WITH` modifier. 
- [CrateDB SQL reference: CREATE TABLE AS](https://cratedb.com/docs/crate/reference/en/latest/_sources/sql/statements/create-table-as.rst.txt): `CREATE TABLE AS` will create a new table and insert rows based on the specified query. 
- [CrateDB SQL reference: CREATE FOREIGN TABLE](https://cratedb.com/docs/crate/reference/en/latest/_sources/sql/statements/create-foreign-table.rst.txt): `CREATE FOREIGN TABLE` is a DDL statement that creates a new foreign table. A foreign table is a view onto data in a foreign system. 
- [CrateDB SQL reference: ALTER TABLE](https://cratedb.com/docs/crate/reference/en/latest/_sources/sql/statements/alter-table.rst.txt): The `ALTER TABLE` command can be used to modify an existing table definition. It provides options to add columns, modify constraints, enabling or disabling table parameters and allows to execute a shard reroute allocation. The REROUTE command provides various options to manually control the allocation of shards. It allows the enforcement of explicit allocations, cancellations and the moving of shards between nodes in a cluster, giving you the ability to re-balance the cluster state manually. 
- [CrateDB SQL reference: COPY FROM](https://cratedb.com/docs/crate/reference/en/latest/_sources/sql/statements/copy-from.rst.txt): The `COPY FROM` command copies data from a URI to the specified table.
- [CrateDB SQL reference: COPY TO](https://cratedb.com/docs/crate/reference/en/latest/_sources/sql/statements/copy-to.rst.txt): The `COPY TO` command exports the contents of a table to one or more files into a given directory with unique filenames. 
- [CrateDB SQL reference: Scalar functions](https://cratedb.com/docs/crate/reference/en/latest/_sources/general/builtins/scalar-functions.rst.txt): Scalar functions are functions that return scalars.
- [CrateDB SQL reference: Aggregation functions](https://cratedb.com/docs/crate/reference/en/latest/_sources/general/builtins/aggregation.rst.txt): When selecting data from CrateDB, you can use an aggregate function to calculate a single summary value for one or more columns. 
- [CrateDB SQL reference: Table functions](https://cratedb.com/docs/crate/reference/en/latest/_sources/general/builtins/table-functions.rst.txt): Table functions are functions that produce a set of rows. They can be used in place of a relation in the `FROM` clause. 
- [CrateDB SQL reference: Window functions](https://cratedb.com/docs/crate/reference/en/latest/_sources/general/builtins/window-functions.rst.txt): Window functions are functions which perform a computation across a set of rows which are related to the current row. This is comparable to aggregation functions, but window functions do not cause multiple rows to be grouped into a single row. 
- [CrateDB SQL reference: User-defined functions](https://cratedb.com/docs/crate/reference/en/latest/_sources/general/user-defined-functions.rst.txt): CrateDB supports user-defined functions.
- [CrateDB SQL reference: Arithmetic operators](https://cratedb.com/docs/crate/reference/en/latest/_sources/general/builtins/arithmetic.rst.txt): Arithmetic operators perform mathematical operations on numeric values including timestamps. 
- [CrateDB SQL reference: Bit operators](https://cratedb.com/docs/crate/reference/en/latest/_sources/general/builtins/bit-operators.rst.txt): Bit operators perform bitwise operations on numeric integral values and bit strings. 
- [CrateDB SQL reference: Comparison operators](https://cratedb.com/docs/crate/reference/en/latest/_sources/general/builtins/comparison-operators.rst.txt): A comparison operator tests the relationship between two values and returns a corresponding value of `true`, `false`, or `NULL`. 
- [CrateDB SQL reference: Array comparison operators](https://cratedb.com/docs/crate/reference/en/latest/_sources/general/builtins/array-comparisons.rst.txt): An array comparison operator tests the relationship between a value and an array and returns `true`, `false`, or `NULL`. 
- [CrateDB SQL reference: Subquery expressions](https://cratedb.com/docs/crate/reference/en/latest/_sources/general/builtins/subquery-expressions.rst.txt): Some operators can be used with an uncorrelated subquery to form a *subquery expression* that returns a boolean value (i.e., `true` or `false`) or `NULL`. 
- [CrateDB cluster-wide settings](https://cratedb.com/docs/crate/reference/en/latest/_sources/config/cluster.rst.txt): Cluster-wide settings can be read by querying the `sys.cluster.settings` column. Most cluster settings can be changed at runtime. 
- [CrateDB node-specific settings](https://cratedb.com/docs/crate/reference/en/latest/_sources/config/node.rst.txt): Node-specific settings of CrateDB.

## Examples

- [CrateDB SQL gallery](https://raw.githubusercontent.com/crate/cratedb-toolkit/refs/tags/v0.0.31/cratedb_toolkit/info/library.py): A collection of SQL queries and utilities suitable for diagnostics on CrateDB.
- [CrateDB Toolkit: Import example datasets](https://cratedb-toolkit.readthedocs.io/_sources/datasets.md.txt): CrateDB Toolkit's `cratedb_toolkit.datasets.load_dataset` API primitive can be used to load curated datasets from <https://github.com/crate/cratedb-datasets> into your database programmatically, using a few lines of Python. The API is suitable to be used for exploring CrateDB in Python programs and scientific notebooks. 

## Optional

- [CrateDB features](https://cratedb.com/docs/guide/_sources/feature/index.md.txt): All features of CrateDB at a glance.
- [Feature: SQL](https://cratedb.com/docs/guide/_sources/feature/sql/index.md.txt): CrateDB’s features are available using plain SQL, and it is wire-protocol compatible to PostgreSQL.
- [Feature: Connectivity](https://cratedb.com/docs/guide/_sources/connect/index.md.txt): All CrateDB connectivity options at a glance: Drivers, adapters, connectors, frameworks.
- [Feature: Document Store](https://cratedb.com/docs/guide/_sources/feature/document/index.md.txt): Efficiently store JSON documents or other structured data, also nested, using CrateDB’s OBJECT and ARRAY container data types, and query this data with ease. 
- [Feature: Relational / JOINs](https://cratedb.com/docs/guide/_sources/feature/relational/index.md.txt): CrateDB implements distributed joins.
- [Feature: Full-text Search](https://cratedb.com/docs/guide/_sources/feature/search/fts/index.md.txt): BM25 term search based on Apache Lucene, using SQL - CrateDB is all you need.
- [Feature: Geospatial Search](https://cratedb.com/docs/guide/_sources/feature/search/geo/index.md.txt): CrateDB supports location data for efficiently storing and querying geographic and spatial/geospatial data.
- [Feature: Vector Search](https://cratedb.com/docs/guide/_sources/feature/search/vector/index.md.txt): Vector search on machine learning embeddings - CrateDB is all you need.
- [Feature: Hybrid Search](https://cratedb.com/docs/guide/_sources/feature/search/hybrid/index.md.txt): Combined BM25 term search and vector search based on Apache Lucene, using SQL - CrateDB is all you need.
- [Feature: BLOB Store](https://cratedb.com/docs/guide/_sources/feature/blob/index.md.txt): CrateDB provides a blob/object storage subsystem accessible via HTTP, similar to AWS S3.
- [Feature: Clustering](https://cratedb.com/docs/guide/_sources/feature/cluster/index.md.txt): CrateDB provides scalability through partitioning, sharding, and replication.
- [Feature: Snapshots](https://cratedb.com/docs/guide/_sources/feature/snapshot/index.md.txt): CrateDB provides a backup mechanism based on snapshots.
- [Feature: Cloud Native](https://cratedb.com/docs/guide/_sources/feature/cloud/index.md.txt): CrateDB is designed to support cloud computing from the beginning.
- [Feature: Storage Layer](https://cratedb.com/docs/guide/_sources/feature/storage/index.md.txt): The CrateDB storage layer is based on Lucene. By default, all fields are indexed, nested or not, but the indexing can be turned off selectively. 
- [Feature: Hybrid Index](https://cratedb.com/docs/guide/_sources/feature/index/index.md.txt): CrateDB indexes all columns by default, for lightning-fast query responses on your fingertips.
- [Feature: Advanced Querying](https://cratedb.com/docs/guide/_sources/feature/query/index.md.txt): About all the advanced querying features of CrateDB, unifying data types and query characteristics. Mix full-text search with time series aspects, and run powerful aggregations or other kinds of complex queries on your data. CrateDB supports effective time series analysis with fast aggregations, relational features for JOIN operations, and a rich set of built-in functions. 
- [Feature: Generated Columns](https://cratedb.com/docs/guide/_sources/feature/generated/index.md.txt): CrateDB's SQL DDL statements accept defining generated columns. Those columns values are computed by applying a generation expression in the context of the current row. The generation expression can reference the values of other columns. 
- [Feature: Server-Side Cursors](https://cratedb.com/docs/guide/_sources/feature/cursor/index.md.txt): CrateDB implements the SQL Standard feature F431 (read-only scrollable cursor), aka. server-side cursors, aka. portals. 
- [Feature: Foreign Data Wrapper](https://cratedb.com/docs/guide/_sources/feature/fdw/index.md.txt): Like the PostgreSQL FDW implementation, CrateDB offers the possibility to access database tables on remote database servers as if they would be stored within CrateDB. 
- [Feature: User-Defined Functions](https://cratedb.com/docs/guide/_sources/feature/udf/index.md.txt): CrateDB supports user-defined functions (UDFs) that can be written in JavaScript.
- [Feature: Cross-Cluster Replication](https://cratedb.com/docs/guide/_sources/feature/ccr/index.md.txt): Cross-cluster replication, also called logical replication, is a method of data replication across multiple clusters. 
- [LangChain and CrateDB](https://raw.githubusercontent.com/crate/cratedb-examples/refs/heads/main/topic/machine-learning/langchain/README.md): Get started with LangChain and CrateDB.