Querying Crate

This section provides an overview on how to query documents using SQL. See Data Definition in Crate for information about Table creation and other Data Definition statements.

Retrieving Data

Retrieving data from Crate is done by using a SQL SELECT statement. The response to a SELECT query contains the column names of the result and the actual result rows as a two-dimensional array of values.

A simple select:

cr> select name, position from locations order by "_id" limit 2
+-------------------+----------+
| name              | position |
+-------------------+----------+
| North West Ripple | 1        |
| Arkintoofle Minor | 3        |
+-------------------+----------+
SELECT 2 rows in set (... sec)

If the ‘*’ operator is used, all properties defined in the schema are returned for each object:

cr> select * from locations order by "_id" limit 1 offset 1
+--------------------------+--------------...---+--------+-------------------+----------+------+
| date                     | description        | kind   | name              | position | race |
+--------------------------+--------------...---+--------+-------------------+----------+------+
| 1979-10-12T00:00:00.000Z | Motivated by ...s. | Planet | Arkintoofle Minor | 3        | None |
+--------------------------+--------------...---+--------+-------------------+----------+------+
SELECT 1 row in set (... sec)

Aliases can be used to change the output name of the columns:

cr> select name as n from locations order by "_id" limit 1
+-------------------+
| n                 |
+-------------------+
| North West Ripple |
+-------------------+
SELECT 1 row in set (... sec)

Where Clause

A simple where clause example using an equality operator:

cr> select description from locations where "_id" = '1'
+-----------------------------------------...-------------------------------------------+
| description                                                                           |
+-----------------------------------------...-------------------------------------------+
| Relative to life on NowWhat, living on a...er by a factor of about seventeen million. |
+-----------------------------------------...-------------------------------------------+
SELECT 1 row in set (... sec)

Usual comparison operators to be used in filters are available for string, integer/long, float/double and date type:

Operator Description
< less than
> greater than
<= less than or equal to
>= greater than or equal to
= equal
<> not equal
!= not equal - same as <>
is not null field is not null and not missing
is null field is null or missing
like matches a part of the given value

Note

The field ‘name’ used for the request above is defined as ‘not_analyzed’ in the schema. For an ‘analyzed’ field the result may differ depending on the used Analyzer/Tokenizer. For details regarding analysis please refer to the elasticsearch analysis site.

For strings a lexicographical comparison is performed based on the Lucene TermRangeQuery:

cr> select name from locations where name > 'Argabuthon' order by name
+------------------------------------+
| name                               |
+------------------------------------+
| Arkintoofle Minor                  |
| Bartledan                          |
| Galactic Sector QQ7 Active J Gamma |
| North West Ripple                  |
| Outer Eastern Rim                  |
+------------------------------------+
SELECT 5 rows in set (... sec)

For details please refer to the Apache Lucene site.

In most SQL databases there is a distinction between NULL and empty string. Since Crate is a schemaless document oriented database there is the third possibility that a row doesn’t contain a field. Due to this is null will return all rows where the fields value is NULL or where the field doesn’t exist at all. is not null also behaves this way. E.g. the following query returns 2 rows since the is null filter matches all rows and a limit is specified:

cr> select name from locations where missing is null order by "_id" limit 2
+-------------------+
| name              |
+-------------------+
| North West Ripple |
| Arkintoofle Minor |
+-------------------+
SELECT 2 rows in set (... sec)

Number and date field comparison behave as expected from standard SQL. The following example uses one of the supported ISO date formats:

cr> select date, position from locations where date <= '1979-10-12' and
... position < 3 order by position
+--------------------------+----------+
| date                     | position |
+--------------------------+----------+
| 1979-10-12T00:00:00.000Z | 1        |
| 1979-10-12T00:00:00.000Z | 2        |
+--------------------------+----------+
SELECT 2 rows in set (... sec)

All supported default ISO date formats are comparable. For a full list please refer to the elasticsearch date-format site.

For custom date types, or defined date formats in the object mapping the corresponding format should be used for a comparison. Otherwise the operation may fail.

Like

Crate supports the LIKE operator. This operator can be used to query for rows where only part of a columns value should match something. For example to get all locations where the name starts with ‘Ar’ the following query can be used:

cr> select name from locations where name like 'Ar%' order by name asc
+-------------------+
| name              |
+-------------------+
| Argabuthon        |
| Arkintoofle Minor |
+-------------------+
SELECT 2 rows in set (... sec)

The following wildcard operators are available:

% A substitute for zero or more characters
_ A substitute for a single character

The wildcard operators may be used at any point in the string literal. For example a more complicated like clause could look like this:

cr> select name from locations where name like '_r%a%' order by name asc
+------------+
| name       |
+------------+
| Argabuthon |
+------------+
SELECT 1 row in set (... sec)

In order so search for the wildcard characters themselves it is possible to escape them using a backslash:

cr> select description from locations where description like '%\%' order by description asc
+-------------------------+
| description             |
+-------------------------+
| The end of the Galaxy.% |
+-------------------------+
SELECT 1 row in set (... sec)

Note

Queries with a like clause can be quite slow. Especially if the like clause starts with a wildcard character. Because in that case CRATE has to iterate over all rows and can’t utilize the index. For better performance consider using a fulltext index.

Limits

As unlimited SELECT queries could break your cluster if the matching rows exceed your node’s RAM, SELECT statements are limited by default to 1000 rows. You can expand this limit by using an explicit LIMIT-clause. But you are encouraged to make use of a windowing using LIMIT and OFFSET to iterate through all the results of a potentially large resultset instead of expanding the default limit.

Inner/Nested Objects

Crate supports an object data type, used for simple storing a whole object into a column and it’s even possible to select and query for properties of such objects.

Select a property of an inner object:

cr> select name, race['name'] from locations where name = 'Bartledan'
+-----------+----------------+
| name      | race['name']   |
+-----------+----------------+
| Bartledan | Bartledannians |
+-----------+----------------+
SELECT 1 row in set (... sec)

Query for a property of an inner object:

cr> select name, race['name'] from locations where race['name'] = 'Bartledannians'
+-----------+----------------+
| name      | race['name']   |
+-----------+----------------+
| Bartledan | Bartledannians |
+-----------+----------------+
SELECT 1 row in set (... sec)

Data Aggregation

Currently the only supported aggregation function in crate is count(*):

cr> select count(*) from locations
+----------+
| COUNT(*) |
+----------+
| 13       |
+----------+
SELECT 1 row in set (... sec)

Note

The count() aggregate function can also be called with parameter substitution. But only if the value evaluates to a ‘*’.

Note

count(columName) works also, but currently only on a primary key column.

Group by

Crate supports the group by clause. This clause can be used to group the resulting rows by the value(s) of one or more columns. That means that rows that contain duplicate values will be merged together.

This is useful if used in conjunction with aggregation functions:

cr> select count(*), kind from locations group by kind order by count(*) desc, kind asc
+----------+-------------+
| COUNT(*) | kind        |
+----------+-------------+
| 5        | Planet      |
| 4        | Galaxy      |
| 4        | Star System |
+----------+-------------+
SELECT 3 rows in set (... sec)

Note

All columns that are used either as result column or in the order by clause have to be used within the group by clause. Otherwise the statement won’t execute.

Inserting data

Inserting data to Crate is done by using the SQL INSERT statement.

Note

The column list at Crate is always ordered alphabetically by column name and so must the inserted column values.

Inserting a row:

cr> insert into locations values ('2013-09-12T21:43:59.000Z', 'Blagulon Kappa is the planet to which the police are native.', 'Planet', 'Blagulon Kappa', 7)
INSERT OK, 1 row affected (... sec)

Inserting multiple rows at once (aka. bulk insert) can be done by defining multiple values for the INSERT statement:

cr> insert into locations (date, description, kind, name, position) values
... ('2013-09-12T21:43:59.000Z', 'Blagulon Kappa is the planet to which the police are native.', 'Planet', 'Blagulon Kappa', 7),
... ('2013-09-13T16:43:59.000Z', 'Brontitall is a planet with a warm, rich atmosphere and no mountains.', 'Planet', 'Brontitall', 10)
INSERT OK, 2 rows affected (... sec)

Updating data

In order to update documents in Crate the SQL UPDATE statement can be used:

cr> update locations set description = 'Updated description' where name = 'Bartledan'
UPDATE OK, 1 row affected (... sec)

Updating nested objects is also supported:

cr> update locations set race['name'] = 'Human' where name = 'Bartledan'
UPDATE OK, 1 row affected (... sec)

Note

If the same documents are updated concurrently an VersionConflictException might occur. Crate contains a retry logic that tries to resolve the conflict automatically. But if it fails more than 3 times the error is returned to the user.

Deleting data

Deleting rows in Crate is done using the SQL DELETE statement:

cr> delete from locations where position > 3
DELETE OK, ... rows affected (... sec)