This section provides an overview on how to query documents using SQL. See Data Definition in Crate for information about Table creation and other Data Definition statements.
Retrieving data from Crate is done by using a SQL SELECT statement. The response to a SELECT query contains the column names of the result and the actual result rows as a two-dimensional array of values.
A simple select:
cr> select name, position from locations order by id limit 2;
+-------------------+----------+
| name | position |
+-------------------+----------+
| North West Ripple | 1 |
| Arkintoofle Minor | 3 |
+-------------------+----------+
SELECT 2 rows in set (... sec)
If the ‘*’ operator is used, all properties defined in the schema are returned for each object:
cr> select date, description, kind, name, position, race from locations order by id limit 1 offset 1;
+--------------+--------------...---+--------+-------------------+----------+------+
| date | description | kind | name | position | race |
+--------------+--------------...---+--------+-------------------+----------+------+
| 308534400000 | Motivated by ...s. | Planet | Arkintoofle Minor | 3 | NULL |
+--------------+--------------...---+--------+-------------------+----------+------+
SELECT 1 row in set (... sec)
Aliases can be used to change the output name of the columns:
cr> select name as n from locations order by id limit 1;
+-------------------+
| n |
+-------------------+
| North West Ripple |
+-------------------+
SELECT 1 row in set (... sec)
If DISTINCT is specified, one unique row is kept. All other duplicate rows are removed from the result set:
cr> select distinct date from locations order by date;
+---------------+
| date |
+---------------+
| 308534400000 |
| 1367366400000 |
| 1373932800000 |
+---------------+
SELECT 3 rows in set (... sec)
A simple where clause example using an equality operator:
cr> select description from locations where id = '1';
+-----------------------------------------...-------------------------------------------+
| description |
+-----------------------------------------...-------------------------------------------+
| Relative to life on NowWhat, living on a...er by a factor of about seventeen million. |
+-----------------------------------------...-------------------------------------------+
SELECT 1 row in set (... sec)
Usual comparison operators to be used in filters are available for all simple data types:
Operator Description < less than > greater than <= less than or equal to >= greater than or equal to = equal <> not equal != not equal - same as <> is not null field is not null and not missing is null field is null or missing like matches a part of the given value
Note
The field ‘name’ used for the request above is defined as ‘not_analyzed’ in the schema. For an ‘analyzed’ field the result may differ depending on the used Analyzer/Tokenizer.
For strings a lexicographical comparison is performed based on the Lucene TermRangeQuery:
cr> select name from locations where name > 'Argabuthon' order by name;
+------------------------------------+
| name |
+------------------------------------+
| Arkintoofle Minor |
| Bartledan |
| Galactic Sector QQ7 Active J Gamma |
| North West Ripple |
| Outer Eastern Rim |
+------------------------------------+
SELECT 5 rows in set (... sec)
For details please refer to the Apache Lucene site.
Number and date field comparison behave as expected from standard SQL. The following example uses one of the supported ISO date formats:
cr> select date, position from locations where date <= '1979-10-12' and
... position < 3 order by position;
+--------------+----------+
| date | position |
+--------------+----------+
| 308534400000 | 1 |
| 308534400000 | 2 |
+--------------+----------+
SELECT 2 rows in set (... sec)
For a detailed explanation of the supported ISO date formats please refer to the joda date_optional_time site.
For custom date types, or defined date formats in the object mapping the corresponding format should be used for a comparison. Otherwise the operation may fail.
Crate supports the LIKE operator. This operator can be used to query for rows where only part of a columns value should match something. For example to get all locations where the name starts with ‘Ar’ the following query can be used:
cr> select name from locations where name like 'Ar%' order by name asc;
+-------------------+
| name |
+-------------------+
| Argabuthon |
| Arkintoofle Minor |
+-------------------+
SELECT 2 rows in set (... sec)
The following wildcard operators are available:
% A substitute for zero or more characters _ A substitute for a single character
The wildcard operators may be used at any point in the string literal. For example a more complicated like clause could look like this:
cr> select name from locations where name like '_r%a%' order by name asc;
+------------+
| name |
+------------+
| Argabuthon |
+------------+
SELECT 1 row in set (... sec)
In order so search for the wildcard characters themselves it is possible to escape them using a backslash:
cr> select description from locations where description like '%\%' order by description asc;
+-------------------------+
| description |
+-------------------------+
| The end of the Galaxy.% |
+-------------------------+
SELECT 1 row in set (... sec)
Note
Queries with a like clause can be quite slow. Especially if the like clause starts with a wildcard character. Because in that case CRATE has to iterate over all rows and can’t utilize the index. For better performance consider using a fulltext index.
As unlimited SELECT queries could break your cluster if the matching rows exceed your node’s RAM, SELECT statements are limited by default to 10000 rows. You can expand this limit by using an explicit LIMIT-clause. But you are encouraged to make use of a windowing using LIMIT and OFFSET to iterate through all the results of a potentially large resultset instead of expanding the default limit.
Crate supports an object data type, used for simple storing a whole object into a column and it’s even possible to select and query for properties of such objects.
Select a property of an inner object:
cr> select name, race['name'] from locations where name = 'Bartledan';
+-----------+----------------+
| name | race['name'] |
+-----------+----------------+
| Bartledan | Bartledannians |
+-----------+----------------+
SELECT 1 row in set (... sec)
Query for a property of an inner object:
cr> select name, race['name'] from locations where race['name'] = 'Bartledannians';
+-----------+----------------+
| name | race['name'] |
+-----------+----------------+
| Bartledan | Bartledannians |
+-----------+----------------+
SELECT 1 row in set (... sec)
Crate supports aggregations by using the aggregation functions listed below on SELECT statements.
Aggregation works on all the rows that match a query or on all matching rows in every distinct group of a GROUP BY statement. Aggregating SELECT statements without GROUP BY will always return one row, as they do an aggregation operation on the matching rows as one group.
See also
Name | Arguments | Description | Return Type |
---|---|---|---|
COUNT(*) | Star as Parameter or as Constant or the primary key column | Counts the number of rows that match the query. | long |
MIN | column name of a numeric, timestamp or string column | returns the smallest of the values in the argument column in case of strings this means the lexicographically smallest. NULL-values are ignored | the input column type or NULL if all values in that matching rows in that column are NULL |
MAX | column name of a numeric, timestamp or string column | returns the biggest of the values in the argument column in case of strings this means the lexicographically biggest. NULL-values are ignored | the input column type or NULL if all values of all matching rows in that column are NULL |
SUM | column name of a numeric or timestamp column | returns the sum of the values in the argument column. NULL-values are ignored. | double or NULL if all values of all matching rows in that column are NULL |
AVG | column name of a numeric or timestamp column | returns the arithmetic mean of the values in the argument column. NULL-values are ignored. | double or NULL if all values of all matching rows in that column are NULL |
ANY | column name of a primitive typed column (all but object) | returns an undefined value of all the values in the argument column. Can be NULL. | the input column type or NULL if some value of the matching rows in that column is NULL |
Some Examples:
cr> select count(*) from locations;
+----------+
| count(*) |
+----------+
| 13 |
+----------+
SELECT 1 row in set (... sec)
cr> select count(*) from locations where kind='Planet';
+----------+
| count(*) |
+----------+
| 5 |
+----------+
SELECT 1 row in set (... sec)
cr> select count(name), count(*) from locations;
+-------------+----------+
| count(name) | count(*) |
+-------------+----------+
| 12 | 13 |
+-------------+----------+
SELECT 1 row in set (... sec)
cr> select max(name) from locations;
+-------------------+
| max(name) |
+-------------------+
| Outer Eastern Rim |
+-------------------+
SELECT 1 row in set (... sec)
cr> select min(date) from locations;
+--------------+
| min(date) |
+--------------+
| 308534400000 |
+--------------+
SELECT 1 row in set (... sec)
cr> select count(*), kind from locations group by kind order by kind asc;
+----------+-------------+
| count(*) | kind |
+----------+-------------+
| 4 | Galaxy |
| 5 | Planet |
| 4 | Star System |
+----------+-------------+
SELECT 3 rows in set (... sec)
cr> select max(position), kind from locations group by kind order by max(position) desc;
+---------------+-------------+
| max(position) | kind |
+---------------+-------------+
| 6 | Galaxy |
| 5 | Planet |
| 4 | Star System |
+---------------+-------------+
SELECT 3 rows in set (... sec)
cr> select min(name), kind from locations group by kind order by min(name) asc;
+------------------------------------+-------------+
| min(name) | kind |
+------------------------------------+-------------+
| | Planet |
| Aldebaran | Star System |
| Galactic Sector QQ7 Active J Gamma | Galaxy |
+------------------------------------+-------------+
SELECT 3 rows in set (... sec)
cr> select count(*), min(name), kind from locations group by kind order by kind;
+----------+------------------------------------+-------------+
| count(*) | min(name) | kind |
+----------+------------------------------------+-------------+
| 4 | Galactic Sector QQ7 Active J Gamma | Galaxy |
| 5 | | Planet |
| 4 | Aldebaran | Star System |
+----------+------------------------------------+-------------+
SELECT 3 rows in set (... sec)
cr> select sum(position) as sum_positions, kind from locations group by kind order by sum_positions;
+---------------+-------------+
| sum_positions | kind |
+---------------+-------------+
| 10.0 | Star System |
| 13.0 | Galaxy |
| 15.0 | Planet |
+---------------+-------------+
SELECT 3 rows in set (... sec)
Crate supports the group by clause. This clause can be used to group the resulting rows by the value(s) of one or more columns. That means that rows that contain duplicate values will be merged together.
This is useful if used in conjunction with aggregation functions:
cr> select count(*), kind from locations group by kind order by count(*) desc, kind asc;
+----------+-------------+
| count(*) | kind |
+----------+-------------+
| 5 | Planet |
| 4 | Galaxy |
| 4 | Star System |
+----------+-------------+
SELECT 3 rows in set (... sec)
Note
All columns that are used either as result column or in the order by clause have to be used within the group by clause. Otherwise the statement won’t execute.
Note
Grouping on multi-value fields doesn’t work. If such a field is encountered during a group by operation an error is thrown.
In order to use fulltext searches on columns, one must create a fulltext index with an analyzer on the related column upfront on table creation, see Indices and fulltext search for details.
A fulltext search is done using the match(columnName, queryTerm) helper function:
cr> select name from locations where match(name_description_ft, 'time');
+-----------+
| name |
+-----------+
| Altair |
| Bartledan |
+-----------+
SELECT 2 rows in set (... sec)
In order to get the match score of the fulltext search, an internal system column _score can be selected:
cr> select name, \"_score\" from locations where match(name_description_ft, 'time');
+-----------+------------+
| name | _score |
+-----------+------------+
| Altair | 0.56319076 |
| Bartledan | 0.4590714 |
+-----------+------------+
SELECT 2 rows in set (... sec)
The results of a match(...) query are sorted by _score in a descending order by default. Of course it is possible to change it to use an ascending order instead:
cr> select name, \"_score\" from locations where match(name_description_ft, 'time')
... order by \"_score\" asc;
+-----------+------------+
| name | _score |
+-----------+------------+
| Bartledan | 0.4590714 |
| Altair | 0.56319076 |
+-----------+------------+
SELECT 2 rows in set (... sec)
Note
The terms used for a fulltext search are analyzed by an OR operator.
A negative fulltext search can be done using a NOT clause:
cr> select name, \"_score\" from locations where not match(name_description_ft, 'time');
+------------------------------------+--------+
| name | _score |
+------------------------------------+--------+
| Outer Eastern Rim | 1.0 |
| Aldebaran | 1.0 |
| Alpha Centauri | 1.0 |
| Allosimanius Syneca | 1.0 |
| NULL | 1.0 |
| North West Ripple | 1.0 |
| Galactic Sector QQ7 Active J Gamma | 1.0 |
| Algol | 1.0 |
| Argabuthon | 1.0 |
| Arkintoofle Minor | 1.0 |
| | 1.0 |
+------------------------------------+--------+
SELECT 11 rows in set (... sec)
It is possible to filter results by the _score column but as it’s value is a ratio related to the highest score of all result and thus never absolute and directly comparable across searches its use-case is very limited if any. It is only possible to filter by greater-than or greater-equals operator on the _score column.
Anyway let’s do it here for demonstration purpose:
cr> select name, \"_score\" from locations where match(name_description_ft, 'time')
... and \"_score\" > 0.9;
+--------+-----------+
| name | _score |
+--------+-----------+
| Altair | 0.9204767 |
+--------+-----------+
SELECT 1 row in set (... sec)
As you maybe noticed, the _score value has changed for the same query text and document because it’s a ratio related to all results, and if we limit it by filtering on _score, we’ll have fewer or different results.
Warning
As noted above _score is a ratio and not comparable across search so use it only for filtering if you know what you’re doing and on your own risk.
Inserting data to Crate is done by using the SQL INSERT statement.
Note
The column list at Crate is always ordered alphabetically by column name and so must the inserted column values.
Inserting a row:
cr> insert into locations values ('14', '2013-09-12T21:43:59.000Z', 'Blagulon Kappa is the planet to which the police are native.', 'Planet', 'Blagulon Kappa', 7);
INSERT OK, 1 row affected (... sec)
Inserting multiple rows at once (aka. bulk insert) can be done by defining multiple values for the INSERT statement:
cr> insert into locations (id, date, description, kind, name, position) values
... ('14', '2013-09-12T21:43:59.000Z', 'Blagulon Kappa is the planet to which the police are native.', 'Planet', 'Blagulon Kappa', 7),
... ('15', '2013-09-13T16:43:59.000Z', 'Brontitall is a planet with a warm, rich atmosphere and no mountains.', 'Planet', 'Brontitall', 10);
INSERT OK, 2 rows affected (... sec)
In order to update documents in Crate the SQL UPDATE statement can be used:
cr> update locations set description = 'Updated description' where name = 'Bartledan';
UPDATE OK, 1 row affected (... sec)
Updating nested objects is also supported:
cr> update locations set race['name'] = 'Human' where name = 'Bartledan';
UPDATE OK, 1 row affected (... sec)
Note
If the same documents are updated concurrently an VersionConflictException might occur. Crate contains a retry logic that tries to resolve the conflict automatically. But if it fails more than 3 times the error is returned to the user.
Deleting rows in Crate is done using the SQL DELETE statement:
cr> delete from locations where position > 3;
DELETE OK, ... rows affected (... sec)
Crate Data is eventually consistent. Data written with a former statement is not guaranteed to be fetched with the next following select statement for the affected rows.
If required a table can be refreshed explicitly in order to ensure that the latest state of the table gets fetched:
cr> refresh table locations;
REFRESH OK (... sec)
A table is refreshed periodically with a specified refresh interval. By default, the refresh interval is set to 1000 milliseconds. The refresh interval of a table can be changed with the table parameter refresh_interval (see refresh_interval).
Using the COPY FROM SQL statement, data can be imported into Crate. Currently the only supported data format is JSON, one line is representing one entry.
Example JSON data:
{"id": 1, "quote": "Don't panic"}
{"id": 2, "quote": "Would it save you a lot of time if I just gave up and went mad now?"}
Note
Existing entries will be overwritten on import.
Note
The COPY FROM statement will not convert or validate your data. Please make sure that it fits your schema.
An example import from a file URI:
cr> copy quotes from 'file:///tmp/import_data/quotes.json';
COPY OK, 3 rows affected (... sec)
If all files inside a directory should be imported a * wildcard has to be used:
cr> copy quotes from '/tmp/import_data/*' with (concurrency=1, bulk_size=4);
COPY OK, 3 rows affected (... sec)
This wildcard can also be used to only match certain files:
cr> copy quotes from '/tmp/import_data/qu*.json';
COPY OK, 3 rows affected (... sec)
See COPY FROM for more information.
Data can be exported using the COPY TO statement. Data gets exported distributed on each node holding data of the table to be exported.
Note
Data is written per shard, so if there is more than one shard of the exported table on the same node, the output file will get corrupted due to concurrent writes to the same file. The example below shows a way to prevent such cases.
This example shows how to export a given table into files named after the table and shard id with gzip compression:
cr> refresh table quotes;
REFRESH OK...
cr> copy quotes to format('/tmp/%s_%s.json',
... sys.shards.table_name, sys.shards.id)
... with (compression='gzip');
COPY OK, 3 rows affected ...
For further details see COPY TO.