==============
Querying Crate
==============

This section provides an overview on how to query documents using SQL.  See
:ref:`sql_ddl` for information about `Table creation` and other Data Definition
statements.

Retrieving Data
===============

Retrieving data from ``Crate`` is done by using a SQL ``SELECT`` statement.
The response to a ``SELECT`` query contains the column names of the result and
the actual result rows as a two-dimensional array of values.

A simple select::

    cr> select name, position from locations order by id limit 2
    +-------------------+----------+
    | name              | position |
    +-------------------+----------+
    | North West Ripple | 1        |
    | Arkintoofle Minor | 3        |
    +-------------------+----------+
    SELECT 2 rows in set (... sec)

If the '*' operator is used, all properties defined in the schema are returned
for each object::

    cr> select date, description, kind, name, position, race from locations order by id limit 1 offset 1
    +--------------+--------------...---+--------+-------------------+----------+------+
    | date         | description        | kind   | name              | position | race |
    +--------------+--------------...---+--------+-------------------+----------+------+
    | 308534400000 | Motivated by ...s. | Planet | Arkintoofle Minor | 3        | NULL |
    +--------------+--------------...---+--------+-------------------+----------+------+
    SELECT 1 row in set (... sec)

Aliases can be used to change the output name of the columns::

    cr> select name as n from locations order by id limit 1
    +-------------------+
    | n                 |
    +-------------------+
    | North West Ripple |
    +-------------------+
    SELECT 1 row in set (... sec)

Where Clause
------------

A simple where clause example using an equality operator::

    cr> select description from locations where id = '1'
    +-----------------------------------------...-------------------------------------------+
    | description                                                                           |
    +-----------------------------------------...-------------------------------------------+
    | Relative to life on NowWhat, living on a...er by a factor of about seventeen million. |
    +-----------------------------------------...-------------------------------------------+
    SELECT 1 row in set (... sec)

Usual comparison operators to be used in filters are available for
all simple data types:

    ===========  =================================
    Operator     Description
    ===========  =================================
    <            less than
    >            greater than
    <=           less than or equal to
    >=           greater than or equal to
    =            equal
    <>           not equal
    !=           not equal - same as <>
    is not null  field is not null and not missing
    is null      field is null or missing
    like         matches a part of the given value
    ===========  =================================

.. note::

  The field 'name' used for the request above is defined as
  'not_analyzed' in the schema. For an 'analyzed' field the result may
  differ depending on the used Analyzer/Tokenizer.

For strings a lexicographical comparison is performed based on the
Lucene TermRangeQuery::

    cr> select name from locations where name > 'Argabuthon' order by name
    +------------------------------------+
    | name                               |
    +------------------------------------+
    | Arkintoofle Minor                  |
    | Bartledan                          |
    | Galactic Sector QQ7 Active J Gamma |
    | North West Ripple                  |
    | Outer Eastern Rim                  |
    +------------------------------------+
    SELECT 5 rows in set (... sec)

For details please refer to the `Apache Lucene`_ site.

.. _`Apache Lucene`: http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/Query.html

Number and date field comparison behave as expected from standard SQL.
The following example uses one of the supported ISO date formats::

    cr> select date, position from locations where date <= '1979-10-12' and
    ... position < 3 order by position
    +--------------+----------+
    | date         | position |
    +--------------+----------+
    | 308534400000 | 1        |
    | 308534400000 | 2        |
    +--------------+----------+
    SELECT 2 rows in set (... sec)


For a detailed explanation of the supported ISO date formats please refer to the `joda date_optional_time`_ site.

.. _`joda date_optional_time`: http://joda-time.sourceforge.net/api-release/org/joda/time/format/ISODateTimeFormat.html#dateOptionalTimeParser%28%29

For custom date types, or defined date formats in the object mapping
the corresponding format should be used for a comparison. Otherwise
the operation may fail.

Like
----

`Crate` supports the `LIKE` operator. This operator can be used to query for
rows where only part of a columns value should match something. For example to
get all locations where the name starts with 'Ar' the following query can be
used::

    cr> select name from locations where name like 'Ar%' order by name asc
    +-------------------+
    | name              |
    +-------------------+
    | Argabuthon        |
    | Arkintoofle Minor |
    +-------------------+
    SELECT 2 rows in set (... sec)

The following wildcard operators are available:

    == ========================================
    %  A substitute for zero or more characters
    _  A substitute for a single character
    == ========================================

The wildcard operators may be used at any point in the string literal. For
example a more complicated like clause could look like this::

    cr> select name from locations where name like '_r%a%' order by name asc
    +------------+
    | name       |
    +------------+
    | Argabuthon |
    +------------+
    SELECT 1 row in set (... sec)

In order so search for the wildcard characters themselves it is possible to
escape them using a backslash::

    cr> select description from locations where description like '%\%' order by description asc
    +-------------------------+
    | description             |
    +-------------------------+
    | The end of the Galaxy.% |
    +-------------------------+
    SELECT 1 row in set (... sec)

.. note::

    Queries with a like clause can be quite slow. Especially if the like clause
    starts with a wildcard character. Because in that case CRATE has to iterate
    over all rows and can't utilize the index. For better performance consider
    using a fulltext index.

Limits
------

As unlimited SELECT queries could break your cluster if the matching rows exceed your node's RAM,
SELECT statements are limited by default to **10000** rows.
You can expand this limit by using an explicit LIMIT-clause.
But you are encouraged to make use of a windowing using LIMIT and OFFSET to iterate through all the results
of a potentially large resultset instead of expanding the default limit.

Inner/Nested Objects
--------------------

``Crate`` supports an ``object`` data type, used for simple storing a whole object into a
column and it's even possible to select and query for properties of such objects.

Select a property of an inner object::

    cr> select name, race['name'] from locations where name = 'Bartledan'
    +-----------+----------------+
    | name      | race['name']   |
    +-----------+----------------+
    | Bartledan | Bartledannians |
    +-----------+----------------+
    SELECT 1 row in set (... sec)


Query for a property of an inner object::

    cr> select name, race['name'] from locations where race['name'] = 'Bartledannians'
    +-----------+----------------+
    | name      | race['name']   |
    +-----------+----------------+
    | Bartledan | Bartledannians |
    +-----------+----------------+
    SELECT 1 row in set (... sec)


.. _sql_dml_aggregation:

Data Aggregation
----------------

Crate supports aggregations by using the aggregation functions listed below on ``SELECT`` statements.

Aggregation works on all the rows that match a query or on
all matching rows in every distinct *group* of a ``GROUP BY`` statement.
Aggregating ``SELECT`` statements without ``GROUP BY`` will
always return one row, as they do an aggregation operation on the matching rows as one group.

.. seealso::

    :ref:`aggregation`

+---------+---------------+----------------------------------+-----------------------+
| Name    | Arguments     | Description                      | Return Type           |
+=========+===============+==================================+=======================+
| COUNT(*)| Star as       | Counts the number of rows        | long                  |
|         | Parameter or  | that match the query.            |                       |
|         | as Constant   |                                  |                       |
|         | or the primary|                                  |                       |
|         | key column    |                                  |                       |
+---------+---------------+----------------------------------+-----------------------+
| MIN     | column name of| returns the smallest of the      | the input             |
|         | a numeric,    | values in the argument column    | column type or NULL   |
|         | timestamp     | in case of strings this          | if all values in that |
|         | or string     | means the lexicographically      | matching rows in that |
|         | column        | smallest. NULL-values are ignored| column are NULL       |
+---------+---------------+----------------------------------+-----------------------+
| MAX     | column name of| returns the biggest of the       | the input             |
|         | a numeric,    | values in the argument column    | column type or NULL   |
|         | timestamp     | in case of strings this          | if all values of all  |
|         | or string     | means the lexicographically      | matching rows in that |
|         | column        | biggest. NULL-values are ignored | column are NULL       |
+---------+---------------+----------------------------------+-----------------------+
| SUM     | column name of| returns the sum of the values in | double or NULL        |
|         | a numeric or  | the argument column.             | if all values of all  |
|         | timestamp     | NULL-values are ignored.         | matching rows in that |
|         | column        |                                  | column are NULL       |
+---------+---------------+----------------------------------+-----------------------+
| AVG     | column name of| returns the arithmetic mean of   | double or NULL        |
|         | a numeric or  | the values in the argument       | if all values of all  |
|         | timestamp     | column.                          | matching rows in that |
|         | column        | NULL-values are ignored.         | column are NULL       |
+---------+---------------+----------------------------------+-----------------------+
| ANY     | column name of| returns an undefined value of    | the input             |
|         | a primitive   | all the values in the argument   | column type or NULL   |
|         | typed         | column. Can be NULL.             | if some value of the  |
|         | column        |                                  | matching rows in that |
|         | (all but      |                                  | column is NULL        |
|         | object)       |                                  |                       |
+---------+---------------+----------------------------------+-----------------------+

Some Examples::

    cr> select count(*) from locations
    +----------+
    | count(*) |
    +----------+
    | 13       |
    +----------+
    SELECT 1 row in set (... sec)

    cr> select count(*) from locations where kind='Planet'
    +----------+
    | count(*) |
    +----------+
    | 5        |
    +----------+
    SELECT 1 row in set (... sec)

    cr> select count(name), count(*) from locations
    +-------------+----------+
    | count(name) | count(*) |
    +-------------+----------+
    | 12          | 13       |
    +-------------+----------+
    SELECT 1 row in set (... sec)

    cr> select max(name) from locations
    +-------------------+
    | max(name)         |
    +-------------------+
    | Outer Eastern Rim |
    +-------------------+
    SELECT 1 row in set (... sec)

    cr> select min(date) from locations
    +--------------+
    | min(date)    |
    +--------------+
    | 308534400000 |
    +--------------+
    SELECT 1 row in set (... sec)

    cr> select count(*), kind from locations group by kind order by kind asc
    +----------+-------------+
    | count(*) | kind        |
    +----------+-------------+
    | 4        | Galaxy      |
    | 5        | Planet      |
    | 4        | Star System |
    +----------+-------------+
    SELECT 3 rows in set (... sec)

    cr> select max(position), kind from locations group by kind order by max(position) desc
    +---------------+-------------+
    | max(position) | kind        |
    +---------------+-------------+
    | 6             | Galaxy      |
    | 5             | Planet      |
    | 4             | Star System |
    +---------------+-------------+
    SELECT 3 rows in set (... sec)

    cr> select min(name), kind from locations group by kind order by min(name) asc
    +------------------------------------+-------------+
    | min(name)                          | kind        |
    +------------------------------------+-------------+
    |                                    | Planet      |
    | Aldebaran                          | Star System |
    | Galactic Sector QQ7 Active J Gamma | Galaxy      |
    +------------------------------------+-------------+
    SELECT 3 rows in set (... sec)

    cr> select count(*), min(name), kind from locations group by kind order by kind
    +----------+------------------------------------+-------------+
    | count(*) | min(name)                          | kind        |
    +----------+------------------------------------+-------------+
    | 4        | Galactic Sector QQ7 Active J Gamma | Galaxy      |
    | 5        |                                    | Planet      |
    | 4        | Aldebaran                          | Star System |
    +----------+------------------------------------+-------------+
    SELECT 3 rows in set (... sec)

    cr> select sum(position) as sum_positions, kind from locations group by kind order by sum_positions
    +---------------+-------------+
    | sum_positions | kind        |
    +---------------+-------------+
    | 10.0          | Star System |
    | 13.0          | Galaxy      |
    | 15.0          | Planet      |
    +---------------+-------------+
    SELECT 3 rows in set (... sec)

.. _sql_dml_group_by:

Group by
--------

Crate supports the `group by` clause. This clause can be used to group the
resulting rows by the value(s) of one or more columns. That means that rows
that contain duplicate values will be merged together.

This is useful if used in conjunction with aggregation functions::

    cr> select count(*), kind from locations group by kind order by count(*) desc, kind asc
    +----------+-------------+
    | count(*) | kind        |
    +----------+-------------+
    | 5        | Planet      |
    | 4        | Galaxy      |
    | 4        | Star System |
    +----------+-------------+
    SELECT 3 rows in set (... sec)

.. note::

    All columns that are used either as result column or in the order by clause
    have to be used within the group by clause. Otherwise the statement won't
    execute.

.. note::

    Grouping on multi-value fields doesn't work. If such a field is encountered
    during a group by operation an error is thrown.

Fulltext search
---------------

In order to use fulltext searches on columns, one must create a
:ref:`fulltext index with an analyzer <sql_ddl_index_fulltext>` on the related column
upfront on table creation, see :ref:`fulltext-indices` for details.

A fulltext search is done using the ``match(columnName, queryTerm)`` helper function::

    cr> select name from locations where match(name_description_ft, 'time')
    +-----------+
    | name      |
    +-----------+
    | Altair    |
    | Bartledan |
    +-----------+
    SELECT 2 rows in set (... sec)

In order to get the match score of the fulltext search, an internal system column
:ref:`_score <sql_ddl_system_column_score>` can be selected::

    cr> select name, \"_score\" from locations where match(name_description_ft, 'time')
    +-----------+------------+
    | name      | _score     |
    +-----------+------------+
    | Altair    | 0.56319076 |
    | Bartledan | 0.4590714  |
    +-----------+------------+
    SELECT 2 rows in set (... sec)

The results of a ``match(...)`` query are sorted by :ref:`_score <sql_ddl_system_column_score>`
in a descending order by default.
Of course it is possible to change it to use an ascending order instead::

    cr> select name, \"_score\" from locations where match(name_description_ft, 'time')
    ... order by \"_score\" asc
    +-----------+------------+
    | name      | _score     |
    +-----------+------------+
    | Bartledan | 0.4590714  |
    | Altair    | 0.56319076 |
    +-----------+------------+
    SELECT 2 rows in set (... sec)

.. note::

  The terms used for a fulltext search are analyzed by an ``OR`` operator.

Negative search
...............

A negative fulltext search can be done using a ``NOT`` clause::

    cr> select name, \"_score\" from locations where not match(name_description_ft, 'time')
    +------------------------------------+--------+
    | name                               | _score |
    +------------------------------------+--------+
    | Outer Eastern Rim                  | 1.0    |
    | Aldebaran                          | 1.0    |
    | Alpha Centauri                     | 1.0    |
    | Allosimanius Syneca                | 1.0    |
    | NULL                               | 1.0    |
    | North West Ripple                  | 1.0    |
    | Galactic Sector QQ7 Active J Gamma | 1.0    |
    | Algol                              | 1.0    |
    | Argabuthon                         | 1.0    |
    | Arkintoofle Minor                  | 1.0    |
    |                                    | 1.0    |
    +------------------------------------+--------+
    SELECT 11 rows in set (... sec)


Filter by :ref:`_score <sql_ddl_system_column_score>`
.....................................................

It is possible to filter results by the :ref:`_score <sql_ddl_system_column_score>` column
but as it's value is a ratio related to the highest score of all result and thus never
absolute and directly comparable across searches its use-case is very limited if any.
It is only possible to filter by greater-than or greater-equals operator on the
:ref:`_score <sql_ddl_system_column_score>` column.

Anyway let's do it here for demonstration purpose::

    cr> select name, \"_score\" from locations where match(name_description_ft, 'time')
    ... and \"_score\" > 0.9
    +--------+-----------+
    | name   | _score    |
    +--------+-----------+
    | Altair | 0.9204767 |
    +--------+-----------+
    SELECT 1 row in set (... sec)

As you maybe noticed, the :ref:`_score <sql_ddl_system_column_score>` value has changed for the
same query text and document because it's a ratio related to all results,
and if we limit it by filtering on :ref:`_score <sql_ddl_system_column_score>`,
we'll have fewer or different results.

.. warning::

  As noted above :ref:`_score <sql_ddl_system_column_score>` is a ratio and not comparable
  across search so use it only for filtering if you know what you're doing and on your own risk.

Inserting data
==============

Inserting data to ``Crate`` is done by using the SQL ``INSERT`` statement.

.. note::

    The column list at ``Crate`` is always ordered alphabetically by column name
    and so must the inserted column values.

Inserting a row::

    cr> insert into locations values ('14', '2013-09-12T21:43:59.000Z', 'Blagulon Kappa is the planet to which the police are native.', 'Planet', 'Blagulon Kappa', 7)
    INSERT OK, 1 row affected (... sec)

Inserting multiple rows at once (aka. bulk insert) can be done by defining
multiple values for the ``INSERT`` statement::

    cr> insert into locations (id, date, description, kind, name, position) values
    ... ('14', '2013-09-12T21:43:59.000Z', 'Blagulon Kappa is the planet to which the police are native.', 'Planet', 'Blagulon Kappa', 7),
    ... ('15', '2013-09-13T16:43:59.000Z', 'Brontitall is a planet with a warm, rich atmosphere and no mountains.', 'Planet', 'Brontitall', 10)
    INSERT OK, 2 rows affected (... sec)


Updating data
=============

In order to update documents in `Crate` the SQL `UPDATE` statement can be
used::

    cr> update locations set description = 'Updated description' where name = 'Bartledan'
    UPDATE OK, 1 row affected (... sec)

Updating nested objects is also supported::

    cr> update locations set race['name'] = 'Human' where name = 'Bartledan'
    UPDATE OK, 1 row affected (... sec)

.. note::

    If the same documents are updated concurrently an VersionConflictException
    might occur. Crate contains a retry logic that tries to resolve the
    conflict automatically. But if it fails more than 3 times the error is
    returned to the user.

Deleting data
=============

Deleting rows in Crate is done using the SQL ``DELETE`` statement::

    cr> delete from locations where position > 3
    DELETE OK, ... rows affected (... sec)


.. _importing_data:

Importing data
==============

Using the ``COPY FROM`` SQL statement, data can be imported into Crate.
Currently the only supported data format is JSON, one line is representing one entry.

Example JSON data::

    {"id": 1, "quote": "Don't panic"}
    {"id": 2, "quote": "Would it save you a lot of time if I just gave up and went mad now?"}

.. note::

  Existing entries will be overwritten on import.

.. note::

  The ``COPY FROM`` statement will not convert or validate your data.
  Please make sure that it fits your schema.

Import from path
----------------

Using a path to copy from, data will be imported distributed on every node on which the path
points to an existing file. This also means, if the same file with the same content is available
on all nodes, every node is concurrently importing the same data which does not make so much
sense. It's more useful, for example, to split the data to import across all nodes into a
directory, so every node is importing a different portion of the overall data.

In short, just put a file on every node and import the data using the ``COPY FROM`` statement.

The path can be a directory, a file or a directory with a `PCRE`_ file pattern suffix.
If the path is relative, it is based on the absolute path of each node's data location.

Let's demonstrate this using a relative file path::

    cr> copy quotes from 'import_data/quotes.json'
    COPY OK, 3 rows affected (... sec)

Using a relative directory path::

    cr> copy quotes from 'import_data'
    COPY OK, 3 rows affected (... sec)

And finally import data using a directory path and a `PCRE`_ file pattern  suffix::

    cr> copy quotes from 'import_data/(\\D).json'
    COPY OK, 3 rows affected (... sec)

Also one could use a defined set of variables inside a path. They are defined by using the
`${variable_name}` syntax.
Following variables are supported::

    =======  ============
    Name     Description
    =======  ============
    node     Node name
    cluster  Cluster name
    table    Table name
    =======  ============

Example using path variables::

    cr> copy quotes from 'import_data/${table}.json'
    COPY OK, 3 rows affected (... sec)

Path variables can be freely combined with `PCRE`_ patterns::

    cr> copy quotes from 'import_data/${table}(\\D*).json'
    COPY OK, 3 rows affected (... sec)

.. _PCRE: http://www.pcre.org/
