.. _partitioned-tables:

==================
Partitioned Tables
==================

A partitioned table is a virtual table that can be created
by naming one or more columns by which it is splitted into
separate internal tables, called ``partitions``.

When a record with a new distinct combination of values for
the configured ``PARTITIONED BY`` columns is inserted, a new
partition is created and the document will be inserted into this partition.

You will end up with separate partitions under the hood that
can be queried like a single table.

If you are usually interested in separate partitions of your data only,
as might be the case for e.g. analyzing time based log data.
you can query them much much faster because you don't have to
iterate over all rows of all partitions.

Deletion is faster too if you delete whole partitions at once,
as a whole table can be deleted and no expensive query is involved.

Creation
========

It can be created using the :ref:`ref-create-table` statement
using the :ref:`ref_clustered_clause`::

    cr> CREATE TABLE parted_table (
    ...   id long,
    ...   title string,
    ...   content string,
    ...   width double,
    ...   day timestamp
    ... ) CLUSTERED BY (title) INTO 4 SHARDS PARTITIONED BY (day);
    CREATE OK (... sec)

This creates an empty partitioned table which is not yet backed by
real partitions. Nonetheless does it behave like a *normal* table.

Information Schema
==================

This table shows up in the ``information_schema.tables`` table,
recognizable as partitioned table by a non null ``partitioned_by`` column::

    cr> SELECT * FROM information_schema.tables
    ... WHERE table_name='parted_table';
    +-------------+--------------+------------------+--------------------+--------------+----------------+
    | schema_name | table_name   | number_of_shards | number_of_replicas | clustered_by | partitioned_by |
    +-------------+--------------+------------------+--------------------+--------------+----------------+
    | doc         | parted_table | 4                | 1                  | title        | [u'day']       |
    +-------------+--------------+------------------+--------------------+--------------+----------------+
    SELECT 1 row in set (... sec)

    cr> SELECT * FROM information_schema.columns
    ... WHERE schema_name = 'doc' AND table_name = 'parted_table'
    ... ORDER BY schema_name, table_name, column_name;
    +-------------+--------------+-------------+------------------+-----------+
    | schema_name | table_name   | column_name | ordinal_position | data_type |
    +-------------+--------------+-------------+------------------+-----------+
    | doc         | parted_table | content     | 1                | string    |
    | doc         | parted_table | day         | 2                | timestamp |
    | doc         | parted_table | id          | 3                | long      |
    | doc         | parted_table | title       | 4                | string    |
    | doc         | parted_table | width       | 5                | double    |
    +-------------+--------------+-------------+------------------+-----------+
    SELECT 5 rows in set (... sec)

And so on.

You can get information about the partitions of a partitioned table
by querying the ``information_schema.table_partitions`` table::

    cr> SELECT count(*) as partition_count FROM information_schema.table_partitions
    ... WHERE schema_name = 'doc' AND table_name = 'parted_table'
    ... ORDER BY partition_ident;
    +-----------------+
    | partition_count |
    +-----------------+
    | 0               |
    +-----------------+
    SELECT 1 row in set (... sec)

As this table is still empty, no partitions have been created.

Insert
======

::

    cr> INSERT INTO parted_table (id, title, width, day)
    ... values (1, 'Don''t Panic', 19.5, '2014-04-08');
    INSERT OK, 1 row affected (... sec)

    cr> SELECT partition_ident, "values" FROM information_schema.table_partitions
    ... WHERE schema_name = 'doc' AND table_name = 'parted_table'
    ... ORDER BY partition_ident;
    +--------------------------+-------------------------+
    | partition_ident          | values                  |
    +--------------------------+-------------------------+
    | 04732cpp6osj2d9i60o30c1g | {u'day': 1396915200000} |
    +--------------------------+-------------------------+
    SELECT 1 row in set (... sec)

On subsequent inserts with the same ``PARTITIONED BY`` column values,
no additional partition is created::

    cr> INSERT INTO parted_table (id, title, width, day)
    ... values (2, 'Time is an illusion, lunchtime doubly so', 0.7, '2014-04-08');
    INSERT OK, 1 row affected (... sec)

    cr> REFRESH TABLE parted_table;
    REFRESH OK (... sec)

    cr> SELECT partition_ident, "values" FROM information_schema.table_partitions
    ... WHERE schema_name = 'doc' AND table_name = 'parted_table'
    ... ORDER BY partition_ident;
    +--------------------------+-------------------------+
    | partition_ident          | values                  |
    +--------------------------+-------------------------+
    | 04732cpp6osj2d9i60o30c1g | {u'day': 1396915200000} |
    +--------------------------+-------------------------+
    SELECT 1 row in set (... sec)


Update
======

Updating partitioned tables has one big limitation.
``PARTITIONED BY`` columns cannot be changed, because this
would involve moving all affected documents which is no
atomic operation and could lead to inconsistent state::

    cr> UPDATE parted_table set content='now panic!', day = '2014-04-07'
    ... WHERE id=1;
    SQLParseException[Updating a partitioned-by column is currently not supported]; ...

    cr> UPDATE parted_table set content='now panic!'
    ... WHERE id=2;
    UPDATE OK, 1 row affected (... sec)

    cr> REFRESH TABLE parted_table;
    REFRESH OK (... sec)

    cr> SELECT * from parted_table WHERE id=2;
    +------------+---------------+----+------------------------------------------+-------+
    | content    | day           | id | title                                    | width |
    +------------+---------------+----+------------------------------------------+-------+
    | now panic! | 1396915200000 | 2  | Time is an illusion, lunchtime doubly so | 0.7   |
    +------------+---------------+----+------------------------------------------+-------+
    SELECT 1 row in set (... sec)

Delete
======

Deleting with a where clause matching all rows of a partition
will drop the whole partition instead of deleting every
matching document, which is a lot faster::

    cr> delete from parted_table where day=1396915200000;
    DELETE OK, 0 rows affected (... sec)

    cr> SELECT count(*) as partition_count FROM information_schema.table_partitions
    ... WHERE schema_name = 'doc' AND table_name = 'parted_table'
    ... ORDER BY partition_ident;
    +-----------------+
    | partition_count |
    +-----------------+
    | 0               |
    +-----------------+
    SELECT 1 row in set (... sec)

Alter
=====

Partitioned tables parameters can be changed as usual with the :ref:`ref-alter-table` statement.
All common ``ALTER TABLE`` parameters affect all existing and future (to be created) partitions::

    cr> ALTER TABLE parted_table reset (number_of_replicas)
    ALTER OK (... sec)


Limitations
===========

* ``PARTITIONED BY`` columns cannot be used in ``ORDER BY``
* ``PARTITIONED BY`` columns cannot be updated
* where clauses cannot contain queries like ``partitioned_by_column='x' OR normal_column=x``


.. Hidden: drop table::

    cr> drop table parted_table;
    DROP OK (... sec)

