
.. _aggregation:

===========
Aggregation
===========

There are two ways of doing aggregation, one on the result of a whole query and one
on the separate groups of a :ref:`sql_dml_group_by` query.
Aggregation Function will always return one result value for a single group.
You can imagine the matching rows of a ``SELECT`` statement without ``GROUP BY``
as one group. One row will always be returned.

For a short summary of aggregation functions see :ref:`sql_dml_aggregation`.

COUNT
=====


.. _aggregation-count-star:

COUNT(*)
--------

This aggregation function simply returns the number of rows that match the query.

`count(columName)` is also possible, but currently only works on a primary key column.
The semantics are the same.
The return value is always of type ``long``.
::

    cr> select count(*) from locations
    +----------+
    | COUNT(*) |
    +----------+
    | 13       |
    +----------+
    SELECT 1 row in set (... sec)

``COUNT(*)`` can also be used on group by queries::

    cr> select count(*), kind from locations group by kind order by kind asc
    +----------+-------------+
    | COUNT(*) | kind        |
    +----------+-------------+
    | 4        | Galaxy      |
    | 5        | Planet      |
    | 4        | Star System |
    +----------+-------------+
    SELECT 3 rows in set (... sec)

.. note::

    The count() aggregate function can also be called with parameter
    substitution. But only if the value evaluates to a '*'.

.. seealso::

    :doc:`rest`

COUNT(columnName)
-----------------

In contrast to the :ref:`aggregation-count-star` function
the ``COUNT`` function used with a column name as parameter will return the number of rows
with a non-``NULL`` value in that column.

Example::

    cr> select count(name), count(*), date from locations group by date
    ... order by count(name) desc, count(*) desc
    +-------------+----------+---------------+
    | COUNT(name) | COUNT(*) | date          |
    +-------------+----------+---------------+
    | 7           | 8        | 1373932800000 |
    | 4           | 4        | 308534400000  |
    | 1           | 1        | 1367366400000 |
    +-------------+----------+---------------+
    SELECT 3 rows in set (... sec)

COUNT(DISTINCT columnName)
--------------------------

The `count` aggregation function also supports the `DISTINCT` keyword. This
keyword changes the behaviour of the function so that it will only count the number
of distinct values in this column that are not ``NULL``::

    cr> select count(distinct kind), count(*), date from locations group by date
    ... order by count(distinct kind) desc, count(*) desc
    +----------------------+----------+---------------+
    | COUNT(DISTINCT kind) | COUNT(*) | date          |
    +----------------------+----------+---------------+
    | 3                    | 8        | 1373932800000 |
    | 3                    | 4        | 308534400000  |
    | 1                    | 1        | 1367366400000 |
    +----------------------+----------+---------------+
     SELECT 3 rows in set (... sec)

    cr> select count(distinct kind) from locations
    +----------------------+
    | COUNT(DISTINCT kind) |
    +----------------------+
    | 3                    |
    +----------------------+
    SELECT 1 row in set (... sec)

MIN
===

The ``MIN`` aggregation function returns the smallest value in a column that is not ``NULL``.
Its single argument is a column name and its return value is always of the type of that column.
Example::

    cr> select min(position), kind from locations where name not like 'North %' group by kind order by min(position) asc, kind asc
    +---------------+-------------+
    | MIN(position) | kind        |
    +---------------+-------------+
    | 1             | Planet      |
    | 1             | Star System |
    | 2             | Galaxy      |
    +---------------+-------------+
    SELECT 3 rows in set (... sec)


    cr> select min(date) from locations
    +--------------+
    | MIN(date)    |
    +--------------+
    | 308534400000 |
    +--------------+
    SELECT 1 row in set (... sec)

``MIN`` returns ``NULL`` if the column does not contain any value but ``NULL``.
It is allowed on numeric columns (``byte``, ``short``, ``integer``, ``long``, ``float``, ``double``),
on ``timestamp`` and ``string`` columns. On ``string`` columns it will return the lexicographically
smallest.::

    cr> select min(name), kind from locations group by kind order by kind asc
    +------------------------------------+-------------+
    | MIN(name)                          | kind        |
    +------------------------------------+-------------+
    | Galactic Sector QQ7 Active J Gamma | Galaxy      |
    |                                    | Planet      |
    | Aldebaran                          | Star System |
    +------------------------------------+-------------+
    SELECT 3 rows in set (... sec)


MAX
===

It behaves exactly like ``MIN`` but returns the biggest value in a column that is not ``NULL``.
Some Examples::

    cr> select max(position), kind from locations group by kind order by kind desc
    +---------------+-------------+
    | MAX(position) | kind        |
    +---------------+-------------+
    | 4             | Star System |
    | 5             | Planet      |
    | 6             | Galaxy      |
    +---------------+-------------+
    SELECT 3 rows in set (... sec)


    cr> select max(position) from locations
    +---------------+
    | MAX(position) |
    +---------------+
    | 6             |
    +---------------+
    SELECT 1 row in set (... sec)


    cr> select max(name), kind from locations group by kind order by max(name) desc
    +-------------------+-------------+
    | MAX(name)         | kind        |
    +-------------------+-------------+
    | Outer Eastern Rim | Galaxy      |
    | Bartledan         | Planet      |
    | Altair            | Star System |
    +-------------------+-------------+
    SELECT 3 rows in set (... sec)


SUM
===

The ``SUM`` aggregation function returns the sum of all the values in a column that are not ``NULL``
as a double value. Its single argument is the column name of a numeric column or timestamp column.
``SUM`` can not be used on other column types.

::

    cr> select sum(position), kind from locations group by kind order by sum(position) asc
    +---------------+-------------+
    | SUM(position) | kind        |
    +---------------+-------------+
    | 10.0          | Star System |
    | 13.0          | Galaxy      |
    | 15.0          | Planet      |
    +---------------+-------------+
    SELECT 3 rows in set (... sec)

    cr> select sum(position) as position_sum from locations
    +--------------+
    | position_sum |
    +--------------+
    | 38.0         |
    +--------------+
    SELECT 1 row in set (... sec)

    cr> select sum(name), kind from locations group by kind order by sum(name) desc
    SQLParseException[Invalid column type 'string' for aggregate function SUM]

AVG
===

The ``AVG`` aggregation function returns the arithmetic mean, the *average*,
of all values in a column that are not ``NULL`` as a double value. It accepts all numeric columns
and timestamp columns as single argument. Using ``AVG`` on other column types is not allowed.

Example::

    cr> select avg(position), kind from locations group by kind order by kind
    +---------------+-------------+
    | AVG(position) | kind        |
    +---------------+-------------+
    | 3.25          | Galaxy      |
    | 3.0           | Planet      |
    | 2.5           | Star System |
    +---------------+-------------+
    SELECT 3 rows in set (... sec)


ANY
===

The ``ANY`` aggregation function returns a single value of a column.
Which value it returns is not defined.
It accepts references to columns of all primitive types.
Using ``ANY`` on ``Craty`` columns is not supported.
Its return type is the type of its parameter column and can be ``NULL``
if the column contains ``NULL`` values.

Example::

    cr> select any(position) from locations
    +---------------+
    | ANY(position) |
    +---------------+
    | ...           |
    +---------------+
    SELECT 1 row in set (... sec)

    cr> select any(name), kind from locations
    ... where kind!='Planet' group by kind order by kind desc
    +-...-+-------------+
    | ... | kind        |
    +-...-+-------------+
    | ... | Star System |
    | ... | Galaxy      |
    +-...-+-------------+
    SELECT 2 rows in set (... sec)


An example use case is to group a table with many rows per user by ``user_id`` and get the ``username``
for every group, that means every user. This works as rows with same ``user_id`` have the same ``username``.
This method performs better than grouping on ``username`` as grouping on number types is generally faster than
on strings.
The advantage is that the ``ANY`` function does very little to no computation
as for example ``MAX`` aggregation function would do.
