
.. _aggregation:

===========
Aggregation
===========

There are two ways of doing aggregation, one on the result of a whole query and one
on the separate groups of a :ref:`sql_dml_group_by` query.
Aggregation Function will always return one result value for a single group.
You can imagine the matching rows of a ``SELECT`` statement without ``GROUP BY``
as one group. One row will always be returned.

For a short summary of aggregation functions see :ref:`sql_dml_aggregation`.

count
=====


.. _aggregation-count-star:

count(*)
--------

This aggregation function simply returns the number of rows that match the query.

`count(columName)` is also possible, but currently only works on a primary key column.
The semantics are the same.
The return value is always of type ``long``.
::

    cr> select count(*) from locations;
    +----------+
    | count(*) |
    +----------+
    | 13       |
    +----------+
    SELECT 1 row in set (... sec)

``count(*)`` can also be used on group by queries::

    cr> select count(*), kind from locations group by kind order by kind asc;
    +----------+-------------+
    | count(*) | kind        |
    +----------+-------------+
    | 4        | Galaxy      |
    | 5        | Planet      |
    | 4        | Star System |
    +----------+-------------+
    SELECT 3 rows in set (... sec)

count(columnName)
-----------------

In contrast to the :ref:`aggregation-count-star` function
the ``count`` function used with a column name as parameter will return the number of rows
with a non-``NULL`` value in that column.

Example::

    cr> select count(name), count(*), date from locations group by date
    ... order by count(name) desc, count(*) desc;
    +-------------+----------+---------------+
    | count(name) | count(*) | date          |
    +-------------+----------+---------------+
    | 7           | 8        | 1373932800000 |
    | 4           | 4        | 308534400000  |
    | 1           | 1        | 1367366400000 |
    +-------------+----------+---------------+
    SELECT 3 rows in set (... sec)

count(distinct columnName)
--------------------------

The `count` aggregation function also supports the `distinct` keyword. This
keyword changes the behaviour of the function so that it will only count the number
of distinct values in this column that are not ``NULL``::

    cr> select count(distinct kind), count(*), date from locations group by date
    ... order by count(distinct kind) desc, count(*) desc;
    +----------------------+----------+---------------+
    | count(DISTINCT kind) | count(*) | date          |
    +----------------------+----------+---------------+
    | 3                    | 8        | 1373932800000 |
    | 3                    | 4        | 308534400000  |
    | 1                    | 1        | 1367366400000 |
    +----------------------+----------+---------------+
     SELECT 3 rows in set (... sec)

    cr> select count(distinct kind) from locations;
    +----------------------+
    | count(DISTINCT kind) |
    +----------------------+
    | 3                    |
    +----------------------+
    SELECT 1 row in set (... sec)

min
===

The ``min`` aggregation function returns the smallest value in a column that is not ``NULL``.
Its single argument is a column name and its return value is always of the type of that column.
Example::

    cr> select min(position), kind from locations where name not like 'North %' group by kind order by min(position) asc, kind asc;
    +---------------+-------------+
    | min(position) | kind        |
    +---------------+-------------+
    | 1             | Planet      |
    | 1             | Star System |
    | 2             | Galaxy      |
    +---------------+-------------+
    SELECT 3 rows in set (... sec)


    cr> select min(date) from locations;
    +--------------+
    | min(date)    |
    +--------------+
    | 308534400000 |
    +--------------+
    SELECT 1 row in set (... sec)

``min`` returns ``NULL`` if the column does not contain any value but ``NULL``.
It is allowed on numeric columns (``byte``, ``short``, ``integer``, ``long``, ``float``, ``double``),
on ``timestamp`` and ``string`` columns. On ``string`` columns it will return the lexicographically
smallest.::

    cr> select min(name), kind from locations group by kind order by kind asc;
    +------------------------------------+-------------+
    | min(name)                          | kind        |
    +------------------------------------+-------------+
    | Galactic Sector QQ7 Active J Gamma | Galaxy      |
    |                                    | Planet      |
    | Aldebaran                          | Star System |
    +------------------------------------+-------------+
    SELECT 3 rows in set (... sec)


max
===

It behaves exactly like ``min`` but returns the biggest value in a column that is not ``NULL``.
Some Examples::

    cr> select max(position), kind from locations group by kind order by kind desc;
    +---------------+-------------+
    | max(position) | kind        |
    +---------------+-------------+
    | 4             | Star System |
    | 5             | Planet      |
    | 6             | Galaxy      |
    +---------------+-------------+
    SELECT 3 rows in set (... sec)


    cr> select max(position) from locations;
    +---------------+
    | max(position) |
    +---------------+
    | 6             |
    +---------------+
    SELECT 1 row in set (... sec)


    cr> select max(name), kind from locations group by kind order by max(name) desc;
    +-------------------+-------------+
    | max(name)         | kind        |
    +-------------------+-------------+
    | Outer Eastern Rim | Galaxy      |
    | Bartledan         | Planet      |
    | Altair            | Star System |
    +-------------------+-------------+
    SELECT 3 rows in set (... sec)


sum
===

The ``sum`` aggregation function returns the sum of all the values in a column that are not ``NULL``
as a double value. Its single argument is the column name of a numeric column or timestamp column.
``sum`` can not be used on other column types.

::

    cr> select sum(position), kind from locations group by kind order by sum(position) asc;
    +---------------+-------------+
    | sum(position) | kind        |
    +---------------+-------------+
    | 10.0          | Star System |
    | 13.0          | Galaxy      |
    | 15.0          | Planet      |
    +---------------+-------------+
    SELECT 3 rows in set (... sec)

    cr> select sum(position) as position_sum from locations;
    +--------------+
    | position_sum |
    +--------------+
    | 38.0         |
    +--------------+
    SELECT 1 row in set (... sec)

    cr> select sum(name), kind from locations group by kind order by sum(name) desc;
    UnsupportedFeatureException[unknown function: sum(string)]...

avg
===

The ``avg`` aggregation function returns the arithmetic mean, the *average*,
of all values in a column that are not ``NULL`` as a double value. It accepts all numeric columns
and timestamp columns as single argument. Using ``avg`` on other column types is not allowed.

Example::

    cr> select avg(position), kind from locations group by kind order by kind;
    +---------------+-------------+
    | avg(position) | kind        |
    +---------------+-------------+
    | 3.25          | Galaxy      |
    | 3.0           | Planet      |
    | 2.5           | Star System |
    +---------------+-------------+
    SELECT 3 rows in set (... sec)


any
===

The ``any`` aggregation function returns a single value of a column.
Which value it returns is not defined.
It accepts references to columns of all primitive types.
Using ``any`` on ``Object`` columns is not supported.
Its return type is the type of its parameter column and can be ``NULL``
if the column contains ``NULL`` values.

Example::

    cr> select any(position) from locations;
    +---------------+
    | any(position) |
    +---------------+
    | ...           |
    +---------------+
    SELECT 1 row in set (... sec)

    cr> select any(name), kind from locations
    ... where name != ''
    ... group by kind order by kind desc;
    +-...------------+-------------+
    | any(name)      | kind        |
    +-...------------+-------------+
    | ...            | Star System |
    | ...            | Planet      |
    | ...            | Galaxy      |
    +-...------------+-------------+
    SELECT 3 rows in set (... sec)

An example use case is to group a table with many rows per user by
``user_id`` and get the ``username`` for every group, that means every
user. This works as rows with same ``user_id`` have the same
``username``.  This method performs better than grouping on
``username`` as grouping on number types is generally faster than on
strings.  The advantage is that the ``any`` function does very little
to no computation as for example ``max`` aggregation function would
do.
