Thursday, November 18, 2010

Grouping Sets

Learnt something newly today, its about specifying the set of groups that you want to create using Grouping sets.

rajesh@10GR2> select x,y,z
  2  from t1
  3  order by 1;

         X          Y          Z
---------- ---------- ----------
         1          2          3
         2          2         10
         3          5          5

Elapsed: 00:00:00.01
rajesh@10GR2>
rajesh@10GR2> select sum(x),sum(y),sum(z)
  2  from t1
  3  order by 1;

    SUM(X)     SUM(Y)     SUM(Z)
---------- ---------- ----------
         6          9         18

Elapsed: 00:00:00.01

So one way to get the aggregated data's along with all the actual values is.

rajesh@10GR2>
rajesh@10GR2> select x,y,z
  2  from t1
  3  union all
  4  select sum(x),sum(y),sum(z)
  5  from t1
  6  order by 1
  7  /

         X          Y          Z
---------- ---------- ----------
         1          2          3
         2          2         10
         3          5          5
         6          9         18

Elapsed: 00:00:00.29
rajesh@10GR2>

The other is using super-aggregate function using CUBE and filtering out the needed aggregates like below.

rajesh@10GR2> select sum(x),sum(y),sum(z)
  2  from t1
  3  group by cube(z,y,x)
  4  having grouping(x)+grouping(z)+grouping(y) = 0 or
  5         grouping(x)+grouping(z)+grouping(y) = 3
  6  order by 1
  7  /

    SUM(X)     SUM(Y)     SUM(Z)
---------- ---------- ----------
         1          2          3
         2          2         10
         3          5          5
         6          9         18

Elapsed: 00:00:00.00
rajesh@10GR2>

But the coolest way is to use Groping sets and specify only the needed set of groups.

rajesh@10GR2> select sum(x),sum(y),sum(z)
  2  from t1
  3  group by grouping sets( (x,y,z),() )
  4  order by 1
  5  /

    SUM(X)     SUM(Y)     SUM(Z)
---------- ---------- ----------
         1          2          3
         2          2         10
         3          5          5
         6          9         18

Elapsed: 00:00:00.01
rajesh@10GR2>

Now lets benchmark this three stuffs on high data volume sets.

rajesh@10GR2> insert into t1(x,y,z)
  2  select mod(level,10),level,level
  3  from dual
  4  connect by level < = 1000000
  5  /

1000000 rows created.

Elapsed: 00:00:03.39
rajesh@10GR2> commit;

Commit complete.

Elapsed: 00:00:00.23
rajesh@10GR2>
rajesh@10GR2>
rajesh@10GR2> exec dbms_stats.gather_table_stats(user,'T1');

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.82
rajesh@10GR2>
rajesh@10GR2> set autotrace traceonly explain statistics;
rajesh@10GR2>
rajesh@10GR2> select x,y,z
  2  from t1
  3  union all
  4  select sum(x),sum(y),sum(z)
  5  from t1
  6  order by 1
  7  /

1000001 rows selected.

Elapsed: 00:00:28.71

Execution Plan
----------------------------------------------------------
Plan hash value: 1154063651

-------------------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |  1016K|    37M|       | 11365   (1)| 00:02:17 |
|   1 |  SORT ORDER BY       |      |  1016K|    37M|   101M|  1113  (52)| 00:00:14 |
|   2 |   UNION-ALL          |      |       |       |       |            |          |
|   3 |    TABLE ACCESS FULL | T1   |  1016K|    37M|       |   557   (3)| 00:00:07 |
|   4 |    SORT AGGREGATE    |      |     1 |    39 |       |            |          |
|   5 |     TABLE ACCESS FULL| T1   |  1016K|    37M|       |   557   (3)| 00:00:07 |
-------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       4966  consistent gets
          0  physical reads
          0  redo size
   15806765  bytes sent via SQL*Net to client
      73722  bytes received via SQL*Net from client
       6668  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
    1000001  rows processed

rajesh@10GR2>
rajesh@10GR2> select sum(x),sum(y),sum(z)
  2  from t1
  3  group by cube(z,y,x)
  4  having grouping(x)+grouping(z)+grouping(y) = 0 or
  5         grouping(x)+grouping(z)+grouping(y) = 3
  6  order by 1
  7  /

1000001 rows selected.

Elapsed: 00:01:46.45

Execution Plan
----------------------------------------------------------
Plan hash value: 3795960770

-------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |  1016K|    37M|   695  (23)| 00:00:09 |
|   1 |  SORT ORDER BY         |      |  1016K|    37M|   695  (23)| 00:00:09 |
|*  2 |   FILTER               |      |       |       |            |          |
|   3 |    SORT GROUP BY       |      |  1016K|    37M|   695  (23)| 00:00:09 |
|   4 |     GENERATE CUBE      |      |  1016K|    37M|   695  (23)| 00:00:09 |
|   5 |      SORT GROUP BY     |      |  1016K|    37M|   695  (23)| 00:00:09 |
|   6 |       TABLE ACCESS FULL| T1   |  1016K|    37M|   557   (3)| 00:00:07 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(GROUPING("X")+GROUPING("Z")+GROUPING("Y")=0 OR
              GROUPING("X")+GROUPING("Z")+GROUPING("Y")=3)


Statistics
----------------------------------------------------------
        315  recursive calls
         39  db block gets
       2483  consistent gets
      40208  physical reads
          0  redo size
   15806780  bytes sent via SQL*Net to client
      73722  bytes received via SQL*Net from client
       6668  SQL*Net roundtrips to/from client
          1  sorts (memory)
          2  sorts (disk)
    1000001  rows processed

rajesh@10GR2>
rajesh@10GR2>
rajesh@10GR2> select sum(x),sum(y),sum(z)
  2  from t1
  3  group by grouping sets( (x,y,z),() )
  4  order by 1
  5  /

1000001 rows selected.

Elapsed: 00:00:40.87

Execution Plan
----------------------------------------------------------
Plan hash value: 2567933839

------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |  1016K|    37M|   695  (23)| 00:00:09 |
|   1 |  SORT ORDER BY        |      |  1016K|    37M|   695  (23)| 00:00:09 |
|   2 |   SORT GROUP BY ROLLUP|      |  1016K|    37M|   695  (23)| 00:00:09 |
|   3 |    TABLE ACCESS FULL  | T1   |  1016K|    37M|   557   (3)| 00:00:07 |
------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
         43  recursive calls
          7  db block gets
       2483  consistent gets
       5430  physical reads
          0  redo size
   15806780  bytes sent via SQL*Net to client
      73722  bytes received via SQL*Net from client
       6668  SQL*Net roundtrips to/from client
          1  sorts (memory)
          1  sorts (disk)
    1000001  rows processed

See, Grouping set performs data aggregation upon selecting table 'T' and answers it smartly.

No comments:

Post a Comment