Thursday, July 8, 2010

Dynamic Sampling

Dynamic sampling first became available in Oracle9i Database Release 2. It is the ability of the cost-based optimizer (CBO) to sample the tables a query references during a hard parse, to determine better default statistics for unanalyzed segments, and to verify its “guesses.” This sampling takes place only at hard parse time and is used to dynamically generate better statistics for the optimizer to use, hence the name dynamic sampling.

The optimizer uses a variety of inputs to come up with a plan. It uses any and all constraints defined on the table; system statistics—information about your server’s I/O speeds, CPU speed, and the like; and statistics gathered from the segments involved in the query. The optimizer uses statistics to estimate cardinalities—the number of rows each step in a given plan is expected to return—and those cardinalities are a major variable in computing the cost of a query. When cardinalities are incorrectly estimated, the optimizer may choose an inefficient query plan. The No. 1, and some might say only, reason for an inefficient plan’s being generated by the optimizer is inaccurate cardinality estimations  “right cardinality equals right plan; wrong cardinality equals wrong plan.”


scott@10GR2> create table t
  2  as
  3  select object_name,object_id
  4  from all_objects
  5  /

Table created.

scott@10GR2> select TABLE_NAME,NUM_ROWS,BLOCKS,EMPTY_BLOCKS,AVG_SPACE
  2  from user_tab_statistics
  3  WHERE TABLE_NAME ='T'
  4  /

TABLE_NAME                       NUM_ROWS     BLOCKS EMPTY_BLOCKS  AVG_SPACE
------------------------------ ---------- ---------- ------------ ----------
T

scott@10GR2> select count(*) from T;

  COUNT(*)
----------
     55859

scott@10GR2> show parameter optimizer_dynamic;

NAME                                 TYPE        VALUE
------------------------------------ ----------- --------
optimizer_dynamic_sampling           integer     2

Disabling the Dynamic Sampling to see the default cardinality.

scott@10GR2> select /*+ dynamic_sampling(t,0) */ * from t;

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 22136 |   648K|    61   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    | 22136 |   648K|    61   (0)| 00:00:01 |
--------------------------------------------------------------------------

The Estimated cardinality is 22136 which is very far from real cardinality 55859. If  I permit dynamic sampling then i get much more realistic cardinality.

scott@10GR2> select * from t;

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 58007 |  1699K|    62   (2)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    | 58007 |  1699K|    62   (2)| 00:00:01 |
--------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement



So, where would dynamic sampling be useful? First and foremost, dynamic sampling is useful when you are accessing any table that has been created and loaded but not yet analyzed (As I did above).

In addition to providing the optimizer with necessary statistics when it is accessing unanalyzed segments, dynamic sampling may also help the optimizer to validate any of its guesses. To demonstrate this, I’ll create a table with some very specific data. Note that in this table, if the FLAG1 column has a value of Y, the FLAG2 column will have a value of N, and vice versa. All the data is either Y, N, or N, Y—there are no Y, Y records and no N, N records

scott@10GR2> create table t
  2  as
  3  select decode(mod(rownum,2),0,'N','Y') as flag1,
  4             decode(mod(rownum,2),0,'Y','N') as flag2,
  5             a.*
  6  from all_objects a
  7  /

Table created.

scott@
10GR2> create index t_ind on t(flag1,flag2);

Index created.

scott@
10GR2> begin
  2     dbms_stats.gather_table_stats(ownname=>USER,tabname=>'T',cascade=>true,method_opt=>'for all indexed columns size 254');
  3  end;
  4  /

PL/SQL procedure successfully completed.

scott@
10GR2> select count(*), count(*)/2,count(*)/2/2 from t;

  COUNT(*) COUNT(*)/2 COUNT(*)/2/2
---------- ---------- ------------
     55859    27929.5     13964.75


scott@10GR2> select * from t where flag1='N';

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 27357 |  2618K|   184   (2)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T    | 27357 |  2618K|   184   (2)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("FLAG1"='N')

Good cardinality looking at half of the data.

scott@10GR2> select * from t where flag2='N';

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 28502 |  2727K|   184   (2)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T    | 28502 |  2727K|   184   (2)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("FLAG2"='N')

Again, Good cardinality looking at half of the data.

scott@10GR2> select * from t where flag1='N' and flag2='N';

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 13959 |  1335K|   184   (2)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T    | 13959 |  1335K|   184   (2)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("FLAG1"='N' AND "FLAG2"='N')

Poor cardinality looking at  more than Quarter of the data. The optimizer is making a wrong decision about cardinality and going for a wrong Access path. The same query when used Dynamic Sampling for level 3 it goes for better cardinality and finds the Optimal execution path.

scott@10GR2> select /*+ dynamic_sampling(t,3) */ * from t where flag1='N' and flag2='N';

Execution Plan
----------------------------------------------------------
Plan hash value: 1376202287

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     5 |   490 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |     5 |   490 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IND |     5 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("FLAG1"='N' AND "FLAG2"='N')

Note
-----
   - dynamic sampling used for this statement


Detailed information about levels of dynamic sampling at product documentation.
http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/stats.htm#i43032

1 comment:

  1. Hi Rajesh,

    I like your blog very much. You described and explained the things very clearly and well. I have one suggestion to make more presentable your blog, change your blog Template Designer – Layout-Adjust width-Entire blog. Check my blog http://deepakguptadba.blogspot.com/

    Thanks,
    Deepak

    ReplyDelete