Tuesday, June 28, 2016

Top Frequency histogram in 12c

Top Frequency histogram was newly introduced in 12c and it is a variation on a Frequency histogram that ignores non-popular values that are statistically insignificant.  A top frequency histogram can produce better histogram for highly popular values.

If a small number of values occupies most of the rows, then creating a Frequency histogram on that small set of values is useful even when the NDV (Number of Distinct Values) is greater than the requested number of histogram buckets. To create a better quality histogram for popular values, the optimizer ignores the non-popular values and creates a frequency histogram.

The database creates a Top frequency histogram, when the following criteria are met.
  • NDV is greater than n, where n is the requested number of buckets (default 254)
  • The percentage of rows occupied by Top-frequent values is greater than or equal to the threshold p where p is (1-(1/n) * 100).
  • The estimate_percent parameter in dbms_stats gathering procedure should be auto_sample_size (set to default)

rajesh@ORA12C> create table t as select * from all_objects;

Table created.

rajesh@ORA12C> select num_rows from user_tables where table_name ='T';

  NUM_ROWS
----------
     90215

1 row selected.

rajesh@ORA12C> select column_name,num_distinct,density,histogram
2  fromuser_tab_col_statistics
3  wheretable_name ='T'
4  andcolumn_name ='OWNER';

COLUMN_NAM NUM_DISTINCT    DENSITY HISTOGRAM
---------- ------------ ---------- ---------------
OWNER                33  .03030303 NONE

1 row selected.

rajesh@ORA12C> select owner,count(*)
2  from t
3  group by owner
4  order by 2 desc;

OWNER                  COUNT(*)
-------------------- ----------
SYS                       41882
PUBLIC                    37076
ORDSYS                     3157
APEX_040200                3025
MDSYS                      1647
SYSTEM                      589
XDB                         401
CTXSYS                      401
WMSYS                       371
SH                          300
DVSYS                       292
ORDDATA                     274
LBACSYS                     237
OE                          112
GSMADMIN_INTERNAL           104
DBSNMP                       55
IX                           48
RAJESH                       38
HR                           34
OLAPSYS                      25
DEMO                         25
OJVMSYS                      22
DVF                          19
FLOWS_FILES                  11
ORDPLUGINS                   10
PM                           10
AUDSYS                        9
OUTLN                         8
SI_INFORMTN_SCHEMA            8
BI                            8
SCOTT                         6
ORACLE_OCM                    6
APPQOSSYS                     5

33 rows selected.

rajesh@ORA12C> begin
  2     dbms_stats.gather_table_stats
  3             (ownname=>user,
  4             tabname=>'T',
  5             method_opt=>'for columns owner size 10’);
  6  end;
  7  /

PL/SQL procedure successfully completed.

rajesh@ORA12C>

So during stats gathering we requested for histogram on the column OWNER with bucket size of 10.

  • OWNER column has NDV = 33 and the requested bucket size is 10. So NDV > N (where N is the number of buckets requested)
  • The Top 10 values of the column OWNER corresponds to 98% rows in the table, which is greater than the arrived threshold value is ( (1 – (1/10))*100) =  90%, so obviously the rows occupied by Top-N dominates here.
  • The default sample size is being used for this histogram generation

Since it matches all the above three conditions, we end up with Top-Frequency histogram.

rajesh@ORA12C> select column_name,num_distinct,density,histogram
2  fromuser_tab_col_statistics
3  wheretable_name ='T'
4  andcolumn_name ='OWNER';

COLUMN_NAM NUM_DISTINCT    DENSITY HISTOGRAM
---------- ------------ ---------- ---------------
OWNER                33 5.5423E-06 TOP-FREQUENCY

1 row selected.

rajesh@ORA12C>
rajesh@ORA12C> column endpoint_actual_value format a15
rajesh@ORA12C> select endpoint_number,endpoint_actual_value
2  fromuser_tab_histograms
3  wheretable_name ='T'
4  andcolumn_name ='OWNER';

ENDPOINT_NUMBER ENDPOINT_ACTUAL
--------------- ---------------
           3025 APEX_040200
           3426 CTXSYS
           5073 MDSYS
           8230 ORDSYS
          45306 PUBLIC
          45606 SH
          87488 SYS
          88077 SYSTEM
          88448 WMSYS
          88849 XDB

10 rows selected.

As per the documentation, the cardinality algorithm for popular values were.

Cardinality = (Num_of_rows  *Num_of_endpoints_spanned) /  Total_num_of_endpoints.

rajesh@ORA12C> select count(*) from t where owner ='MDSYS';

Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     6 |   480  (13)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     6 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |  1647 |  9882 |   480  (13)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OWNER"='MDSYS')

So taking the value of OWNER = MDSYS into consideration and applying the above cardinality formula produces this result.

Cardinality = ((5073 - 3426) * 90215)/88847 = 1672 (that pretty much matches up with the estimated cardinality)

But in case of non-popular values, the formula from the documentation is

Cardinality = Num_of_rows * Density, applying this formula to our data sets produces the result as 0.5 – rounded off to the nearestinteger the estimated cardinality should be one.

rajesh@ORA12C> select num_distinct,density, density*90215 the_guess
2  fromuser_tab_col_statistics
3  wheretable_name ='T'
4  andcolumn_name ='OWNER' ;

NUM_DISTINCT    DENSITY  THE_GUESS
------------ ---------- ----------
          33 5.5423E-06         .5

1 row selected.

rajesh@ORA12C>

But the explain plan for the non-popular values reported the estimated cardinality as 59, which seems far away from our estimated calculations.

rajesh@ORA12C> select count(*) from t where owner ='SCOTT';

Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     6 |   480  (13)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     6 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |    59 |   354 |   480  (13)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OWNER"='SCOTT')

So looking into 10053 trace, showed that rather than using density from user_tab_col_statistics, optimizer used a different values called “NewDensity” to produce the cardinality.

=====================================
Access path analysis for T
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T[T]
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
  Column (#1):
NewDensity:0.000658, OldDensity:0.000006 BktCnt:88849.000000, PopBktCnt:88849.000000, PopValCnt:10, NDV:33
  Column (#1): OWNER(VARCHAR2)
AvgLen: 6 NDV: 33 Nulls: 0 Density: 0.000658
    Histogram: Top-Freq  #Bkts: 88849  UncompBkts: 88849  EndPtVals: 10  ActualVal: yes
  Table: T  Alias: T
    Card: Original: 90215.000000  Rounded: 59  Computed: 59.391304  Non Adjusted: 59.391304

So applying the formula as Cardinality = NewDensity * Num_of_rows = 0.000658 * 90215 = 59.3 (rounded to the nearest integer produced the value as 59) – that matches up with the estimated cardinality shown up in the explain plan.

But how does the optimizer arrived this NewDensity value shown up in 10053 Trace?

NewDensity = sum_of_unpopular_value_occurances/  (unpopular NDV * Num_of_rows )

Applying this to our data set resulted in this.

NewDensity = (292 + 274 + 237 + 112 + 104 + 55 + 48 + 38 + 34 + 25 + 25 + 22 + 19 + 11 + 10 + 10 + 9 + 8 + 8 + 8 + 6 + 6 + 5 )  / ( 23*90215)  = 0.000658


Wednesday, June 8, 2016

Write Inconsitency Part III

But really we don’t need a trigger in place for a “Transaction restart” to kick in? Not at all – a simple where clause could do that.

Here is again a test case to demonstrate that.

Session#1>set feedback off
Session#1>drop table t purge;
Session#1> create table t as select 1 x, 0 y from dual;
Session#1> insert into t(x,y) values(2,0);
Session#1> set feedback on
Session#1>
Session#1>
Session#1> select * from t;

         X          Y
---------- ----------
         1          0
         2          0

2 rows selected.

Session#1> update t set y = 1;

2 rows updated.

Session#1> select * from t;

         X          Y
---------- ----------
         1          1
         2          1

2 rows selected.

Session#1>

Remember, no commit yet! From session#2 do this.
Session#2> select * from t;

         X          Y
---------- ----------
         1          0

1 row selected.

Session#2> update t set y = y+1 where x >0 and y is not null;

Session#2 is blocked by Session#1 since its transaction is not yet committed, so go back and do commit in Session#1.
Session#1> commit;

Commit complete.

Session#1> select * from t;

         X          Y
---------- ----------
         1          1
         2          1

2 rows selected.

Session#1>

Since session#1 got committed, session#2 got unblocked and completed its update statement, now session#2 show this.
Session#2>update t set y = y+1 where x >0 and y is not null;

2 rows updated.

Session#2> select * from t;

         X          Y
---------- ----------
         1          2
         2          2

2 rows selected.

Session#2>

So having the set of columns in where clause “triggers” the transaction restart, which in-turn advances SCN later than the commit SCN generated by the first transaction.

That is the whole story of getting consistent result after the update from Session#2.

So why this statementupdate T set y = y+1”   behaves different from this statementupdate T set y = y+1 where x >0 and y is not null”?  (Since both the statement is supposed to update all the rows from the Table T) – This is identified as an Internal Bug and not yet fixed still in Oracle 12c (12.1.0.2)


So please aware of these and have your DML’s to be tested properly in applications, to avoid these kind of inconsistency, till they get fixed.

Write Inconsitency Part II

Was wondering, why don’t transaction (from second session) restart itself at an SCN later than commit SCN generated by the first transaction?

Just to check if restart happens, with trigger in place after create table, you’d actually see different results….

Repeating the Test case with Trigger in place now.

Session#1> set feedback off
Session#1> drop table t purge;
Session#1> create table t as select 1 x,0 y from dual;
Session#1> create or replace trigger t_trig
  2  before update on t
  3  for each row
  4  begin
  5     dbms_output.put_line('updating '|| :old.x ||','|| :old.y
  6              ||' to '|| :new.x || ',' || :new.y );
  7  end;
  8  /
Session#1> insert into t(x,y) values(2,0);
Session#1> set feedback on
Session#1>
Session#1> select * from t;

         X          Y
---------- ----------
         1          0
         2          0

2 rows selected.

Session#1>update t set y = 1;
updating 1,0 to 1,1
updating 2,0 to 2,1

2 rows updated.

Session#1> select * from t;

         X          Y
---------- ----------
         1          1
         2          1

2 rows selected.

Session#1>

So Trigger reported that Session#1 has update two records - Remember no, Commit yet!

Now from session#2 do this.

Session#2> select * from t;

         X          Y
---------- ----------
         1          0

1 row selected.

Session#2>update t set y = y+1;

Session#2 wait for Session#1 to complete the transaction, until then session#2 is blocked.

Now go back to Session#1 and do commit.

Session#1> commit;

Commit complete.

Session#1> select * from t;

         X          Y
---------- ----------
         1          1
         2          1

2 rows selected.

Session#1>

Since Session#1 got committed, Update statement in Session#2 got unblocked and query from session#2 returned this.

Session#2> update t set y = y+1;
updating 1,0 to 1,1
updating 1,1 to 1,2
updating 2,1 to 2,2

2 rows updated.

Session#2> select * from t;

         X          Y
---------- ----------
         1          2
         2          2

2 rows selected.

Session#2>

The output from the trigger confirms, the update statement from session#2
a)      Did a consistent read on Table ‘T’
b)      Got the first row for update with x=0 and y=1 and changed y=2 and invoked the trigger before executing the update statement (since it is a before update trigger).
c)      Trigger put this message "updating 1,0 to 1,1" in buffer
d)      Upon trying to make changes to that block, it realizes that row is locked by session#1 and hence this update got blocked. 
e)      When session#1 got committed – this update from session#2 resumes – in-turn got restarted with SCN later than commit SCN generated by the first transaction.
·         So do again a consistent read, go two rows, modify them, have those before and after values recorded by trigger in buffer.
f)       Session#2 get current read on block to make changes for new values of Y. (since session#1 got committed, block is available for current read to Session#2)
g)      For each row updated trigger tries to put message “updating x1, y1 to x2, y2” in buffer
h)      Once the update statement completes, message from buffer got printed on screen.

The presence of three message confirms that update got restarted and produced correct results at the end of this transaction.

The set of columns that "trigger" the restart are the ones used to locate rows (of which there are none, the where clause doesn't exist) plus any columns referenced in a trigger. Since the trigger refers to X and Y - they become part of the set of columns responsible for triggering the restart of this update.


Tuesday, June 7, 2016

Write Inconsistency Part I

Oracle Database enforces statement-level read consistency, which guarantees that data returned by a single query is committed and consistent with respect to a single point in time.

Here is an example of write inconsistency in Oracle. (Able to reproduce in 10g, 11g and 12c (12.1.0.2))

In session-1 do this.

Session#1> set feedback off
Session#1> drop table t purge;
Session#1> create table t as select 1 x,0 y from dual;
Session#1> insert into t(x,y) values(2,0);
Session#1> set feedback on
Session#1> select * from t;

         X          Y
---------- ----------
         1          0
         2          0

2 rows selected.

Session#1> update t set y = 1;

2 rows updated.

Session#1> select * from t;

         X          Y
---------- ----------
         1          1
         2          1

2 rows selected.

Session#1>

Remember no, Commit yet! From the session-2 do this.
Session#2> select * from t;

         X          Y
---------- ----------
         1          0

1 row selected.

Session#2> update t set y = y+1;

Session#2 wait for Session#1 to complete the transaction, until then session#2 is blocked.
Now go back to session#1 and do commit!

Session#1> commit;

Commit complete.

Session#1> select * from t;

         X          Y
---------- ----------
         1          1
         2          1

2 rows selected.

Since Session#1 got committed, Update statement in Session#2 got unblocked and query from session#2 returned this.

Session#2> update t set y = y+1;

1 row updated.

Session#2> select * from t;

         X          Y
---------- ----------
         1          2
         2          1

2 rows selected.

Session#2>

This is an interesting example of something that arguably doing the wrong thing.

According to test result,
  • The second session has update (1, 1) to (1, 2) – which means it has seen some of the effects of the transaction executed by the first session.
  •  But it has not update (2, 1) to (2, 2) – which means it hasn’t seen all of the effects of the transaction executed by the first session.


In principle, the second transaction should have done one of these things.

  •  Rolled back and restarted at an SCN later than commit SCN generated by the first transaction.
  •  Failed and rolled back (something similar to ORA-01877 errors)



We will see how a Transaction restart, could fix this issue. Next installment coming soon…J