THE RAJESH BLOG: June 2016

Tuesday, June 28, 2016

Top Frequency histogram in 12c

Top Frequency histogram was newly introduced in 12c and it is a variation on a Frequency histogram that ignores non-popular values that are statistically insignificant. A top frequency histogram can produce better histogram for highly popular values.

If a small number of values occupies most of the rows, then creating a Frequency histogram on that small set of values is useful even when the NDV (Number of Distinct Values) is greater than the requested number of histogram buckets. To create a better quality histogram for popular values, the optimizer ignores the non-popular values and creates a frequency histogram.

The database creates a Top frequency histogram, when the following criteria are met.

NDV is greater than n, where n is the requested number of buckets (default 254)
The percentage of rows occupied by Top-frequent values is greater than or equal to the threshold p where p is (1-(1/n) * 100).
The estimate_percent parameter in dbms_stats gathering procedure should be auto_sample_size (set to default)

rajesh@ORA12C> create table t as select * from all_objects;

Table created.

rajesh@ORA12C> select num_rows from user_tables where table_name ='T';

NUM_ROWS

----------

90215

1 row selected.

rajesh@ORA12C> select column_name,num_distinct,density,histogram

2 fromuser_tab_col_statistics

3 wheretable_name ='T'

4 andcolumn_name ='OWNER';

COLUMN_NAM NUM_DISTINCT DENSITY HISTOGRAM

---------- ------------ ---------- ---------------

OWNER 33 .03030303 NONE

1 row selected.

rajesh@ORA12C> select owner,count(*)

2 from t

3 group by owner

4 order by 2 desc;

OWNER COUNT(*)

-------------------- ----------

SYS 41882

PUBLIC 37076

ORDSYS 3157

APEX_040200 3025

MDSYS 1647

SYSTEM 589

XDB 401

CTXSYS 401

WMSYS 371

SH 300

DVSYS 292

ORDDATA 274

LBACSYS 237

OE 112

GSMADMIN_INTERNAL 104

DBSNMP 55

IX 48

RAJESH 38

HR 34

OLAPSYS 25

DEMO 25

OJVMSYS 22

DVF 19

FLOWS_FILES 11

ORDPLUGINS 10

PM 10

AUDSYS 9

OUTLN 8

SI_INFORMTN_SCHEMA 8

BI 8

SCOTT 6

ORACLE_OCM 6

APPQOSSYS 5

33 rows selected.

rajesh@ORA12C> begin

2 dbms_stats.gather_table_stats

3 (ownname=>user,

4 tabname=>'T',

5 method_opt=>'for columns owner size 10’);

6 end;

7 /

PL/SQL procedure successfully completed.

rajesh@ORA12C>

So during stats gathering we requested for histogram on the column OWNER with bucket size of 10.

OWNER column has NDV = 33 and the requested bucket size is 10. So NDV > N (where N is the number of buckets requested)
The Top 10 values of the column OWNER corresponds to 98% rows in the table, which is greater than the arrived threshold value is ( (1 – (1/10))*100) = 90%, so obviously the rows occupied by Top-N dominates here.
The default sample size is being used for this histogram generation

Since it matches all the above three conditions, we end up with Top-Frequency histogram.

rajesh@ORA12C> select column_name,num_distinct,density,histogram

2 fromuser_tab_col_statistics

3 wheretable_name ='T'

4 andcolumn_name ='OWNER';

COLUMN_NAM NUM_DISTINCT DENSITY HISTOGRAM

---------- ------------ ---------- ---------------

OWNER 33 5.5423E-06 TOP-FREQUENCY

1 row selected.

rajesh@ORA12C>

rajesh@ORA12C> column endpoint_actual_value format a15

rajesh@ORA12C> select endpoint_number,endpoint_actual_value

2 fromuser_tab_histograms

3 wheretable_name ='T'

4 andcolumn_name ='OWNER';

ENDPOINT_NUMBER ENDPOINT_ACTUAL

--------------- ---------------

3025 APEX_040200

3426 CTXSYS

5073 MDSYS

8230 ORDSYS

45306 PUBLIC

45606 SH

87488 SYS

88077 SYSTEM

88448 WMSYS

88849 XDB

10 rows selected.

As per the documentation, the cardinality algorithm for popular values were.

Cardinality = (Num_of_rows *Num_of_endpoints_spanned) / Total_num_of_endpoints.

rajesh@ORA12C> select count(*) from t where owner ='MDSYS';

Execution Plan

----------------------------------------------------------

Plan hash value: 2966233522

---------------------------------------------------------------------------

---------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 6 | 480 (13)| 00:00:01 |

| 1 | SORT AGGREGATE | | 1 | 6 | | |

|* 2 | TABLE ACCESS FULL| T | 1647 | 9882 | 480 (13)| 00:00:01 |

---------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - filter("OWNER"='MDSYS')

So taking the value of OWNER = MDSYS into consideration and applying the above cardinality formula produces this result.

Cardinality = ((5073 - 3426) * 90215)/88847 = 1672 (that pretty much matches up with the estimated cardinality)

But in case of non-popular values, the formula from the documentation is

Cardinality = Num_of_rows * Density, applying this formula to our data sets produces the result as 0.5 – rounded off to the nearestinteger the estimated cardinality should be one.

rajesh@ORA12C> select num_distinct,density, density*90215 the_guess

2 fromuser_tab_col_statistics

3 wheretable_name ='T'

4 andcolumn_name ='OWNER' ;

NUM_DISTINCT DENSITY THE_GUESS

------------ ---------- ----------

33 5.5423E-06 .5

1 row selected.

rajesh@ORA12C>

But the explain plan for the non-popular values reported the estimated cardinality as 59, which seems far away from our estimated calculations.

rajesh@ORA12C> select count(*) from t where owner ='SCOTT';

Execution Plan

----------------------------------------------------------

Plan hash value: 2966233522

---------------------------------------------------------------------------

---------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 6 | 480 (13)| 00:00:01 |

| 1 | SORT AGGREGATE | | 1 | 6 | | |

|* 2 | TABLE ACCESS FULL| T | 59 | 354 | 480 (13)| 00:00:01 |

---------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - filter("OWNER"='SCOTT')

So looking into 10053 trace, showed that rather than using density from user_tab_col_statistics, optimizer used a different values called “NewDensity” to produce the cardinality.

=====================================

Access path analysis for T

***************************************

SINGLE TABLE ACCESS PATH

Single Table Cardinality Estimation for T[T]

SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE

Column (#1):

NewDensity:0.000658, OldDensity:0.000006 BktCnt:88849.000000, PopBktCnt:88849.000000, PopValCnt:10, NDV:33

Column (#1): OWNER(VARCHAR2)

AvgLen: 6 NDV: 33 Nulls: 0 Density: 0.000658

Histogram: Top-Freq #Bkts: 88849 UncompBkts: 88849 EndPtVals: 10 ActualVal: yes

Table: T Alias: T

Card: Original: 90215.000000 Rounded: 59 Computed: 59.391304 Non Adjusted: 59.391304

So applying the formula as Cardinality = NewDensity * Num_of_rows = 0.000658 * 90215 = 59.3 (rounded to the nearest integer produced the value as 59) – that matches up with the estimated cardinality shown up in the explain plan.

But how does the optimizer arrived this NewDensity value shown up in 10053 Trace?

NewDensity = sum_of_unpopular_value_occurances/ (unpopular NDV * Num_of_rows )

Applying this to our data set resulted in this.

NewDensity = (292 + 274 + 237 + 112 + 104 + 55 + 48 + 38 + 34 + 25 + 25 + 22 + 19 + 11 + 10 + 10 + 9 + 8 + 8 + 8 + 6 + 6 + 5 ) / ( 23*90215) = 0.000658

Wednesday, June 8, 2016

Write Inconsitency Part III

But really we don’t need a trigger in place for a “Transaction restart” to kick in? Not at all – a simple where clause could do that.

Here is again a test case to demonstrate that.

Session#1>set feedback off

Session#1>drop table t purge;

Session#1> create table t as select 1 x, 0 y from dual;

Session#1> insert into t(x,y) values(2,0);

Session#1> set feedback on

Session#1>

Session#1> select * from t;

X Y

---------- ----------

1 0

2 0

2 rows selected.

Session#1> update t set y = 1;

2 rows updated.

Session#1> select * from t;

X Y

---------- ----------

1 1

2 1

2 rows selected.

Session#1>

Remember, no commit yet! From session#2 do this.

Session#2> select * from t;

X Y

---------- ----------

1 0

1 row selected.

Session#2> update t set y = y+1 where x >0 and y is not null;

Session#2 is blocked by Session#1 since its transaction is not yet committed, so go back and do commit in Session#1.

Session#1> commit;

Commit complete.

Session#1> select * from t;

X Y

---------- ----------

1 1

2 1

2 rows selected.

Session#1>

Since session#1 got committed, session#2 got unblocked and completed its update statement, now session#2 show this.

Session#2>update t set y = y+1 where x >0 and y is not null;

2 rows updated.

Session#2> select * from t;

X Y

---------- ----------

1 2

2 2

2 rows selected.

Session#2>

So having the set of columns in where clause “triggers” the transaction restart, which in-turn advances SCN later than the commit SCN generated by the first transaction.

That is the whole story of getting consistent result after the update from Session#2.

So why this statement “update T set y = y+1” behaves different from this statement “update T set y = y+1 where x >0 and y is not null”? (Since both the statement is supposed to update all the rows from the Table T) – This is identified as an Internal Bug and not yet fixed still in Oracle 12c (12.1.0.2)

So please aware of these and have your DML’s to be tested properly in applications, to avoid these kind of inconsistency, till they get fixed.

Write Inconsitency Part II

Was wondering, why don’t transaction (from second session) restart itself at an SCN later than commit SCN generated by the first transaction?

Just to check if restart happens, with trigger in place after create table, you’d actually see different results….

Repeating the Test case with Trigger in place now.

Session#1> set feedback off

Session#1> drop table t purge;

Session#1> create table t as select 1 x,0 y from dual;

Session#1> create or replace trigger t_trig

2 before update on t

3 for each row

4 begin

5 dbms_output.put_line('updating '|| :old.x ||','|| :old.y

6 ||' to '|| :new.x || ',' || :new.y );

7 end;

8 /

Session#1> insert into t(x,y) values(2,0);

Session#1> set feedback on

Session#1>

Session#1> select * from t;

X Y

---------- ----------

1 0

2 0

2 rows selected.

Session#1>update t set y = 1;

updating 1,0 to 1,1

updating 2,0 to 2,1

2 rows updated.

Session#1> select * from t;

X Y

---------- ----------

1 1

2 1

2 rows selected.

Session#1>

So Trigger reported that Session#1 has update two records - Remember no, Commit yet!

Now from session#2 do this.

Session#2> select * from t;

X Y

---------- ----------

1 0

1 row selected.

Session#2>update t set y = y+1;

Session#2 wait for Session#1 to complete the transaction, until then session#2 is blocked.

Now go back to Session#1 and do commit.

Session#1> commit;

Commit complete.

Session#1> select * from t;

X Y

---------- ----------

1 1

2 1

2 rows selected.

Session#1>

Since Session#1 got committed, Update statement in Session#2 got unblocked and query from session#2 returned this.

Session#2> update t set y = y+1;

updating 1,0 to 1,1

updating 1,1 to 1,2

updating 2,1 to 2,2

2 rows updated.

Session#2> select * from t;

X Y

---------- ----------

1 2

2 2

2 rows selected.

Session#2>

The output from the trigger confirms, the update statement from session#2

a) Did a consistent read on Table ‘T’

b) Got the first row for update with x=0 and y=1 and changed y=2 and invoked the trigger before executing the update statement (since it is a before update trigger).

c) Trigger put this message "updating 1,0 to 1,1" in buffer

d) Upon trying to make changes to that block, it realizes that row is locked by session#1 and hence this update got blocked.

e) When session#1 got committed – this update from session#2 resumes – in-turn got restarted with SCN later than commit SCN generated by the first transaction.

· So do again a consistent read, go two rows, modify them, have those before and after values recorded by trigger in buffer.

f) Session#2 get current read on block to make changes for new values of Y. (since session#1 got committed, block is available for current read to Session#2)

g) For each row updated trigger tries to put message “updating x1, y1 to x2, y2” in buffer

h) Once the update statement completes, message from buffer got printed on screen.

The presence of three message confirms that update got restarted and produced correct results at the end of this transaction.

The set of columns that "trigger" the restart are the ones used to locate rows (of which there are none, the where clause doesn't exist) plus any columns referenced in a trigger. Since the trigger refers to X and Y - they become part of the set of columns responsible for triggering the restart of this update.

Tuesday, June 7, 2016

Write Inconsistency Part I

Oracle Database enforces statement-level read consistency, which guarantees that data returned by a single query is committed and consistent with respect to a single point in time.

Here is an example of write inconsistency in Oracle. (Able to reproduce in 10g, 11g and 12c (12.1.0.2))

In session-1 do this.

Session#1> set feedback off

Session#1> drop table t purge;

Session#1> create table t as select 1 x,0 y from dual;

Session#1> insert into t(x,y) values(2,0);

Session#1> set feedback on

Session#1> select * from t;

X Y

---------- ----------

1 0

2 0

2 rows selected.

Session#1> update t set y = 1;

2 rows updated.

Session#1> select * from t;

X Y

---------- ----------

1 1

2 1

2 rows selected.

Session#1>

Remember no, Commit yet! From the session-2 do this.

Session#2> select * from t;

X Y

---------- ----------

1 0

1 row selected.

Session#2> update t set y = y+1;

Session#2 wait for Session#1 to complete the transaction, until then session#2 is blocked.

Now go back to session#1 and do commit!

Session#1> commit;

Commit complete.

Session#1> select * from t;

X Y

---------- ----------

1 1

2 1

2 rows selected.

Since Session#1 got committed, Update statement in Session#2 got unblocked and query from session#2 returned this.

Session#2> update t set y = y+1;

1 row updated.

Session#2> select * from t;

X Y

---------- ----------

1 2

2 1

2 rows selected.

Session#2>

This is an interesting example of something that arguably doing the wrong thing.

According to test result,

The second session has update (1, 1) to (1, 2) – which means it has seen some of the effects of the transaction executed by the first session.
But it has not update (2, 1) to (2, 2) – which means it hasn’t seen all of the effects of the transaction executed by the first session.

In principle, the second transaction should have done one of these things.

Rolled back and restarted at an SCN later than commit SCN generated by the first transaction.
Failed and rolled back (something similar to ORA-01877 errors)

We will see how a Transaction restart, could fix this issue. Next installment coming soon…J

Books I follow...

Advanced SQL Functions in Oracle 10g - by Richard Walsh Earp
Cost-Based Oracle Fundamentals - By Jonathan Lewis
Effective Oracle by Design - by Thomas kyte
Expert One on One Oracle - by Thomas kyte
Expert Oracle - by Thomas kyte
Expert Oracle Database Architecture Oracle Database 9i, 10g - Thomas Kyte
Expert Oracle Database Architecture Oracle Database 9i, 10g, and 11g Programming Techniques and Solutions - Thomas Kyte
Expert Oracle Practices Oracle Database Administration from the Oak Table
High-Performance Oracle Database Applications - by Donald K. Burleson
Mastering Oracle PL/SQL: Practical Solutions - by Connor McDonald
Mastering Oracle Scheduler in Oracle 11g Databases - by Ronald Rood
Mastering Oracle SQL and SQL*Plus - By LEX DE HAAN
Optimizing Oracle Performance - By Cary Millsap
Oracle Database 10g Performance Tuning Tips & Techniques - By Richard J. Niemiec
Oracle Database 10g XML & SQL: Design, Build & Manage XML Applications in Java, C, C++ & PL/SQL - by Mark V. Scardina, Ben Chang and Jinyu Wang
Oracle Database 10g: The Complete Reference - by Kevin Loney
Oracle Database 11g DBA Handbook - by Bob Bryla , Kevin Loney
Oracle Database 11g New Features - Maximize the new capabilities of the Latest Database Release - by Robert G. Freeman
Oracle Performance Tuning for 10gR2 - By Gavin Powell
Oracle PL/SQL for DBAs By Steven Feuerstein, Arup Nanda
Oracle SQL*Plus: The Definitive Guide - By Jonathan Gennick
Oracle Wait Interface: A Practical Guide to Performance Diagnostics & Tuning by Richmond Shee, Kirtikumar Deshpande and K Gopalakrishnan
Oracle8i Internal Services for Waits, Latches, Locks, and Memory - by Steve Adams
Professional Oracle 8i Application Programming - By Thomas Kyte
Secrets of the Oracle Database - By Norbert Debes
Troubleshooting Oracle Performance - By Christian Antognini
Tuning Third-party Vendor Oracle systems Tuning when you can't touch the code - By Mike Ault
Using Oracle SQL Stored Outlines & Optimizer Plan Stability - By Mike Ault