Friday, March 30, 2018

SQLs not using bind variables

We all know that proper use of bind variables in SQL statements is a must to make the transaction processing application scalable. So how to find the queries that don’t use bind variable and are parsed each time they are executed? There are number of ways, but this article shows an efficient way.
Starting with Oracle 10gR2, two interesting columns – FORCE_MATCHING_SIGNATURE and EXACT_MATCHING_SIGNATURE – were added in a number of tables and views. We could see them in V$SQL, V$SQLAREA, DBA_HIST_SQLSTAT and DBA_HIST_ACTIVE_SESS_HISTORY data dictionaries.  As you see they are present all over the place – shared pool, ASH, AWR and Statspack. That means we have good choice of source to look for problematic sql’s.
EXACT_MATCHING_SIGNATURE means, a signature is calculated on the normalized sql text. The normalization includes the removal of white space and uppercasing of all the non-literal strings.
FORCE_MATCHING_SIGNATURE means, a signature used as if CURSOR_SHARING set to FORCE. It seems to be another hash value calculated from SQL statement.
demo@ORA11G> select * from dual where dummy='A';
 
no rows selected
 
demo@ORA11G> select * from dual where dummy= 'A';
 
no rows selected
 
demo@ORA11G> select * from dual where DUMMY='A';
 
no rows selected
 
demo@ORA11G> select * from dual where dummy='B';
 
no rows selected
 
demo@ORA11G> column sql_text format a40
demo@ORA11G> col exact_matching_signature format 999999999999999999999
demo@ORA11G> select sql_id,exact_matching_signature,sql_text
  2  from v$sql
  3  where lower(sql_text) like 'select * from dual where dummy%';
 
SQL_ID        EXACT_MATCHING_SIGNATURE SQL_TEXT
------------- ------------------------ ----------------------------------------
2sx3rsjythh1y      4901325341701120494 select * from dual where dummy='B'
4bushd7kct5t7     17152174882085893964 select * from dual where DUMMY='A'
0kuuh73za9ka5     17152174882085893964 select * from dual where dummy='A'
1husj2y65zyyd     17152174882085893964 select * from dual where dummy= 'A'
 
demo@ORA11G>
 
It is easy to notice that all the sql_id’s are different. But the EXACT_MATCHING_SIGNATURE is same for all three statement because of normalization.
 
As noted before FORCE_MATCHING_SIGNATURE is calculated from sql text as if CURSOR_SHARING set to FORCE – that forces sql statements as to share cursors by replacing constants with bind variables, so all statements that differ only by the constants share the same cursor.  Let’s have a look at the FORCE_MATCHING_SIGNATURE value for the same sql’s.
 
demo@ORA11G> col force_matching_signature format 999999999999999999999
demo@ORA11G> select sql_id,force_matching_signature,sql_text
  2  from v$sql
  3  where lower(sql_text) like 'select * from dual where dummy%';
 
SQL_ID        FORCE_MATCHING_SIGNATURE SQL_TEXT
------------- ------------------------ ----------------------------------------
2sx3rsjythh1y     10668153635715970930 select * from dual where dummy='B'
4bushd7kct5t7     10668153635715970930 select * from dual where DUMMY='A'
0kuuh73za9ka5     10668153635715970930 select * from dual where dummy='A'
1husj2y65zyyd     10668153635715970930 select * from dual where dummy= 'A'
 
demo@ORA11G>
 
Now finding similar statements becomes as easy as querying the chosen data source (Shared pool, AWR, ASH, Statspack) and grouping the statements by FORCE_MATCHING_SIGNATURE value.
 
Something like this:
 
demo@ORA11G> select *
  2  from (
  3  select force_matching_signature,count(*)
  4  from v$sql
  5  where child_number = 0
  6  group by force_matching_signature
  7  having count(*) > 1
  8  order by 2 desc
  9       )
 10  where rownum <= 5 ;
 
FORCE_MATCHING_SIGNATURE   COUNT(*)
------------------------ ----------
                       0         89
    10668153635715970930          4
      562954948158618121          3
     6261972462962078540          3
      577209413380651322          2
 
demo@ORA11G> select sql_id,child_number,sql_text
  2  from v$sql
  3  where force_matching_signature ='562954948158618121';
 
SQL_ID        CHILD_NUMBER SQL_TEXT
------------- ------------ ----------------------------------------
5qz5gdb6n09k4            0 select * from emp where deptno = 30
a4y5kdgz0nw69            0 select * from emp where deptno = 10
84fmz2fg9mdgh            0 select * from emp where deptno = 20
 
demo@ORA11G>

Thursday, March 22, 2018

Cursor invalidation (no_invalidate param in stats gathering)

One of the nice capability of Oracle database is the ability to reuse the cursor/plans across the multiple executions. However reusing the plan for each execution will reduce the total (hard) parsing done in the database that in-turn improve the scalability of the application.
However reusing the plan in the ETL or warehousing environment can be dangerous, since the data volume returned for each execution will be different. So the plan generated for the first execution will not be the optimal plan for the subsequent execution.
Even the stats gathered before the subsequent execution won’t help unless until no_invalidate option is included in the stats gathering process.
Here is the demo, Created a table with index on the ID column, then data loaded and gathered the stats on this table along with a histogram on the ID column (since that column is massively skewed)
demo@ORA12C> create table t(id number,code varchar2(60),some_other_text varchar2(60) );
 
Table created.
 
demo@ORA12C> create index t_idx on t(id);
 
Index created.
 
demo@ORA12C> insert into t(id,code,some_other_text)
  2  select decode(rownum,1,1,99), substr(object_name,1,60), rpad('*',60)
  3  from big_table;
 
1000000 rows created.
 
demo@ORA12C> begin
  2     dbms_stats.gather_table_stats(
  3             ownname=>user,
  4             tabname=>'T',
  5             method_opt=>'for columns id size 254');
  6  end;
  7  /
 
PL/SQL procedure successfully completed.
 
Let’s run the query and see what plan it picks
demo@ORA12C> set serveroutput off
demo@ORA12C> select max(code) from t where id = 1;
 
MAX(CODE)
------------------------------------------------------------
jdk/nashorn/internal/runtime/SpillProperty
 
demo@ORA12C> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));
 
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------
SQL_ID  apbgnwvfasyfs, child number 0
-------------------------------------
select max(code) from t where id = 1
 
Plan hash value: 1339972470
 
--------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |      1 |00:00:00.01 |       4 |
|   1 |  SORT AGGREGATE                      |       |      1 |      1 |      1 |00:00:00.01 |       4 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T     |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX RANGE SCAN                  | T_IDX |      1 |      1 |      1 |00:00:00.01 |       3 |
--------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("ID"=1)
 
Just to return one row – matching the predicate – the optimizer picked up the index on the ID column & query completed in less than a second, by doing four logical IO per execution.
Now let us change the data distribution in this table and see what plan does it picks in the next run.
demo@ORA12C> select loads, child_number,invalidations
  2  from v$sql
  3  where sql_id ='apbgnwvfasyfs';
 
     LOADS CHILD_NUMBER INVALIDATIONS
---------- ------------ -------------
         1            0             0
 
demo@ORA12C> insert into t(id,code,some_other_text)
  2  select 1, substr(object_name,1,60), rpad('*',60)
  3  from big_table;
 
1000000 rows created.
 
demo@ORA12C> begin
  2     dbms_stats.gather_table_stats(
  3             ownname=>user,
  4             tabname=>'T',
  5             method_opt=>'for columns id size 254');
  6  end;
  7  /
 
PL/SQL procedure successfully completed.
 
demo@ORA12C> select loads, child_number,invalidations
  2  from v$sql
  3  where sql_id ='apbgnwvfasyfs';
 
     LOADS CHILD_NUMBER INVALIDATIONS
---------- ------------ -------------
         1            0             0
 
 
Added a bunch of records to the table with ID=1 and gathering the stats doesn’t invalidates the existing cursor in the library cache.
 
demo@ORA12C> select max(code) from t where id = 1;
 
MAX(CODE)
------------------------------------------------------------
sun/util/xml/PlatformXmlPropertiesProvider$Resolver
 
demo@ORA12C> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));
 
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------
SQL_ID  apbgnwvfasyfs, child number 0
-------------------------------------
select max(code) from t where id = 1
 
Plan hash value: 1339972470
 
-----------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |      1 |00:01:24.03 |   34347 |  15652 |
|   1 |  SORT AGGREGATE                      |       |      1 |      1 |      1 |00:01:24.03 |   34347 |  15652 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T     |      1 |      1 |   1000K|00:01:23.73 |   34347 |  15652 |
|*  3 |    INDEX RANGE SCAN                  | T_IDX |      1 |      1 |   1000K|00:00:08.43 |    5109 |   1067 |
-----------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("ID"=1)
 
 
20 rows selected.
 
Executing the same sql again ended up with reusing the plan and now it get to return half of the result sets from the table using the same index based access (using index based access to return half the volume of data from the table is something seriously wrong). And now it took nearly more than a min and took about 34K logical IO to complete.
 
However having the “no_invalidate” option during the stats gathering will lead the existing child cursor to be invalidated, that doesn’t mean it will be discarded entirely, it simply means the existing information about the child cursor cannot be used, instead it must be reloaded on the next invocation.
 
demo@ORA12C> select loads, child_number,invalidations
  2  from v$sql
  3  where sql_id ='apbgnwvfasyfs';
 
     LOADS CHILD_NUMBER INVALIDATIONS
---------- ------------ -------------
         1            0             0
 
demo@ORA12C> begin
  2     dbms_stats.gather_table_stats(
  3             ownname=>user,
  4             tabname=>'T',
  5             method_opt=>'for columns id size 254',
  6             no_invalidate=>false);
  7  end;
  8  /
 
PL/SQL procedure successfully completed.
 
demo@ORA12C> select loads, child_number,invalidations
  2  from v$sql
  3  where sql_id ='apbgnwvfasyfs';
 
     LOADS CHILD_NUMBER INVALIDATIONS
---------- ------------ -------------
         1            0             1
 
demo@ORA12C> select max(code) from t where id = 1;
 
MAX(CODE)
------------------------------------------------------------
sun/util/xml/PlatformXmlPropertiesProvider$Resolver
 
demo@ORA12C> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));
 
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------
SQL_ID  apbgnwvfasyfs, child number 1
-------------------------------------
select max(code) from t where id = 1
 
Plan hash value: 2966233522
 
----------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:06.80 |   29483 |  29465 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:06.80 |   29483 |  29465 |
|*  2 |   TABLE ACCESS FULL| T    |      1 |   1000K|   1000K|00:00:06.78 |   29483 |  29465 |
----------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter("ID"=1)
 
 
19 rows selected.
 
So instead of reusing the plan from the library cache, we invalidated the existing plan during stats gathering and upon execution we picked the optimal plan (i.e. Full Table scan rather than index based access path) based on the current data statistics present in the table.

Tuesday, March 6, 2018

One percent data retrieval from table

One of the most common myth about index is, if any SQL statement is about to retrieve one percent of data from a table, then the index based access would be the best option.
However that is not the case **always**, it depends on clustering of the data.
If the data is very well clustered according to the index, then the cost of reading the table via the index would be the cheap.
Created a table with unique index on ID column and a very good appealing clustering factor.
demo@ORA12C> create table t (id number NOT NULL, some_other_datas varchar2(300) );
 
Table created.
 
demo@ORA12C> create unique index t_idx on t(id);
 
Index created.
 
demo@ORA12C> insert into t (id,some_other_datas)
  2  select rownum, rpad('*',200,'*')
  3  from all_objects, all_users
  4  where rownum <=2e6;
 
2000000 rows created.
 
demo@ORA12C> exec dbms_stats.gather_table_stats(user,'T');
 
PL/SQL procedure successfully completed.
 
demo@ORA12C> select i.index_name, i.clustering_factor, t.num_rows, t.blocks
  2  from user_indexes i ,
  3    user_tables t
  4  where i.table_name = t.table_name
  5  and i.index_name ='T_IDX';
 
INDEX_NAME CLUSTERING_FACTOR   NUM_ROWS     BLOCKS
---------- ----------------- ---------- ----------
T_IDX                  58824    2000000      59057
 
demo@ORA12C>
 
 
To retrieve one percent of data from this table, the optimizer picks the index as best access path.
 
 
demo@ORA12C> alter session set statistics_level=all;
 
Session altered.
 
demo@ORA12C> set serveroutput off
demo@ORA12C> set feedback only
demo@ORA12C> select *
  2  from t
  3  where id between 20000 and 40000;
 
20001 rows selected.
 
demo@ORA12C> set feedback 6
demo@ORA12C> select * from table( dbms_xplan.display_cursor(format=>'allstats last'));
 
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------
SQL_ID  b7jpkag179cf6, child number 0
-------------------------------------
select * from t where id between 20000 and 40000
 
Plan hash value: 767293772
 
-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |  20001 |00:00:00.01 |     889 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T     |      1 |  20002 |  20001 |00:00:00.01 |     889 |
|*  2 |   INDEX RANGE SCAN                  | T_IDX |      1 |  20002 |  20001 |00:00:00.01 |     174 |
-------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("ID">=20000 AND "ID"<=40000)
 
 
19 rows selected.
 
demo@ORA12C>
 
The query completed in less than a second returning about 20K rows doing 890 Logical IOs.
 
Now let’s truncate and reload this table with a different set of data, having a poor clustering factor.
 
demo@ORA12C> truncate table t;
 
Table truncated.
 
demo@ORA12C> insert into t (id,some_other_datas)
  2  select dbms_random.value(1,2000000), rpad('*',200,'*')
  3  from all_objects, all_users
  4  where rownum <=2e6;
 
2000000 rows created.
 
demo@ORA12C> exec dbms_stats.gather_table_stats(user,'T');
 
PL/SQL procedure successfully completed.
 
demo@ORA12C> select i.index_name, i.clustering_factor, t.num_rows, t.blocks
  2  from user_indexes i ,
  3    user_tables t
  4  where i.table_name = t.table_name
  5  and i.index_name ='T_IDX';
 
INDEX_NAME CLUSTERING_FACTOR   NUM_ROWS     BLOCKS
---------- ----------------- ---------- ----------
T_IDX                2005915    2000000      65177
 
 
Now to retrieve one percent of data from this table, the optimizer picks the Full table scan as best access path.
 
demo@ORA12C> alter session set statistics_level=all;
 
Session altered.
 
demo@ORA12C> set serveroutput off
demo@ORA12C> set feedback only
demo@ORA12C> select *
  2  from t
  3  where id between 20000 and 40000;
 
20119 rows selected.
 
demo@ORA12C> set feedback 6
demo@ORA12C> select * from table( dbms_xplan.display_cursor(format=>'allstats last'));
 
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------
SQL_ID  b7jpkag179cf6, child number 0
-------------------------------------
select * from t where id between 20000 and 40000
 
Plan hash value: 1601196873
 
---------------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |  20119 |00:00:06.42 |     129K|  64669 |
|*  1 |  TABLE ACCESS FULL| T    |      1 |  20002 |  20119 |00:00:06.42 |     129K|  64669 |
---------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(("ID"<=40000 AND "ID">=20000))
 
 
18 rows selected.
 
This look around 6 sec to complete and returned 20K rows by doing 129K logical IOs. 
 
Though the optimizer doesn’t pick the index to answer this query, if we force the index via a Hint, things become worse like this.
 
demo@ORA12C> set feedback only
demo@ORA12C> select /*+ index(t,t_idx) */ *
  2  from t
  3  where id between 20000 and 40000;
 
20119 rows selected.
 
demo@ORA12C> set feedback 6
demo@ORA12C> select * from table( dbms_xplan.display_cursor(format=>'allstats last'));
 
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------
SQL_ID  478424k8ajjqh, child number 0
-------------------------------------
select /*+ index(t,t_idx) */ * from t where id between 20000 and 40000
 
Plan hash value: 767293772
 
----------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |  20119 |00:02:42.33 |   37768 |  18066 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T     |      1 |  20002 |  20119 |00:02:42.33 |   37768 |  18066 |
|*  2 |   INDEX RANGE SCAN                  | T_IDX |      1 |  20002 |  20119 |00:00:02.01 |     293 |     10 |
----------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("ID">=20000 AND "ID"<=40000)
 
 
19 rows selected.
 
demo@ORA12C>
 
Using index the sql took around 160 sec, where as a Full table scan did only 6 sec.
 
So the answer to this question – if a SQL is about to return one percent of data from a table, then the choice to use either the full table scan or the index based access purely depends on clustering of the data.