Thursday, February 11, 2010

Use the Correct Datatype

rajesh@ORA10G> create table t as
  2  select to_char(to_date('01/01/2000','mm/dd/yyyy')+rownum,'yyyymmdd') as str_date,
  3        to_number(to_char(to_date('01/01/2000','mm/dd/yyyy')+rownum,'yyyymmdd')) as num_date,
  4        to_date('01/01/2000','mm/dd/yyyy')+rownum as date_date
  5  from all_objects
  6  order by dbms_random.random ;

Table created.
rajesh@ORA10G>
rajesh@ORA10G> create index t_ind_01 on t(str_date);

Index created.
rajesh@ORA10G> create index t_ind_02 on t(num_date);

Index created.
rajesh@ORA10G> create index t_ind_03 on t(date_date);

Index created.
rajesh@ORA10G> begin
  2     dbms_stats.gather_table_stats
  3     (user,'T',
  4      method_opt=>'for all indexed columns size 254');
  5  end;
  6  /

PL/SQL procedure successfully completed.
rajesh@ORA10G>

Now, let’s see what happens when we query this table using the string date column and the real date column.
Pay close attention to the Card= component of the plan:

rajesh@ORA10G> select * from t
  2  where str_date between
  3  '20121231' and '20130101';

STR_DATE   NUM_DATE DATE_DATE
-------- ---------- -----------------------
20130101   20130101 01-JAN-2013 12:00:00 AM
20121231   20121231 31-DEC-2012 12:00:00 AM

2 rows selected.
rajesh@ORA10G> select * from table( dbms_xplan.display_cursor() );

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------
SQL_ID  5qzjrcx91mvp6, child number 1
-------------------------------------
select * from t where str_date between '20121231' and '20130101'
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |    55 (100)|          |
|*  1 |  TABLE ACCESS FULL| T    |   212 |  4876 |    55   (2)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("STR_DATE"<='20130101' AND "STR_DATE">='20121231'))

18 rows selected.

now querying against the num_date column

rajesh@ORA10G>
rajesh@ORA10G> select * from t
  2  where num_date
  3  between 20121231 and 20130101;

STR_DATE   NUM_DATE DATE_DATE
-------- ---------- -----------------------
20130101   20130101 01-JAN-2013 12:00:00 AM
20121231   20121231 31-DEC-2012 12:00:00 AM

2 rows selected.
rajesh@ORA10G>
rajesh@ORA10G> select * from table( dbms_xplan.display_cursor() );

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------
SQL_ID  61pahf9s175bc, child number 0
-------------------------------------
select * from t where num_date between 20121231 and 20130101
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |    55 (100)|          |
|*  1 |  TABLE ACCESS FULL| T    |   212 |  4876 |    55   (2)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("NUM_DATE"<=20130101 AND "NUM_DATE">=20121231))

18 rows selected.

rajesh@ORA10G> select * from t
  2  where date_date
  3  between to_date('20121231','yyyymmdd')
  4  and to_date('20130101','yyyymmdd');

STR_DATE   NUM_DATE DATE_DATE
-------- ---------- -----------------------
20121231   20121231 31-DEC-2012 12:00:00 AM
20130101   20130101 01-JAN-2013 12:00:00 AM

2 rows selected.
rajesh@ORA10G>
rajesh@ORA10G> select * from table( dbms_xplan.display_cursor() );

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------
SQL_ID  2140x47gjtgmk, child number 0
-------------------------------------
select * from t where date_date between to_date('20121231','yyyymmdd') and
to_date('20130101','yyyymmdd')
Plan hash value: 1136801923
----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |       |       |     3 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T        |     1 |    23 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IND_03 |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("DATE_DATE">=TO_DATE('2012-12-31 00:00:00', 'yyyy-mm-dd
              hh24:mi:ss') AND "DATE_DATE"<=TO_DATE('2013-01-01 00:00:00', 'yyyy-mm-dd
              hh24:mi:ss'))

So, what happened there? Well, the optimizer understands VARCHAR2 types and it understands DATE types.
The optimizer knows that between the two DATE items December 31, 2012, and January 1, 2013, there is only one day. The optimizer also thinks that between the two string items ‘20121231’ and ‘20130101’ there are a whole bunch of values.The cardinality is thrown off.


I have a very simple rule: Put dates in dates, numbers in numbers, and strings in strings.
Never use a datatype to store something other than what it was designed for, and use the most specific type possible. Furthermore, only compare dates to dates, strings to strings, and numbers to numbers. 

When dates and numbers are stored in strings, or stored using inappropriate lengths, your system suffers:
 

No comments:

Post a Comment