Not, entirely learnt newly today but revisited some good old stuffs. Its about how to translate a histogram into a report of underlying datas (sometimes to see how histograms matches real datas). Lets see how this to be handled on dates, numeric and Varchar2 columns.
rajesh@10GR2>
rajesh@10GR2> create table t
2 nologging
3 as
4 select object_id,ntile(10) over(order by object_id) as ntile
5 from all_objects;
Table created.
Elapsed: 00:00:01.90
rajesh@10GR2>
rajesh@10GR2> begin
2 dbms_stats.gather_table_stats(
3 ownname =>user,
4 tabname=>'T',
5 method_opt=>'for all columns size 10',
6 estimate_percent=>100
7 );
8 end;
9 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.64
rajesh@10GR2>
Now table 'T' will have only values from 1 to 10 in the column NTILE. so provide the values SIZE 10 in gathering histograms. Now looking at user_tab_histograms will represent below values.
rajesh@10GR2> select table_name, column_name, endpoint_number, endpoint_value
2 from user_tab_histograms
3 where table_name ='T'
4 and column_name ='NTILE'
5 order by column_name,endpoint_value;
TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
---------- ---------- --------------- -------------------------------------
T NTILE 5675 1
T NTILE 11349 2
T NTILE 17023 3
T NTILE 22697 4
T NTILE 28371 5
T NTILE 34045 6
T NTILE 39719 7
T NTILE 45393 8
T NTILE 51067 9
T NTILE 56741 10
10 rows selected.
Elapsed: 00:00:00.54
rajesh@10GR2>
The ENDPOINT_VALUE value in user_tab_histograms holds the list of values available in column NTILE. the ENDPOINT_NUMBER is cumulative frequency for appearance of each ENDPOINT_VALUE's.
rajesh@10GR2> select ntile,count(*)
2 from t
3 group by ntile;
NTILE COUNT(*)
---------- ----------
1 5675
6 5674
2 5674
4 5674
5 5674
8 5674
3 5674
7 5674
9 5674
10 5674
10 rows selected.
Elapsed: 00:00:00.28
rajesh@10GR2>
rajesh@10GR2>
This show that we have count of 5674 (approx) for each value of NTILE. lets see how we can match this using Histogram buckets (from below query).
rajesh@10GR2> select table_name, column_name, endpoint_number, endpoint_value,
2 endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_value) as actual_valu
3 from user_tab_histograms
4 where table_name ='T'
5 and column_name ='NTILE'
6 order by column_name,endpoint_value;
TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE ACTUAL_VALUES
---------- ---------- --------------- -------------- -------------
T NTILE 5675 1 5675
T NTILE 11349 2 5674
T NTILE 17023 3 5674
T NTILE 22697 4 5674
T NTILE 28371 5 5674
T NTILE 34045 6 5674
T NTILE 39719 7 5674
T NTILE 45393 8 5674
T NTILE 51067 9 5674
T NTILE 56741 10 5674
10 rows selected.
Elapsed: 00:00:00.04
rajesh@10GR2>
rajesh@10GR2>
This code is for Numeric columns, for Date columns we need to get day part and fraction of day, multiplies the fraction by the number of seconds in a day, converts both bits to character and then uses the ‘Julian’ and ‘seconds’ conversion format.
rajesh@10GR2> create table t
2 as
3 select sysdate as dt, sysdate - ntile(10) over(order by object_id) as ntile
4 from all_objects;
Table created.
Elapsed: 00:00:02.12
rajesh@10GR2>
rajesh@10GR2> begin
2 dbms_stats.gather_table_stats(
3 ownname =>user,
4 tabname=>'T',
5 method_opt=>'for all columns size 10',
6 estimate_percent=>100
7 );
8 end;
9 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.39
rajesh@10GR2>
rajesh@10GR2>
rajesh@10GR2> select ntile,count(*)
2 from t
3 group by ntile
4 order by 1;
NTILE COUNT(*)
--------- ----------
11-JAN-11 5674
12-JAN-11 5674
13-JAN-11 5674
14-JAN-11 5674
15-JAN-11 5674
16-JAN-11 5674
17-JAN-11 5674
18-JAN-11 5674
19-JAN-11 5674
20-JAN-11 5675
10 rows selected.
Elapsed: 00:00:00.07
rajesh@10GR2>
rajesh@10GR2>
Now, reconstructing and validating data's from Histogram buckets will be like below.
rajesh@10GR2> SELECT table_name,
2 column_name,
3 endpoint_number,
4 endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_value) AS COUNT,
5 to_date(floor(endpoint_value)
6 ||'.'
7 || TO_CHAR(86400 * MOD(endpoint_value, 1), 'FM999999999'),'J.sssss')endpoint_value
8 FROM user_tab_histograms
9 WHERE table_name ='T'
10 AND column_name ='NTILE'
11 ORDER BY column_name,endpoint_value;
TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER COUNT ENDPOINT_
---------- ---------- --------------- ---------- ---------
T NTILE 5674 5674 11-JAN-11
T NTILE 11348 5674 12-JAN-11
T NTILE 17022 5674 13-JAN-11
T NTILE 22696 5674 14-JAN-11
T NTILE 28370 5674 15-JAN-11
T NTILE 34044 5674 16-JAN-11
T NTILE 39718 5674 17-JAN-11
T NTILE 45392 5674 18-JAN-11
T NTILE 51066 5674 19-JAN-11
T NTILE 56741 5675 20-JAN-11
10 rows selected.
Elapsed: 00:00:00.07
rajesh@10GR2>
Now for character based columns it involves.
rajesh@10GR2> create table t
2 as
3 select empno, ename as OBJECT_NAME
4 from emp
5 where rownum <= 10;
Table created.
Elapsed: 00:00:00.39
rajesh@10GR2>
rajesh@10GR2>
rajesh@10GR2> begin
2 dbms_stats.gather_table_stats(
3 ownname =>user,
4 tabname=>'T',
5 method_opt=>'for all columns size 10',
6 estimate_percent=>100
7 );
8 end;
9 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.14
rajesh@10GR2>
rajesh@10GR2>
rajesh@10GR2> SELECT endpoint_number,
2 endpoint_number - NVL(prev_endpoint,0) frequency,
3 hex_values,
4 chr(to_number(SUBSTR(hex_values, 2,2),'XX'))
5 || chr(to_number(SUBSTR(hex_values, 4,2),'XX'))
6 || chr(to_number(SUBSTR(hex_values, 6,2),'XX'))
7 || chr(to_number(SUBSTR(hex_values, 8,2),'XX'))
8 || chr(to_number(SUBSTR(hex_values,10,2),'XX'))
9 || chr(to_number(SUBSTR(hex_values,12,2),'XX'))
10 || chr(to_number(SUBSTR(hex_values,14,2),'XX'))
11 || chr(to_number(SUBSTR(hex_values,16,2),'XX'))
12 || chr(to_number(SUBSTR(hex_values,18,2),'XX'))
13 || chr(to_number(SUBSTR(hex_values,20,2),'XX'))
14 || chr(to_number(SUBSTR(hex_values,22,2),'XX'))
15 || chr(to_number(SUBSTR(hex_values,24,2),'XX'))
16 as column_values
17 FROM
18 (SELECT endpoint_number,
19 lag(endpoint_number,1) over( order by endpoint_number ) prev_endpoint,
20 TO_CHAR(endpoint_value,'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')hex_values,
21 endpoint_actual_value
22 FROM user_tab_histograms
23 WHERE table_name = 'T'
24 AND column_name = 'OBJECT_NAME'
25 )
26 ORDER BY endpoint_number ;
ENDPOINT_NUMBER FREQUENCY HEX_VALUES COLUMN_VALUE
--------------- ---------- ------------------------------- ------------
1 1 414C4C454DFFEFD17E20CECE400000 ALLEM n-~ ++
2 1 424C414B44FFF5632009023BE00000 BLAKD )c ?;
3 1 434C41524B001321ED8AF356E00000 CLARK ?!fè=V
4 1 4A4F4E4552FFEC7EDB45D5C0800000 JONER 8~¦E++
5 1 4B494E4700001D9911EA0EFCE00000 KING ?Ö?O?n
6 1 4D415254494DFF74618D0FA4400000 MARTIM taì¤ñ
7 1 53434F54540016FED29C0B71200000 SCOTT ?¦-£?q
8 1 534D49544800059420ACC8DF600000 SMITH ?ö ¼+¯
9 1 5455524E455201F1D23C139C800000 TURNER?±-<?£
10 1 57415243FFFFF52B667FFCC9400000 WARC )+f¦n+
10 rows selected.
Elapsed: 00:00:00.75
rajesh@10GR2>
But due to some Hexadecimal value mismatch, I feel column COLUMN_VALUE from the above output provides some special characters along with actual data.
rajesh@10GR2>
rajesh@10GR2> create table t
2 nologging
3 as
4 select object_id,ntile(10) over(order by object_id) as ntile
5 from all_objects;
Table created.
Elapsed: 00:00:01.90
rajesh@10GR2>
rajesh@10GR2> begin
2 dbms_stats.gather_table_stats(
3 ownname =>user,
4 tabname=>'T',
5 method_opt=>'for all columns size 10',
6 estimate_percent=>100
7 );
8 end;
9 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.64
rajesh@10GR2>
Now table 'T' will have only values from 1 to 10 in the column NTILE. so provide the values SIZE 10 in gathering histograms. Now looking at user_tab_histograms will represent below values.
rajesh@10GR2> select table_name, column_name, endpoint_number, endpoint_value
2 from user_tab_histograms
3 where table_name ='T'
4 and column_name ='NTILE'
5 order by column_name,endpoint_value;
TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
---------- ---------- --------------- -------------------------------------
T NTILE 5675 1
T NTILE 11349 2
T NTILE 17023 3
T NTILE 22697 4
T NTILE 28371 5
T NTILE 34045 6
T NTILE 39719 7
T NTILE 45393 8
T NTILE 51067 9
T NTILE 56741 10
10 rows selected.
Elapsed: 00:00:00.54
rajesh@10GR2>
The ENDPOINT_VALUE value in user_tab_histograms holds the list of values available in column NTILE. the ENDPOINT_NUMBER is cumulative frequency for appearance of each ENDPOINT_VALUE's.
rajesh@10GR2> select ntile,count(*)
2 from t
3 group by ntile;
NTILE COUNT(*)
---------- ----------
1 5675
6 5674
2 5674
4 5674
5 5674
8 5674
3 5674
7 5674
9 5674
10 5674
10 rows selected.
Elapsed: 00:00:00.28
rajesh@10GR2>
rajesh@10GR2>
This show that we have count of 5674 (approx) for each value of NTILE. lets see how we can match this using Histogram buckets (from below query).
rajesh@10GR2> select table_name, column_name, endpoint_number, endpoint_value,
2 endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_value) as actual_valu
3 from user_tab_histograms
4 where table_name ='T'
5 and column_name ='NTILE'
6 order by column_name,endpoint_value;
TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE ACTUAL_VALUES
---------- ---------- --------------- -------------- -------------
T NTILE 5675 1 5675
T NTILE 11349 2 5674
T NTILE 17023 3 5674
T NTILE 22697 4 5674
T NTILE 28371 5 5674
T NTILE 34045 6 5674
T NTILE 39719 7 5674
T NTILE 45393 8 5674
T NTILE 51067 9 5674
T NTILE 56741 10 5674
10 rows selected.
Elapsed: 00:00:00.04
rajesh@10GR2>
rajesh@10GR2>
This code is for Numeric columns, for Date columns we need to get day part and fraction of day, multiplies the fraction by the number of seconds in a day, converts both bits to character and then uses the ‘Julian’ and ‘seconds’ conversion format.
rajesh@10GR2> create table t
2 as
3 select sysdate as dt, sysdate - ntile(10) over(order by object_id) as ntile
4 from all_objects;
Table created.
Elapsed: 00:00:02.12
rajesh@10GR2>
rajesh@10GR2> begin
2 dbms_stats.gather_table_stats(
3 ownname =>user,
4 tabname=>'T',
5 method_opt=>'for all columns size 10',
6 estimate_percent=>100
7 );
8 end;
9 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.39
rajesh@10GR2>
rajesh@10GR2>
rajesh@10GR2> select ntile,count(*)
2 from t
3 group by ntile
4 order by 1;
NTILE COUNT(*)
--------- ----------
11-JAN-11 5674
12-JAN-11 5674
13-JAN-11 5674
14-JAN-11 5674
15-JAN-11 5674
16-JAN-11 5674
17-JAN-11 5674
18-JAN-11 5674
19-JAN-11 5674
20-JAN-11 5675
10 rows selected.
Elapsed: 00:00:00.07
rajesh@10GR2>
rajesh@10GR2>
Now, reconstructing and validating data's from Histogram buckets will be like below.
rajesh@10GR2> SELECT table_name,
2 column_name,
3 endpoint_number,
4 endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_value) AS COUNT,
5 to_date(floor(endpoint_value)
6 ||'.'
7 || TO_CHAR(86400 * MOD(endpoint_value, 1), 'FM999999999'),'J.sssss')endpoint_value
8 FROM user_tab_histograms
9 WHERE table_name ='T'
10 AND column_name ='NTILE'
11 ORDER BY column_name,endpoint_value;
TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER COUNT ENDPOINT_
---------- ---------- --------------- ---------- ---------
T NTILE 5674 5674 11-JAN-11
T NTILE 11348 5674 12-JAN-11
T NTILE 17022 5674 13-JAN-11
T NTILE 22696 5674 14-JAN-11
T NTILE 28370 5674 15-JAN-11
T NTILE 34044 5674 16-JAN-11
T NTILE 39718 5674 17-JAN-11
T NTILE 45392 5674 18-JAN-11
T NTILE 51066 5674 19-JAN-11
T NTILE 56741 5675 20-JAN-11
10 rows selected.
Elapsed: 00:00:00.07
rajesh@10GR2>
Now for character based columns it involves.
- Take the first six bytes of the string
- View this as a hexadecimal number, and convert to decimal
- Round to 15 significant digits and store as the endpoint_value
- If duplicate rows appear, store the first 32 bytes of each string as the endpoint_actual_value
rajesh@10GR2> create table t
2 as
3 select empno, ename as OBJECT_NAME
4 from emp
5 where rownum <= 10;
Table created.
Elapsed: 00:00:00.39
rajesh@10GR2>
rajesh@10GR2>
rajesh@10GR2> begin
2 dbms_stats.gather_table_stats(
3 ownname =>user,
4 tabname=>'T',
5 method_opt=>'for all columns size 10',
6 estimate_percent=>100
7 );
8 end;
9 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.14
rajesh@10GR2>
rajesh@10GR2>
rajesh@10GR2> SELECT endpoint_number,
2 endpoint_number - NVL(prev_endpoint,0) frequency,
3 hex_values,
4 chr(to_number(SUBSTR(hex_values, 2,2),'XX'))
5 || chr(to_number(SUBSTR(hex_values, 4,2),'XX'))
6 || chr(to_number(SUBSTR(hex_values, 6,2),'XX'))
7 || chr(to_number(SUBSTR(hex_values, 8,2),'XX'))
8 || chr(to_number(SUBSTR(hex_values,10,2),'XX'))
9 || chr(to_number(SUBSTR(hex_values,12,2),'XX'))
10 || chr(to_number(SUBSTR(hex_values,14,2),'XX'))
11 || chr(to_number(SUBSTR(hex_values,16,2),'XX'))
12 || chr(to_number(SUBSTR(hex_values,18,2),'XX'))
13 || chr(to_number(SUBSTR(hex_values,20,2),'XX'))
14 || chr(to_number(SUBSTR(hex_values,22,2),'XX'))
15 || chr(to_number(SUBSTR(hex_values,24,2),'XX'))
16 as column_values
17 FROM
18 (SELECT endpoint_number,
19 lag(endpoint_number,1) over( order by endpoint_number ) prev_endpoint,
20 TO_CHAR(endpoint_value,'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')hex_values,
21 endpoint_actual_value
22 FROM user_tab_histograms
23 WHERE table_name = 'T'
24 AND column_name = 'OBJECT_NAME'
25 )
26 ORDER BY endpoint_number ;
ENDPOINT_NUMBER FREQUENCY HEX_VALUES COLUMN_VALUE
--------------- ---------- ------------------------------- ------------
1 1 414C4C454DFFEFD17E20CECE400000 ALLEM n-~ ++
2 1 424C414B44FFF5632009023BE00000 BLAKD )c ?;
3 1 434C41524B001321ED8AF356E00000 CLARK ?!fè=V
4 1 4A4F4E4552FFEC7EDB45D5C0800000 JONER 8~¦E++
5 1 4B494E4700001D9911EA0EFCE00000 KING ?Ö?O?n
6 1 4D415254494DFF74618D0FA4400000 MARTIM taì¤ñ
7 1 53434F54540016FED29C0B71200000 SCOTT ?¦-£?q
8 1 534D49544800059420ACC8DF600000 SMITH ?ö ¼+¯
9 1 5455524E455201F1D23C139C800000 TURNER?±-<?£
10 1 57415243FFFFF52B667FFCC9400000 WARC )+f¦n+
10 rows selected.
Elapsed: 00:00:00.75
rajesh@10GR2>
But due to some Hexadecimal value mismatch, I feel column COLUMN_VALUE from the above output provides some special characters along with actual data.
No comments:
Post a Comment