Recently, I encountered some frustration in recalling the correct syntax for the method_opt parameter in DBMS_STATS. This clause is primarily used when creating or deleting histograms.
A histogram is a special type of column statistic that provides more detailed information about the data distribution in a table column. A histogram sorts values into “buckets,” as you might sort coins into buckets.
By default the optimiser assumes a uniform distribution of rows across the distinct values in a column.
For columns that contain data skew (a nonuniform distribution of data within the column), a histogram enables the optimiser to generate accurate cardinality estimates for filter and join predicates that involve these columns.
Show me an example!
My objective was to ensure that only a single column in my example table retained a histogram. Specifically, I wanted to:
- Delete all histograms except for 1 column
- Set the histogram to be created using the AUTO clause
Here’s an example how to use the method_opt parameter correctly.
Creating the table
Consider the following table:
-- with a nod to the legend that is Jonathan Lewis
-- https://jonathanlewis.wordpress.com/2010/05/18/double-trouble/
create table t1
as
with generator as (
select --+ materialize
rownum id
from dual
connect by
rownum <= 10000 --> comment to avoid wordpress format issue
)
select
rownum id,
case when mod(rownum, 2) = 0
then 'A'
else dbms_random.string('l', 1)
end data_skew,
mod(rownum,100) mod1,
trunc(dbms_random.value(0,10000)) random1,
lpad(rownum,10,'0') small_vc,
rpad('x',60) padding
from
generator v1,
generator v2
where
rownum <= 100000; --> comment to avoid wordpress format issue
alter table t1 add constraint
pk_t1 primary key(id);
create index i_skewed_column
on t1 (data_skew);
Data is non-uniformly distributed
The above table DDL creates some wonderful data skew in column data_skew. Histograms are well suited to help the optimiser identify this kind of data distribution. See for yourself:
SQL> select
2 count(*) count,
3 data_skew,
4 ratio_to_report(count(*)) over () * 100 percent_of_total
5 from
6 t1
7 group by
8 all
9 order by
10 count desc
11* fetch first 10 rows only;
COUNT DATA_SKEW PERCENT_OF_TOTAL
________ ____________ ___________________
50000 A 50
2003 h 2,003
2001 c 2,001
1994 r 1,994
1984 a 1,984
1975 x 1,975
1965 b 1,965
1962 q 1,962
1958 y 1,958
1951 g 1,951
10 rows selected.
Note how 50% of all rows in column data_skew feature the value A, the rest is made up of randomly distributed lower case letters.
Creating the histogram
Now let’s create the histogram on data_skew only:
begin
dbms_stats.gather_table_stats(
ownname => user,
tabname => 'T1',
method_opt => 'for all columns size 1, for columns data_skew size AUTO',
cascade => true
);
end;
/
It’s admittedly simple, if you remember the syntax. And it does exactly what I had in mind:
SQL> select
2 column_name,
3 notes,
4 histogram
5 from
6 user_tab_col_statistics
7 where
8* table_name = 'T1';
Resulting in ….
COLUMN_NAME NOTES HISTOGRAM
______________ ________ ____________
ID NONE
DATA_SKEW FREQUENCY
MOD1 NONE
RANDOM1 NONE
SMALL_VC NONE
PADDING NONE
6 rows selected.
That’s it!
Summary
The optimiser is pretty good these days working out which columns require a histogram, and which ones don’t. You can read all there is to know about histograms in the SQL Tuning Guide.
Equipped with better knowledge about the column’s data distribution, the optimiser can make more informed decisions. There is so much to say about that topic that I’d like to dedicate a blog post to the matter. More on that later!