Multiple “for” clauses in dbms_stats’s method_opt

Recently, I encountered some frustration in recalling the correct syntax for the method_opt parameter in DBMS_STATS. This clause is primarily used when creating or deleting histograms.

A histogram is a special type of column statistic that provides more detailed information about the data distribution in a table column. A histogram sorts values into “buckets,” as you might sort coins into buckets.

By default the optimiser assumes a uniform distribution of rows across the distinct values in a column.

For columns that contain data skew (a nonuniform distribution of data within the column), a histogram enables the optimiser to generate accurate cardinality estimates for filter and join predicates that involve these columns.

Show me an example!

My objective was to ensure that only a single column in my example table retained a histogram. Specifically, I wanted to:

Delete all histograms except for 1 column
Set the histogram to be created using the AUTO clause

Here’s an example how to use the method_opt parameter correctly.

Creating the table

Consider the following table:

-- with a nod to the legend that is Jonathan Lewis
-- https://jonathanlewis.wordpress.com/2010/05/18/double-trouble/
create table t1
as
with generator as (
        select  --+ materialize
                rownum id
        from    dual
        connect by
                rownum <= 10000    --> comment to avoid wordpress format issue
)
select
        rownum                                  id,
        case when mod(rownum, 2) = 0
          then 'A'
          else dbms_random.string('l', 1)
        end                                     data_skew,
        mod(rownum,100)                         mod1,
        trunc(dbms_random.value(0,10000))       random1,
        lpad(rownum,10,'0')                     small_vc,
        rpad('x',60)                            padding
from
        generator       v1,
        generator       v2
where
        rownum <= 100000;    --> comment to avoid wordpress format issue

alter table t1 add constraint
pk_t1 primary key(id);

create index i_skewed_column 
on t1 (data_skew);

Data is non-uniformly distributed

The above table DDL creates some wonderful data skew in column data_skew. Histograms are well suited to help the optimiser identify this kind of data distribution. See for yourself:

SQL> select
  2          count(*) count,
  3          data_skew,
  4          ratio_to_report(count(*)) over () * 100 percent_of_total
  5  from 
  6          t1 
  7  group by 
  8          all
  9  order by
 10          count desc
 11* fetch first 10 rows only;

   COUNT DATA_SKEW       PERCENT_OF_TOTAL 
________ ____________ ___________________ 
   50000 A                             50 
    2003 h                          2,003 
    2001 c                          2,001 
    1994 r                          1,994 
    1984 a                          1,984 
    1975 x                          1,975 
    1965 b                          1,965 
    1962 q                          1,962 
    1958 y                          1,958 
    1951 g                          1,951 

10 rows selected.

Note how 50% of all rows in column data_skew feature the value A, the rest is made up of randomly distributed lower case letters.

Creating the histogram

Now let’s create the histogram on data_skew only:

begin
        dbms_stats.gather_table_stats(
                ownname => user,
                tabname => 'T1',
                method_opt => 'for all columns size 1, for columns data_skew size AUTO',
                cascade => true
        );
end;
/

It’s admittedly simple, if you remember the syntax. And it does exactly what I had in mind:

SQL> select
  2          column_name,
  3          notes,
  4          histogram 
  5   from
  6          user_tab_col_statistics 
  7   where
  8*         table_name = 'T1';

Resulting in ….

COLUMN_NAME    NOTES    HISTOGRAM    
______________ ________ ____________ 
ID                      NONE         
DATA_SKEW               FREQUENCY    
MOD1                    NONE         
RANDOM1                 NONE         
SMALL_VC                NONE         
PADDING                 NONE         

6 rows selected.

That’s it!

Summary

The optimiser is pretty good these days working out which columns require a histogram, and which ones don’t. You can read all there is to know about histograms in the SQL Tuning Guide.

Equipped with better knowledge about the column’s data distribution, the optimiser can make more informed decisions. There is so much to say about that topic that I’d like to dedicate a blog post to the matter. More on that later!

histogram, tuning