Understanding enhancements to block cleanouts in Exadata part 1

Travel time is writing time and I have the perfect setting for a techie post. Actually I got quite excited about the subject causing the article to get a bit longer than initially anticipated. In this part you can read about block cleanouts when using buffered I/O. The next part will show how this works using direct path reads and Smart Scans.

The article ultimately aims at describing the enhancements Exadata brings to the table for direct path reads and delayed block cleanouts. Delayed block cleanouts are described in Jonathan Lewis’s “Oracle Core”, and in one of his blog posts, so here’s just a summary.

The delayed block cleanout

In a nutshell Oracle’s database writer (the process persisting your transactions to disk) is free to write blocks to disk in batches when it has to be done. A commit on its own won’t trigger a write of dirty (or modified) block to disk. If it were to trigger the write, the commit-time would be proportional to the number of blocks affected by the last transaction. The commit command however completes very quickly regardless of how much data has been affected. Unlike a rollback … It is entirely possible that a block modified by a transaction is written to disk before the transaction has been completed. A little later, once the commit has been acknowledged by Oracle there is no process that would read the blocks back into the buffer cache and clear them out-this happens later. It would also be quite inefficient to do so.

Defining “clean”

Now when database writer writes to disk it is possible that the block just written has an active transaction recorded in its header. You can see this by dumping a block with an active transaction – the ITL in the header will reference the XID, UBA and the number of rows affected plus other information. The individual rows that are part of the transaction have their lock byte set as you can see in the row directory. The number in the lb field refers back to an ITL you see further up in the block dump (don’t worry I’ll show you an example shortly).

What happened in Oracle before direct path reads became a lot more common is this:

A user starts a transaction, for example by updating a large portion of the table
In the meantime the database writer flushes some blocks to disk, including some of the ones affected by the transaction.
The user commits a transaction
The next user queries the block (let’s assume it is not in the buffer cache)
The second session’s foreground process reads the block
- Realises that it has an active transaction recorded in it
- Checks if that transaction is still active
- And clears the block out if not
The block with a valid CR image is now in the buffer cache

Huh?

Sounds too abstract-I agree? Let’s have an example.

SQL> select table_name,num_rows,compression,partitioned from tabs
  2  where table_name = 'T1_100k';

TABLE_NAME                       NUM_ROWS COMPRESS PAR
------------------------------ ---------- -------- ---
T1_100K                            500000 DISABLED NO

SQL> update T1_100k set state = 'MODIFIED';

5000000 rows updated.

An easy way to simulate the flushing of blocks to disk is a slightly brute-force approach of flushing the buffer cache. Only recommended in the lab, really. Let’s have a look at what the block looks like:

SQL> select dbms_rowid.rowid_block_number(rowid), dbms_rowid.rowid_relative_fno(rowid)
  2  from t1_100k where rownum < 11;

DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID)
------------------------------------ ------------------------------------
                             3940811                                    5
                             3940811                                    5
                             3940811                                    5
                             3940811                                    5
                             3940811                                    5
                             3940811                                    5
                             3940812                                    5
                             3940812                                    5
                             3940812                                    5
                             3940812                                    5

10 rows selected.

Elapsed: 00:00:00.01
SQL> alter system dump datafile 5 block 3940811;

System altered.

Elapsed: 00:00:00.01
SQL> select value from v$diag_info where name like 'Default%';

VALUE
---------------------------------------------------------------------------
/u01/app/oracle/diag/rdbms/dbm01/dbm011/trace/dbm011_ora_94351.trc

Elapsed: 00:00:00.02

The block dump for block 3940811 on datafile 5 (users tablespace) is now in that trace file:

Block header dump:  0x017c21cb
 Object id on Block? Y
 seg/obj: 0x12a13  csc: 0x00.2191ff7  itc: 3  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x17c21c8 ver: 0x01 opc: 0
     inc: 0  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0xffff.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.02191d10
0x02   0x0009.005.00002575  0x0003688a.0ef1.1f  ----    6  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
bdba: 0x017c21cb
data_block_dump,data header at 0x7f026361707c
===============
tsiz: 0x1f80
hsiz: 0x1e
pbl: 0x7f026361707c
     76543210
flag=--------
ntab=1
nrow=6
...
0xe:pti[0]      nrow=6  offs=0
0x12:pri[0]     offs=0x1b75
0x14:pri[1]     offs=0x176a
0x16:pri[2]     offs=0x135f
0x18:pri[3]     offs=0xf54
0x1a:pri[4]     offs=0xb49
0x1c:pri[5]     offs=0x73e
block_row_dump:
tab 0, row 0, @0x1b75
tl: 1035 fb: --H-FL-- lb: 0x2  cc: 6
col  0: [ 2]  c1 02
col  1: [999]
 31 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
...

The significant pieces of information here are in the ITL (interested transaction list) and the lock byte set, pointing to the second ITL entry. The absence of flags for ITL #2 indicates the transaction is ongoing (you can see this in v$transaction-this is a lab environment, in real databases you’d see more than 1 active transaction):

SQL> select xidusn, xidslot, xidsqn, status, start_scn from v$transaction;

    XIDUSN    XIDSLOT     XIDSQN STATUS            START_SCN
---------- ---------- ---------- ---------------- ----------
         9          5       9589 ACTIVE             35200741

This is indeed the transaction that is referenced in v$transaction and the ITL 0x02 above (9589 is decimal for 2575 in hexadecimal notation, 9 and 5 don’t need converting to decimal numbers). If you are curious now about some more internals then you might find this post by Arup Nanda interesting.

Now what happens if another session queries the table? Since the query in session 1 has not yet committed the changes in it must not be seen by any other session (that would be a dirty read otherwise. We do not use dirty reads in Oracle RDBMS). So to give the user a consistent view of the data the block must be rolled back. Before the test I made sure there wasn’t a block in the buffer cache for the object in question:

SQL> select count(*), inst_id, status from gv$bh where objd =
  2   (select data_object_id from dba_objects where object_name = 'T1_100K')
  3  group by inst_id, status;

  COUNT(*)    INST_ID STATUS
---------- ---------- ----------
    524327          1 free

Let’s test:

SQL> @mystats start

SQL> select /* test004 */ count(*) from t1_100k;

  COUNT(*)
----------
    500000

Elapsed: 00:00:13.96

@mystats stop t=1

==========================================================================================
MyStats report : 22-FEB-2015 08:42:09
==========================================================================================


------------------------------------------------------------------------------------------
1. Summary Timings
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
TIMER   snapshot interval (seconds)                                                  35.57
TIMER   CPU time used (seconds)                                                       3.21


------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    CPU used by this session                                                       327
STAT    CPU used when call started                                                     327
STAT    CR blocks created                                                           83,334
STAT    DB time                                                                      1,401
STAT    Requests to/from client                                                         16
STAT    SQL*Net roundtrips to/from client                                               16
STAT    active txn count during cleanout                                            83,334
STAT    buffer is not pinned count                                                       2
STAT    bytes received via SQL*Net from client                                      13,279
STAT    bytes sent via SQL*Net to client                                             5,106
STAT    calls to get snapshot scn: kcmgss                                              758
STAT    calls to kcmgas                                                             83,332
STAT    calls to kcmgcs                                                                 30
STAT    cell flash cache read hits                                                   8,045
STAT    cell physical IO interconnect bytes                                    748,576,768
STAT    change write time                                                               48
STAT    cleanout - number of ktugct calls                                           83,334
STAT    cleanouts and rollbacks - consistent read gets                              83,334
STAT    consistent changes                                                         591,717
STAT    consistent gets                                                            757,980
STAT    consistent gets examination                                                674,254
STAT    consistent gets examination (fastpath)                                      83,337
STAT    consistent gets from cache                                                 757,980
STAT    consistent gets pin                                                         83,726
STAT    consistent gets pin (fastpath)                                              83,055
STAT    data blocks consistent reads - undo records applied                        590,917
STAT    db block changes                                                            84,134
STAT    db block gets                                                                2,301
STAT    db block gets from cache                                                     2,301
STAT    enqueue releases                                                                 4
STAT    enqueue requests                                                                 4
STAT    execute count                                                                   13
STAT    file io wait time                                                       11,403,214
STAT    free buffer requested                                                      174,766
STAT    gc local grants                                                            174,713
STAT    global enqueue gets sync                                                        43
STAT    global enqueue releases                                                         41
STAT    heap block compress                                                          4,549
STAT    hot buffers moved to head of LRU                                                 4
STAT    immediate (CR) block cleanout applications                                  83,334
STAT    index fetch by key                                                               1
STAT    lob writes                                                                     375
STAT    lob writes unaligned                                                           375
STAT    logical read bytes from cache                                        6,228,221,952
STAT    messages sent                                                                    9
STAT    non-idle wait count                                                         17,446
STAT    non-idle wait time                                                           1,141
STAT    opened cursors cumulative                                                       13
STAT    parse count (hard)                                                               1
STAT    parse count (total)                                                             13
STAT    physical read IO requests                                                    8,715
STAT    physical read bytes                                                    748,576,768
STAT    physical read requests optimized                                             8,045
STAT    physical read total IO requests                                              8,715
STAT    physical read total bytes                                              748,576,768
STAT    physical read total bytes optimized                                     65,904,640
STAT    physical read total multi block requests                                       654
STAT    physical reads                                                              91,379
STAT    physical reads cache                                                        91,379
STAT    physical reads cache prefetch                                               82,664
STAT    recursive calls                                                              1,985
STAT    recursive cpu usage                                                              6
STAT    redo entries                                                                84,005
STAT    redo entries for lost write detection                                          671
STAT    redo size                                                                9,016,740
STAT    redo size for lost write detection                                       3,016,296
STAT    redo subscn max counts                                                      83,334
STAT    rows fetched via callback                                                        1
STAT    session cursor cache count                                                       4
STAT    session cursor cache hits                                                        9
STAT    session logical reads                                                      760,281
STAT    session pga memory                                                         -65,536
STAT    session uga memory                                                          65,488
STAT    session uga memory max                                                     148,312
STAT    shared hash latch upgrades - no wait                                         8,218
STAT    table fetch by rowid                                                             1
STAT    table scan blocks gotten                                                    83,334
STAT    table scan disk non-IMC rows gotten                                        500,000
STAT    table scan rows gotten                                                     500,000
STAT    table scans (short tables)                                                       1
STAT    temp space allocated (bytes)                                             2,097,152
STAT    user I/O wait time                                                           1,141
STAT    user calls                                                                      22
STAT    workarea executions - optimal                                                    1
STAT    workarea memory allocated                                                      -43


------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

I am using Adrian Billington’s mystats script here which I can only recommend to you-it’s very decent (and that’s an understatement). In the example above it took two snapshots and calculated the change of the session counters in v$mystat during the query execution. Have a look at the cleanout% statistics here! These are the block cleanouts. As this generates redo you can see that recorded here too. This is a tricky interview question: can a select generate redo? Yes sure can! There is also a fair amount of physical I/O going on. After all the buffer cache was empty.

The second execution of the query completes a lot faster due to the absence of physical I/O. Nevertheless, the session has to clean out all these blocks and perform consistent read processing again (remember the transaction is still uncommitted as seen in active txn%):

SQL> select count(*) from t1_100k;

  COUNT(*)
----------
    500000

Elapsed: 00:00:01.76

SQL> @mystats stop t=1

==========================================================================================
MyStats report : 22-FEB-2015 08:42:59
==========================================================================================


------------------------------------------------------------------------------------------
1. Summary Timings
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
TIMER   snapshot interval (seconds)                                                   8.55
TIMER   CPU time used (seconds)                                                       1.75


------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    CPU used by this session                                                       181
STAT    CPU used when call started                                                     181
STAT    CR blocks created                                                           83,334
STAT    DB time                                                                        183
STAT    Requests to/from client                                                         16
STAT    SQL*Net roundtrips to/from client                                               16
STAT    active txn count during cleanout                                            83,334
STAT    buffer is not pinned count                                                       2
STAT    bytes received via SQL*Net from client                                      13,280
STAT    bytes sent via SQL*Net to client                                             5,177
STAT    calls to get snapshot scn: kcmgss                                              757
STAT    calls to kcmgas                                                             83,331
STAT    calls to kcmgcs                                                                 30
STAT    change write time                                                               28
STAT    cleanout - number of ktugct calls                                           83,334
STAT    cleanouts and rollbacks - consistent read gets                              83,334
STAT    consistent changes                                                         591,708
STAT    consistent gets                                                            757,980
STAT    consistent gets examination                                                674,254
STAT    consistent gets examination (fastpath)                                      83,337
STAT    consistent gets from cache                                                 757,980
STAT    consistent gets pin                                                         83,726
STAT    consistent gets pin (fastpath)                                              83,726
STAT    cursor authentications                                                           1
STAT    data blocks consistent reads - undo records applied                        590,917
STAT    db block changes                                                            84,125
STAT    db block gets                                                                2,895
STAT    db block gets from cache                                                     2,895
STAT    execute count                                                                   13
STAT    free buffer requested                                                       83,382
STAT    global enqueue gets sync                                                        38
STAT    global enqueue releases                                                         36
STAT    heap block compress                                                          4,549
STAT    immediate (CR) block cleanout applications                                  83,334
STAT    index fetch by key                                                               1
STAT    lob writes                                                                     375
STAT    lob writes unaligned                                                           375
STAT    logical read bytes from cache                                        6,233,088,000
STAT    messages sent                                                                    7
STAT    non-idle wait count                                                             16
STAT    opened cursors cumulative                                                       13
STAT    parse count (total)                                                             13
STAT    recursive calls                                                              1,982
STAT    recursive cpu usage                                                              6
STAT    redo entries                                                                83,334
STAT    redo size                                                                6,000,400
STAT    rows fetched via callback                                                        1
STAT    session cursor cache hits                                                       12
STAT    session logical reads                                                      760,875
STAT    session pga memory                                                      -1,769,472
STAT    session uga memory                                                      -1,858,472
STAT    shared hash latch upgrades - no wait                                           524
STAT    table fetch by rowid                                                             1
STAT    table scan blocks gotten                                                    83,334
STAT    table scan disk non-IMC rows gotten                                        500,000
STAT    table scan rows gotten                                                     500,000
STAT    table scans (short tables)                                                       1
STAT    user calls                                                                      22
STAT    workarea executions - optimal                                                    1
STAT    workarea memory allocated                                                      -83


------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

I should have mentioned that these 2 examples were executed on an X2-2 quarter rack. To prevent Exadata from applying some clever optimisations I specifically disabled direct path reads and turned off Exadata features. Here is the view from the buffer cache again:

SQL> select count(*), inst_id, status from gv$bh where objd =
  2   (select data_object_id from dba_objects where object_name = 'T1_100K')
  3  group by inst_id, status;

  COUNT(*)    INST_ID STATUS
---------- ---------- ----------
    351704          1 free
         1          1 scur
     83334          1 xcur
     83334          1 cr

4 rows selected.

For each subsequent execution of the select statement consistent read processing has to be conducted. The number of CR blocks in the buffer cache will increase. For the third execution I can see the following:

SQL> r
  1  select count(*), inst_id, status from gv$bh where objd =
  2   (select data_object_id from dba_objects where object_name = 'T1_100K')
  3* group by inst_id, status

  COUNT(*)    INST_ID STATUS
---------- ---------- ----------
    267526          1 free
         1          1 scur
     83334          1 xcur
    166668          1 cr

4 rows selected.

Things improve when the transaction commits.

The impact of a commit

What about a commit now? Committing in session 1 and running the query again in session 2 shows a very good response time. Please remember that I forced buffered reads. The next part deals with this case when using direct path reads instead (those are not buffered). The segment size would have qualified for direct path reads otherwise.

SQL> select /* test004 */ count(*) from t1_100k;

  COUNT(*)
----------
    500000

Elapsed: 00:00:00.18

==========================================================================================
MyStats report : 22-FEB-2015 09:40:08
==========================================================================================


------------------------------------------------------------------------------------------
1. Summary Timings
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
TIMER   snapshot interval (seconds)                                                  19.73
TIMER   CPU time used (seconds)                                                       0.18


------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    CPU used by this session                                                        25
STAT    CPU used when call started                                                      25
STAT    DB time                                                                         25
STAT    Heatmap SegLevel - Full Table Scan                                               1
STAT    Requests to/from client                                                         16
STAT    SQL*Net roundtrips to/from client                                               16
STAT    buffer is not pinned count                                                       2
STAT    bytes received via SQL*Net from client                                      13,280
STAT    bytes sent via SQL*Net to client                                             5,107
STAT    calls to get snapshot scn: kcmgss                                              757
STAT    calls to kcmgcs                                                                 30
STAT    consistent changes                                                             800
STAT    consistent gets                                                             83,729
STAT    consistent gets examination                                                      3
STAT    consistent gets examination (fastpath)                                           3
STAT    consistent gets from cache                                                  83,729
STAT    consistent gets pin                                                         83,726
STAT    consistent gets pin (fastpath)                                              83,726
STAT    db block changes                                                               800
STAT    db block gets                                                                2,301
STAT    db block gets from cache                                                     2,301
STAT    enqueue releases                                                                 3
STAT    enqueue requests                                                                 3
STAT    execute count                                                                   13
STAT    free buffer requested                                                           53
STAT    global enqueue gets sync                                                        39
STAT    global enqueue releases                                                         37
STAT    index fetch by key                                                               1
STAT    lob writes                                                                     375
STAT    lob writes unaligned                                                           375
STAT    logical read bytes from cache                                          704,757,760
STAT    no work - consistent read gets                                              83,334
STAT    non-idle wait count                                                             16
STAT    opened cursors cumulative                                                       13
STAT    parse count (total)                                                             13
STAT    recursive calls                                                              1,984
STAT    recursive cpu usage                                                              7
STAT    rows fetched via callback                                                        1
STAT    session cursor cache count                                                       4
STAT    session cursor cache hits                                                       10
STAT    session logical reads                                                       86,030
STAT    session pga memory                                                          65,536
STAT    session pga memory max                                                      65,536
STAT    session uga memory                                                          65,488
STAT    session uga memory max                                                     148,312
STAT    shared hash latch upgrades - no wait                                           525
STAT    table fetch by rowid                                                             1
STAT    table scan blocks gotten                                                    83,334
STAT    table scan disk non-IMC rows gotten                                        500,000
STAT    table scan rows gotten                                                     500,000
STAT    table scans (short tables)                                                       1
STAT    temp space allocated (bytes)                                             2,097,152
STAT    user calls                                                                      22
STAT    workarea executions - optimal                                                    1
STAT    workarea memory allocated                                                      -40


------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

Access to blocks in the buffer cache is quick! There was no need to clean out any blocks anymore. The fastest way to do something is indeed not having to do it at all. Mind you, the block on disk still has the active transaction recorded!

SQL> alter system dump datafile 5 block 3940811;

System altered.

Elapsed: 00:00:00.01
SQL> select value from v$diag_info where name like 'Default%';

VALUE
------------------------------------------------------------------------------------------------------------------------
/u01/app/oracle/diag/rdbms/dbm01/dbm011/trace/dbm011_ora_100297.trc

SQL> exit

$ vi /u01/app/oracle/diag/rdbms/dbm01/dbm011/trace/dbm011_ora_100297.trc

Block header dump:  0x017c21cb
 Object id on Block? Y
 seg/obj: 0x12a13  csc: 0x00.2317620  itc: 3  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x17c21c8 ver: 0x01 opc: 0
     inc: 0  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0xffff.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.02191d10
0x02   0x0009.005.00002575  0x0003688a.0ef1.1f  --U-    6  fsc 0x0000.0232cfca
0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
bdba: 0x017c21cb
data_block_dump,data header at 0x7fc9f3cb407c
...

$ grep lb -B1 /u01/app/oracle/diag/rdbms/dbm01/dbm011/trace/dbm011_ora_100297.trc
tab 0, row 0, @0x1b75
tl: 1035 fb: --H-FL-- lb: 0x2  cc: 6
--
tab 0, row 1, @0x176a
tl: 1035 fb: --H-FL-- lb: 0x2  cc: 6
--
tab 0, row 2, @0x135f
tl: 1035 fb: --H-FL-- lb: 0x2  cc: 6
--
tab 0, row 3, @0xf54
tl: 1035 fb: --H-FL-- lb: 0x2  cc: 6
--
tab 0, row 4, @0xb49
tl: 1035 fb: --H-FL-- lb: 0x2  cc: 6
--
tab 0, row 5, @0x73e
tl: 1035 fb: --H-FL-- lb: 0x2  cc: 6
--

The transaction recorded in the ITL slot 2 is gone though:

SQL> select xidusn, xidslot, xidsqn, status, start_scn from v$transaction
  2  where xidusn = 9 and xidslot = 5 and xidsqn = 9589;

no rows selected

Did you notice the Flag in the second ITL entry? That wasn’t there before. It indicates a fast commit according to Core Oracle by Jonathan Lewis.

Summary

Block cleanouts are simple with buffered I/O. Dirty blocks can be rolled back and kept in the buffer cache and can be accessed without having to undergo block cleanout.

Response

Understanding enhancements to block cleanouts in Exadata part 2 « Martins Blog

February 25, 2015

[…] « Understanding enhancements to block cleanouts in Exadata part 1 […]