Travel time is writing time and I have the perfect setting for a techie post. Actually I got quite excited about the subject causing the article to get a bit longer than initially anticipated. In this part you can read about block cleanouts when using buffered I/O. The next part will show how this works using direct path reads and Smart Scans.
The article ultimately aims at describing the enhancements Exadata brings to the table for direct path reads and delayed block cleanouts. Delayed block cleanouts are described in Jonathan Lewis’s “Oracle Core”, and in one of his blog posts, so here’s just a summary.
The delayed block cleanout
In a nutshell Oracle’s database writer (the process persisting your transactions to disk) is free to write blocks to disk in batches when it has to be done. A commit on its own won’t trigger a write of dirty (or modified) block to disk. If it were to trigger the write, the commit-time would be proportional to the number of blocks affected by the last transaction. The commit command however completes very quickly regardless of how much data has been affected. Unlike a rollback … It is entirely possible that a block modified by a transaction is written to disk before the transaction has been completed. A little later, once the commit has been acknowledged by Oracle there is no process that would read the blocks back into the buffer cache and clear them out-this happens later. It would also be quite inefficient to do so.
Defining “clean”
Now when database writer writes to disk it is possible that the block just written has an active transaction recorded in its header. You can see this by dumping a block with an active transaction – the ITL in the header will reference the XID, UBA and the number of rows affected plus other information. The individual rows that are part of the transaction have their lock byte set as you can see in the row directory. The number in the lb field refers back to an ITL you see further up in the block dump (don’t worry I’ll show you an example shortly).
What happened in Oracle before direct path reads became a lot more common is this:
- A user starts a transaction, for example by updating a large portion of the table
- In the meantime the database writer flushes some blocks to disk, including some of the ones affected by the transaction.
- The user commits a transaction
- The next user queries the block (let’s assume it is not in the buffer cache)
- The second session’s foreground process reads the block
- Realises that it has an active transaction recorded in it
- Checks if that transaction is still active
- And clears the block out if not
- The block with a valid CR image is now in the buffer cache
Huh?
Sounds too abstract-I agree? Let’s have an example.
SQL> select table_name,num_rows,compression,partitioned from tabs 2 where table_name = 'T1_100k'; TABLE_NAME NUM_ROWS COMPRESS PAR ------------------------------ ---------- -------- --- T1_100K 500000 DISABLED NO SQL> update T1_100k set state = 'MODIFIED'; 5000000 rows updated.
An easy way to simulate the flushing of blocks to disk is a slightly brute-force approach of flushing the buffer cache. Only recommended in the lab, really. Let’s have a look at what the block looks like:
SQL> select dbms_rowid.rowid_block_number(rowid), dbms_rowid.rowid_relative_fno(rowid)
2 from t1_100k where rownum < 11;
DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID)
------------------------------------ ------------------------------------
3940811 5
3940811 5
3940811 5
3940811 5
3940811 5
3940811 5
3940812 5
3940812 5
3940812 5
3940812 5
10 rows selected.
Elapsed: 00:00:00.01
SQL> alter system dump datafile 5 block 3940811;
System altered.
Elapsed: 00:00:00.01
SQL> select value from v$diag_info where name like 'Default%';
VALUE
---------------------------------------------------------------------------
/u01/app/oracle/diag/rdbms/dbm01/dbm011/trace/dbm011_ora_94351.trc
Elapsed: 00:00:00.02
The block dump for block 3940811 on datafile 5 (users tablespace) is now in that trace file:
Block header dump: 0x017c21cb
Object id on Block? Y
seg/obj: 0x12a13 csc: 0x00.2191ff7 itc: 3 flg: E typ: 1 - DATA
brn: 0 bdba: 0x17c21c8 ver: 0x01 opc: 0
inc: 0 exflg: 0
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0xffff.000.00000000 0x00000000.0000.00 C--- 0 scn 0x0000.02191d10
0x02 0x0009.005.00002575 0x0003688a.0ef1.1f ---- 6 fsc 0x0000.00000000
0x03 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc 0x0000.00000000
bdba: 0x017c21cb
data_block_dump,data header at 0x7f026361707c
===============
tsiz: 0x1f80
hsiz: 0x1e
pbl: 0x7f026361707c
76543210
flag=--------
ntab=1
nrow=6
...
0xe:pti[0] nrow=6 offs=0
0x12:pri[0] offs=0x1b75
0x14:pri[1] offs=0x176a
0x16:pri[2] offs=0x135f
0x18:pri[3] offs=0xf54
0x1a:pri[4] offs=0xb49
0x1c:pri[5] offs=0x73e
block_row_dump:
tab 0, row 0, @0x1b75
tl: 1035 fb: --H-FL-- lb: 0x2 cc: 6
col 0: [ 2] c1 02
col 1: [999]
31 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
...
The significant pieces of information here are in the ITL (interested transaction list) and the lock byte set, pointing to the second ITL entry. The absence of flags for ITL #2 indicates the transaction is ongoing (you can see this in v$transaction-this is a lab environment, in real databases you’d see more than 1 active transaction):
SQL> select xidusn, xidslot, xidsqn, status, start_scn from v$transaction;
XIDUSN XIDSLOT XIDSQN STATUS START_SCN
---------- ---------- ---------- ---------------- ----------
9 5 9589 ACTIVE 35200741
This is indeed the transaction that is referenced in v$transaction and the ITL 0x02 above (9589 is decimal for 2575 in hexadecimal notation, 9 and 5 don’t need converting to decimal numbers). If you are curious now about some more internals then you might find this post by Arup Nanda interesting.
Now what happens if another session queries the table? Since the query in session 1 has not yet committed the changes in it must not be seen by any other session (that would be a dirty read otherwise. We do not use dirty reads in Oracle RDBMS). So to give the user a consistent view of the data the block must be rolled back. Before the test I made sure there wasn’t a block in the buffer cache for the object in question:
SQL> select count(*), inst_id, status from gv$bh where objd =
2 (select data_object_id from dba_objects where object_name = 'T1_100K')
3 group by inst_id, status;
COUNT(*) INST_ID STATUS
---------- ---------- ----------
524327 1 free
Let’s test:
SQL> @mystats start
SQL> select /* test004 */ count(*) from t1_100k;
COUNT(*)
----------
500000
Elapsed: 00:00:13.96
@mystats stop t=1
==========================================================================================
MyStats report : 22-FEB-2015 08:42:09
==========================================================================================
------------------------------------------------------------------------------------------
1. Summary Timings
------------------------------------------------------------------------------------------
Type Statistic Name Value
------ ---------------------------------------------------------------- ----------------
TIMER snapshot interval (seconds) 35.57
TIMER CPU time used (seconds) 3.21
------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------
Type Statistic Name Value
------ ---------------------------------------------------------------- ----------------
STAT CPU used by this session 327
STAT CPU used when call started 327
STAT CR blocks created 83,334
STAT DB time 1,401
STAT Requests to/from client 16
STAT SQL*Net roundtrips to/from client 16
STAT active txn count during cleanout 83,334
STAT buffer is not pinned count 2
STAT bytes received via SQL*Net from client 13,279
STAT bytes sent via SQL*Net to client 5,106
STAT calls to get snapshot scn: kcmgss 758
STAT calls to kcmgas 83,332
STAT calls to kcmgcs 30
STAT cell flash cache read hits 8,045
STAT cell physical IO interconnect bytes 748,576,768
STAT change write time 48
STAT cleanout - number of ktugct calls 83,334
STAT cleanouts and rollbacks - consistent read gets 83,334
STAT consistent changes 591,717
STAT consistent gets 757,980
STAT consistent gets examination 674,254
STAT consistent gets examination (fastpath) 83,337
STAT consistent gets from cache 757,980
STAT consistent gets pin 83,726
STAT consistent gets pin (fastpath) 83,055
STAT data blocks consistent reads - undo records applied 590,917
STAT db block changes 84,134
STAT db block gets 2,301
STAT db block gets from cache 2,301
STAT enqueue releases 4
STAT enqueue requests 4
STAT execute count 13
STAT file io wait time 11,403,214
STAT free buffer requested 174,766
STAT gc local grants 174,713
STAT global enqueue gets sync 43
STAT global enqueue releases 41
STAT heap block compress 4,549
STAT hot buffers moved to head of LRU 4
STAT immediate (CR) block cleanout applications 83,334
STAT index fetch by key 1
STAT lob writes 375
STAT lob writes unaligned 375
STAT logical read bytes from cache 6,228,221,952
STAT messages sent 9
STAT non-idle wait count 17,446
STAT non-idle wait time 1,141
STAT opened cursors cumulative 13
STAT parse count (hard) 1
STAT parse count (total) 13
STAT physical read IO requests 8,715
STAT physical read bytes 748,576,768
STAT physical read requests optimized 8,045
STAT physical read total IO requests 8,715
STAT physical read total bytes 748,576,768
STAT physical read total bytes optimized 65,904,640
STAT physical read total multi block requests 654
STAT physical reads 91,379
STAT physical reads cache 91,379
STAT physical reads cache prefetch 82,664
STAT recursive calls 1,985
STAT recursive cpu usage 6
STAT redo entries 84,005
STAT redo entries for lost write detection 671
STAT redo size 9,016,740
STAT redo size for lost write detection 3,016,296
STAT redo subscn max counts 83,334
STAT rows fetched via callback 1
STAT session cursor cache count 4
STAT session cursor cache hits 9
STAT session logical reads 760,281
STAT session pga memory -65,536
STAT session uga memory 65,488
STAT session uga memory max 148,312
STAT shared hash latch upgrades - no wait 8,218
STAT table fetch by rowid 1
STAT table scan blocks gotten 83,334
STAT table scan disk non-IMC rows gotten 500,000
STAT table scan rows gotten 500,000
STAT table scans (short tables) 1
STAT temp space allocated (bytes) 2,097,152
STAT user I/O wait time 1,141
STAT user calls 22
STAT workarea executions - optimal 1
STAT workarea memory allocated -43
------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis
==========================================================================================
End of report
==========================================================================================
I am using Adrian Billington’s mystats script here which I can only recommend to you-it’s very decent (and that’s an understatement). In the example above it took two snapshots and calculated the change of the session counters in v$mystat during the query execution. Have a look at the cleanout% statistics here! These are the block cleanouts. As this generates redo you can see that recorded here too. This is a tricky interview question: can a select generate redo? Yes sure can! There is also a fair amount of physical I/O going on. After all the buffer cache was empty.
The second execution of the query completes a lot faster due to the absence of physical I/O. Nevertheless, the session has to clean out all these blocks and perform consistent read processing again (remember the transaction is still uncommitted as seen in active txn%):
SQL> select count(*) from t1_100k;
COUNT(*)
----------
500000
Elapsed: 00:00:01.76
SQL> @mystats stop t=1
==========================================================================================
MyStats report : 22-FEB-2015 08:42:59
==========================================================================================
------------------------------------------------------------------------------------------
1. Summary Timings
------------------------------------------------------------------------------------------
Type Statistic Name Value
------ ---------------------------------------------------------------- ----------------
TIMER snapshot interval (seconds) 8.55
TIMER CPU time used (seconds) 1.75
------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------
Type Statistic Name Value
------ ---------------------------------------------------------------- ----------------
STAT CPU used by this session 181
STAT CPU used when call started 181
STAT CR blocks created 83,334
STAT DB time 183
STAT Requests to/from client 16
STAT SQL*Net roundtrips to/from client 16
STAT active txn count during cleanout 83,334
STAT buffer is not pinned count 2
STAT bytes received via SQL*Net from client 13,280
STAT bytes sent via SQL*Net to client 5,177
STAT calls to get snapshot scn: kcmgss 757
STAT calls to kcmgas 83,331
STAT calls to kcmgcs 30
STAT change write time 28
STAT cleanout - number of ktugct calls 83,334
STAT cleanouts and rollbacks - consistent read gets 83,334
STAT consistent changes 591,708
STAT consistent gets 757,980
STAT consistent gets examination 674,254
STAT consistent gets examination (fastpath) 83,337
STAT consistent gets from cache 757,980
STAT consistent gets pin 83,726
STAT consistent gets pin (fastpath) 83,726
STAT cursor authentications 1
STAT data blocks consistent reads - undo records applied 590,917
STAT db block changes 84,125
STAT db block gets 2,895
STAT db block gets from cache 2,895
STAT execute count 13
STAT free buffer requested 83,382
STAT global enqueue gets sync 38
STAT global enqueue releases 36
STAT heap block compress 4,549
STAT immediate (CR) block cleanout applications 83,334
STAT index fetch by key 1
STAT lob writes 375
STAT lob writes unaligned 375
STAT logical read bytes from cache 6,233,088,000
STAT messages sent 7
STAT non-idle wait count 16
STAT opened cursors cumulative 13
STAT parse count (total) 13
STAT recursive calls 1,982
STAT recursive cpu usage 6
STAT redo entries 83,334
STAT redo size 6,000,400
STAT rows fetched via callback 1
STAT session cursor cache hits 12
STAT session logical reads 760,875
STAT session pga memory -1,769,472
STAT session uga memory -1,858,472
STAT shared hash latch upgrades - no wait 524
STAT table fetch by rowid 1
STAT table scan blocks gotten 83,334
STAT table scan disk non-IMC rows gotten 500,000
STAT table scan rows gotten 500,000
STAT table scans (short tables) 1
STAT user calls 22
STAT workarea executions - optimal 1
STAT workarea memory allocated -83
------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis
==========================================================================================
End of report
==========================================================================================
I should have mentioned that these 2 examples were executed on an X2-2 quarter rack. To prevent Exadata from applying some clever optimisations I specifically disabled direct path reads and turned off Exadata features. Here is the view from the buffer cache again:
SQL> select count(*), inst_id, status from gv$bh where objd =
2 (select data_object_id from dba_objects where object_name = 'T1_100K')
3 group by inst_id, status;
COUNT(*) INST_ID STATUS
---------- ---------- ----------
351704 1 free
1 1 scur
83334 1 xcur
83334 1 cr
4 rows selected.
For each subsequent execution of the select statement consistent read processing has to be conducted. The number of CR blocks in the buffer cache will increase. For the third execution I can see the following:
SQL> r
1 select count(*), inst_id, status from gv$bh where objd =
2 (select data_object_id from dba_objects where object_name = 'T1_100K')
3* group by inst_id, status
COUNT(*) INST_ID STATUS
---------- ---------- ----------
267526 1 free
1 1 scur
83334 1 xcur
166668 1 cr
4 rows selected.
Things improve when the transaction commits.
The impact of a commit
What about a commit now? Committing in session 1 and running the query again in session 2 shows a very good response time. Please remember that I forced buffered reads. The next part deals with this case when using direct path reads instead (those are not buffered). The segment size would have qualified for direct path reads otherwise.
SQL> select /* test004 */ count(*) from t1_100k;
COUNT(*)
----------
500000
Elapsed: 00:00:00.18
==========================================================================================
MyStats report : 22-FEB-2015 09:40:08
==========================================================================================
------------------------------------------------------------------------------------------
1. Summary Timings
------------------------------------------------------------------------------------------
Type Statistic Name Value
------ ---------------------------------------------------------------- ----------------
TIMER snapshot interval (seconds) 19.73
TIMER CPU time used (seconds) 0.18
------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------
Type Statistic Name Value
------ ---------------------------------------------------------------- ----------------
STAT CPU used by this session 25
STAT CPU used when call started 25
STAT DB time 25
STAT Heatmap SegLevel - Full Table Scan 1
STAT Requests to/from client 16
STAT SQL*Net roundtrips to/from client 16
STAT buffer is not pinned count 2
STAT bytes received via SQL*Net from client 13,280
STAT bytes sent via SQL*Net to client 5,107
STAT calls to get snapshot scn: kcmgss 757
STAT calls to kcmgcs 30
STAT consistent changes 800
STAT consistent gets 83,729
STAT consistent gets examination 3
STAT consistent gets examination (fastpath) 3
STAT consistent gets from cache 83,729
STAT consistent gets pin 83,726
STAT consistent gets pin (fastpath) 83,726
STAT db block changes 800
STAT db block gets 2,301
STAT db block gets from cache 2,301
STAT enqueue releases 3
STAT enqueue requests 3
STAT execute count 13
STAT free buffer requested 53
STAT global enqueue gets sync 39
STAT global enqueue releases 37
STAT index fetch by key 1
STAT lob writes 375
STAT lob writes unaligned 375
STAT logical read bytes from cache 704,757,760
STAT no work - consistent read gets 83,334
STAT non-idle wait count 16
STAT opened cursors cumulative 13
STAT parse count (total) 13
STAT recursive calls 1,984
STAT recursive cpu usage 7
STAT rows fetched via callback 1
STAT session cursor cache count 4
STAT session cursor cache hits 10
STAT session logical reads 86,030
STAT session pga memory 65,536
STAT session pga memory max 65,536
STAT session uga memory 65,488
STAT session uga memory max 148,312
STAT shared hash latch upgrades - no wait 525
STAT table fetch by rowid 1
STAT table scan blocks gotten 83,334
STAT table scan disk non-IMC rows gotten 500,000
STAT table scan rows gotten 500,000
STAT table scans (short tables) 1
STAT temp space allocated (bytes) 2,097,152
STAT user calls 22
STAT workarea executions - optimal 1
STAT workarea memory allocated -40
------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis
==========================================================================================
End of report
==========================================================================================
Access to blocks in the buffer cache is quick! There was no need to clean out any blocks anymore. The fastest way to do something is indeed not having to do it at all. Mind you, the block on disk still has the active transaction recorded!
SQL> alter system dump datafile 5 block 3940811;
System altered.
Elapsed: 00:00:00.01
SQL> select value from v$diag_info where name like 'Default%';
VALUE
------------------------------------------------------------------------------------------------------------------------
/u01/app/oracle/diag/rdbms/dbm01/dbm011/trace/dbm011_ora_100297.trc
SQL> exit
$ vi /u01/app/oracle/diag/rdbms/dbm01/dbm011/trace/dbm011_ora_100297.trc
Block header dump: 0x017c21cb
Object id on Block? Y
seg/obj: 0x12a13 csc: 0x00.2317620 itc: 3 flg: E typ: 1 - DATA
brn: 0 bdba: 0x17c21c8 ver: 0x01 opc: 0
inc: 0 exflg: 0
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0xffff.000.00000000 0x00000000.0000.00 C--- 0 scn 0x0000.02191d10
0x02 0x0009.005.00002575 0x0003688a.0ef1.1f --U- 6 fsc 0x0000.0232cfca
0x03 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc 0x0000.00000000
bdba: 0x017c21cb
data_block_dump,data header at 0x7fc9f3cb407c
...
$ grep lb -B1 /u01/app/oracle/diag/rdbms/dbm01/dbm011/trace/dbm011_ora_100297.trc
tab 0, row 0, @0x1b75
tl: 1035 fb: --H-FL-- lb: 0x2 cc: 6
--
tab 0, row 1, @0x176a
tl: 1035 fb: --H-FL-- lb: 0x2 cc: 6
--
tab 0, row 2, @0x135f
tl: 1035 fb: --H-FL-- lb: 0x2 cc: 6
--
tab 0, row 3, @0xf54
tl: 1035 fb: --H-FL-- lb: 0x2 cc: 6
--
tab 0, row 4, @0xb49
tl: 1035 fb: --H-FL-- lb: 0x2 cc: 6
--
tab 0, row 5, @0x73e
tl: 1035 fb: --H-FL-- lb: 0x2 cc: 6
--
The transaction recorded in the ITL slot 2 is gone though:
SQL> select xidusn, xidslot, xidsqn, status, start_scn from v$transaction 2 where xidusn = 9 and xidslot = 5 and xidsqn = 9589; no rows selected
Did you notice the Flag in the second ITL entry? That wasn’t there before. It indicates a fast commit according to Core Oracle by Jonathan Lewis.
Summary
Block cleanouts are simple with buffered I/O. Dirty blocks can be rolled back and kept in the buffer cache and can be accessed without having to undergo block cleanout.
Response
[…] « Understanding enhancements to block cleanouts in Exadata part 1 […]