Matching the start-with value of an identity column with table data

Identity columns are a fantastic feature. They were introduced in Oracle Database 12c to comply with the SQL Standard, and they solved the (then) age-old problem of auto-incrementing the value of a surrogate key. Here is a link to the SQL language reference for Oracle Database 12c, the release at which they initially appeared. In my experience, identity columns are easier to use provided they are defined with the initial table creation than sequences. One reason is shown in this article.

Background

You may encounter situations where you have to reset an identity column’s next value. There aren’t really many valid use cases for this feature, but if you’re facing one, you’d appreciate the ability to update the next value.

Suppose you regularly truncate a table as part of your warehouse’s ELT (extraction, load, transform) process. That won’t affect its identity column’s next value; however, that’s a requirement. How can this be done? Consider the following example:

--liquibase formatted sql
--changeset martin:identity-demo failOnError:true labels:demo0001

CREATE TABLE staging_t (
    id NUMBER
        GENERATED BY DEFAULT ON NULL AS IDENTITY ( CACHE 100 ),
    c1 VARCHAR2(100),
    c2 DATE
);

--rollback drop table staging_t purge;

After the table has been created, let’s simulate the first load operation:

insert into staging_t (
  c1,
  c2
)
with lots_of_ints as (
select
  rownum rn
from
  dual
connect by
  rownum  <= 10)
select
  rn,
  sysdate - rn / 1440
from
  lots_of_ints;

This results in the first increment of the ID column as you can see here:

SQL> select
  2    *
  3  from
  4*   staging_t;

   ID C1    C2          
_____ _____ ___________ 
    1 1     17.06.24    
    2 2     17.06.24    
    3 3     17.06.24    
    4 4     17.06.24    
    5 5     17.06.24    
    6 6     17.06.24    
    7 7     17.06.24    
    8 8     17.06.24    
    9 9     17.06.24    
   10 10    17.06.24    

10 rows selected.

As soon as the first batch of incoming data has been successfully processed, the second ELT process commences. The first step is to truncate the table, then load more data.

truncate table staging_t;

insert into staging_t (
  c1,
  c2
)
with lots_of_ints as (
select
  rownum rn
from
  dual
connect by
  rownum  <= 10)
select
  rn,
  sysdate - rn / 1440
from
  lots_of_ints;

However, as you can see, the ID values haven’t been reset as they should have:

SQL> truncate table staging_t;

Table STAGING_T truncated.

SQL> 
SQL> insert into staging_t (
  2    c1,
  3    c2
  4  )
  5  with lots_of_ints as (
  6  select
  7    rownum rn
  8  from
  9    dual
 10  connect by
 11    rownum  <= 10)
 12  select
 13    rn,
 14    sysdate - rn / 1440
 15  from
 16*   lots_of_ints;

10 rows inserted.

SQL> select
  2    *
  3  from
  4*   staging_t;

   ID C1    C2          
_____ _____ ___________ 
   11 1     17.06.24    
   12 2     17.06.24    
   13 3     17.06.24    
   14 4     17.06.24    
   15 5     17.06.24    
   16 6     17.06.24    
   17 7     17.06.24    
   18 8     17.06.24    
   19 9     17.06.24    
   20 10    17.06.24    

10 rows selected. 

To reset the identity column, you need to add another step in your process:

alter table staging_t modify
  id generated by default on null
  as identity (start with limit value);

Putting it all together, my fictional ELT process looks as follows:

-- truncate the table
truncate table staging_t;

-- sync identity column's value with table data
alter table staging_t modify
  id generated by default on null
  as identity (start with limit value);

-- load
insert into staging_t (
  c1,
  c2
)
with lots_of_ints as (
select
  rownum rn
from
  dual
connect by
  rownum  <= 10)
select
  rn,
  sysdate - rn / 1440
from
  lots_of_ints;

-- verify result
select
  *
from
  staging_t;

As you can see, this time the IDs have been reset:

SQL> @load
SQL> truncate table staging_t;

Table STAGING_T truncated.

SQL> alter table staging_t modify
  2    id generated by default on null
  3    as identity (start with limit value);

Table STAGING_T altered.

SQL> insert into staging_t (
  2    c1,
  3    c2
  4  )
  5  with lots_of_ints as (
  6  select
  7    rownum rn
  8  from
  9    dual
 10  connect by
 11    rownum  <= 10)
 12  select
 13    rn,
 14    sysdate - rn / 1440
 15  from
 16    lots_of_ints;

10 rows inserted.

SQL> select
  2    *
  3  from
  4    staging_t;

   ID C1    C2          
_____ _____ ___________ 
    1 1     17.06.24    
    2 2     17.06.24    
    3 3     17.06.24    
    4 4     17.06.24    
    5 5     17.06.24    
    6 6     17.06.24    
    7 7     17.06.24    
    8 8     17.06.24    
    9 9     17.06.24    
   10 10    17.06.24    

10 rows selected. 

Just what I needed!

Blog at WordPress.com.