Using faker-js/Faker to generate test data respecting referential integrity in Oracle Database 23c

Generating a random set of test data is easy – until it isn’t anymore because your foreign keys are violated, leading to errors when inserting rows into a table. This post describes how to use @faker-js/Faker to generate test data adhering to the data model, respecting primary and foreign keys. It applies to Oracle Database 23c on Linux x86-64.

Human Resources sample schema

The Oracle Database ships with a set of sample schemas. This comes in quite handy for this post as it makes for a realistic data model. Rather than re-inventing the wheel this article is based on tables in the human resource, aka HR schema.

For the sake of argument, let’s assume you are assigned a task to provide sample data for the HR.EMPLOYEES table. The EMPLOYEES table is defined as follows:

SQL> info+ employees
TABLE: EMPLOYEES 
	 LAST ANALYZED: 
	 ROWS         : 
	 SAMPLE SIZE  : 
	 INMEMORY     :DISABLED 
	 COMMENTS     :employees table. References with departments,
                       jobs, job_history tables. Contains a self reference. 

Columns 
NAME             DATA TYPE           NULL  DEFAULT    LOW_VALUE   HIGH_VALUE   NUM_DISTINCT   HISTOGRAM  
*EMPLOYEE_ID     NUMBER(6,0)         No                                                                      
 FIRST_NAME      VARCHAR2(20 BYTE)   Yes                                                                     
 LAST_NAME       VARCHAR2(25 BYTE)   No                                                                      
 EMAIL           VARCHAR2(25 BYTE)   No                                                                      
 PHONE_NUMBER    VARCHAR2(20 BYTE)   Yes                                                                     
 HIRE_DATE       DATE                No                   .....       .....                                  
 JOB_ID          VARCHAR2(10 BYTE)   No                                                                      
 SALARY          NUMBER(8,2)         Yes                                                                     
 COMMISSION_PCT  NUMBER(2,2)         Yes                                                                     
 MANAGER_ID      NUMBER(6,0)         Yes                                                                     
 DEPARTMENT_ID   NUMBER(4,0)         Yes                                                                     

Indexes
INDEX_NAME              UNIQUENESS    STATUS    FUNCIDX_STATUS    COLUMNS                  
_______________________ _____________ _________ _________________ ________________________ 
HR.EMP_JOB_IX           NONUNIQUE     VALID                       JOB_ID                   
HR.EMP_NAME_IX          NONUNIQUE     VALID                       LAST_NAME, FIRST_NAME    
HR.EMP_EMAIL_UK         UNIQUE        VALID                       EMAIL                    
HR.EMP_EMP_ID_PK        UNIQUE        VALID                       EMPLOYEE_ID              
HR.EMP_MANAGER_IX       NONUNIQUE     VALID                       MANAGER_ID               
HR.EMP_DEPARTMENT_IX    NONUNIQUE     VALID                       DEPARTMENT_ID            


References
TABLE_NAME     CONSTRAINT_NAME    DELETE_RULE    STATUS     DEFERRABLE        VALIDATED    GENERATED    
______________ __________________ ______________ __________ _________________ ____________ ____________ 
DEPARTMENTS    DEPT_MGR_FK        NO ACTION      ENABLED    NOT DEFERRABLE    VALIDATED    USER NAME    
EMPLOYEES      EMP_MANAGER_FK     NO ACTION      ENABLED    NOT DEFERRABLE    VALIDATED    USER NAME    
JOB_HISTORY    JHIST_EMP_FK       NO ACTION      ENABLED    NOT DEFERRABLE    VALIDATED    USER NAME

(info+ is a command available in sqlcl)

The EMPLOYEES table is a great candidate for a blog post: plenty of constraints and foreign keys to other tables add a certain degree of complexity.

Say hello to FakerJS

FakerJS is a very popular data generator; it has been introduced in the previous article. It’s again at the centre of attention and will be used to create the test data.

Please refer to the project’s GitHub project site for more details about its license and use implications. The article assumes your legal and IT Security departments (as well as any other party) agreed that using the module in your code is safe and compliant with your license. Using 3rd party code in your application typically requires specific compliance steps to be completed which are out of the scope of this article.

The article assumes that you deployed @faker-js/faker 8.3.1 as a Multilingual Engine (MLE) module in Oracle Database 23c. If you are unsure how to do this, please refer to the previous article for details. The post further assumes that you created the corresponding MLE environment FAKER_ENV.

Test data generation

The following JavaScript module extends @faker-js/faker with a few custom routines to generate the requested employee record. A few of the functions shown below have previously been introduced, they are kept to allow copy/pasting the code without creating problems at runtime.

The employee record generated by randomEmployee() is returned as JSON. Consumers of randomEmployee() can store the data as is in a column or collection. Alternatively it’s possible to use json_table() to transform the JSON into a relational model. You’ll see an example of the latter a bit later in this post.

create or replace mle module mle_faker language javascript as
import { simpleFaker, faker } from "fakerjs";

/**
 * Generate a random integer in the interval [minNumber, maxNumber]
 * @param {number} minNumber the lower bound of the interval
 * @param {number} maxNumber the upper bound of the interval
 * @returns number
 */
export function randomInt(minNumber, maxNumber) {

    const options = {
        max: maxNumber,
        min: minNumber
    }

    return simpleFaker.number.int(options);
}

/**
 * Generate a random, alpha-numeric string of a certain case (defaults
 * to mixed case), and a specific length
 * @param {string} casing the string's case, one of upper, lower, mixed
 * @param {*} minLength the string's minimum length
 * @param {*} maxLength the string's maximum length
 * @returns string
 */
export function randomString(casing, minLength, maxLength) {

    const options = {
        length: {
            max: maxLength,
            min: minLength
        }
    };

    switch (casing) {
        case 'upper':
            options.casing = 'upper';
            break;
        case 'lower':
            options.casing = 'lower';
            break;
        case 'mixed':
            options.casing = 'mixed';
            break;
        default:
            options.casing = 'mixed';
            break;
    }

    return simpleFaker.string.alpha(options);
}

/**
 * Generate a random date between in the interval [startDate, stopDate]
 * @param {Date} startDate the earliest possible date
 * @param {Date} stopDate the latest possible date
 * @returns Date
 */
export function randomDate(startDate, stopDate) {

    const options = {
        from: startDate,
        to: stopDate
    };

    return simpleFaker.date.between(options);
}

/**
 * Create a new employee record for insertion into HR.EMPLOYEES. The main
 * goal is to ensure referential integrity is preserved. Helper functions
 * select the relevant details from the schema. Consumers of this function
 * need to either store the record in JSON format, or use json_table() to
 * convert JSON into the relational format.
 * @param {string} gender the employee's gender
 * @returns {JSON} a JSON representation of the new employee
 */
export function randomEmployee(gender) {

    let employee = {};

    const extension = randomInt(1, 9999);

    // employee_id doesn't need to be created here
    employee.firstName = faker.person.firstName(gender);
    employee.lastName = faker.person.lastName();
    employee.email = employee.firstName.substr(0,1) + employee.lastName.toUpperCase();
    employee.phoneNumber = '1.650.555.' + faker.string.numeric(4);
    employee.hireDate = randomDate('2020-01-01T00:00:00.000Z', '2023-12-01T00:00:00.000Z');
    employee.jobId = randomJob();
    employee.salary = faker.number.int(randomSalary(employee.jobId));
    employee.commissionPct = null;
    employee.departmentId = randomDepartment();
    employee.managerId = randomManager(employee.departmentId);

    return employee;
}

/**
 * Get a random job from the HR.JOBS table
 * @returns a random JOB_ID
 */
function randomJob() {

    const result = session.execute(
        `SELECT
            job_id
        FROM
            jobs
        ORDER BY
            dbms_random.random
        FETCH FIRST 1 ROWS ONLY`
    );

    if (result.rows === undefined) {
        throw new Error(`database query went wrong: ${err}`);
    } else {
        return result.rows[0].JOB_ID;
    }
}

/**
 * Get a random department ID from HR.DEPARTMENTS. Makes sure the
 * department has a manager. The department's manager will later
 * be used as the new employee's manager as well.
 * @returns a random DEPARTMENT_ID
 */
function randomDepartment() {

    const result = session.execute(
        `SELECT
            department_id
        FROM
            departments
        WHERE
            manager_id is not null
        ORDER BY
            dbms_random.random
        FETCH FIRST 1 ROWS ONLY`
    );

    if (result.rows === undefined) {
        throw new Error(`database query went wrong: ${err}`);
    } else {
        return result.rows[0].DEPARTMENT_ID;
    }
}

/**
 * Get a specific department's manager from HR.DEPARTMENTS
 * @param {number} departmentId the department ID for which a manager must be found
 * @returns the corresponding MANAGER_ID
 */
function randomManager(departmentId) {

    const result = session.execute(
        `SELECT
            manager_id
        FROM
            departments
        WHERE
            department_id = :1`,
        [ departmentId ]
    );

    if (result.rows === undefined) {
        throw new Error(`database query went wrong: ${err}`);
    } else {
        return result.rows[0].MANAGER_ID;
    }
}

/**
 * Get a random salary in the range defined by HR.JOBS, depending
 * on the input parameter.
 * @param {string} jobId matching HR.JOBS.JOB_ID
 * @returns {JSON} a JSON object to be used in faker.number.int to calculate the appropriate salary
 */
function randomSalary(jobId) {

    const result = session.execute(
        `SELECT
            min_salary as min,
            max_salary as max
        FROM
            jobs
        WHERE
            job_id = :1`,
        [ jobId ]
    );

    if (result.rows === undefined) {
        throw new Error(`database query went wrong: ${err}`);
    } else {
        return {
            min: result.rows[0].MIN,
            max: result.rows[0].MAX
        }
    }
}

Call specification

A call specification makes the JavaScript code available in SQL and PL/SQL. It can be created as follows:

create or replace package mle_faker_api as

    function random_number(
        p_min_number number,
        p_max_number number
    ) return number
        as mle module mle_faker
        env faker_env
        signature 'randomInt';
    
    function random_string(
        p_casing varchar2,
        p_min_length number,
        p_max_length number
    ) return varchar2
        as mle module mle_faker
        env faker_env
        signature 'randomString';
    
    function random_date(
        p_start_date date,
        p_stop_date date
    ) return date
        as mle module mle_faker
        env faker_env
        signature 'randomDate';
    
    function random_employee(
        p_gender varchar2
    ) return JSON
        as mle module mle_faker
        env faker_env
        signature 'randomEmployee';

end mle_faker_api;
/

Testing

With the JavaScript code in place and a call specification available, it’s time for some testing.

Creating a single record

Let’s see what happens when you create a single employee:

SQL> select
  2     mle_faker_api.random_employee('female') employee;

The query generates the following output:

EMPLOYEE
---------------------------------------------------
{
  "firstName" : "Beatrice",
  "lastName" : "Lynch",
  "email" : "BLYNCH",
  "phoneNumber" : "1.650.555.2607",
  "hireDate" : "2023-01-11T05:42:26.380000+01:00",
  "jobId" : "IT_PROG",
  "salary" : 6458,
  "commissionPct" : null,
  "departmentId" : 90,
  "managerId" : 100
}  

A few things are noteworthy about this JSON:

  • Unlike truly random data generators the EMAIL column matches the rows in the table. The email address consists of the first letter of the first name + the last name in all upper-case
  • The phone number is in the same range as existing numbers
  • The salary of 6458 units is in the range defined in HR.JOBS. Programmers earn between 4000 and 10000 monetary units
  • Department 90’s manager acts as this employee’s manager as well

This looks good, all the values make sense in the context of the HR sample schema.

More data, please

What if you need to create more than a single employee record? No problem! The following example demonstrates how to create 10 new employee records, transformed to the relational format using json_table():

SQL> with lots_of_employees as (
  2      select
  3          case when mod(rownum, 2) = 0 then
  4              mle_faker_api.random_employee('female')
  5          else
  6              mle_faker_api.random_employee('male')
  7          end employee
  8      from
  9          dual
 10      connect by level
 11          <= 10
 12  )
 13  select
 14      jt.*
 15  from
 16      lots_of_employees e,
 17      json_table(
 18          e.employee,
 19          '$'
 20          columns (
 21              first_name      varchar2    path '$.firstName',
 22              last_name       varchar2    path '$.lastName',
 23              email           varchar2    path '$.email',
 24              phone_number    varchar2    path '$.phoneNumber',
 25              hire_date       date        path '$.hireDate',
 26              job_id          varchar2    path '$.jobId',
 27              salary          number      path '$.salary',
 28              commission_pct  number      path '$.commissionPct',
 29              manager_id      number      path '$.managerId',
 30              department_id   number      path '$.departmentId'
 31          )
 32*     ) jt;

FIRST_NAME    LAST_NAME          EMAIL               PHONE_NUMBER      HIRE_DATE    JOB_ID        SALARY    COMMISSION_PCT    MANAGER_ID    DEPARTMENT_ID 
_____________ __________________ ___________________ _________________ ____________ __________ _________ _________________ _____________ ________________ 
Marc          Mills              MMILLS              1.650.555.4471    18-JUN-23    PU_MAN         14241                             100               90 
Marcia        Welch              MWELCH              1.650.555.6313    18-MAR-21    PU_MAN         13593                             100               90 
Mack          Quitzon            MQUITZON            1.650.555.7048    20-DEC-22    PR_REP          5533                             200               10 
Leticia       Murphy             LMURPHY             1.650.555.6040    29-AUG-22    AD_ASST         5066                             103               60 
Laurence      Schuppe            LSCHUPPE            1.650.555.8321    02-SEP-20    SA_MAN         19420                             103               60 
Phyllis       Grant              PGRANT              1.650.555.2625    14-MAY-22    AC_MGR          9475                             100               90 
Lowell        Kilback            LKILBACK            1.650.555.2167    20-DEC-21    PR_REP          4751                             121               50 
Mindy         Upton              MUPTON              1.650.555.4995    02-APR-23    ST_MAN          8356                             200               10 
Eddie         Mitchell           EMITCHELL           1.650.555.6041    26-MAR-23    SA_REP         11012                             100               90 
Gertrude      O'Connell-Hoppe    GO'CONNELL-HOPPE    1.650.555.8125    23-JAN-21    AC_MGR          9139                             100               90

It’s very easy to add new records to the HR.EMPLOYEES table using this approach. Thankfully we don’t need to worry about the EMPLOYEE_ID, a sequence (EMPLOYEES_SEQ) helps populating this column:

SQL> insert into employees
  2  with lots_of_employees as (
  3      select
  4          case when mod(rownum, 2) = 0 then
  5              mle_faker_api.random_employee('female')
  6          else
  7              mle_faker_api.random_employee('male')
  8          end employee
  9      from
 10          dual
 11      connect by level
 12          <= 10
 13  )
 14  select
 15      employees_seq.nextval,
 16      jt.*
 17  from
 18      lots_of_employees e,
 19      json_table(
 20          e.employee,
 21          '$'
 22          columns (
 23              first_name      varchar2    path '$.firstName',
 24              last_name       varchar2    path '$.lastName',
 25              email           varchar2    path '$.email',
 26              phone_number    varchar2    path '$.phoneNumber',
 27              hire_date       date        path '$.hireDate',
 28              job_id          varchar2    path '$.jobId',
 29              salary          number      path '$.salary',
 30              commission_pct  number      path '$.commissionPct',
 31              manager_id      number      path '$.managerId',
 32              department_id   number      path '$.departmentId'
 33          )
 34*     ) jt;

10 rows inserted.

Summary

Creating random dummy data isn’t too hard, the challenge is somewhat higher in cases where data should make sense and respect primary and foreign key definitions. The previous article hopefully demonstrated that it’s not too hard to fulfill these requirements.

Blog at WordPress.com.