Encrypting, decrypting, and hashing data

Once you have set up authentication with the SecureData Appliance (SDA), call the VoltageSecureProtect and VoltageSecureAccess functions to encrypt/hash or decrypt your data, respectively.

Once you have set up authentication with the SecureData Appliance (SDA), call the VoltageSecureProtect and VoltageSecureAccess functions to encrypt/hash or decrypt your data, respectively. At a minimum, you pass these functions the value to be encrypted, hashed, or decrypted and value's FPE format as defined in the Voltage SDA. You can also capture these parameters in SQL Macros and access policies.

Connecting Vertica to the SecureData appliance

The following example demonstrates how to fetch the SDA client policy with Vertica and then how to set the session parameters for authentication to the SDA.

  1. Add the SecureData CA certificate to Vertica.

  2. Set the client policy with VoltageSecureConfigureGlobal.

    => SELECT VoltageSecureConfigureGlobal(USING PARAMETERS
        policy_url='https://voltage-pp-0000.example.com/policy/clientPolicy.xml')
        OVER ();
                               policy_url                            | allow_short_fpe | enable_file_cache
    -----------------------------------------------------------------+-----------------+-------------------
     https://voltage-pp-0000.example.com/policy/clientPolicy.xml     |                 |
    (1 row)
    
  3. (Optional) Retrieve the latest version of the client policy with VoltageSecureRefreshPolicy.

    => SELECT VoltageSecureRefreshPolicy() OVER ();
                                      PolicyRefresh
    -------------------------------------------------------------------------------------
    Successfully refreshed policy on node [v_node0001]. Policy on other nodes
    will be refreshed the next time a Voltage operation is run on them.
    (1 row)
    
  4. Set the authentication parameters to authenticate to the SDA as a Vertica user with ALTER SESSION or VoltageSecureConfigure. For details on this process, see Configuring access to SecureData.

    ALTER SESSION sets these parameters for the duration of your session and override parameters set in configuration files.

    
    => ALTER SESSION SET UDPARAMETER FOR voltagesecurelib identity='alice@example.com';
    ALTER SESSION
    => ALTER SESSION SET UDPARAMETER FOR voltagesecurelib username='alice';
    ALTER SESSION
    => ALTER SESSION SET UDPARAMETER FOR voltagesecurelib shared_secret='my_secret';
    ALTER SESSION
    

    VoltageSecureConfigure saves these parameters to the specified config_dfs_path. This configuration file can then be passed to other Voltage functions.

    => SELECT VoltageSecureConfigure(USING PARAMETERS config_dfs_path='voltage.conf',
       username='alice', identity='alice@example.com', store_password=false) OVER ();
    
     config_dfs_path | identity          | username
    -----------------+-------------------+----------
     voltage.conf    | alice@example.com | alice
    
    => SELECT VoltageSecureProtect('123-45-6789' USING PARAMETERS
        format='ssn',
        config_dfs_path='voltage.conf');
    
     VoltageSecureProtect
    ----------------------
     376-69-6789
    

Encrypting and decrypting data

VoltageSecureProtect takes a format-preserving encryption (FPE) format defined in your SDA and VARCHAR data and returns format-preserved ciphertext, which can then be decrypted with VoltageSecureAccess. That is, if the plaintext follows the form of a phone number (###-###-####), the resulting ciphertext will be in the same form. Formats can be customized in various ways, like masking certain values, leaving some part of the string unencrypted, etc.

One of the many predefined formats in the SDA is ssn. The ssn format encrypts all but the last four digits of the social security number, which can be useful for certain roles like customer support, which may need the last four digits of the SSN to authenticate individuals over the phone.


=> SELECT VoltageSecureProtect('123-45-6789' USING PARAMETERS format='ssn');
 VoltageSecureProtect
----------------------
 376-69-6789
(1 row)

=> SELECT VoltageSecureAccess('376-69-6789' USING PARAMETERS format='ssn');
 VoltageSecureAccess
---------------------
 123-45-6789
(1 row)

Another predefined format is auto, which can handle a wide variety of data, including SSNs. In contrast to ssn, auto is configured in the SDA to encrypt all characters in the plaintext.


=> SELECT VoltageSecureProtect('123-45-6789' USING PARAMETERS format='auto');
 VoltageSecureProtect
----------------------
 820-31-5110
 (1 row)

Note that you can only decrypt ciphertext with the same format that generated it. In this case, ciphertext created with ssn must be decrypted with ssn. Passing the format auto with a ciphertext generated with ssn, for instance, would simply return an incorrect plaintext.

Encrypting data during load

When handling sensitive data, you often want to encrypt it as you load it so the unencrypted values are never stored in your database.

The following example demonstrates loading data using COPY. Suppose you have a data filled with customer information with the following fields: id number, first name, last name, social security number, card verification number, and date of birth:

5345,Thane,Ross,559-32-0670,376765616314013,618,05-09-1996
5346,Talon,Wilkins,540-48-0784,4716511603424923,111,09-17-1941
5347,Daquan,Phelps,785-34-0092,342226134491834,294,05-08-1963
5348,Basia,Lopez,011-85-0705,4595818418314603,503,04-29-1940
5349,Kaseem,Hendrix,672-57-0309,4556 078 73 7944,693,03-11-1942
5350,Omar,Lott,825-45-0131,6462 0541 0799 6261,555,02-17-1956
5351,Nell,Cooke,637-50-0105,646 59756 30903 530,818,02-14-1995
5352,Illana,Middleton,831-47-0929,648 23640 86684 267,883,12-29-1949
5353,Garrett,Williamson,408-73-0207,5334 2702 1360 8370,869,11-06-1955
5354,Hanna,Ware,694-97-0394,543 38494 19219 254,586,08-08-1967

To encrypt the social security number and the credit card number columns, call VoltageSecureProtect to encrypt these columns in the COPY statement you use to load the data:

=> CREATE TABLE customers (id INTEGER, first_name VARCHAR, last_name VARCHAR,
                           ssn VARCHAR(11), cc_num VARCHAR(25), cvv VARCHAR(5), dob DATE);
CREATE TABLE
=> COPY customers (id, first_name, last_name, ssn_raw FILLER VARCHAR(11),
        cc_num_raw FILLER VARCHAR(25), cvv, dob,
        ssn AS VoltageSecureProtect(ssn_raw USING PARAMETERS format='ssn',
                                    config_dfs_path='voltage.conf'),
     cc_num AS VoltageSecureProtect(cc_num_raw USING PARAMETERS format='cc',
                                       config_dfs_path='voltage.conf'))
     FROM '/home/dbadmin/customer_data.csv' DELIMITER ',';
 Rows Loaded
-------------
         100
(1 row)

=> SELECT * FROM customers ORDER BY id ASC LIMIT 10;
  id  | first_name | last_name  |     ssn     |       cc_num        | cvv |    dob
------+------------+------------+-------------+---------------------+-----+------------
 5345 | Thane      | Ross       | 072-52-0670 | 405939553794013     | 618 | 1996-05-09
 5346 | Talon      | Wilkins    | 348-30-0784 | 5350908688294923    | 111 | 1941-09-17
 5347 | Daquan     | Phelps     | 983-53-0092 | 133383311411834     | 294 | 1963-05-08
 5348 | Basia      | Lopez      | 490-63-0705 | 7979155436134603    | 503 | 1940-04-29
 5349 | Kaseem     | Hendrix    | 268-74-0309 | 3212 314 45 7944    | 693 | 1942-03-11
 5350 | Omar       | Lott       | 872-03-0131 | 4914 1839 6801 6261 | 555 | 1956-02-17
 5351 | Nell       | Cooke      | 785-90-0105 | 332 34312 95233 530 | 818 | 1995-02-14
 5352 | Illana     | Middleton  | 947-60-0929 | 219 06376 36044 267 | 883 | 1949-12-29
 5353 | Garrett    | Williamson | 333-23-0207 | 1126 1022 5922 8370 | 869 | 1955-11-06
 5354 | Hanna      | Ware       | 661-57-0394 | 106 09915 59049 254 | 586 | 1967-08-08
(10 rows)

Encrypting Non-VARCHAR data

VoltageSecureProtect function only encrypts VARCHAR values, so if you need to encrypt other data types, such as DATE or INTEGER, you must cast these values to VARCHAR in your function call and then cast them back to DATE or INTEGER when storing them.

Building off the previous example, the following encrypts the date of birth (dob) column, which holds date type DATE.

=> CREATE TABLE customers2 (id INTEGER, first_name VARCHAR, last_name VARCHAR, ssn VARCHAR(11),
                            cc_num VARCHAR(25), cvv VARCHAR(5), dob DATE);
CREATE TABLE
=> COPY customers2 (id, first_name, last_name, ssn, cc_num, cvv, dob_raw FILLER DATE,
                    dob AS VoltageSecureProtect(dob_raw::VARCHAR USING PARAMETERS
                                                format='birthday',
                                                config_dfs_path='voltage.conf')::DATE)
        FROM '/home/dbadmin/customer_data.csv' DELIMITER ',';
 Rows Loaded
-------------
         100
(1 row)

=> SELECT * FROM customers2 ORDER BY id ASC LIMIT 10;
  id  | first_name | last_name  |     ssn     |       cc_num        | cvv |    dob
------+------------+------------+-------------+---------------------+-----+------------
 5345 | Thane      | Ross       | 559-32-0670 | 376765616314013     | 618 | 1902-03-09
 5346 | Talon      | Wilkins    | 540-48-0784 | 4716511603424923    | 111 | 2023-07-22
 5347 | Daquan     | Phelps     | 785-34-0092 | 342226134491834     | 294 | 2091-01-18
 5348 | Basia      | Lopez      | 011-85-0705 | 4595818418314603    | 503 | 1921-08-17
 5349 | Kaseem     | Hendrix    | 672-57-0309 | 4556 078 73 7944    | 693 | 1962-08-23
 5350 | Omar       | Lott       | 825-45-0131 | 6462 0541 0799 6261 | 555 | 1930-01-12
 5351 | Nell       | Cooke      | 637-50-0105 | 646 59756 30903 530 | 818 | 2098-01-01
 5352 | Illana     | Middleton  | 831-47-0929 | 648 23640 86684 267 | 883 | 1956-09-07
 5353 | Garrett    | Williamson | 408-73-0207 | 5334 2702 1360 8370 | 869 | 2079-03-25
 5354 | Hanna      | Ware       | 694-97-0394 | 543 38494 19219 254 | 586 | 1903-07-16
(10 rows)

Note that you cannot use the auto format for DATE types; ciphertext generated by auto uses the full range of numbers and letters and only preserves the number of characters and any separators, so the resulting ciphertext cannot typically be cast to a date.

=> SELECT VoltageSecureProtect('07-16-1969' USING PARAMETERS format='auto',
                               config_dfs_path='/voltagesecure/conf');
 VoltageSecureProtect
----------------------
 45-86-8651
(1 row)

=> SELECT VoltageSecureProtect('07-16-1969' USING PARAMETERS format='auto',
                               config_dfs_path='/voltagesecure/conf')::DATE;
ERROR 2992:  Date/time field value out of range: "45-86-8651"
HINT:  Perhaps you need a different "datestyle" setting

Decrypting values in queries

To decrypt ciphertext stored in tables, call VoltageSecureAccess on the encrypted column in your query.

The following example queries a table and decrypts the SSN column:


=> SELECT id,
          first_name,
          last_name,
          VoltageSecureAccess(ssn USING PARAMETERS format='ssn',
                              config_dfs_path='/voltagesecure/conf') AS ssn_plaintext,
          dob
      FROM customers
      WHERE dob < '1970-1-1'
      ORDER BY id ASC
      LIMIT 10;

  id  | first_name | last_name  |     ssn_plaintext     |    dob
------+------------+------------+-----------------------+------------
 5346 | Talon      | Wilkins    | 540-48-0784           | 1941-09-17
 5347 | Daquan     | Phelps     | 785-34-0092           | 1963-05-08
 5348 | Basia      | Lopez      | 011-85-0705           | 1940-04-29
 5349 | Kaseem     | Hendrix    | 672-57-0309           | 1942-03-11
 5350 | Omar       | Lott       | 825-45-0131           | 1956-02-17
 5352 | Illana     | Middleton  | 831-47-0929           | 1949-12-29
 5353 | Garrett    | Williamson | 408-73-0207           | 1955-11-06
 5354 | Hanna      | Ware       | 694-97-0394           | 1967-08-08
 5355 | Quinn      | Pruitt     | 818-91-0359           | 1965-11-14
 5356 | Clayton    | Santiago   | 102-56-0010           | 1958-02-02
(10 rows)

Decrypting Non-VARCHAR columns

Like VoltageSecureProtect, VoltageSecureAccess can only handle VARCHAR values. When you decrypt ciphertext from a non-VARCHAR column, you must cast it to a VARCHAR before passing it to VoltageSecureAccess. If your query or client application depends on the original data type of the column, you must then cast the returned plaintext back to the column's original data type.

The following example decrypts the dob column by casting the column data from DATE to VARCHAR, passing it to VoltageSecureAccess, and then casting the plaintext back to DATE.


=> SELECT id, first_name, last_name,
          VoltageSecureAccess(dob::VARCHAR USING PARAMETERS format='birthday',
                              config_dfs_path='/voltagesecure/conf')::DATE AS dob
          FROM customers2 ORDER BY id LIMIT 10;
  id  | first_name | last_name  |    dob
------+------------+------------+------------
 5345 | Thane      | Ross       | 1996-05-09
 5346 | Talon      | Wilkins    | 1941-09-17
 5347 | Daquan     | Phelps     | 1963-05-08
 5348 | Basia      | Lopez      | 1940-04-29
 5349 | Kaseem     | Hendrix    | 1942-03-11
 5350 | Omar       | Lott       | 1956-02-17
 5351 | Nell       | Cooke      | 1995-02-14
 5352 | Illana     | Middleton  | 1949-12-29
 5353 | Garrett    | Williamson | 1955-11-06
 5354 | Hanna      | Ware       | 1967-08-08
(10 rows)

Masking decrypted values

If the ciphertext was created with a format that specifies a form of "masking," plaintexts returned by VoltageSecureAccess can be masked by using the mask=true parameter. If you don't specify this parameter, the full plaintext will be returned regardless of the format.

This format, as defined in the Voltage SDA, obscures all but the last two characters of the decrypted plaintext.


=> SELECT VoltageSecureAccess('g3kbx6ru19', USING PARAMETERS
                                  format='maskedFormat',
                                  config_dfs_path='voltage.conf');
 VoltageSecureAccess
----------------------
 1234567890

=> SELECT VoltageSecureAccess('g3kbx6ru19', USING PARAMETERS
                                  format='maskedFormat',
                                  config_dfs_path='voltage.conf',
                                  mask=true);
 VoltageSecureAccess
----------------------
 XXXXXXXX90

Tweaking ciphertext

Analogous to a salt, you can pass an additional "tweak" value to add to the key used by VoltageSecureProtect to create unique ciphertexts from the same plaintext. This "tweaked" ciphertext can then only be decrypted by passing the same format and "tweak" value to VoltageSecureAccess. Support for this feature is set in the SDA.

The following example uses the custom format ssn-tweak and the value tweakvalue123 to tweak the ciphertext.


=> SELECT VoltageSecureProtect('681-09-2913', 'tweakvalue123' USING PARAMETERS
    format='ssn-tweak');

 VoltageSecureProtect
----------------------
 721-21-2913

=> SELECT VoltageSecureAccess('721-21-2913', 'tweakvalue123' USING PARAMETERS
    format='ssn-tweak');

 VoltageSecureProtect
----------------------
 681-09-2913

Tweaking an entire column

A common use case is to use the values from one column to tweak the ciphertext in another.

The following example loads the data in a similar manner to the previous section, but uses the values from the last_name column to tweak the dates of birth of each customer.


=> CREATE TABLE customers (id INTEGER, first_name VARCHAR, last_name VARCHAR, ssn VARCHAR(11),
    cc_num VARCHAR(25), cvv VARCHAR(5), dob DATE);
CREATE TABLE

=> COPY customers (id, first_name, last_name, ssn_raw FILLER VARCHAR(11),
           cc_num_raw FILLER VARCHAR(25), cvv,
           dob_raw FILLER DATE, ssn AS VoltageSecureProtect(ssn_raw USING PARAMETERS format='ssn',
                                                            config_dfs_path='voltage.conf'),
           /* cast the dob_raw to VARCHAR and tweak with last_name, then cast back to DATE */
           dob AS VoltageSecureProtect(dob_raw::VARCHAR , last_name USING PARAMETERS format='date-tweak',
                                       config_dfs_path='voltage.conf')::DATE,
           cc_num AS VoltageSecureProtect(cc_num_raw USING PARAMETERS format='cc', config_dfs_path='voltage.conf'))
           FROM 'customers.csv' DELIMITER ',';

 Rows Loaded
-------------
         100
(1 row)

=> SELECT id, first_name, last_name, dob FROM customers ORDER BY id;
  id  | first_name | last_name  |    dob
------+------------+------------+------------
 5345 | Thane      | Ross       | 2086-03-07
 5346 | Talon      | Wilkins    | 1936-06-09
 5347 | Daquan     | Phelps     | 1953-12-01
 5348 | Basia      | Lopez      | 1979-11-30
 5349 | Kaseem     | Hendrix    | 2090-01-07
 5350 | Omar       | Lott       | 1984-07-27
 5351 | Nell       | Cooke      | 2053-03-20
 5352 | Illana     | Middleton  | 1988-03-06
 5353 | Garrett    | Williamson | 1927-03-03
 5354 | Hanna      | Ware       | 1998-08-28
 5355 | James      | May        | 1941-10-24
(11 rows)

To decrypt the column, include the tweak column in the call to VoltageSecureAccess. The tweak column must contain the same data it did when it was used to encrypt the column.


=> SELECT id, first_name, last_name,
   VoltageSecureAccess(dob::VARCHAR , last_name USING PARAMETERS format='date-tweak',
   config_dfs_path='voltage.conf')::DATE AS dob
   FROM customers ORDER BY id;
 id  | first_name | last_name  |    dob
------+------------+------------+------------
 5345 | Thane      | Ross       | 1996-05-09
 5346 | Talon      | Wilkins    | 1941-09-17
 5347 | Daquan     | Phelps     | 1963-05-08
 5348 | Basia      | Lopez      | 1940-04-29
 5349 | Kaseem     | Hendrix    | 1942-03-11
 5350 | Omar       | Lott       | 1956-02-17
 5351 | Nell       | Cooke      | 1995-02-14
 5352 | Illana     | Middleton  | 1949-12-29
 5353 | Garrett    | Williamson | 1955-11-06
 5354 | Hanna      | Ware       | 1967-08-08
 5355 | James      | May        | 1967-08-08
(11 rows)

Tweaking columns with stored procedures

Since tweak values become part of the encryption key, a tweak column's contents must be the same at the time of both encryption and decryption. To avoid losing access to the plaintext, when modifying a tweak column, you must:

  1. decrypt the ciphertext

  2. modify the tweak column

  3. re-encrypt the plaintext data with the contents of the modified tweak column

You can automate this process by using stored procedures. For example, a stored procedure to update a customer's date of birth when tweaked with their last name might be:

=> CREATE PROCEDURE update_dob (employee_id INT, tweak_value VARCHAR) AS $$
  BEGIN
    PERFORM UPDATE customers
      SET dob = (
        WITH dob_ AS (SELECT VoltageSecureAccess(dob::VARCHAR , last_name
          USING PARAMETERS format='date-tweak', config_dfs_path='voltage.conf')::DATE AS dob
            FROM customers
            WHERE id = employee_id)
        SELECT VoltageSecureProtect(dob::VARCHAR, tweak_value
          USING PARAMETERS format='date-tweak', config_dfs_path='voltage.conf')::DATE
            FROM dob_)
      WHERE ID = employee_id;

    PERFORM UPDATE customers SET last_name = tweak_value WHERE ID = employee_id;

    PERFORM COMMIT;
  END;
$$;

You can then call the procedure to update the encrypted value and the tweak value:

=> CALL update_dob(5349, 'Henderson');

Hashing data

VoltageSecureProtect can also hash your data while preserving its format, which can be useful in contexts where you need to anonymize data to comply with privacy standards while retaining accessibility for client code.

You should keep in mind the following:

  • hashing is a one way operation, so the original plaintext is unrecoverable

  • the nature of format-preserving hash functions and plaintexts with low entropy (names, dates, SSNs, etc.) mean that collisions are much more likely than with standard hashing functions

The following example encrypts an SSN using a customFPH format, ssnHash.


=> SELECT VoltageSecureProtect('681-09-2913' USING PARAMETERS
                                  format='ssnHash',
                                  config_dfs_path='voltage.conf');
 VoltageSecureProtect
----------------------
 841-68-2913