This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Stored procedures
You can condense complex database tasks and routines into stored procedures.
You can condense complex database tasks and routines into stored procedures. Unlike external procedures, stored procedures live and can be executed from inside your database; this lets them communicate and interact with your database directly to perform maintenance, execute queries, and update tables.
Best practices
Many other databases are optimized for online transaction processing (OLTP), which focuses on frequent transactions. In contrast, Vertica is optimized for online analytical processing (OLAP), which instead focuses on storing and analyzing large amounts of data and delivering the fastest responses to the most complex queries on that data.
This architecture difference means that the recommended use cases and best practices for stored procedures in Vertica differ slightly from stored procedures in other databases.
While stored procedures in OLTP-oriented databases are often used to perform small transactions, stored procedures in OLAP-oriented databases like Vertica should instead be used to enhance analytical workloads. Vertica can handle isolated transactions, but frequent small transactions can potentially hinder performance.
Some recommended use cases for stored procedures in Vertica include information lifecycle management (ILM) activities such as extract, transform, and load (ETL), and data preparation for tasks like machine learning. For example:
-
Swapping partitions according to age
-
Exporting data at end-of-life and dropping the partitions
-
Saving inputs, outputs, and metadata from a machine learning model—who ran the model, the version of the model, how many times the model was run, and who received the results
Stored procedures in Vertica can also operate on objects that require higher privileges than that of the caller. An optional parameter allows procedures to run using the privileges of the definer, allowing callers to perform sensitive operations in a controlled way.
Viewing stored procedures
To view existing stored procedures, see USER_PROCEDURES.
=> SELECT * FROM USER_PROCEDURES;
procedure_name | owner | language | security | procedure_arguments | schema_name
------------------+---------+----------+----------+------------------------------+-------------
simple_procedure | dbadmin | PL/vSQL | INVOKER | INOUT x int, INOUT y varchar | public
raiseXY | dbadmin | PL/vSQL | INVOKER | INOUT x int, INOUT y int | public
(2 rows)
To view the the source code for stored procedures, export them with EXPORT_OBJECTS.
To export a particular implementation of a stored procedure, specify either the types or both the names and types of the procedure's formal parameters. The following example specifies the types:
=> SELECT EXPORT_OBJECTS('','raiseXY(int, int)');
EXPORT_OBJECTS
----------------------
CREATE PROCEDURE public.raiseXY(x int, y int)
LANGUAGE 'PL/vSQL'
SECURITY INVOKER
AS '
BEGIN
RAISE NOTICE ''x = %'', x;
RAISE NOTICE ''y = %'', y;
-- some processing statements
END
';
SELECT MARK_DESIGN_KSAFE(0);
(1 row)
To export all implementations of an overloaded stored procedure, export its parent schema:
=> SELECT EXPORT_OBJECTS('','public');
EXPORT_OBJECTS
----------------------
...
CREATE PROCEDURE public.raiseXY(x int, y varchar)
LANGUAGE 'PL/vSQL'
SECURITY INVOKER
AS '
BEGIN
RAISE NOTICE ''x = %'', x;
RAISE NOTICE ''y = %'', y;
-- some processing statements
END
';
CREATE PROCEDURE public.raiseXY(x int, y int)
LANGUAGE 'PL/vSQL'
SECURITY INVOKER
AS '
BEGIN
RAISE NOTICE ''x = %'', x;
RAISE NOTICE ''y = %'', y;
-- some processing statements
END
';
SELECT MARK_DESIGN_KSAFE(0);
(1 row)
Known issues and workarounds
-
You cannot use PERFORM CREATE FUNCTION to create a SQL macro.
Workaround
Use EXECUTE to make SQL macros inside a stored procedure:
CREATE PROCEDURE procedure_name()
LANGUAGE PLvSQL AS $$
BEGIN
EXECUTE 'macro';
end;
$$;
where macro
is the creation statement for a SQL macro. For example, this procedure creates the argmax macro:
=> CREATE PROCEDURE make_argmax() LANGUAGE PLvSQL AS $$
BEGIN
EXECUTE
'CREATE FUNCTION
argmax(x int) RETURN int AS
BEGIN
RETURN (CASE WHEN (x IS NOT NULL) THEN x ELSE 0 END);
END';
END;
$$;
-
Non-error exceptions in embedded SQL statements are not reported.
-
DECIMAL, NUMERIC, NUMBER, MONEY, and UUID data types cannot yet be used for arguments.
-
Cursors should capture the variable context at declaration time, but they currently capture the variable context at open time.
-
DML queries on tables with key constraints cannot yet return a value.
Workaround
Rather than:
DO $$
DECLARE
y int;
BEGIN
y := UPDATE tbl WHERE col1 = 3 SET col2 = 4;
END;
$$
Check the result of the DML query with SELECT:
DO $$
DECLARE
y int;
BEGIN
y := SELECT COUNT(*) FROM tbl WHERE col1 = 3;
PERFORM UPDATE tbl SET col2 = 4 WHERE col1 = 3;
END;
$$;
1 - PL/vSQL
PL/vSQL is a powerful and expressive procedural language for creating reusable procedures, manipulating data, and simplifying otherwise complex database routines.
PL/vSQL is a powerful and expressive procedural language for creating reusable procedures, manipulating data, and simplifying otherwise complex database routines.
Vertica PL/vSQL is largely compatible with PostgreSQL PL/pgSQL, with minor semantic differences. For details on migrating your PostgreSQL PL/pgSQL stored procedures to Vertica, see the PL/pgSQL to PL/vSQL migration guide.
For real-world, practical examples of PL/vSQL usage, see Stored procedures: use cases and examples.
1.1 - Supported types
Vertica PL/vSQL supports non-complex data types.
Vertica PL/vSQL supports non-complex data types. The following types are supported as variables only and not as arguments:
-
DECIMAL
-
NUMERIC
-
NUMBER
-
MONEY
-
UUID
1.2 - Scope and structure
PL/vSQL uses block scope, where a block has the following structure:.
PL/vSQL uses block scope, where a block has the following structure:
[ <<label>> ]
[ DECLARE
declarations ]
BEGIN
statements
...
END [ label ];
Declarations
Variable declarations
in the DECLARE block are structured as:
variable_name [ CONSTANT ] data_type [ NOT NULL ] [:= { expression | statement } ];
variable_name |
Variable names must meet the following requirements:
|
CONSTANT |
Defines the variable as a constant (immutable). You can only set a constant variable's value during initialization. |
data_type |
The variable data type. PL/vSQL supports non-complex data types, with the following exceptions:
You can optionally reference a particular column's data type:
variable_name table_name.column_name%TYPE ;
|
NOT NULL |
Specifies that the variable cannot hold a NULL value. If declared with NOT NULL, the variable must be initialized (otherwise, throws ERRCODE_SYNTAX_ERROR) and cannot be assigned NULL (otherwise, throws ERRCODE_WRONG_OBJECT_TYPE). |
:= expression |
Initializes a variable with expression or statement.
If the variable is declared with NOT NULL , expression is required.
Variable declarations in a given block execute sequentially, so old declarations can be referenced by newer ones. For example:
DECLARE
x int := 3;
y int := x;
Default (uninitialized): NULL
|
Aliases
Aliases are alternate names for the same variable. An alias of a variable is not a copy, and changes made to either reference affect the same underlying variable.
new_name ALIAS FOR variable;
Here, the identifier y
is now an alias for variable x
, and changes to y
are reflected in x
.
DO $$
DECLARE
x int := 3;
y ALIAS FOR x;
BEGIN
y := 5; -- since y refers to x, x = 5
RAISE INFO 'x = %, y = %', x, y;
END;
$$;
INFO 2005: x = 5, y = 5
BEGIN and nested blocks
BEGIN contains statements
. A statement is defined as a line or block of PL/vSQL.
Variables declared in inner blocks shadow those declared in outer blocks. To unambiguously specify a variable in a particular block, you can name the block with a label
(case-insensitive), and then reference the variable declared in that block with:
label.variable_name
For example, specifying the variable x
from inside the inner block implicitly refers to inner_block.x
rather than outer_block.x
because of shadowing:
<<outer_block>>
DECLARE
x int;
BEGIN
<<inner_block>>
DECLARE
x int;
BEGIN
x := 1000; -- implicitly specifies x in inner_block because of shadowing
OUTER_BLOCK.x := 0; -- specifies x in outer_block; labels are case-insensitive
END inner_block;
END outer_block;
NULL statement
The NULL statement does nothing. This can be useful as a placeholder statement or a way to show that a code block is intentionally empty. For example:
DO $$
BEGIN
NULL;
END;
$$
Comments have the following syntax. You cannot nest comments.
-- single-line comment
/* multi-line
comment
*/
Nested stored procedures
Stored procedures that call other stored procedures, also called nested stored procedures, can be useful for simplifying complex functions and reusing code.
You can enable nested stored procedures by setting the EnableNestedStoredProcedures configuration parameter (disabled by default):
--Enable nested calls
=> ALTER DATABASE DEFAULT SET EnableNestedStoredProcedures = 1;
--Disable nested calls
=> ALTER DATABASE DEFAULT SET EnableNestedStoredProcedures = 0;
In the following example, proc2()
calls proc1()
to insert values into a table:
CREATE PROCEDURE proc1() AS
$$
BEGIN PERFORM INSERT INTO t_int VALUES(2023);
END;
$$;
CREATE PROCEDURE proc2() AS $$
BEGIN
PERFORM CREATE TABLE IF NOT EXISTS t_int(x int);
PERFORM CALL proc1();
END;
$$;
You can also use this feature to call meta-functions:
CREATE PROCEDURE RUN_ANALYZE_STATS() AS
$$
BEGIN PERFORM SELECT analyze_statistics('');
END;
$$;
CALL run_analyze_stats();
Depth limits
Stored procedures can only be nested up to a depth of 50. If a stored procedure exceeds the call depth, the entire operation is terminated and rolled back.
The stored procedure recursive_proc()
calls itself to insert sequential values into a table, but it has no condition to stop before the depth limit. Calling the procedure causes a rollback and no changes are made to the table:
=> CREATE TABLE numbers (n INT);
=> SELECT * FROM numbers;
n
---
(0 rows)
=> CREATE OR REPLACE PROCEDURE recursive_proc(x int) AS
$$
BEGIN
PERFORM INSERT INTO numbers VALUES(x + 1);
PERFORM CALL recursive_proc(x + 1);
END;
$$;
=> CALL recursive_proc(0);
ERROR 0: Nested stored procedure call exceeds call depth limit
CONTEXT: PL/vSQL procedure recursive_proc line 4 at static SQL
PL/vSQL procedure recursive_proc line 4 at static SQL
PL/vSQL procedure recursive_proc line 4 at static SQL
...
=> SELECT * FROM numbers;
n
---
(0 rows)
1.3 - Embedded SQL
You can embed and execute SQL and from within stored procedures.
You can embed and execute SQL statements and expressions from within stored procedures.
Assignment
To save the value of an expression or returned value, you can assign it to a variable:
variable_name := expression;
variable_name := statement;
For example, this procedure assigns 3 into i
and 'message' into v
.
=> CREATE PROCEDURE performless_assignment() LANGUAGE PLvSQL AS $$
DECLARE
i int;
v varchar;
BEGIN
i := SELECT 3;
v := 'message';
END;
$$;
This type of assignment will fail if the query returns no rows or more than one row. For returns of multiple rows, use LIMIT or truncating assignment:
=> SELECT * FROM t1;
b
---
t
f
f
(3 rows)
=> CREATE PROCEDURE more_than_one_row() LANGUAGE PLvSQL as $$
DECLARE
x boolean;
BEGIN
x := SELECT * FROM t1;
END;
$$;
CREATE PROCEDURE
=> CALL more_than_one_row();
ERROR 10332: Query returned multiple rows where 1 was expected
Truncating assignment
Truncating assignment stores in a variable the first row returned by a query. Row order is nondeterministic unless you specify an ORDER BY clause:
variable_name <- expression;
variable_name <- statement;
The following procedure takes the first row of the results returned by the specified query and assigns it to x
:
=> CREATE PROCEDURE truncating_assignment() LANGUAGE PLvSQL AS $$
DECLARE
x boolean;
BEGIN
x <- SELECT * FROM t1 ORDER BY b DESC; -- x is now assigned the first row returned by the SELECT query
END;
$$;
The PERFORM keyword runs a SQL statement or expression and discards the returned result.
PERFORM statement;
PERFORM expression;
For example, this procedure inserts a value into a table. INSERT returns the number of rows inserted, so you must pair it with PERFORM.
=> DO $$
BEGIN
PERFORM INSERT INTO coordinates VALUES(1,2,3);
END;
$$;
Note
If a SQL statement has no return value or you don't assign the return value to a variable, you must use PERFORM.
EXECUTE
EXECUTE allows you to dynamically construct a SQL query during execution:
EXECUTE command_expression [ USING expression [, ... ] ];
command_expression
is a SQL expression that can reference PL/vSQL variables and evaluates to a string literal. The string literal is executed as a SQL statement, and $1, $2, ... are substituted with the corresponding *expression
*s.
Constructing your query with PL/vSQL variables can be dangerous and expose your system to SQL injection, so wrap them with QUOTE_IDENT, QUOTE_LITERAL, and QUOTE_NULLABLE.
The following procedure constructs a query with a WHERE clause:
DO $$
BEGIN
EXECUTE 'SELECT * FROM t1 WHERE x = $1' USING 10; -- becomes WHERE x = 10
END;
$$;
The following procedure creates a user with a password from the username
and password
arguments. Because the constructed CREATE USER statement uses variables, use the functions QUOTE_IDENT and QUOTE_LITERAL, concatenating them with ||.
=> CREATE PROCEDURE create_user(username varchar, password varchar) LANGUAGE PLvSQL AS $$
BEGIN
EXECUTE 'CREATE USER ' || QUOTE_IDENT(username) || ' IDENTIFIED BY ' || QUOTE_LITERAL(password);
END;
$$;
EXECUTE is a SQL statement, so you can assign it to a variable or pair it with PERFORM:
variable_name:= EXECUTE command_expression;
PERFORM EXECUTE command_expression;
FOUND (special variable)
The special boolean variable FOUND is initialized as false and assigned true or false based on whether:
-
A statement (but not expression) returns results with non-zero number of rows, or
-
A FOR loop iterates at least once
You can use FOUND to distinguish between a NULL and 0-row return.
Special variables exist between the scope of a procedure's argument and the outermost block of its definition. This means that:
The following procedure demonstrates how FOUND changes. Before the SELECT statement, FOUND is false; after the SELECT statement, FOUND is true.
=> DO $$
BEGIN
RAISE NOTICE 'Before SELECT, FOUND = %', FOUND;
PERFORM SELECT 1; -- SELECT returns 1
RAISE NOTICE 'After SELECT, FOUND = %', FOUND;
END;
$$;
NOTICE 2005: Before SELECT, FOUND = f
NOTICE 2005: After SELECT, FOUND = t
Similarly, UPDATE, DELETE, and INSERT return the number of rows affected. In the next example, UPDATE doesn't change any rows, but returns the value 0 to indicate that no rows were affected, so FOUND is set to true:
=> SELECT * t1;
a | b
-----+-----
100 | abc
(1 row)
DO $$
BEGIN
PERFORM UPDATE t1 SET a=200 WHERE b='efg'; -- no rows affected since b doesn't contain 'efg'
RAISE INFO 'FOUND = %', FOUND;
END;
$$;
INFO 2005: FOUND = t
FOUND starts as false and is set to true if the loop iterates at least once:
=> DO $$
BEGIN
RAISE NOTICE 'FOUND = %', FOUND;
FOR i IN RANGE 1..1 LOOP -- RANGE is inclusive, so iterates once
RAISE NOTICE 'i = %', i;
END LOOP;
RAISE NOTICE 'FOUND = %', FOUND;
END;
$$;
NOTICE 2005: FOUND = f
NOTICE 2005: FOUND = t
DO $$
BEGIN
RAISE NOTICE 'FOUND = %', FOUND;
FOR i IN RANGE 1..0 LOOP
RAISE NOTICE 'i = %', i;
END LOOP;
RAISE NOTICE 'FOUND = %', FOUND;
END;
$$;
NOTICE 2005: FOUND = f
NOTICE 2005: FOUND = f
1.4 - Control flow
Control flow constructs give you control over how many times and under what conditions a block of statements should run.
Control flow constructs give you control over how many times and under what conditions a block of statements should run.
Conditionals
IF/ELSIF/ELSE
IF/ELSIF/ELSE statements let you perform different actions based on a specified condition.
IF condition_1 THEN
statement_1;
[ ELSIF condition_2 THEN
statement_2 ]
...
[ ELSE
statement_n; ]
END IF;
Vertica successively evaluates each condition as a boolean until it finds one that's true, then executes the block of statements and exits the IF statement. If no conditions are true, it executes the ELSE block, if one exists.
IF i = 3 THEN...
ELSIF 0 THEN...
ELSIF true THEN...
ELSIF x <= 4 OR x >= 10 THEN...
ELSIF y = 'this' AND z = 'THAT' THEN...
For example, this procedure demonstrates a simple IF...ELSE branch. Because b
is declared to be true, Vertica executes the first branch.
=> DO LANGUAGE PLvSQL $$
DECLARE
b bool := true;
BEGIN
IF b THEN
RAISE NOTICE 'true branch';
ELSE
RAISE NOTICE 'false branch';
END IF;
END;
$$;
NOTICE 2005: true branch
CASE
CASE expressions are often more readable than IF...ELSE chains. After executing a CASE expression's branch, control jumps to the statement after the enclosing END CASE.
PL/vSQL CASE expressions are more flexible and powerful than SQL case expressions, but the latter are more efficient; you should favor SQL case expressions when possible.
CASE [ search_expression ]
WHEN expression_1 [, expression_2, ...] THEN
when_statements
[ ... ]
[ ELSE
else_statements ]
END CASE;
search_expression
is evaluated once and then compared with expression_n
in each branch from top to bottom. If search_expression
and a given expression_n
are equal, then Vertica executes the WHEN block for expression_n
and exits the CASE block. If no matching expression is found, the ELSE branch is executed, if one exists.
Case expressions must have either a matching case or an ELSE branch, otherwise Vertica throws a CASE_NOT_FOUND error.
If you omit search_expression
, its value defaults to true
.
For example, this procedure plays the game FizzBuzz, printing Fizz if the argument is divisible by 3, Buzz if the argument is divisible by 5, FizzBuzz if the if the argument is divisible by 3 and 5.
=> CREATE PROCEDURE fizzbuzz(IN x int) LANGUAGE PLvSQL AS $$
DECLARE
fizz int := x % 3;
buzz int := x % 5;
BEGIN
CASE fizz
WHEN 0 THEN -- if fizz = 0, execute WHEN block
CASE buzz
WHEN 0 THEN -- if buzz = 0, execute WHEN block
RAISE INFO 'FizzBuzz';
ELSE -- if buzz != 0, execute WHEN block
RAISE INFO 'Fizz';
END CASE;
ELSE -- if fizz != 0, execute ELSE block
CASE buzz
WHEN 0 THEN
RAISE INFO 'Buzz';
ELSE
RAISE INFO '';
END CASE;
END CASE;
END;
$$;
=> CALL fizzbuzz(3);
INFO 2005: Fizz
=> CALL fizzbuzz(5);
INFO 2005: Buzz
=> CALL fizzbuzz(15);
INFO 2005: FizzBuzz
Loops
Loops repeatedly execute a block of code until a given condition is satisfied.
WHILE
A WHILE loop checks a given condition and, if the condition is true, it executes the loop body, after which the condition is checked again: if true, the loop body executes again; if false, control jumps to the end of the loop body.
[ <<label>> ]
WHILE condition LOOP
statements;
END LOOP;
For example, this procedure computes the factorial of the argument:
=> CREATE PROCEDURE factorialSP(input int) LANGUAGE PLvSQL AS $$
DECLARE
i int := 1;
output int := 1;
BEGIN
WHILE i <= input loop
output := output * i;
i := i + 1;
END LOOP;
RAISE INFO '%! = %', input, output;
END;
$$;
=> CALL factorialSP(5);
INFO 2005: 5! = 120
LOOP
This type of loop is equivalent to WHILE true
and only terminates if it encounters a RETURN or EXIT statement, or if an exception is thrown.
[ <<label>> ]
LOOP
statements;
END LOOP;
For example, this procedure prints the integers from counter
up to upper_bound
, inclusive:
DO $$
DECLARE
counter int := 1;
upper_bound int := 3;
BEGIN
LOOP
RAISE INFO '%', counter;
IF counter >= upper_bound THEN
RETURN;
END IF;
counter := counter + 1;
END LOOP;
END;
$$;
INFO 2005: 1
INFO 2005: 2
INFO 2005: 3
FOR
FOR loops iterate over a collection, which can be an integral range, query, or cursor.
If a FOR loop iterates at least once, the special FOUND variable is set to true after the loop ends. Otherwise, FOUND is set to false.
The FOUND variable can be useful for distinguishing between a NULL and 0-row return, or creating an IF branch if a LOOP didn't run.
FOR (RANGE)
A FOR (RANGE) loop iterates over a range of integers specified by the expressions left
and right
.
[ <<label>> ]
FOR loop_counter IN RANGE [ REVERSE ] left..right [ BY step ] LOOP
statements
END LOOP [ label ];
loop_counter
:
loop_counter
iterates from left
to right
(inclusive), incrementing by step
at the end of each iteration.
The REVERSE
option instead iterates from right
to left
(inclusive), decrementing by step
.
For example, here is a standard ascending FOR loop with step
= 1:
=> DO $$
BEGIN
FOR i IN RANGE 1..4 LOOP -- loop_counter i does not have to be declared
RAISE NOTICE 'i = %', i;
END LOOP;
RAISE NOTICE 'after loop: i = %', i; -- fails
END;
$$;
NOTICE 2005: i = 1
NOTICE 2005: i = 2
NOTICE 2005: i = 3
NOTICE 2005: i = 4
ERROR 2624: Column "i" does not exist -- loop_counter i is only available inside the FOR loop
Here, the loop_counter
i
starts at 4 and decrements by 2 at the end of each iteration:
=> DO $$
BEGIN
FOR i IN RANGE REVERSE 4..0 BY 2 LOOP
RAISE NOTICE 'i = %', i;
END LOOP;
END;
$$;
NOTICE 2005: i = 4
NOTICE 2005: i = 2
NOTICE 2005: i = 0
FOR (query)
A FOR (QUERY) loop iterates over the results of a query.
[ <<label>> ]
FOR target IN QUERY statement LOOP
statements
END LOOP [ label ];
You can include an ORDER BY clause in the query to make the ordering deterministic.
Unlike FOR (RANGE) loops, you must declare the target
variables. The values of these variables persist after the loop ends.
For example, suppose given the table tuple
:
=> SELECT * FROM tuples ORDER BY x ASC;
x | y | z
---+---+---
1 | 2 | 3
4 | 5 | 6
7 | 8 | 9
(3 rows)
This procedure retrieves the tuples in each row and stores them in the variables a
, b
, and c
, and prints them after each iteration:
=>
=> DO $$
DECLARE
a int; -- target variables must be declared
b int;
c int;
i int := 1;
BEGIN
FOR a,b,c IN QUERY SELECT * FROM tuples ORDER BY x ASC LOOP
RAISE NOTICE 'iteration %: a = %, b = %, c = %', i,a,b,c;
i := i + 1;
END LOOP;
RAISE NOTICE 'after loop: a = %, b = %, c = %', a,b,c;
END;
$$;
NOTICE 2005: iteration 1: a = 1, b = 2, c = 3
NOTICE 2005: iteration 2: a = 4, b = 5, c = 6
NOTICE 2005: iteration 3: a = 7, b = 8, c = 9
NOTICE 2005: after loop: a = 7, b = 8, c = 9
You can also use a query constructed dynamically with EXECUTE:
[ <<label>> ]
FOR target IN EXECUTE 'statement' [ USING expression [, ... ] ] LOOP
statements
END LOOP [ label ];
The following procedure uses EXECUTE to construct a FOR (QUERY) loop and stores the results of that SELECT statement in the variables x
and y
. The result set of a statement like this has only one row, so it only iterates once.
=> SELECT 'first string', 'second string';
?column? | ?column?
--------------+---------------
first string | second string
(1 row)
=> DO $$
DECLARE
x varchar; -- target variables must be declared
y varchar;
BEGIN
-- substitute the placeholders $1 and $2 with the strings
FOR x, y IN EXECUTE 'SELECT $1, $2' USING 'first string', 'second string' LOOP
RAISE NOTICE '%', x;
RAISE NOTICE '%', y;
END LOOP;
END;
$$;
NOTICE 2005: first string
NOTICE 2005: second string
FOR (cursor)
A FOR (CURSOR) loop iterates over a bound, unopened cursor, executing some set of statements
for each iteration.
[ <<label>> ]
FOR loop_variable [, ...] IN CURSOR bound_unopened_cursor [ ( [ arg_name := ] arg_value [, ...] ) ] LOOP
statements
END LOOP [ label ];
This type of FOR loop opens the cursor at start of the loop and closes at the end.
For example, this procedure creates a cursor c
. The procedure passes 6
as an argument to the cursor, so the cursor only retrieves rows where the y-coordinate is 6, storing the coordinates in the variables x_
, y_
, and z_
and printing them at the end of each iteration:
=> SELECT * FROM coordinates;
x | y | z
----+---+----
14 | 6 | 19
1 | 6 | 2
10 | 6 | 39
10 | 2 | 1
7 | 1 | 10
67 | 1 | 77
(6 rows)
DO $$
DECLARE
c CURSOR (key int) FOR SELECT * FROM coordinates WHERE y=key;
x_ int;
y_ int;
z_ int;
BEGIN
FOR x_,y_,z_ IN CURSOR c(6) LOOP
RAISE NOTICE 'cursor returned %,%,% FOUND=%', x_,y_,z_,FOUND;
END LOOP;
RAISE NOTICE 'after loop: %,%,% FOUND=%', x_,y_,z_,FOUND;
END;
$$;
NOTICE 2005: cursor returned 14,6,19 FOUND=f -- FOUND is only set after the loop ends
NOTICE 2005: cursor returned 1,6,2 FOUND=f
NOTICE 2005: after loop: 10,6,39 FOUND=t -- x_, y_, and z_ retain their values, FOUND is now true because the FOR loop iterated at least once
Manipulating loops
RETURN
You can exit the entire procedure (and therefore the loop) with RETURN. RETURN is an optional statement and can be added to signal to readers the end of a procedure.
RETURN;
EXIT
Similar to a break
or labeled break
in other programming languages, EXIT statements let you exit a loop early, optionally specifying:
-
loop_label
: the name of the loop to exit from
-
condition
: if the condition
is true
, execute the EXIT statement
EXIT [ loop_label ] [ WHEN condition ];
CONTINUE
CONTINUE skips to the next iteration of the loop without executing statements that follow the CONTINUE itself. You can specify a particular loop with loop_label
:
CONTINUE [loop_label] [ WHEN condition ];
For example, this procedure doesn't print during its first two iterations because the CONTINUE statement executes and moves on to the next iteration of the loop before control reaches the RAISE NOTICE statement:
=> DO $$
BEGIN
FOR i IN RANGE 1..5 LOOP
IF i < 3 THEN
CONTINUE;
END IF;
RAISE NOTICE 'i = %', i;
END LOOP;
END;
$$;
NOTICE 2005: i = 3
NOTICE 2005: i = 4
NOTICE 2005: i = 5
1.5 - Errors and diagnostics
ASSERT is a debugging feature that checks whether a condition is true.
ASSERT
ASSERT is a debugging feature that checks whether a condition is true
. If the condition is false
, ASSERT raises an ASSERT_FAILURE
exception with an optional error message.
To escape a '
(single quote) character, use ''
. Similarly, to escape a "
(double quote) character, use ""
.
ASSERT condition [ , message ];
For example, this procedure checks the number of rows in the products
table and uses ASSERT to check that the table is populated. If the table is empty, Vertica raises an error:
=> CREATE TABLE products(id UUID, name VARCHARE, price MONEY);
CREATE TABLE
=> SELECT * FROM products;
id | name | price
----+------+-------
(0 rows)
DO $$
DECLARE
prod_count INT;
BEGIN
prod_count := SELECT count(*) FROM products;
ASSERT prod_count > 0, 'products table is empty';
END;
$$;
ERROR 2005: products table is empty
To stop Vertica from checking ASSERT statements, you can set the boolean session-level parameter PLpgSQLCheckAsserts
.
RAISE
RAISE can throw errors or print a user-specified error message, one of the following:
RAISE [ level ] 'format' [, arg_expression [, ... ]] [ USING option = expression [, ... ] ];
RAISE [ level ] condition_name [ USING option = expression [, ... ] ];
RAISE [ level ] SQLSTATE 'sql-state' [ USING option = expression [, ... ] ];
RAISE [ level ] USING option = expression [, ... ];
level |
VARCHAR, one of the following:
-
LOG: Sends the format to vertica.log
-
INFO: Prints an INFO message in VSQL
-
NOTICE: Prints a NOTICE in VSQL
-
WARNING: Prints a WARNING in VSQL
-
EXCEPTION: Throws catchable exception
Default: EXCEPTION
|
format |
VARCHAR, a string literal error message where the percent character % is substituted with the *arg_expression *s. %% escapes the substitution and results in a single % in plaintext.
If the number of % characters doesn't equal the number of arguments, Vertica throws an error.
To escape a ' (single quote) character, use '' . Similarly, to escape a " (double quote) character, use "" .
|
arg_expression |
An expression that substitutes for the percent character (% ) in the format string. |
option = expression |
option must be one of the following and paired with an expression that elaborates on the option :
option |
expression content |
MESSAGE |
An error message.
Default: the ERRCODE associated with the exception
|
DETAIL |
Details about the error. |
HINT |
A hint message. |
ERRCODE |
The error code to report, one of the following:
-
A condition name specified in the description column of the SQL state list (with optional ERRCODE_ prefix)
-
A code that satisfies the SQLSTATE formatting: 5-character sequence of numbers and capital letters (not necessarily on the SQL State List)
Default: ERRCODE_RAISE_EXCEPTION (V0002)
|
COLUMN |
A column name relevant to the error |
CONSTRAINT |
A constraint relevant to the error |
DATATYPE |
A data type relevant to the error |
TABLE |
A table name relevant to the error |
SCHEMA |
A schema name relevant to the error |
|
This procedure demonstrates various RAISE levels:
=> DO $$
DECLARE
logfile varchar := 'vertica.log';
BEGIN
RAISE LOG 'this message was sent to %', logfile;
RAISE INFO 'info';
RAISE NOTICE 'notice';
RAISE WARNING 'warning';
RAISE EXCEPTION 'exception';
RAISE NOTICE 'exception changes control flow; this is not printed';
END;
$$;
INFO 2005: info
NOTICE 2005: notice
WARNING 2005: warning
ERROR 2005: exception
$ grep 'this message was sent to vertica.log' v_vmart_node0001_catalog/vertica.log
<LOG> @v_vmart_node0001: V0002/2005: this message is sent to vertica.log
Exceptions
EXCEPTION blocks let you catch and handle exceptions that might get thrown from statements
:
[ <<label>> ]
[ DECLARE
declarations ]
BEGIN
statements
EXCEPTION
WHEN exception_condition [ OR exception_condition ... ] THEN
handler_statements
[ WHEN exception_condition [ OR exception_condition ... ] THEN
handler_statements
... ]
END [ label ];
exception_condition
has one of the following forms:
WHEN errcode_division_by_zero THEN ...
WHEN division_by_zero THEN ...
WHEN SQLSTATE '22012' THEN ...
WHEN OTHERS THEN ...
OTHERS is a special condition that catches all exceptions except QUERY_CANCELLED
, ASSERT_FAILURE
, and FEATURE_NOT_SUPPORTED
.
When an exception is thrown, Vertica checks the list of exceptions for a matching exception_condition
from top to bottom. If it finds a match, it executes the handler_statements
and then leaves the exception block's scope.
If Vertica can't find a match, it propagates the exception up to the next enclosing block. You can do this manually within an exception handler with RAISE:
RAISE;
For example, the following procedure divides 3 by 0 in the inner_block
, which is an illegal operation that throws the exception division_by_zero
with SQL state 22012. Vertica checks the inner EXCEPTION block for a matching condition:
-
The first condition checks for SQL state 42501, so Vertica, so Vertica moves to the next condition.
-
WHEN OTHERS THEN catches all exceptions, so it executes that block.
-
The bare RAISE then propagates the exception to the outer_block
.
-
The outer EXCEPTION block successfully catches the exception and prints a message.
=> DO $$
<<outer_block>>
BEGIN
<<inner_block>>
DECLARE
x int;
BEGIN
x := 3 / 0; -- throws exception division_by_zero, SQLSTATE 22012
EXCEPTION -- this block is checked first for matching exceptions
WHEN SQLSTATE '42501' THEN
RAISE NOTICE 'caught insufficient_privilege exception';
WHEN OTHERS THEN -- catches all exceptions
RAISE; -- manually propagate the exception to the next enclosing block
END inner_block;
EXCEPTION -- exception is propagated to this block
WHEN division_by_zero THEN
RAISE NOTICE 'caught division_by_zero exception';
END outer_block;
$$;
NOTICE 2005: caught division_by_zero exception
SQLSTATE and SQLERRM variables
When handling an exception, you can use the following variables to retrieve error information:
For details, see SQL state list.
This procedure catches the exception thrown by attempting to assign NULL to a NOT NULL variable and prints the SQL state and error message:
DO $$
DECLARE
i int NOT NULL := 1;
BEGIN
i := NULL; -- illegal, i was declared with NOT NULL
EXCEPTION
WHEN OTHERS THEN
RAISE WARNING 'SQL State: %', SQLSTATE;
RAISE WARNING 'Error message: %', SQLERRM;
END;
$$;
WARNING 2005: SQLSTATE: 42809
WARNING 2005: SQLERRM: Cannot assign null into NOT NULL variable
You can retrieve information about exceptions inside exception handlers with GET STACKED DIAGNOSTICS:
GET STACKED DIAGNOSTICS variable_name { = | := } item [, ... ];
Where item
can be any of the following:
item |
Description |
RETRUNED_SQLSTATE |
SQLSTATE error code of the exception |
COLUMN_NAME |
Name of the column related to exception |
CONSTRAINT_NAME |
Name of the constraint related to exception |
DATATYPE_NAME |
Name of the data type related to exception |
MESSAGE_TEXT |
Text of the exception's primary message |
TABLE_NAME |
Name of the table related to exception |
SCHEMA_NAME |
Name of the schema related to exception |
DETAIL_TEXT |
Text of the exception's detail message, if any |
HINT_TEXT |
Text of the exception's hint message, if any |
EXCEPTION_CONTEXT |
Description of the call stack at the time of the exception |
For example, this procedure has an EXCEPTION block that catches the division_by_zero
error and prints SQL state, error message, and the exception context:
=> DO $$
DECLARE
message_1 varchar;
message_2 varchar;
message_3 varchar;
x int;
BEGIN
x := 5 / 0;
EXCEPTION
WHEN OTHERS THEN -- OTHERS catches all exceptions
GET STACKED DIAGNOSTICS message_1 = RETURNED_SQLSTATE,
message_2 = MESSAGE_TEXT,
message_3 = EXCEPTION_CONTEXT;
RAISE INFO 'SQLSTATE: %', message_1;
RAISE INFO 'MESSAGE: %', message_2;
RAISE INFO 'EXCEPTION_CONTEXT: %', message_3;
END;
$$;
INFO 2005: SQLSTATE: 22012
INFO 2005: MESSAGE: Division by zero
INFO 2005: EXCEPTION_CONTEXT: PL/vSQL procedure inline_code_block line 8 at static SQL
1.6 - Cursors
A cursor is a reference to the result set of a query and allows you to view the results one row at a time.
A cursor is a reference to the result set of a query and allows you to view the results one row at a time. Cursors remember their positions in result sets, which can be one of the following:
-
a result row
-
before the first row
-
after the last row
You can also iterate over unopened, bound cursors with a FOR loop. See Control Flow for more information.
Declaring cursors
Bound cursors
To bind a cursor to a statement
on declaration, use the FOR keyword:
cursor_name CURSOR [ ( arg_name arg_type [, ...] ) ] FOR statement;
The arguments to a cursor give you more control over which rows to process. For example, suppose you have the following table:
=> SELECT * FROM coordinates_xy;
x | y
---+----
1 | 2
9 | 5
7 | 13
...
(100000 rows)
If you're only interested in the rows where y
is 6, you might declare the following cursor and then provide the argument 6
when you OPEN the cursor:
c CURSOR (key int) FOR SELECT * FROM coordinates_xy WHERE y=key;
Unbound cursors
To declare a cursor without binding it to a particular query, use the refcursor
type:
cursor_name refcursor;
You can bind an unbound cursor at any time with OPEN.
For example, to declare the cursor my_unbound_cursor
:
my_unbound_cursor refcursor;
Opening and closing cursors
OPEN
Opening a cursor executes the query with the given arguments, and puts the cursor before the first row of the result set. The ordering of query results (and therefore, the start of the result set) is non-deterministic, unless you specify an ORDER BY clause.
OPEN a bound cursor
To open a cursor that was bound during declaration:
OPEN bound_cursor [ ( [ arg_name := ] arg_value [, ...] ) ];
For example, given the following declaration:
c CURSOR (key int) FOR SELECT * FROM t1 WHERE y=key;
You can open the cursor with one of the following:
OPEN c(5);
OPEN c(key := 5);
CLOSE
Open cursors are automatically closed when the cursor leaves scope, but you can close the cursor preemptively with CLOSE. Closed cursors can be reopened later, which re-executes the query and prepares a new result set.
CLOSE cursor;
OPEN an unbound cursor
To bind an unbound cursor and then open it:
OPEN unbound_cursor FOR statement;
You can also use EXECUTE because it's a statement:
OPEN unbound_cursor FOR EXECUTE statement_string [ USING expression [, ... ] ];
For example, to bind the cursor c
to a query to a table product_data
:
OPEN c for SELECT * FROM product_data;
FETCH rows
FETCH statements:
-
Retrieve the row that the specified cursor currently points to and stores it in some variable.
-
Advance the cursor to the next position.
variable [, ...] := FETCH opened_cursor;
The retrieved value is stored in variable
. Rows typically have more than one value, so you can use one variable for each.
If FETCH successfully retrieves a value, the special variable FOUND is set to true
. Otherwise, if you call FETCH when the cursor is past the final row of the result set, it returns NULL and the special variable FOUND is set to false
.
The following procedure creates a cursor c
, binding it to a SELECT query on the coordinates
table. The procedures passes the argument 1 to the cursor, so the cursor only retrieves rows where the y-coordinate is 1, storing the coordinates in the variables x_
, y_
, and z_
.
Only two rows have a y-coordinate of 1, so after using FETCH twice, the third FETCH starts to return NULL values and FOUND is set to false
:
=> SELECT * FROM coordinates;
x | y | z
----+---+----
14 | 6 | 19
1 | 6 | 2
10 | 6 | 39
10 | 2 | 1
7 | 1 | 10
67 | 1 | 77
(6 rows)
DO $$
DECLARE
c CURSOR (key int) FOR SELECT * FROM coordinates WHERE y=key;
x_ int;
y_ int;
z_ int;
BEGIN
OPEN c(1); -- only retrieve rows where y=1
x_,y_,z_ := FETCH c;
RAISE NOTICE 'cursor returned %, %, %, FOUND=%',x_, y_, z_, FOUND;
x_,y_,z_ := FETCH c; -- fetches the last set of results and moves to the end of the result set
RAISE NOTICE 'cursor returned %, %, %, FOUND=%',x_, y_, z_, FOUND;
x_,y_,z_ := FETCH c; -- cursor has advanced past the final row
RAISE NOTICE 'cursor returned %, %, %, FOUND=%',x_, y_, z_, FOUND;
END;
$$;
NOTICE 2005: cursor returned 7, 1, 10, FOUND=t
NOTICE 2005: cursor returned 67, 1, 77, FOUND=t
NOTICE 2005: cursor returned <NULL>, <NULL>, <NULL>, FOUND=f
MOVE cursors
MOVE advances an open cursor to the next position without retrieving the row. The special FOUND variable is set to true
if the cursor's position (before MOVE) was not past the final row—that is, if calling FETCH instead of MOVE would have retrieved the row.
MOVE bound_cursor;
For example, this cursor only retrieves rows where the y-coordinate is 2. The result set is only one row, so using MOVE twice advances past the first (and last) row, setting FOUND to false:
=> SELECT * FROM coordinates WHERE y=2;
x | y | z
----+---+---
10 | 2 | 1
(1 row)
DO $$
DECLARE
c CURSOR (key int) FOR SELECT * FROM coordinates WHERE y=key;
BEGIN
OPEN c(2); -- only retrieve rows where y=2, cursor starts before the first row
MOVE c; -- cursor advances to the first (and last) row
RAISE NOTICE 'FOUND=%', FOUND; -- FOUND is true because the cursor points to a row in the result set
MOVE c; -- cursor advances past the final row
RAISE NOTICE 'FOUND=%', FOUND; -- FOUND is false because the cursor is past the final row
END;
$$;
NOTICE 2005: FOUND=t
NOTICE 2005: FOUND=f
1.7 - PL/pgSQL to PL/vSQL migration guide
While Vertica PL/vSQL is largely compatible with PostgreSQL PL/pgSQL, there are some easily-resolved semantic and SQL-level differences when migrating from PostgreSQL PL/pgSQL.
While Vertica PL/vSQL is largely compatible with PostgreSQL PL/pgSQL, there are some easily-resolved semantic and SQL-level differences when migrating from PostgreSQL PL/pgSQL.
Language-level differences
PL/vSQL differs from PL/pgSQL in the following ways:
-
You must use the PERFORM statement for SQL statements that return no value.
-
UPDATE/DELETE WHERE CURRENT OF is not currently supported.
-
FOR loops have additional keywords:
-
FOR (RANGE) loops: RANGE keyword
-
FOR (QUERY) loops: QUERY keyword
-
FOR (CURSOR) loops: CURSOR keyword
-
By default, NULL cannot be coerced to FALSE.
Coercing NULL to FALSE
NULL is not coercible to FALSE by default, and expressions that expect a BOOLEAN value throw an exception when given a NULL:
=> DO $$
BEGIN
IF NULL THEN -- BOOLEAN value expected for IF
END IF;
END;
$$;
ERROR 10268: Query returned null where a value was expected
To enable NULL-to-FALSE coercion, set the configuration parameter PLvSQLCoerceNull:
=> ALTER DATABASE DEFAULT SET PLvSQLCoerceNull = 1;
Planned features
Support for the following features is planned for a future release:
- Full transaction semantics: Currently, stored procedures only COMMIT changes after successful execution. This means that you cannot manually ROLLBACK. However, changes made by nested stored procedures are automatically rolled back if they reach the depth limit. You can also use PERFORM COMMIT to commit changes during execution.
- FOREACH (ARRAY) loops
- Using the following types as arguments:
- DECIMAL
- NUMERIC
- NUMBER
- MONEY
- UUID
- Non-forward moving cursors
- CONTEXT/EXCEPTION_CONTEXT for diagnostics
- The special variable ROW_COUNT: To work around this, you can rely on INSERT, UPDATE, and DELETE to return the number of rows affected:
=> CREATE TABLE t1(i int);
CREATE TABLE
=> DO $$
DECLARE
x int;
BEGIN
x := INSERT INTO t1 VALUES (200);
RAISE INFO 'rows inserted: %', x;
END;
$$;
INFO 2005: rows inserted: 1
SQL-level differences
Vertica differs from PostgreSQL in the following ways:
-
Some data types are different sizes—for example, the standard INTEGER type in Vertica is 8 bytes, but 4 bytes in PostgreSQL.
-
In Vertica, INSERT, UPDATE, and DELETE return the number of rows affected.
-
Certain SQLSTATE codes are different, which affects exception handling.
2 - Parameter modes
Each formal parameter of a stored procedure can be set to one of the following parameter modes. If unspecified, the parameter is set to IN:
Each formal parameter of a stored procedure can be set to one of the following parameter modes. If unspecified, the parameter is set to IN:
- IN: An input parameter. The caller uses this to pass arguments to the stored procedure.
- OUT: An output parameter. The stored procedure returns values as a result set, the columns of which are identified by the name of the output parameter.
- INOUT: An input and output parameter. The caller can both pass an argument to the stored procedure and retrieve it from the result set.
Function signatures and overloading
A function's signature is defined by the function's name and input parameter (IN and INOUT) types. You cannot create two stored procedures with the same function signature.
An overloaded function is a set of functions that share the same name but have different input parameters. This can be useful in cases where a function needs to operate on multiple types. For example, the two functions find_average(int, int, int)
and find_average(float, float, float)
describe the overloaded function find_average()
that operates on integers and floats.
When an overloaded procedure is called, Vertica runs the procedure whose signature matches the types of the arguments passed in the invocation.
The example procedures below use RAISE NOTICE for informational messages. For details, see Errors and diagnostics.
IN
IN parameters specify the name and type of an argument. For example, the caller of this procedure must pass in an INT and a VARCHAR value:
=> CREATE PROCEDURE raiseXY(IN x INT, y VARCHAR) LANGUAGE PLvSQL AS $$
BEGIN
RAISE NOTICE 'x = %', x;
RAISE NOTICE 'y = %', y;
END
$$;
CALL raiseXY(3, 'some string');
NOTICE 2005: x = 3
NOTICE 2005: y = some string
OUT
OUT parameters specify the name and type of a return value. When the procedure runs, the OUT parameter is added to the scope of the stored procedure as a variable. This variable's value is initialized as NULL and is returned in a result set after execution:
CREATE PROCEDURE sum_procedure(IN x INT, IN y INT, OUT z INT) LANGUAGE PLvSQL AS $$
BEGIN
RAISE NOTICE 'This procedure returns the sum of x and y as z:';
z := x + y;
END
$$;
=> CALL sum_procedure(38,19);
NOTICE 2005: This procedure returns the sum of x and y as z:
z
----
57
(1 row)
INOUT
INOUT parameters specify the name and type of both an input value and return value. When the procedure runs, the IN/OUT parameter is initialized to the value passed in by the caller and is returned in a result set after execution:
=> CREATE PROCEDURE echo(INOUT x INT) LANGUAGE PLvSQL AS $$
BEGIN
RAISE NOTICE 'This procedure returns its input:';
END
$$;
=> CALL echo(19);
NOTICE 2005: This procedure returns its input:
x
----
19
(1 row)
3 - Executing stored procedures
If you have EXECUTE privileges on a stored procedure, you can execute it with a CALL statement that specifies the procedure and its IN arguments.
If you have EXECUTE privileges on a stored procedure, you can execute it with a CALL statement that specifies the procedure and its IN arguments.
Syntax
CALL stored_procedure_name();
For example, the stored procedure raiseXY()
is defined as:
The following procedure echoes its inputs as a result set:
=> CREATE PROCEDURE echo_int_varchar(INOUT x INT, INOUT y VARCHAR) LANGUAGE PLvSQL AS $$
BEGIN
RAISE NOTICE 'This procedure outputs a result set of its inputs:';
END
$$;
=> CALL echo_int_varchar(3, 'a string');
NOTICE 2005: This procedure outputs a result set of its inputs:
x | y
---+----------
3 | a string
(1 row)
You can execute an anonymous (unnamed) procedure with DO. This requires no privileges:
=> DO $$
BEGIN
RAISE NOTICE '% ran an anonymous procedure', current_user();
END;
$$;
NOTICE 2005: Bob ran an anonymous procedure
Transaction semantics
Changes made by stored procedures are automatically committed after successful execution and are rolled back otherwise.
You can also manually commit changes in the middle of a stored procedure. If execution fails after the commit, the committed changes persist even after the automatic rollback.
In this example, manual_commit()
inserts two values into a table and then commits. The third insert attempts to insert a CHAR into an INT column, which causes the stored procedure to fail and triggers an automatic rollback. This rollback does not affect the first two inserts because they were manually committed:
=> CREATE TABLE numbers (n INT);
=> CREATE PROCEDURE manualcommit() AS
$$
BEGIN
PERFORM INSERT INTO numbers VALUES(1);
PERFORM INSERT INTO numbers VALUES(2);
PERFORM COMMIT;
PERFORM INSERT INTO numbers VALUES('a');
END;
$$;
=> CALL manualcommit();
ERROR 3681: Invalid input syntax for integer: "a"
CONTEXT: PL/vSQL procedure manualcommit line 6 at static SQL
=> SELECT * FROM numbers;
n
---
1
2
(2 rows)
Session semantics
Operations that modify the session persist after execution of the stored procedure, including ALTER SESSION and SET statements.
Limiting runtime
You can set the maximum runtime of a procedure with session parameter RUNTIMECAP.
This example sets the runtime of all stored procedures to one second for duration of session and runs an anonymous procedure with an infinite loop. Vertica terminates the procedure after it runs for more than one second:
=> SET SESSION RUNTIMECAP '1 SECOND';
=> DO $$
BEGIN
LOOP
END LOOP;
END;
$$;
ERROR 0: Query exceeded maximum runtime
HINT: Change the maximum runtime using SET SESSION RUNTIMECAP
Execution security and privileges
By default, stored procedures execute with the privileges of the caller (invoker), so callers must have the necessary privileges on the catalog objects accessed by the stored procedure. You can allow callers to execute the procedure with the privileges, default roles, user parameters, and user attributes (RESOURCE_POOL, MEMORY_CAP_KB, TEMP_SPACE_CAP_KB, RUNTIMECAP) of the definer by specifying DEFINER for the SECURITY option.
For example, the following procedure inserts a value into table s1.t1
. If the DEFINER has the required privileges (USAGE on the schema and INSERT on table), this requirement is waived for callers.
=> CREATE PROCEDURE insert_into_s1_t1(IN x int, IN y int)
LANGUAGE PLvSQL
SECURITY DEFINER AS $$
BEGIN
PERFORM INSERT INTO s1.t1 VALUES(x,y);
END;
$$;
A procedure with SECURITY DEFINER effectively executes the procedure as that user, so changes to the database appear to be performed by the procedure's definer rather than its caller.
Caution
Improper use of SECURITY DEFINER can lead to the
confused deputy problem and introduce vulnerabilities into your system like SQL injection.
Execution privileges for nested stored procedures
A stored procedure cannot call stored procedures that require additional privileges. For example, if a stored procedure executes with privileges A, B, and C, it cannot call a stored procedure that requires privileges C, D, and E.
For details on nested stored procedures, see Scope and structure.
Examples
In this example, this table:
records(i INT, updated_date TIMESTAMP DEFAULT sysdate, updated_by VARCHAR(128) DEFAULT current_user())
Contains the following content:
=> SELECT * FROM records;
i | updated_date | updated_by
---+----------------------------+------------
1 | 2021-08-27 15:54:05.709044 | Bob
2 | 2021-08-27 15:54:07.051154 | Bob
3 | 2021-08-27 15:54:08.301704 | Bob
(3 rows)
Bob creates a procedure to update the table and uses the SECURITY DEFINER option and grants EXECUTE on the procedure to Alice. Alice can now use the procedure to update the table without any additional privileges:
=> GRANT EXECUTE ON PROCEDURE update_records(int,int) to Alice;
GRANT PRIVILEGE
=> \c - Alice
You are now connected as user "Alice".
=> CALL update_records(99,1);
update_records
---------------
0
(1 row)
Because calls to update_records()
effectively run the procedure as Bob, Bob is listed as the updater of the table rather than Alice:
=> SELECT * FROM records;
i | updated_date | updated_by
----+----------------------------+------------
99 | 2021-08-27 15:55:42.936404 | Bob
2 | 2021-08-27 15:54:07.051154 | Bob
3 | 2021-08-27 15:54:08.301704 | Bob
(3 rows)
3.1 - Triggers
You can automate the execution of stored procedures with triggers.
You can automate the execution of stored procedures with triggers. A trigger listens to database events and executes its associated stored procedure when the events occur. You can use triggers with CREATE SCHEDULE to implement Scheduled execution.
Individual triggers can be enabled and disabled with ENABLE_TRIGGER, and can be manually executed with EXECUTE_TRIGGER.
3.1.1 - Scheduled execution
Stored procedures can be scheduled to execute automatically with the privileges of the trigger definer.
Stored procedures can be scheduled to execute automatically with the privileges of the trigger definer. You can use this to automate various tasks like logging database activity, revoking privileges, creating roles, or loading data.
Enabling and disabling scheduling
Scheduling can be toggled at the database level with the EnableStoredProcedureScheduler configuration parameter:
-- Enable scheduler
=> SELECT SET_CONFIG_PARAMETER('EnableStoredProcedureScheduler', 1);
-- Disable scheduler
=> SELECT SET_CONFIG_PARAMETER('EnableStoredProcedureScheduler', 0);
You can toggle an individual schedule ENABLE_TRIGGER, or disable it by dropping the schedule's associated trigger.
Scheduling a stored procedure
The general workflow for implementing scheduled execution for a single stored procedure is as follows:
-
Create a stored procedure.
-
Create a schedule. A schedule can either use a list of timestamps for one-off triggers or a cron
expression for recurring events.
-
Create a trigger, associating it with the stored procedure and trigger.
-
(Optional) Manually execute the trigger to test it.
One-off triggers
One-off triggers run a finite number of times.
The following example creates a trigger that revokes privileges on the customer_dimension table from the user Bob after 24 hours:
-
Create a stored procedure to revoke privileges from Bob:
=> CREATE OR REPLACE PROCEDURE revoke_all_on_table(table_name VARCHAR, user_name VARCHAR)
LANGUAGE PLvSQL
AS $$
BEGIN
EXECUTE 'REVOKE ALL ON ' || QUOTE_IDENT(table_name) || ' FROM ' || QUOTE_IDENT(user_name);
END;
$$;
-
Create a schedule with a timestamp for 24 hours later:
=> CREATE SCHEDULE 24_hours_later USING DATETIMES('2022-12-16 12:00:00');
-
Create a trigger with the stored procedure and schedule:
=> CREATE TRIGGER revoke_trigger ON SCHEDULE 24_hours_later EXECUTE PROCEDURE revoke_all_on_table('customer_dimension', 'Bob') AS DEFINER;
Recurring triggers
Recurring triggers run at a recurring date or time.
The following example creates a weekly trigger that logs to the USER_COUNT table the number of users in the database:
=> SELECT * FROM USER_COUNT;
total | timestamp
---------+-------------------------------
293 | 2022-12-04 00:00:00.346664-00
302 | 2022-12-11 00:00:00.782242-00
301 | 2022-12-18 00:00:00.144633-00
301 | 2022-12-25 00:00:00.548832-00
(4 rows)
-
Create the table to log the user counts:
=> CREATE TABLE USER_COUNT(total INT, timestamp TIMESTAMPTZ)
-
Create the stored procedure to log to the table:
=> CREATE OR REPLACE PROCEDURE log_user_count()
LANGUAGE PLvSQL
AS $$
DECLARE
num_users int := SELECT count (user_id) FROM users;
timestamp datetime := SELECT NOW();
BEGIN
PERFORM INSERT INTO USER_COUNT VALUES(num_users, timestamp);
END;
$$;
-
Create the schedule for 12:00 AM on Sunday:
=> CREATE SCHEDULE weekly_sunday USING CRON '0 0 * * 0';
-
Create the trigger with the stored procedure and schedule:
=> CREATE TRIGGER user_log_trigger ON SCHEDULE weekly_sunday EXECUTE PROCEDURE log_user_count() AS DEFINER;
Viewing upcoming schedules
Schedules are managed and coordinated by the Active Scheduler Node (ASN). If the ASN goes down, a different node is automatically designated as the new ASN. To view scheduled tasks, query SCHEDULER_TIME_TABLE on the ASN.
-
Determine the ASN with ACTIVE_SCHEDULER_NODE:
=> SELECT active_scheduler_node();
active_scheduler_node
-----------------------
initiator
(1 row)
-
On the ASN, query SCHEDULER_TIME_TABLE:
=> SELECT * FROM scheduler_time_table;
schedule_name | attached_trigger | scheduled_execution_time
-----------------+------------------+--------------------------
daily_1am_gmt | log_user_actions | 2022-12-15 01:00:00-00
24_hours_later | revoke_trigger | 2022-12-16 12:00:00-00
4 - Altering stored procedures
You can alter a stored procedure and retain its grants with ALTER PROCEDURE.
You can alter a stored procedure and retain its grants with ALTER PROCEDURE.
Examples
The examples below use the following procedure:
=> CREATE PROCEDURE echo_integer(IN x int) LANGUAGE PLvSQL AS $$
BEGIN
RAISE INFO 'x is %', x;
END;
$$;
By default, stored procedures execute with the privileges of the caller (invoker), so callers must have the necessary privileges on the catalog objects accessed by the stored procedure. You can allow callers to execute the procedure with the privileges, default roles, user parameters, and user attributes (RESOURCE_POOL, MEMORY_CAP_KB, TEMP_SPACE_CAP_KB, RUNTIMECAP) of the definer by specifying DEFINER for the SECURITY option.
To execute the procedure with privileges of the...
To change a procedure's source code:
=> ALTER PROCEDURE echo_integer(int) SOURCE TO $$
BEGIN
RAISE INFO 'the integer is: %', x;
END;
$$;
To change a procedure's owner (definer):
=> ALTER PROCEDURE echo_integer(int) OWNER TO u1;
To change a procedure's schema:
=> ALTER PROCEDURE echo_integer(int) SET SCHEMA s1;
To rename a procedure:
=> ALTER PROCEDURE echo_integer(int) RENAME TO echo_int;
5 - Stored procedures: use cases and examples
Stored procedures in Vertica are best suited for complex, analytical workflows rather than small, transaction-heavy ones.
Stored procedures in Vertica are best suited for complex, analytical workflows rather than small, transaction-heavy ones. Some recommended use cases include information lifecycle management (ILM) activities like extract, transform, and load (ETL), and data preparation for more complex analytical tasks like machine learning. For example:
-
Swapping partitions according to age
-
Exporting data at end-of-life and dropping the partitions
-
Saving inputs, outputs, and metadata from a machine learning model (e.g. who ran the model, the version of the model, how many times the model was run, who received the results, etc.) for auditing purposes
Searching for a value
The find_my_value()
procedure searches for a user-specified value in any table column in a given schema and stores the locations of instances of the value in a user-specified table:
=> CREATE PROCEDURE find_my_value(p_table_schema VARCHAR(128), p_search_value VARCHAR(1000), p_results_schema VARCHAR(128), p_results_table VARCHAR(128)) AS $$
DECLARE
sql_cmd VARCHAR(65000);
sql_cmd_result VARCHAR(65000);
results VARCHAR(65000);
BEGIN
IF p_table_schema IS NULL OR p_table_schema = '' OR
p_search_value IS NULL OR p_search_value = '' OR
p_results_schema IS NULL OR p_results_schema = '' OR
p_results_table IS NULL OR p_results_table = '' THEN
RAISE EXCEPTION 'Please provide a schema to search, a search value, a results table schema, and a results table name.';
RETURN;
END IF;
sql_cmd := 'CREATE TABLE IF NOT EXISTS ' || QUOTE_IDENT(p_results_schema) || '.' || QUOTE_IDENT(p_results_table) ||
'(found_timestamp TIMESTAMP, found_value VARCHAR(1000), table_name VARCHAR(128), column_name VARCHAR(128));';
sql_cmd_result := EXECUTE 'SELECT LISTAGG(c USING PARAMETERS max_length=1000000, separator='' '')
FROM (SELECT ''
(SELECT '''''' || NOW() || ''''''::TIMESTAMP , ''''' || QUOTE_IDENT(p_search_value) || ''''','''''' || table_name || '''''', '''''' || column_name || ''''''
FROM '' || table_schema || ''.'' || table_name || ''
WHERE '' || column_name || ''::'' ||
CASE
WHEN data_type_id IN (17, 115, 116, 117) THEN data_type
ELSE ''VARCHAR('' || LENGTH(''' || QUOTE_IDENT(p_search_value)|| ''') || '')'' END || '' = ''''' || QUOTE_IDENT(p_search_value) || ''''''' || DECODE(LEAD(column_name) OVER(ORDER BY table_schema, table_name, ordinal_position), NULL, '' LIMIT 1);'', '' LIMIT 1)
UNION ALL '') c
FROM (SELECT table_schema, table_name, column_name, ordinal_position, data_type_id, data_type
FROM columns WHERE NOT is_system_table AND table_schema ILIKE ''' || QUOTE_IDENT(p_table_schema) || ''' AND data_type_id < 1000
ORDER BY table_schema, table_name, ordinal_position) foo) foo;';
results := EXECUTE 'INSERT INTO ' || QUOTE_IDENT(p_results_schema) || '.' || QUOTE_IDENT(p_results_table) || ' ' || sql_cmd_result;
RAISE INFO 'Matches Found: %', results;
END;
$$;
For example, to search the public
schema for instances of the string 'dog
' and then store the results in public.table_list
:
=> CALL find_my_value('public', 'dog', 'public', 'table_list');
find_my_value
---------------
0
(1 row)
=> SELECT * FROM public.table_list;
found_timestamp | found_value | table_name | column_name
----------------------------+-------------+---------------+-------------
2021-08-25 22:13:20.147889 | dog | another_table | b
2021-08-25 22:13:20.147889 | dog | some_table | c
(2 rows)
Optimizing tables
You can automate loading data from Parquet files and optimizing your queries with the create_optimized_table()
procedure. This procedure:
-
Creates an external table whose structure is built from Parquet files using the Vertica INFER_TABLE_DDL function.
-
Creates a native Vertica table, like the external table, resizing all VARCHAR columns to the MAX length of the data to be loaded.
-
Creates a super projection using the optional segmentation/order by columns passed in as a parameter.
-
Adds an optional primary key to the native table passed in as a parameter.
-
Loads a sample data set (1 million rows) from the external table into the native table.
-
Drops the external table.
-
Runs the ANALYZE_STATISTICS function on the native table.
-
Runs the DESIGNER_DESIGN_PROJECTION_ENCODINGS function to get a properly encoded super projection for the native table.
-
Truncates the now-optimized native table (we will load the entire data set in a separate script/stored procedure).
=> CREATE OR REPLACE PROCEDURE create_optimized_table(p_file_path VARCHAR(1000), p_table_schema VARCHAR(128), p_table_name VARCHAR(128), p_seg_columns VARCHAR(1000), p_pk_columns VARCHAR(1000)) LANGUAGE PLvSQL AS $$
DECLARE
command_sql VARCHAR(1000);
seg_columns VARCHAR(1000);
BEGIN
-- First 3 parms are required.
-- Segmented and PK columns names, if present, must be Unquoted Identifiers
IF p_file_path IS NULL OR p_file_path = '' THEN
RAISE EXCEPTION 'Please provide a file path.';
ELSEIF p_table_schema IS NULL OR p_table_schema = '' THEN
RAISE EXCEPTION 'Please provide a table schema.';
ELSEIF p_table_name IS NULL OR p_table_name = '' THEN
RAISE EXCEPTION 'Please provide a table name.';
END IF;
-- Pass optional segmented columns parameter as null or empty string if not used
IF p_seg_columns IS NULL OR p_seg_columns = '' THEN
seg_columns := '';
ELSE
seg_columns := 'ORDER BY ' || p_seg_columns || ' SEGMENTED BY HASH(' || p_seg_columns || ') ALL NODES';
END IF;
-- Add '_external' to end of p_table_name for the external table and drop it if it already exists
EXECUTE 'DROP TABLE IF EXISTS ' || QUOTE_IDENT(p_table_schema) || '.' || QUOTE_IDENT(p_table_name) || '_external CASCADE;';
-- Execute INFER_TABLE_DDL to generate CREATE EXTERNAL TABLE from the Parquet files
command_sql := EXECUTE 'SELECT infer_table_ddl(' || QUOTE_LITERAL(p_file_path) || ' USING PARAMETERS format = ''parquet'', table_schema = ''' || QUOTE_IDENT(p_table_schema) || ''', table_name = ''' || QUOTE_IDENT(p_table_name) || '_external'', table_type = ''external'');';
-- Run the CREATE EXTERNAL TABLE DDL
EXECUTE command_sql;
-- Generate the Internal/ROS Table DDL and generate column lengths based on maximum column lengths found in external table
command_sql := EXECUTE 'SELECT LISTAGG(y USING PARAMETERS separator='' '')
FROM ((SELECT 0 x, ''SELECT ''''CREATE TABLE ' || QUOTE_IDENT(p_table_schema) || '.' || QUOTE_IDENT(p_table_name) || '('' y
UNION ALL SELECT ordinal_position, column_name || '' '' ||
CASE WHEN data_type LIKE ''varchar%''
THEN ''varchar('''' || (SELECT MAX(LENGTH('' || column_name || ''))
FROM '' || table_schema || ''.'' || table_name || '') || '''')'' ELSE data_type END || NVL2(LEAD('' || column_name || '', 1) OVER (ORDER BY ordinal_position), '','', '')'')
FROM columns WHERE table_schema = ''' || QUOTE_IDENT(p_table_schema) || ''' AND table_name = ''' || QUOTE_IDENT(p_table_name) || '_external''
UNION ALL SELECT 10000, ''' || seg_columns || ''' UNION ALL SELECT 10001, '';'''''') ORDER BY x) foo WHERE y <> '''';';
command_sql := EXECUTE command_sql;
EXECUTE command_sql;
-- Alter the Internal/ROS Table if primary key columns were passed as a parameter
IF p_pk_columns IS NOT NULL AND p_pk_columns <> '' THEN
EXECUTE 'ALTER TABLE ' || QUOTE_IDENT(p_table_schema) || '.' || QUOTE_IDENT(p_table_name) || ' ADD CONSTRAINT ' || QUOTE_IDENT(p_table_name) || '_pk PRIMARY KEY (' || p_pk_columns || ') ENABLED;';
END IF;
-- Insert 1M rows into the Internal/ROS Table, analyze stats, and generate encodings
EXECUTE 'INSERT INTO ' || QUOTE_IDENT(p_table_schema) || '.' || QUOTE_IDENT(p_table_name) || ' SELECT * FROM ' || QUOTE_IDENT(p_table_schema) || '.' || QUOTE_IDENT(p_table_name) || '_external LIMIT 1000000;';
EXECUTE 'SELECT analyze_statistics(''' || QUOTE_IDENT(p_table_schema) || '.' || QUOTE_IDENT(p_table_name) || ''');';
EXECUTE 'SELECT designer_design_projection_encodings(''' || QUOTE_IDENT(p_table_schema) || '.' || QUOTE_IDENT(p_table_name) || ''', ''/tmp/toss.sql'', TRUE, TRUE);';
-- Truncate the Internal/ROS Table and you are now ready to load all rows
-- Drop the external table
EXECUTE 'TRUNCATE TABLE ' || QUOTE_IDENT(p_table_schema) || '.' || QUOTE_IDENT(p_table_name) || ';';
EXECUTE 'DROP TABLE IF EXISTS ' || QUOTE_IDENT(p_table_schema) || '.' || QUOTE_IDENT(p_table_name) || '_external CASCADE;';
END;
$$;
=> call create_optimized_table('/home/dbadmin/parquet_example/*','public','parquet_table','c1,c2','c1');
create_optimized_table
------------------------
0
(1 row)
=> select export_objects('', 'public.parquet_table');
export_objects
------------------------------------------
CREATE TABLE public.parquet_table
(
c1 int NOT NULL,
c2 varchar(36),
c3 date,
CONSTRAINT parquet_table_pk PRIMARY KEY (c1) ENABLED
);
CREATE PROJECTION public.parquet_table_super /*+createtype(D)*/
(
c1 ENCODING COMMONDELTA_COMP,
c2 ENCODING ZSTD_FAST_COMP,
c3 ENCODING COMMONDELTA_COMP
)
AS
SELECT parquet_table.c1,
parquet_table.c2,
parquet_table.c3
FROM public.parquet_table
ORDER BY parquet_table.c1,
parquet_table.c2
SEGMENTED BY hash(parquet_table.c1, parquet_table.c2) ALL NODES OFFSET 0;
SELECT MARK_DESIGN_KSAFE(0);
(1 row)
Pivoting tables dynamically
The stored procedure unpivot()
takes as input a source table and target table. It unpivots the source table and outputs it into a target table.
This example uses the following table:
=> SELECT * FROM make_the_columns_into_rows;
c1 | c2 | c3 | c4 | c5 | c6
-----+-----+--------------------------------------+----------------------------+----------+----
123 | ABC | cf470c5b-50e3-492a-8483-b9e4f20d195a | 2021-08-24 18:49:40.835802 | 1.72964 | t
567 | EFG | 25ea7636-d924-4b4f-81b5-1e1c884b06e3 | 2021-08-04 18:49:40.835802 | 41.46100 | f
890 | XYZ | f588935a-35a4-4275-9e7f-ebb3986390e3 | 2021-08-29 19:53:39.465778 | 8.58207 | t
(3 rows)
This table contains the following columns:
=> \d make_the_columns_into_rows
List of Fields by Tables
Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key
--------+----------------------------+--------+---------------+------+---------+----------+-------------+-------------
public | make_the_columns_into_rows | c1 | int | 8 | | f | f |
public | make_the_columns_into_rows | c2 | varchar(80) | 80 | | f | f |
public | make_the_columns_into_rows | c3 | uuid | 16 | | f | f |
public | make_the_columns_into_rows | c4 | timestamp | 8 | | f | f |
public | make_the_columns_into_rows | c5 | numeric(10,5) | 8 | | f | f |
public | make_the_columns_into_rows | c6 | boolean | 1 | | f | f |
(6 rows)
The target table has columns from the source table pivoted into rows as key/value pairs. It also has a ROWID
column to tie the key/value pairs back to their original row from the source table:
=> CREATE PROCEDURE unpivot(p_source_table_schema VARCHAR(128), p_source_table_name VARCHAR(128), p_target_table_schema VARCHAR(128), p_target_table_name VARCHAR(128)) AS $$
DECLARE
explode_command VARCHAR(10000);
BEGIN
explode_command := EXECUTE 'SELECT ''explode(string_to_array(''''['''' || '' || LISTAGG(''NVL('' || column_name || ''::VARCHAR, '''''''')'' USING PARAMETERS separator='' || '''','''' || '') || '' || '''']'''')) OVER (PARTITION BY rn)'' explode_command FROM (SELECT table_schema, table_name, column_name, ordinal_position FROM columns ORDER BY table_schema, table_name, ordinal_position LIMIT 10000000) foo WHERE table_schema = ''' || QUOTE_IDENT(p_source_table_schema) || ''' AND table_name = ''' || QUOTE_IDENT(p_source_table_name) || ''';';
EXECUTE 'CREATE TABLE ' || QUOTE_IDENT(p_target_table_schema) || '.' || QUOTE_IDENT(p_target_table_name) || '
AS SELECT rn rowid, column_name key, value FROM (SELECT (ordinal_position - 1) op, column_name
FROM columns WHERE table_schema = ''' || QUOTE_IDENT(p_source_table_schema) || '''
AND table_name = ''' || QUOTE_IDENT(p_source_table_name) || ''') a
JOIN (SELECT rn, ' || explode_command || '
FROM (SELECT ROW_NUMBER() OVER() rn, *
FROM ' || QUOTE_IDENT(p_source_table_schema) || '.' || QUOTE_IDENT(p_source_table_name) || ') foo) b ON b.position = a.op';
END;
$$;
Call the procedure:
=> CALL unpivot('public', 'make_the_columns_into_rows', 'public', 'columns_into_rows');
unpivot
---------
0
(1 row)
=> SELECT * FROM columns_into_rows ORDER BY rowid, key;
rowid | key | value
-------+-----+--------------------------------------
1 | c1 | 123
1 | c2 | ABC
1 | c3 | cf470c5b-50e3-492a-8483-b9e4f20d195a
1 | c4 | 2021-08-24 18:49:40.835802
1 | c5 | 1.72964
1 | c6 | t
2 | c1 | 890
2 | c2 | XYZ
2 | c3 | f588935a-35a4-4275-9e7f-ebb3986390e3
2 | c4 | 2021-08-29 19:53:39.465778
2 | c5 | 8.58207
2 | c6 | t
3 | c1 | 567
3 | c2 | EFG
3 | c3 | 25ea7636-d924-4b4f-81b5-1e1c884b06e3
3 | c4 | 2021-08-04 18:49:40.835802
3 | c5 | 41.46100
3 | c6 | f
(18 rows)
The unpivot()
procedure can handle new columns in the source table as well.
Add a new column z
to the source table, and then unpivot the table with the same procedure:
=> ALTER TABLE make_the_columns_into_rows ADD COLUMN z VARCHAR;
ALTER TABLE
=> UPDATE make_the_columns_into_rows SET z = 'ZZZ' WHERE c1 IN (123, 890);
OUTPUT
--------
2
(1 row)
=> CALL unpivot('public', 'make_the_columns_into_rows', 'public', 'columns_into_rows');
unpivot
---------
0
(1 row)
=> SELECT * FROM columns_into_rows;
rowid | key | value
-------+-----+--------------------------------------
1 | c1 | 567
1 | c2 | EFG
1 | c3 | 25ea7636-d924-4b4f-81b5-1e1c884b06e3
1 | c4 | 2021-08-04 18:49:40.835802
1 | c5 | 41.46100
1 | c6 | f
1 | z | -- new column
2 | c1 | 123
2 | c2 | ABC
2 | c3 | cf470c5b-50e3-492a-8483-b9e4f20d195a
2 | c4 | 2021-08-24 18:49:40.835802
2 | c5 | 1.72964
2 | c6 | t
2 | z | ZZZ -- new column
3 | c1 | 890
3 | c2 | XYZ
3 | c3 | f588935a-35a4-4275-9e7f-ebb3986390e3
3 | c4 | 2021-08-29 19:53:39.465778
3 | c5 | 8.58207
3 | c6 | t
3 | z | ZZZ -- new column
(21 rows)
Machine learning: optimizing AUC estimation
The ROC function can approximate the AUC (area under the curve), the accuracy of which depends on the num_bins
parameter; greater values of num_bins
give you more precise approximations, but may impact performance.
You can use the stored procedure accurate_auc()
to approximate the AUC, which automatically determines the optimal num_bins
value for a given epsilon (error term):
=> CREATE PROCEDURE accurate_auc(relation VARCHAR, observation_col VARCHAR, probability_col VARCHAR, epsilon FLOAT) AS $$
DECLARE
auc_value FLOAT;
previous_auc FLOAT;
nbins INT;
BEGIN
IF epsilon > 0.25 THEN
RAISE EXCEPTION 'epsilon must not be bigger than 0.25';
END IF;
IF epsilon < 1e-12 THEN
RAISE EXCEPTION 'epsilon must be bigger than 1e-12';
END IF;
auc_value := 0.5;
previous_auc := 0; -- epsilon and auc should be always less than 1
nbins := 100;
WHILE abs(auc_value - previous_auc) > epsilon and nbins < 1000000 LOOP
RAISE INFO 'auc_value: %', auc_value;
RAISE INFO 'previous_auc: %', previous_auc;
RAISE INFO 'nbins: %', nbins;
previous_auc := auc_value;
auc_value := EXECUTE 'SELECT auc FROM (select roc(' || QUOTE_IDENT(observation_col) || ',' || QUOTE_IDENT(probability_col) || ' USING parameters num_bins=$1, auc=true) over() FROM ' || QUOTE_IDENT(relation) || ') subq WHERE auc IS NOT NULL' USING nbins;
nbins := nbins * 2;
END LOOP;
RAISE INFO 'Result_auc_value: %', auc_value;
END;
$$;
For example, given the following data in test_data.csv
:
1,0,0.186
1,1,0.993
1,1,0.9
1,1,0.839
1,0,0.367
1,0,0.362
0,1,0.6
1,1,0.726
...
(see test_data.csv for the complete set of data)
You can load the data into table categorical_test_data
as follows:
=> \set datafile '\'/data/test_data.csv\''
=> CREATE TABLE categorical_test_data(obs INT, pred INT, prob FLOAT);
CREATE TABLE
=> COPY categorical_test_data FROM :datafile DELIMITER ',';
Call accurate_auc()
. For this example, the approximated AUC will be within the an epsilon of 0.01:
=> CALL accurate_auc('categorical_test_data', 'obs', 'prob', 0.01);
INFO 2005: auc_value: 0.5
INFO 2005: previous_auc: 0
INFO 2005: nbins: 100
INFO 2005: auc_value: 0.749597423510467
INFO 2005: previous_auc: 0.5
INFO 2005: nbins: 200
INFO 2005: Result_auc_value: 0.750402576489533
test_data.csv
1,0,0.186
1,1,0.993
1,1,0.9
1,1,0.839
1,0,0.367
1,0,0.362
0,1,0.6
1,1,0.726
0,0,0.087
0,0,0.004
0,1,0.562
1,0,0.477
0,0,0.258
1,0,0.143
0,0,0.403
1,1,0.978
1,1,0.58
1,1,0.51
0,0,0.424
0,1,0.546
0,1,0.639
0,1,0.676
0,1,0.639
1,1,0.757
1,1,0.883
1,0,0.301
1,1,0.846
1,0,0.129
1,1,0.76
1,0,0.351
1,1,0.803
1,1,0.527
1,1,0.836
1,0,0.417
1,1,0.656
1,1,0.977
1,1,0.815
1,1,0.869
0,0,0.474
0,0,0.346
1,0,0.188
0,1,0.805
1,1,0.872
1,0,0.466
1,1,0.72
0,0,0.163
0,0,0.085
0,0,0.124
1,1,0.876
0,0,0.451
0,0,0.185
1,1,0.937
1,1,0.615
0,0,0.312
1,1,0.924
1,1,0.638
1,1,0.891
0,1,0.621
1,0,0.421
0,0,0.254
0,0,0.225
1,1,0.577
0,1,0.579
0,1,0.628
0,1,0.855
1,1,0.955
0,0,0.331
1,0,0.298
0,0,0.047
0,0,0.173
1,1,0.96
0,0,0.481
0,0,0.39
0,0,0.088
1,0,0.417
0,0,0.12
1,1,0.871
0,1,0.522
0,0,0.312
1,1,0.695
0,0,0.155
0,0,0.352
1,1,0.561
0,0,0.076
0,1,0.923
1,0,0.169
0,0,0.032
1,1,0.63
0,0,0.126
0,0,0.15
1,0,0.348
0,0,0.188
0,1,0.755
1,1,0.813
0,0,0.418
1,0,0.161
1,0,0.316
0,1,0.558
1,1,0.641
1,0,0.305