Detecting constraint violations
ANALYZE_CONSTRAINTS
analyzes and reports on table constraint violations within a given schema. You can use ANALYZE_CONSTRAINTS
to analyze an individual table, specific columns within a table, or all tables within a schema. You typically use this function on tables where primary key, unique, or check constraints are not enforced. You can also use ANALYZE_CONSTRAINTS
to check the referential integrity of foreign keys.
In the simplest use case, ANALYZE_CONSTRAINTS
is a two-step process:
-
Run
ANALYZE_CONSTRAINTS
on the desired table.ANALYZE_CONSTRAINTS
reports all rows that violate constraints. -
Use the report to fix violations.
You can also use ANALYZE_CONSTRAINTS
in the following cases:
-
Analyze tables with enforced constraints.
-
Detect constraint violations introduced by a
COPY
operation and address them before the copy transaction is committed.
Analyzing tables with enforced constraints
If constraints are enforced on a table and a DML operation returns constraint violations, Vertica reports on a limited number of constraint violations before it rolls back the operation. This can be problematic when you try to load a large amount of data that includes many constraint violations—for example, duplicate key values. In this case, use ANALYZE_CONSTRAINTS
as follows:
-
Temporarily disable enforcement of all constraints on the target table.
-
Run the DML operation.
-
After the operation returns, run
ANALYZE_CONSTRAINTS
on the table.ANALYZE_CONSTRAINTS
reports all rows that violate constraints. -
Use the report to fix the violations.
-
Re-enable constraint enforcement on the table.
Using ANALYZE_CONSTRAINTS in a COPY transaction
Use ANALYZE_CONSTRAINTS
to detect and address constraint violations introduced by a
COPY
operation as follows:
-
Copy the source data into the target table with
COPY...NO COMMIT
. -
Call
ANALYZE_CONSTRAINTS
to check the target table with its uncommitted updates. -
If
ANALYZE_CONSTRAINTS
reports constraint violations, roll back the copy transaction. -
Use the report to fix the violations, and then re-execute the copy operation.
For details about using COPY...NO COMMIT
, see Using transactions to stage a load.
Distributing constraint analysis
ANALYZE_CONSTRAINTS
runs as an atomic operation—that is, it does not return until it evaluates all constraints within the specified scope. For example, if you run ANALYZE_CONSTRAINTS
against a table, the function returns only after it evaluates all column constraints against column data. If the table has a large number of columns with constraints, and contains a very large data set, ANALYZE_CONSTRAINTS
is liable to exhaust all available memory and return with an out-of-memory error. This risk is increased by running ANALYZE_CONSTRAINTS
against multiple tables simultaneously, or against the entire database.
You can minimize the risk of out-of-memory errors by setting configuration parameter MaxConstraintChecksPerQuery (by default set to -1) to a positive integer. For example, if this parameter is set to 20, and you run ANALYZE_CONSTRAINTS
on a table that contains 38 column constraints, the function divides its work into two separate queries. ANALYZE_CONSTRAINTS
creates a temporary table for loading and compiling results from the two queries, and then returns the composite result set.
MaxConstraintChecksPerQuery can only be set at the database level, and can incur a certain amount of overhead. When set, commits to the temporary table created by ANALYZE_CONSTRAINTS
cause all pending database transactions to auto-commit. Setting this parameter to a reasonable number such as 20 should minimize its performance impact.