POISSON_REG

Executes Poisson regression on an input relation, and returns a Poisson regression model.

Executes Poisson regression on an input relation, and returns a Poisson regression model.

You can export the resulting Poisson regression model in VERTICA_MODELS or PMML format to apply it on data outside Vertica. You can also train a Poisson regression model elsewhere, then import it to Vertica in PMML format to apply it on data inside Vertica.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

POISSON_REG ( 'model-name', 'input-table', 'response-column', 'predictor-columns'
        [ USING PARAMETERS
              [exclude_columns = 'excluded-columns']
              [, optimizer = 'optimizer-method']
              [, regularization = 'regularization-method']
              [, epsilon = epsilon-value]
              [, max_iterations = iterations]
              [, lambda = lamda-value]
              [, fit_intercept = boolean-value] ] )

Arguments

model-name
Identifies the model to create, where model-name conforms to conventions described in Identifiers. It must also be unique among all names of sequences, tables, projections, views, and models within the same schema.
input-table
Table or view that contains the training data for building the model. If the input relation is defined in Hive, use SYNC_WITH_HCATALOG_SCHEMA to sync the hcatalog schema, and then run the machine learning function.
response-column
Name of input column that represents the dependent variable or outcome. All values in this column must be numeric, otherwise the model is invalid.
predictor-columns

Comma-separated list of columns in the input relation that represent independent variables for the model, or asterisk (*) to select all columns. If you select all columns, the argument list for parameter exclude_columns must include response-column, and any columns that are invalid as predictor columns.

All predictor columns must be of type numeric or BOOLEAN; otherwise the model is invalid.

Parameters

exclude_columns
Comma-separated list of columns from predictor-columns to exclude from processing.
optimizer
Optimizer method used to train the model. The currently supported method is Newton.
regularization
Method of regularization, one of the following:
  • None (default)

  • L2

epsilon
FLOAT in the range (0.0, 1.0), the error value at which to stop training. Training stops if either the relative change in Poisson deviance is less than or equal to epsilon or if the number of iterations exceeds max_iterations.

Default: 1e-6

max_iterations
INTEGER in the range (0, 1000000), the maximum number of training iterations. Training stops if either the number of iterations exceeds max_iterations or the relative change in Poisson deviance is less than or equal to epsilon.
lambda
FLOAT ≥ 0, specifies the regularization strength.

Default: 1.0

fit_intercept
Boolean, specifies whether the model includes an intercept. By setting to false, no intercept will be used in training the model.”

Default: True

Model attributes

data
Data for the function, including:
  • coeffNames: Name of the coefficients. This starts with intercept and then follows with the names of the predictors in the same order specified in the call.

  • coeff: Vector of estimated coefficients, with the same order as coeffNames

  • stdErr: Vector of the standard error of the coefficients, with the same order as coeffNames

  • zValue: (for logistic and Poisson regression): Vector of z-values of the coefficients, in the same order as coeffNames

  • tValue (for linear regression): Vector of t-values of the coefficients, in the same order as coeffNames

  • pValue: Vector of p-values of the coefficients, in the same order as coeffNames

regularization
Type of regularization to use when training the model.
lambda
Regularization parameter. Higher values enforce stronger regularization. This value must be nonnegative.
iterations
Number of iterations that actually occur for the convergence before exceeding max_iterations.
skippedRows
Number of rows of the input relation that were skipped because they contained an invalid value.
processedRows
Total number of input relation rows minus skippedRows.
callStr
Value of all input arguments specified when the function was called.

Examples

=> SELECT POISSON_REG('myModel', 'numericFaithful', 'eruptions', 'waiting' USING PARAMETERS epsilon=1e-8);
poisson_reg
---------------------------
Finished in 7 iterations

(1 row)

See also