PLS_REG

Executes PLS regression on an input relation, and returns a PLS regression model.

Executes the Partial Least Squares (PLS) regression algorithm on an input relation, and returns a PLS regression model.

Combining aspects of PCA (principal component analysis) and linear regression, the PLS regression algorithm extracts a set of latent components that explain as much covariance as possible between the predictor and response variables, and then performs a regression that predicts response values using the extracted components.

This technique is particularly useful when the number of predictor variables is greater than the number of observations or the predictor variables are highly collinear. If either of these conditions is true of the input relation, ordinary linear regression fails to converge to an accurate model.

The PLS_REG function supports PLS regression with only one response column, often referred to as PLS1. PLS regression with multiple response columns, known as PLS2, is not currently supported.

To make predictions with a PLS model, use the PREDICT_PLS_REG function.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Syntax

PLS_REG ( 'model-name', 'input-relation', 'response-column', 'predictor-columns'
        [ USING PARAMETERS param=value[,...] ] )

Arguments

model-name
Model to create, where model-name conforms to conventions described in Identifiers. It must also be unique among all names of sequences, tables, projections, views, and models within the same schema.
input-relation
Table or view that contains the training data.
response-column
Name of the input column that represents the dependent variable or outcome. All values in this column must be numeric.
predictor-columns

Comma-separated list of columns in the input relation that represent independent variables for the model, or asterisk (*) to select all columns. If you select all columns, the argument list for parameter exclude_columns must include response-column, and any columns that are invalid as predictor columns.

All predictor columns must be of type numeric or BOOLEAN; otherwise the model is invalid.

Parameters

exclude_columns
Comma-separated list of columns from predictor-columns to exclude from processing.
num_components
Number of components in the model. The value must be an integer in the range [1, min(N, P)], where N is the number of rows in input-relation and P is the number of columns in predictor-columns.

Default: 2

scale
Boolean, whether to standardize the response and predictor columns.

Default: True

Model attributes

details
Information about the model coefficients, including:
  • coeffNames: Name of the coefficients, starting with the intercept and following with the names of the predictors in the same order specified in the call.
  • coeff: Vector of estimated coefficients, with the same order as coeffNames.
responses
Name of the response column.
call_string
SQL statement used to train the model.
Additional Info
Additional information about the model, including:
  • is_scaled: Whether the input columns were scaled before model training.
  • n_components: Number of components in the model.
  • rejected_row_count: Number of rows in input-relation that were rejected because they contained an invalid value.
  • accepted_row_count: Number of rows in input-relation accepted for training the model.

Privileges

Non-superusers:

  • CREATE privileges on the schema where the model is created

  • SELECT privileges on the input relation

Examples

The following example trains a PLS regression model with the default number of components:

=> CREATE TABLE pls_data (y float, x1 float, x2 float, x3 float, x4 float, x5 float);

=> COPY pls_data FROM STDIN;
1|2|3|0|2|5
2|3|4|0|4|4
3|4|5|0|8|3
\.

=> SELECT PLS_REG('pls_model', 'pls_data', 'y', 'x1,x2');
WARNING 0:  There are not more than 1 meaningful component(s)
                        PLS_REG                           
------------------------------------------------------------
 Number of components 1.
Accepted Rows: 3  Rejected Rows: 0
(1 row)

For an in-depth example that trains and makes predictions with a PLS model, see PLS regression.

See also