PLS_REG
Executes the Partial Least Squares (PLS) regression algorithm on an input relation, and returns a PLS regression model.
Combining aspects of PCA (principal component analysis) and linear regression, the PLS regression algorithm extracts a set of latent components that explain as much covariance as possible between the predictor and response variables, and then performs a regression that predicts response values using the extracted components.
This technique is particularly useful when the number of predictor variables is greater than the number of observations or the predictor variables are highly collinear. If either of these conditions is true of the input relation, ordinary linear regression fails to converge to an accurate model.
The PLS_REG function supports PLS regression with only one response column, often referred to as PLS1. PLS regression with multiple response columns, known as PLS2, is not currently supported.
To make predictions with a PLS model, use the PREDICT_PLS_REG function.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Syntax
PLS_REG ( 'model-name', 'input-relation', 'response-column', 'predictor-columns'
[ USING PARAMETERS param=value[,...] ] )
Arguments
model-name
- Model to create, where
model-name
conforms to conventions described in Identifiers. It must also be unique among all names of sequences, tables, projections, views, and models within the same schema. input-relation
- Table or view that contains the training data.
response-column
- Name of the input column that represents the dependent variable or outcome. All values in this column must be numeric.
predictor-columns
Comma-separated list of columns in the input relation that represent independent variables for the model, or asterisk (*) to select all columns. If you select all columns, the argument list for parameter
exclude_columns
must includeresponse-column
, and any columns that are invalid as predictor columns.All predictor columns must be of type numeric or BOOLEAN; otherwise the model is invalid.
Note
All BOOLEAN predictor values are converted to FLOAT values before training: 0 for false, 1 for true. No type checking occurs during prediction, so you can use a BOOLEAN predictor column in training, and during prediction provide a FLOAT column of the same name. In this case, all FLOAT values must be either 0 or 1.
Parameters
exclude_columns
- Comma-separated list of columns from
predictor-columns
to exclude from processing. num_components
- Number of components in the model. The value must be an integer in the range [1, min(N, P)], where N is the number of rows in
input-relation
and P is the number of columns inpredictor-columns
.Default: 2
scale
- Boolean, whether to standardize the response and predictor columns.
Default: True
Model attributes
details
- Information about the model coefficients, including:
coeffNames
: Name of the coefficients, starting with the intercept and following with the names of the predictors in the same order specified in the call.coeff
: Vector of estimated coefficients, with the same order ascoeffNames
.
responses
- Name of the response column.
call_string
- SQL statement used to train the model.
Additional Info
- Additional information about the model, including:
is_scaled
: Whether the input columns were scaled before model training.n_components
: Number of components in the model.rejected_row_count
: Number of rows ininput-relation
that were rejected because they contained an invalid value.accepted_row_count
: Number of rows ininput-relation
accepted for training the model.
Privileges
Non-superusers:
-
CREATE privileges on the schema where the model is created
-
SELECT privileges on the input relation
Examples
The following example trains a PLS regression model with the default number of components:
=> CREATE TABLE pls_data (y float, x1 float, x2 float, x3 float, x4 float, x5 float);
=> COPY pls_data FROM STDIN;
1|2|3|0|2|5
2|3|4|0|4|4
3|4|5|0|8|3
\.
=> SELECT PLS_REG('pls_model', 'pls_data', 'y', 'x1,x2');
WARNING 0: There are not more than 1 meaningful component(s)
PLS_REG
------------------------------------------------------------
Number of components 1.
Accepted Rows: 3 Rejected Rows: 0
(1 row)
For an in-depth example that trains and makes predictions with a PLS model, see PLS regression.