PLS_REG
Executes the Partial Least Squares (PLS) regression algorithm on an input relation, and returns a PLS regression model.
Combining aspects of PCA (principal component analysis) and linear regression, the PLS regression algorithm extracts a set of latent components that explain as much covariance as possible between the predictor and response variables, and then performs a regression that predicts response values using the extracted components.
This technique is particularly useful when the number of predictor variables is greater than the number of observations or the predictor variables are highly collinear. If either of these conditions is true of the input relation, ordinary linear regression fails to converge to an accurate model.
The PLS_REG function supports PLS regression with only one response column, often referred to as PLS1. PLS regression with multiple response columns, known as PLS2, is not currently supported.
To make predictions with a PLS model, use the PREDICT_PLS_REG function.
This is a metafunction. You must call metafunctions in a toplevel SELECT statement.
Syntax
PLS_REG ( 'modelname', 'inputrelation', 'responsecolumn', 'predictorcolumns'
[ USING PARAMETERS param=value[,...] ] )
Arguments
modelname
 Model to create, where
modelname
conforms to conventions described in Identifiers. It must also be unique among all names of sequences, tables, projections, views, and models within the same schema. inputrelation
 Table or view that contains the training data.
responsecolumn
 Name of the input column that represents the dependent variable or outcome. All values in this column must be numeric.
predictorcolumns
Commaseparated list of columns in the input relation that represent independent variables for the model, or asterisk (*) to select all columns. If you select all columns, the argument list for parameter
exclude_columns
must includeresponsecolumn
, and any columns that are invalid as predictor columns.All predictor columns must be of type numeric or BOOLEAN; otherwise the model is invalid.
Note
All BOOLEAN predictor values are converted to FLOAT values before training: 0 for false, 1 for true. No type checking occurs during prediction, so you can use a BOOLEAN predictor column in training, and during prediction provide a FLOAT column of the same name. In this case, all FLOAT values must be either 0 or 1.
Parameters
exclude_columns
 Commaseparated list of columns from
predictorcolumns
to exclude from processing. num_components
 Number of components in the model. The value must be an integer in the range [1, min(N, P)], where N is the number of rows in
inputrelation
and P is the number of columns inpredictorcolumns
.Default: 2
scale
 Boolean, whether to standardize the response and predictor columns.
Default: True
Model attributes
details
 Information about the model coefficients, including:
coeffNames
: Name of the coefficients, starting with the intercept and following with the names of the predictors in the same order specified in the call.coeff
: Vector of estimated coefficients, with the same order ascoeffNames
.
responses
 Name of the response column.
call_string
 SQL statement used to train the model.
Additional Info
 Additional information about the model, including:
is_scaled
: Whether the input columns were scaled before model training.n_components
: Number of components in the model.rejected_row_count
: Number of rows ininputrelation
that were rejected because they contained an invalid value.accepted_row_count
: Number of rows ininputrelation
accepted for training the model.
Privileges
Nonsuperusers:

CREATE privileges on the schema where the model is created

SELECT privileges on the input relation
Examples
The following example trains a PLS regression model with the default number of components:
=> CREATE TABLE pls_data (y float, x1 float, x2 float, x3 float, x4 float, x5 float);
=> COPY pls_data FROM STDIN;
123025
234044
345083
\.
=> SELECT PLS_REG('pls_model', 'pls_data', 'y', 'x1,x2');
WARNING 0: There are not more than 1 meaningful component(s)
PLS_REG

Number of components 1.
Accepted Rows: 3 Rejected Rows: 0
(1 row)
For an indepth example that trains and makes predictions with a PLS model, see PLS regression.