以 PMML 格式导出 Vertica 模型
您可以利用 Vertica 中的内置分布式算法来训练机器学习模型。在某些情况下,您可能希望在 Vertica 之外(例如在边缘节点上)使用这些模型进行预测。现在,您可以将模型导出为 PMML 格式,并使用导出后的模型通过一个支持读取和评估 PMML 模型的库或平台进行预测。
这里有一个在 Vertica 中训练模型后以 PMML 格式导出的示例。下图显示了示例的工作流程。我们使用 vsql 来运行该示例。
假设您想在名为“患者”的关系中训练数据的逻辑回归模型,以预测患者在接受治疗后的第二次发作和特质焦虑。
训练后,模型显示在名为 V_CATALOG.MODELS 的系统表中,其中列出了 Vertica 中存档的 ML 模型。
=> -- Training a logistic regression model on a training_data
=> SELECT logistic_reg('myModel', 'patients', 'second_attack', 'treatment, trait_anxiety');
logistic_reg
---------------------------
Finished in 5 iterations
(1 row)
=> -- Looking at the models table
=> SELECT model_name, schema_name, category, model_type, create_time, size FROM models;
model_name | schema_name | category | model_type | create_time | size
------------+-------------+----------------+---------------------+-------------------------------+------
myModel | public | VERTICA_MODELS | LOGISTIC_REGRESSION | 2020-07-28 00:05:18.441958-04 | 1845
(1 row)
可以使用 GET_MODEL_SUMMARY 函数来查看模型摘要。
=> -- Looking at the summary of the model
=> \t
Showing only tuples.
=> SELECT get_model_summary(USING PARAMETERS model_name='myModel');
=======
details
=======
predictor |coefficient|std_err |z_value |p_value
-------------+-----------+--------+--------+--------
Intercept | -6.36347 | 3.21390|-1.97998| 0.04771
treatment | -1.02411 | 1.17108|-0.87450| 0.38185
trait_anxiety| 0.11904 | 0.05498| 2.16527| 0.03037
==============
regularization
==============
type| lambda
----+--------
none| 1.00000
===========
call_string
===========
logistic_reg('public.myModel', 'patients', '"second_attack"', 'treatment, trait_anxiety'
USING PARAMETERS optimizer='newton', epsilon=1e-06, max_iterations=100, regularization='none', lambda=1, alpha=0.5)
===============
Additional Info
===============
Name |Value
------------------+-----
iteration_count | 5
rejected_row_count| 0
accepted_row_count| 20
您还可以使用 GET_MODEL_ATTRIBUTE 函数检索模型的属性。
=> \t
Tuples only is off.
=> -- The list of the attributes of the model
=> SELECT get_model_attribute(USING PARAMETERS model_name='myModel');
attr_name | attr_fields | #_of_rows
--------------------+---------------------------------------------------+-----------
details | predictor, coefficient, std_err, z_value, p_value | 3
regularization | type, lambda | 1
iteration_count | iteration_count | 1
rejected_row_count | rejected_row_count | 1
accepted_row_count | accepted_row_count | 1
call_string | call_string | 1
(6 rows)
=> -- Returning the coefficients of the model in a tabular format
=> SELECT get_model_attribute(USING PARAMETERS model_name='myModel', attr_name='details');
predictor | coefficient | std_err | z_value | p_value
---------------+-------------------+--------------------+--------------------+--------------------
Intercept | -6.36346994178182 | 3.21390452471434 | -1.97998101463435 | 0.0477056620380991
treatment | -1.02410605239327 | 1.1710801464903 | -0.874496980810833 | 0.381847663704613
trait_anxiety | 0.119044916668605 | 0.0549791755747139 | 2.16527285875412 | 0.0303667955962211
(3 rows)
您可以在一个简单的语句中使用 EXPORT_MODELS 函数将模型导出为 PMML 格式,如下所示。
=> -- Exporting the model as PMML
=> SELECT export_models('/data/username/temp', 'myModel' USING PARAMETERS category='PMML');
export_models
---------------
Success
(1 row)