tf_model_desc.json overview
Before importing your externally trained TensorFlow models, you must:
-
save the model in frozen graph (
.pb
) format -
create
tf_model_desc.json
, which describes to your Vertica database how to map its inputs and outputs to input/output tables
Conveniently, the script freeze_tf2_model.py
included in the TensorFlow
directory of the Machine-Learning-Examples repository (and in opt/vertica/packages/TFIntegration/examples
) will do both of these automatically. In most cases, the generated tf_model_desc.json
can be used as-is, but for more complex datasets and use cases, you might need to edit it.
tf_model_desc.json for an example dataset
The following tf_model_desc.json
is generated from the MNIST handwriting dataset used by the TensorFlow example.
{
"frozen_graph": "mnist_keras.pb",
"input_desc": [
{
"op_name": "image_input",
"tensor_map": [
{
"idx": 0,
"dim": [
1,
28,
28,
1
],
"col_start": 0
}
]
}
],
"output_desc": [
{
"op_name": "OUTPUT/Softmax",
"tensor_map": [
{
"idx": 0,
"dim": [
1,
10
],
"col_start": 0
}
]
}
]
}
This file describes the structure of the model's inputs and outputs. It must contain a frozen_graph
field that matches the filename of the .pb model, an input_desc field, and an output_desc field.
- input_desc and output_desc: the descriptions of the input and output nodes in the TensorFlow graph. Each of these include the following fields:
-
op_name: the name of the operation node which is set when creating and training the model. You can typically retrieve the names of these parameters from
tfmodel
.inputs
andtfmodel
.outputs
. For example:$ print({t.name:t for t in tfmodel.inputs}) {'image_input:0': <tf.Tensor 'image_input:0' shape=(?, 28, 28, 1) dtype=float32>}
$ print({t.name:t for t in tfmodel.outputs}) {'OUTPUT/Softmax:0': <tf.Tensor 'OUTPUT/Softmax:0' shape=(?, 10) dtype=float32>}
In this case, the respective values for op_name would be the following.
-
input_desc
:image_input
-
output_desc
:OUTPUT/Softmax
For a more detailed example of this process, review the code for
freeze_tf2_model.py
. -
-
tensor_map: how to map the tensor to Vertica columns, which can be specified with the following:
-
idx: the index of the tensor (should be 0 for the first input/output, 1 for the second input/output, etc.)
-
dim: the vector holding the dimensions of the tensor; it provides the number of columns
-
col_start (only used if col_idx is not specified): the starting column index. When used with dim, it specifies a range of indices of Vertica columns starting at col_start and ending at col_start+dim; Vertica starts at the column specified by the index col_start and gets the next dim columns
-
col_idx: the indices in the Vertica columns corresponding to the flattened tensors. This allows you explicitly specify the indices of the Vertica columns that couldn't otherwise be specified as a simple range with col_start and dim (e.g. 1, 3, 5, 7)
-
data_type (not shown): the data type of the input or output, one of the following:
-
TF_FLOAT (default)
-
TF_DOUBLE
-
TF_INT8
-
TF_INT16
-
TF_INT32
-
TF_INT64
-
-
-
Complex tf_model_desc.json example
Below is a more complex example that includes multiple inputs and outputs:
{
"input_desc": [
{
"op_name": "input1",
"tensor_map": [
{
"idx": 0,
"dim": [
4
],
"col_idx": [
0,
1,
2,
3
]
},
{
"idx": 1,
"dim": [
2,
2
],
"col_start": 4
}
]
},
{
"op_name": "input2",
"tensor_map": [
{
"idx": 0,
"dim": [],
"col_idx": [
8
]
},
{
"idx": 1,
"dim": [
2
],
"col_start": 9
}
]
}
],
"output_desc": [
{
"op_name": "output",
"tensor_map": [
{
"idx": 0,
"dim": [
2
],
"col_start": 0
}
]
}
]
}