tf_model_desc.json overview
Before importing your externally trained TensorFlow models, you must:
-
save the model in frozen graph (
.pb
) format -
create
tf_model_desc.json
, which describes to your Vertica database how to map its inputs and outputs to input/output tables
Conveniently, the script freeze_tf2_model.py
included in the TensorFlow
directory of the Machine-Learning-Examples repository (and in opt/vertica/packages/TFIntegration/examples
) will do both of these automatically. In most cases, the generated tf_model_desc.json
can be used as-is, but for more complex datasets and use cases, you might need to edit it.
The contents of the tf_model_desc.json
file depend on whether you provide a column-type
of 0 or 1 when calling the freeze_tf2_model.py
script. If column-type
is 0, the imported model accepts primitive input and output columns. If it is 1, the model accepts complex input and output columns.
Models that accept primitive types
The following tf_model_desc.json
is generated from the MNIST handwriting dataset used by the TensorFlow example.
{
"frozen_graph": "mnist_keras.pb",
"input_desc": [
{
"op_name": "image_input",
"tensor_map": [
{
"idx": 0,
"dim": [
1,
28,
28,
1
],
"col_start": 0
}
]
}
],
"output_desc": [
{
"op_name": "OUTPUT/Softmax",
"tensor_map": [
{
"idx": 0,
"dim": [
1,
10
],
"col_start": 0
}
]
}
]
}
This file describes the structure of the model's inputs and outputs. It must contain a frozen_graph
field that matches the filename of the .pb model, an input_desc
field, and an output_desc
field.
input_desc
andoutput_desc
: the descriptions of the input and output nodes in the TensorFlow graph. Each of these include the following fields:-
op_name
: the name of the operation node which is set when creating and training the model. You can typically retrieve the names of these parameters fromtfmodel.
inputs
andtfmodel.
outputs
. For example:$ print({t.name:t for t in tfmodel.inputs}) {'image_input:0': <tf.Tensor 'image_input:0' shape=(?, 28, 28, 1) dtype=float32>}
$ print({t.name:t for t in tfmodel.outputs}) {'OUTPUT/Softmax:0': <tf.Tensor 'OUTPUT/Softmax:0' shape=(?, 10) dtype=float32>}
In this case, the respective values for
op_name
would be the following.-
input_desc
:image_input
-
output_desc
:OUTPUT/Softmax
For a more detailed example of this process, review the code for
freeze_tf2_model.py
. -
-
tensor_map
: how to map the tensor to Vertica columns, which can be specified with the following:-
idx
: the index of the output tensor under the given operation (should be 0 for the first output, 1 for the second output, etc.). -
dim
: the vector holding the dimensions of the tensor; it provides the number of columns. -
col_start
(only used ifcol_idx
is not specified): the starting column index. When used withdim
, it specifies a range of indices of Vertica columns starting atcol_start
and ending atcol_start
+flattend_tensor_dimension
. Vertica starts at the column specified by the indexcol_start
and gets the nextflattened_tensor_dimension
columns. -
col_idx
: the indices in the Vertica columns corresponding to the flattened tensors. This allows you explicitly specify the indices of the Vertica columns that couldn't otherwise be specified as a simple range withcol_start
anddim
(e.g. 1, 3, 5, 7). -
data_type
(not shown): the data type of the input or output, one of the following:-
TF_FLOAT (default)
-
TF_DOUBLE
-
TF_INT8
-
TF_INT16
-
TF_INT32
-
TF_INT64
-
-
-
Below is a more complex example that includes multiple inputs and outputs:
{
"input_desc": [
{
"op_name": "input1",
"tensor_map": [
{
"idx": 0,
"dim": [
4
],
"col_idx": [
0,
1,
2,
3
]
},
{
"idx": 1,
"dim": [
2,
2
],
"col_start": 4
}
]
},
{
"op_name": "input2",
"tensor_map": [
{
"idx": 0,
"dim": [],
"col_idx": [
8
]
},
{
"idx": 1,
"dim": [
2
],
"col_start": 9
}
]
}
],
"output_desc": [
{
"op_name": "output",
"tensor_map": [
{
"idx": 0,
"dim": [
2
],
"col_start": 0
}
]
}
]
}
Models that accept complex types
The following tf_model_desc.json
is generated from a model that inputs and outputs complex type columns:
{
"column_type": "complex",
"frozen_graph": "frozen_graph.pb",
"input_tensors": [
{
"name": "x:0",
"data_type": "int32",
"dims": [
-1,
1
]
},
{
"name": "x_1:0",
"data_type": "int32",
"dims": [
-1,
2
]
}
],
"output_tensors": [
{
"name": "Identity:0",
"data_type": "float32",
"dims": [
-1,
1
]
},
{
"name": "Identity_1:0",
"data_type": "float32",
"dims": [
-1,
2
]
}
]
}
As with models that accept primitive types, this file describes the structure of the model's inputs and outputs and contains a frozen_graph
field that matches the filename of the .pb model. However, instead of an input_desc
field and an output_desc
field, models with complex types have an input_tensors
field and an output_tensors
field, as well as a column_type
field.
-
column_type
: specifies that the model accepts input and output columns of complex types. When imported into Vertica, the model must make predictions using the PREDICT_TENSORFLOW_SCALAR function. -
input_tensors
andoutput_tensors
: the descriptions of the input and output tensors in the TensorFlow graph. Each of these fields include the following sub-fields:-
name
: the name of the tensor for which information is listed. The name is in the format ofoperation:tensor-number
, whereoperation
is the operation that contains the tensor andtensor-number
is the index of the tensor under the given operation. -
data_type
: the data type of the elements in the input or output tensor, one of the following:-
TF_FLOAT (default)
-
TF_DOUBLE
-
TF_INT8
-
TF_INT16
-
TF_INT32
-
TF_INT64
-
-
dims
: the dimensions of the tensor. Each input/output tensor is contained in a 1D ARRAY in the input/output ROW column.
-