tf_model_desc.json overview

Before importing your externally trained TensorFlow models, you must:.

Before importing your externally trained TensorFlow models, you must:

  • save the model in frozen graph (.pb) format

  • create tf_model_desc.json, which describes to your Vertica database how to map its inputs and outputs to input/output tables

Conveniently, the script freeze_tf2_model.py included in the TensorFlow directory of the Machine-Learning-Examples repository (and in opt/vertica/packages/TFIntegration/examples) will do both of these automatically. In most cases, the generated tf_model_desc.json can be used as-is, but for more complex datasets and use cases, you might need to edit it.

The contents of the tf_model_desc.json file depend on whether you provide a column-type of 0 or 1 when calling the freeze_tf2_model.py script. If column-type is 0, the imported model accepts primitive input and output columns. If it is 1, the model accepts complex input and output columns.

Models that accept primitive types

The following tf_model_desc.json is generated from the MNIST handwriting dataset used by the TensorFlow example.

{
    "frozen_graph": "mnist_keras.pb",
    "input_desc": [
        {
            "op_name": "image_input",
            "tensor_map": [
                {
                    "idx": 0,
                    "dim": [
                        1,
                        28,
                        28,
                        1
                    ],
                    "col_start": 0
                }
            ]
        }
    ],
    "output_desc": [
        {
            "op_name": "OUTPUT/Softmax",
            "tensor_map": [
                {
                    "idx": 0,
                    "dim": [
                        1,
                        10
                    ],
                    "col_start": 0
                }
            ]
        }
    ]
}

This file describes the structure of the model's inputs and outputs. It must contain a frozen_graph field that matches the filename of the .pb model, an input_desc field, and an output_desc field.

  • input_desc and output_desc: the descriptions of the input and output nodes in the TensorFlow graph. Each of these include the following fields:
    • op_name: the name of the operation node which is set when creating and training the model. You can typically retrieve the names of these parameters from tfmodel.inputs and tfmodel.outputs. For example:

      
      $ print({t.name:t for t in tfmodel.inputs})
      {'image_input:0': <tf.Tensor 'image_input:0' shape=(?, 28, 28, 1) dtype=float32>}
      
      $ print({t.name:t for t in tfmodel.outputs})
      {'OUTPUT/Softmax:0': <tf.Tensor 'OUTPUT/Softmax:0' shape=(?, 10) dtype=float32>}
      

      In this case, the respective values for op_name would be the following.

      • input_desc: image_input

      • output_desc: OUTPUT/Softmax

      For a more detailed example of this process, review the code for freeze_tf2_model.py.

    • tensor_map: how to map the tensor to Vertica columns, which can be specified with the following:

      • idx: the index of the output tensor under the given operation (should be 0 for the first output, 1 for the second output, etc.).

      • dim: the vector holding the dimensions of the tensor; it provides the number of columns.

      • col_start (only used if col_idx is not specified): the starting column index. When used with dim, it specifies a range of indices of Vertica columns starting at col_start and ending at col_start+flattend_tensor_dimension. Vertica starts at the column specified by the index col_start and gets the next flattened_tensor_dimension columns.

      • col_idx: the indices in the Vertica columns corresponding to the flattened tensors. This allows you explicitly specify the indices of the Vertica columns that couldn't otherwise be specified as a simple range with col_start and dim (e.g. 1, 3, 5, 7).

      • data_type (not shown): the data type of the input or output, one of the following:

        • TF_FLOAT (default)

        • TF_DOUBLE

        • TF_INT8

        • TF_INT16

        • TF_INT32

        • TF_INT64

Below is a more complex example that includes multiple inputs and outputs:

{
    "input_desc": [
        {
            "op_name": "input1",
            "tensor_map": [
                {
                    "idx": 0,
                    "dim": [
                        4
                    ],
                    "col_idx": [
                        0,
                        1,
                        2,
                        3
                    ]
                },
                {
                    "idx": 1,
                    "dim": [
                        2,
                        2
                    ],
                    "col_start": 4
                }
            ]
        },
        {
            "op_name": "input2",
            "tensor_map": [
                {
                    "idx": 0,
                    "dim": [],
                    "col_idx": [
                        8
                    ]
                },
                {
                    "idx": 1,
                    "dim": [
                        2
                    ],
                    "col_start": 9
                }
            ]
        }
    ],
    "output_desc": [
        {
            "op_name": "output",
            "tensor_map": [
                {
                    "idx": 0,
                    "dim": [
                        2
                    ],
                    "col_start": 0
                }
            ]
        }
    ]
}

Models that accept complex types

The following tf_model_desc.json is generated from a model that inputs and outputs complex type columns:

{
    "column_type": "complex",
    "frozen_graph": "frozen_graph.pb",
    "input_tensors": [
        {
            "name": "x:0",
            "data_type": "int32",
            "dims": [
                -1,
                1
            ]
        },
        {
            "name": "x_1:0",
            "data_type": "int32",
            "dims": [
                -1,
                2
            ]
        }
    ],
    "output_tensors": [
        {
            "name": "Identity:0",
            "data_type": "float32",
            "dims": [
                -1,
                1
            ]
        },
        {
            "name": "Identity_1:0",
            "data_type": "float32",
            "dims": [
                -1,
                2
            ]
        }
    ]
}

As with models that accept primitive types, this file describes the structure of the model's inputs and outputs and contains a frozen_graph field that matches the filename of the .pb model. However, instead of an input_desc field and an output_desc field, models with complex types have an input_tensors field and an output_tensors field, as well as a column_type field.

  • column_type: specifies that the model accepts input and output columns of complex types. When imported into Vertica, the model must make predictions using the PREDICT_TENSORFLOW_SCALAR function.

  • input_tensors and output_tensors: the descriptions of the input and output tensors in the TensorFlow graph. Each of these fields include the following sub-fields:

    • name: the name of the tensor for which information is listed. The name is in the format of operation:tensor-number, where operation is the operation that contains the tensor and tensor-number is the index of the tensor under the given operation.

    • data_type: the data type of the elements in the input or output tensor, one of the following:

      • TF_FLOAT (default)

      • TF_DOUBLE

      • TF_INT8

      • TF_INT16

      • TF_INT32

      • TF_INT64

    • dims: the dimensions of the tensor. Each input/output tensor is contained in a 1D ARRAY in the input/output ROW column.

See also