Computing flex table keys

After loading data into a flex table, you can determine the set of keys that exist in the raw column (the map data).

After loading data into a flex table, you can determine the set of keys that exist in the __raw__ column (the map data). Two helper functions compute keys from flex table map data:

Using COMPUTE_FLEXTABLE_KEYS

During execution, this function calculates the following information for the flex keys table columns:

Column Description
key_name The name of the virtual column (key).
frequency The number of times the key occurs in the map.
data_type_guess The type guess that the helper functions determine for each distinct key in the map data. The function determines the type of each non-string value, depending on the length of the key, and whether the key includes nested maps. Changing the default value of the EnableBetterFlexTypeGuessing configuration parameter to 0 (OFF) results in the functions determining all flex table keys as string types ([LONG]VARCHAR) or ([LONG] VARBINARY).

Determining key data types

By default, using COMPUTE_FLEXTABLE_KEYS determines non-string key values from the __raw__ column LONG VARBINARY type. The non-string keys include these data types (and others listed in Data types):

  • BOOLEAN

  • INTEGER

  • FLOAT

  • TIMESTAMP

  • DATE

Assigning flex key data types

Use the sample CSV data in this section to compare the results of using or not using the EnableBetterFlexTypeGuessing configuration parameter. When the parameter is ON, the function determines key non-string data types in your map data more accurately. The default for the parameter is 1 (ON).

Year,Quarter,Region,Species,Grade,Pond Value,Number of Quotes,Available
2015,1,2 - Northwest Oregon & Willamette,Douglas-fir,1P,$615.12 ,12,No
2015,1,2 - Northwest Oregon & Willamette,Douglas-fir,SM,$610.78 ,12,Yes
2015,1,2 - Northwest Oregon & Willamette,Douglas-fir,2S,$596.00 ,20,Yes
2015,1,2 - Northwest Oregon & Willamette,Hemlock,P,$520.00 ,6,Yes
2015,1,2 - Northwest Oregon & Willamette,Hemlock,SM,$510.00 ,6,No
2015,1,2 - Northwest Oregon & Willamette,Hemlock,2S,$490.00 ,14,No

To compare the data type assignment results, complete the following steps:

  1. Save the CSV data file (here, as trees.csv).

  2. Create a flex table (trees) and load trees.csv using the fcsvparser:

    => CREATE FLEX TABLE trees();
    => COPY trees FROM '/home/dbadmin/tempdat/trees.csv' PARSER fcsvparser();
    
  3. Use COMPUTE_FLEXTABLE_KEYS with the trees flex table.

    => SELECT COMPUTE_FLEXTABLE_KEYS('trees');
                COMPUTE_FLEXTABLE_KEYS
    -----------------------------------------------
     Please see public.trees_keys for updated keys
    (1 row)
    
  4. Query the trees_keys table output.:

    => SELECT * FROM trees_keys;
         key_name     | frequency | data_type_guess
    ------------------+-----------+-----------------
     Year             |         6 | Integer
     Quarter          |         6 | Integer
     Region           |         6 | Varchar(66)
     Available        |         6 | Boolean
     Number of Quotes |         6 | Integer
     Grade            |         6 | Varchar(20)
     Species          |         6 | Varchar(22)
     Pond Value       |         6 | Numeric(8,3)
    (8 rows)
    
  5. Set the EnableBetterFlexTypeGuessing parameter to 0 (OFF).

  6. Call COMPUTE_FLEXTABLE_KEYS with the trees flex table again.

  7. Query the trees_keys table to compare the data_type_guess values with the previous results. Without the configuration parameter set, all of the non-string data types are VARCHARS of various lengths:

    
    => SELECT * FROM trees_keys;
        key_name     | frequency | data_type_guess
    ------------------+-----------+-----------------
     Year             |         6 | varchar(20)
     Quarter          |         6 | varchar(20)
     Region           |         6 | varchar(66)
     Available        |         6 | varchar(20)
     Grade            |         6 | varchar(20)
     Number of Quotes |         6 | varchar(20)
     Pond Value       |         6 | varchar(20)
     Species          |         6 | varchar(22)
    (8 rows)
    
  8. To maintain accurate results for non-string data types, set the EnableBetterFlexTypeGuessing parameter back to 1 (ON).

For more information about the EnableBetterFlexTypeGuessing configuration parameter, see EnableBetterFlexTypeGuessing.

Calculating key value column widths

The COMPUTE_FLEXTABLE_KEYS function determines the column width for keys by determining the length of the largest value for each key, multiplied by the FlexTableDataTypeGuessMultiplier factor.

The next example shows the results of populating the _keys table after creating a flex table (darkdata1) and loading data. The column widths are shown in parentheses, where applicable, after the value of the data_type_guess column:


=> SELECT compute_flextable_keys('darkdata1');
            compute_flextable_keys
--------------------------------------------------
 Please see public.darkdata1_keys for updated keys
(1 row)

=> SELECT * from darkdata1_keys;
                         key_name                         | frequency |   data_type_guess
----------------------------------------------------------+-----------+----------------------
 created_at                                               |         8 | TimestampTz
 delete.status.id_str                                     |         4 | Integer
 delete.status.user_id                                    |         4 | Integer
 entities.hashtags                                        |         8 | long varbinary(186)
 favorited                                                |         8 | Boolean
 id_str                                                   |         8 | Integer
 in_reply_to_screen_name                                  |         8 | Varchar(24)
 retweeted_status.contributors                            |         1 | Varchar(20)
 retweeted_status.coordinates                             |         1 | Varchar(20)
 retweeted_status.created_at                              |         1 | TimestampTz
.
.
.
(125 rows)