Python example: string tokenizer
The following example shows a transform function that breaks an input string into tokens (based on whitespace).
The following example shows a transform function that breaks an input string into tokens (based on whitespace). It is similar to the tokenizer examples for C++ and Java.
Loading and using the example
Create the library and function:
=> CREATE LIBRARY pyudtf AS '/home/dbadmin/udx/tokenize.py' LANGUAGE 'Python';
CREATE LIBRARY
=> CREATE TRANSFORM FUNCTION tokenize AS NAME 'StringTokenizerFactory' LIBRARY pyudtf;
CREATE TRANSFORM FUNCTION
You can then use the function in SQL statements, for example:
=> CREATE TABLE words (w VARCHAR);
CREATE TABLE
=> COPY words FROM STDIN;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> this is a test of the python udtf
>> \.
=> SELECT tokenize(w) OVER () FROM words;
token
----------
this
is
a
test
of
the
python
udtf
(8 rows)
Setup
All Python UDxs must import the Vertica SDK.
UDTF Python code
The following code defines the tokenizer and its factory.