Java example: FileSource
The example shown in this section is a simple UDL Source function named FileSource
, This function loads data from files stored on the host's file system (similar to the standard COPY statement). To call FileSource
, you must supply a parameter named file
that contains the absolute path to one or more files on the host file system. You can specify multiple files as a comma-separated list.
The FileSource
function also accepts an optional parameter, named nodes
, that indicates which nodes should load the files. If you do not supply this parameter, the function defaults to loading data on the initiator node only. Because this example is simple, the nodes load only the files from their own file systems. Any files in the file parameter must exist on all of the hosts in the nodes parameter. The FileSource
UDSource attempts to load all of the files in the file
parameter on all of the hosts in the nodes
parameter.
Generating files
You can use the following Python script to generate files and distribute them to hosts in your Vertica cluster. With these files, you can experiment with the example UDSource
function. Running the function requires passwordless-SSH logins to copy the files to the other hosts. Therefore, you must run the script using the database administrator account on one of your database hosts.
You call this script by giving it a comma-separated list of hosts to receive the files and a comma-separated list of absolute paths of files to generate. For example:
This script generates files that contain a thousand rows of columns delimited with the pipe character (|). These columns contain an index value, a set of random words, and the node for which the file was generated, as shown in the following output sample:
0|megabits embanks|v_vmart_node0001
1|unneatly|v_vmart_node0001
2|self-precipitation|v_vmart_node0001
3|antihistamine scalados Vatter|v_vmart_node0001
Loading and using the example
Load and use the FileSource
UDSource as follows:
Parser implementation
The following code shows the source of the FileSource
class that reads a file from the host file system. The constructor, which is called by FileSourceFactory.prepareUDSources()
, gets the absolute path for the file containing the data to be read. The setup()
method opens the file and the destroy()
method closes it. The process()
method reads from the file into a buffer provided by the instance of the DataBuffer
class passed to it as a parameter. If the read operation filled the output buffer, it returns OUTPUT_NEEDED
. This value tells Vertica to call the method again after the next stage of the load has processed the output buffer. If the read did not fill the output buffer, then process()
returns DONE to indicate it has finished processing the data source.
Factory implementation
The following code is a modified version of the example Java UDsource function provided in the Java UDx support package. You can find the full example in /opt/vertica/sdk/examples/JavaUDx/UDLFuctions/com/vertica/JavaLibs/FileSourceFactory.java
. Its override of the plan()
method verifies that the user supplied the required file
parameter. If the user also supplied the optional nodes parameter, this method verifies that the nodes exist in the Vertica cluster. If there is a problem with either parameter, the method throws an exception to return an error to the user. If there are no issues with the parameters, the plan()
method stores their values in the plan context object.