<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>OpenText Analytics Database 26.2.x – Configuring a tokenizer</title>
    <link>/en/admin/using-text-search/stemmers-and-tokenizers/configuring-tokenizer/</link>
    <description>Recent content in Configuring a tokenizer on OpenText Analytics Database 26.2.x</description>
    <generator>Hugo -- gohugo.io</generator>
    
	  <atom:link href="/en/admin/using-text-search/stemmers-and-tokenizers/configuring-tokenizer/index.xml" rel="self" type="application/rss+xml" />
    
    
      
        
      
    
    
    <item>
      <title>Admin: Tokenizer base configuration</title>
      <link>/en/admin/using-text-search/stemmers-and-tokenizers/configuring-tokenizer/tokenizer-base-config/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/admin/using-text-search/stemmers-and-tokenizers/configuring-tokenizer/tokenizer-base-config/</guid>
      <description>
        
        
        &lt;p&gt;You can choose among two tokenizer base configurations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Ngram with position: &lt;code&gt;logNgramTokenizerPositionFactory&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Ngram without position: &lt;code&gt;logNgramTokenizerFactory&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following example creates an Ngram tokenizer without positional relevance:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;o&#34;&gt;=&amp;gt;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;CREATE&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;TRANSFORM&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;FUNCTION&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;v_txtindex&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;myNgramTokenizer&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;   &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;AS&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;LANGUAGE&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;C++&amp;#39;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;   &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;NAME&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;logNgramTokenizerFactory&amp;#39;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;   &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;LIBRARY&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;v_txtindex&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;logSearchLib&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;NOT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;FENCED&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
      </description>
    </item>
    
    <item>
      <title>Admin: RetrieveTokenizerproc_oid</title>
      <link>/en/admin/using-text-search/stemmers-and-tokenizers/configuring-tokenizer/retrievetokenizerproc-oid/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/admin/using-text-search/stemmers-and-tokenizers/configuring-tokenizer/retrievetokenizerproc-oid/</guid>
      <description>
        
        
        &lt;p&gt;After you create the tokenizer, OpenText™ Analytics Database writes the name and proc_oid to the system table vs_procedures. You must retrieve the tokenizer&#39;s proc_oid to perform additional configuration.&lt;/p&gt;
&lt;p&gt;Enter the following query, substituting your own tokenizer name:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT proc_oid FROM vs_procedures WHERE procedure_name = &amp;#39;fooTokenizer&amp;#39;;
&lt;/code&gt;&lt;/pre&gt;
      </description>
    </item>
    
    <item>
      <title>Admin: Set tokenizer parameters</title>
      <link>/en/admin/using-text-search/stemmers-and-tokenizers/configuring-tokenizer/set-tokenizer-parameters/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/admin/using-text-search/stemmers-and-tokenizers/configuring-tokenizer/set-tokenizer-parameters/</guid>
      <description>
        
        
        &lt;p&gt;Use the tokenizer&#39;s proc_oid to configure the tokenizer. See &lt;a href=&#34;../../../../../en/admin/using-text-search/stemmers-and-tokenizers/configuring-tokenizer/#&#34;&gt;Configuring a tokenizer&lt;/a&gt; for more information about getting the proc_oid of your tokenizer. The following examples show how you can configure each of the tokenizer parameters:&lt;/p&gt;
&lt;p&gt;Configure stop words:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT v_txtindex.SET_TOKENIZER_PARAMETER(&amp;#39;stopwordscaseinsensitive&amp;#39;,&amp;#39;for,the&amp;#39; USING PARAMETERS proc_oid=&amp;#39;45035996274128376&amp;#39;);
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Configure major separators:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT v_txtindex.SET_TOKENIZER_PARAMETER(&amp;#39;majorseparators&amp;#39;, E&amp;#39;{}()&amp;amp;[]&amp;#39; USING PARAMETERS proc_oid=&amp;#39;45035996274128376&amp;#39;);
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Configure minor separators:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT v_txtindex.SET_TOKENIZER_PARAMETER(&amp;#39;minorseparators&amp;#39;, &amp;#39;-,$&amp;#39; USING PARAMETERS proc_oid=&amp;#39;45035996274128376&amp;#39;);
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Configure minimum length:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT v_txtindex.SET_TOKENIZER_PARAMETER(&amp;#39;minlength&amp;#39;, &amp;#39;1&amp;#39; USING PARAMETERS proc_oid=&amp;#39;45035996274128376&amp;#39;);
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Configure maximum length:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT v_txtindex.SET_TOKENIZER_PARAMETER(&amp;#39;maxlength&amp;#39;, &amp;#39;140&amp;#39; USING PARAMETERS proc_oid=&amp;#39;45035996274128376&amp;#39;);
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Configure ngramssize:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT v_txtindex.SET_TOKENIZER_PARAMETER(&amp;#39;ngramssize&amp;#39;, &amp;#39;2&amp;#39; USING PARAMETERS proc_oid=&amp;#39;45035996274128376&amp;#39;);
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;lock-tokenizer-parameters&#34;&gt;Lock tokenizer parameters&lt;/h2&gt;
&lt;p&gt;When you finish configuring the tokenizer, set the parameter, used, to &lt;code&gt;True&lt;/code&gt;. After changing this setting, you are no longer able to alter the parameters of the tokenizer. At this point, the tokenizer is ready for you to use to create a text index.&lt;/p&gt;
&lt;p&gt;Configure the used parameter:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT v_txtindex.SET_TOKENIZER_PARAMETER(&amp;#39;used&amp;#39;, &amp;#39;True&amp;#39; USING PARAMETERS proc_oid=&amp;#39;45035996274128376&amp;#39;);
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;see-also&#34;&gt;See also&lt;/h2&gt;
&lt;a href=&#34;../../../../../en/sql-reference/functions/match-and-search-functions/text-search-functions/set-tokenizer-parameter/#&#34;&gt;SET_TOKENIZER_PARAMETER&lt;/a&gt;

      </description>
    </item>
    
    <item>
      <title>Admin: View tokenizer parameters</title>
      <link>/en/admin/using-text-search/stemmers-and-tokenizers/configuring-tokenizer/view-tokenizer-parameters/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/admin/using-text-search/stemmers-and-tokenizers/configuring-tokenizer/view-tokenizer-parameters/</guid>
      <description>
        
        
        &lt;p&gt;After creating a custom tokenizer, you can view the tokenizer&#39;s parameter settings in either of two ways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Use the GET_TOKENIZER_PARAMETER — &lt;a href=&#34;#ViewIndividualTokenizerParameterSettings&#34;&gt;View individual tokenizer parameter settings&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Use the READ_CONFIG_FILE — &lt;a href=&#34;#ViewAllTokenizerParameterSettings&#34;&gt;View all tokenizer parameter settings&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a name=&#34;ViewIndividualTokenizerParameterSettings&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;view-individual-tokenizer-parameter-settings&#34;&gt;View individual tokenizer parameter settings&lt;/h2&gt;
&lt;p&gt;If you need to see an individual parameter setting for a tokenizer, you can use GET_TOKENIZER_PARAMETER to see specific tokenizer parameter settings:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT v_txtindex.GET_TOKENIZER_PARAMETER(&amp;#39;majorseparators&amp;#39; USING PARAMETERS proc_oid=&amp;#39;45035996274126984&amp;#39;);
 getTokenizerParameter
-----------------------
 {}()&amp;amp;[]
(1 row)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;For more information, see &lt;a href=&#34;../../../../../en/sql-reference/functions/match-and-search-functions/text-search-functions/get-tokenizer-parameter/#&#34;&gt;GET_TOKENIZER_PARAMETER&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a name=&#34;ViewAllTokenizerParameterSettings&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;view-all-tokenizer-parameter-settings&#34;&gt;View all tokenizer parameter settings&lt;/h2&gt;
&lt;p&gt;If you need to see all of the parameters for a tokenizer, you can use READ_CONFIG_FILE to see all of the parameter settings for your tokenizer:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT v_txtindex.READ_CONFIG_FILE( USING PARAMETERS proc_oid=&amp;#39;45035996274126984&amp;#39;) OVER();
               config_key | config_value
--------------------------+---------------
          majorseparators | {}()&amp;amp;[]
                maxlength | 140
                minlength | 1
          minorseparators | -,$
 stopwordscaseinsensitive | for,the
                     type | 1
                     used | true
(7 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;If the parameter, used, is set to &lt;code&gt;False&lt;/code&gt;, then you can only view the parameters that have been applied to the tokenizer.

&lt;div class=&#34;alert admonition note&#34; role=&#34;alert&#34;&gt;
&lt;h4 class=&#34;admonition-head&#34;&gt;Note&lt;/h4&gt;

The database automatically supplies the value for Type, unless you are using an ngram tokenizer, which allows you to set it.

&lt;/div&gt;&lt;/p&gt;
&lt;p&gt;For more information, see &lt;a href=&#34;../../../../../en/sql-reference/functions/match-and-search-functions/text-search-functions/read-config-file/#&#34;&gt;READ_CONFIG_FILE&lt;/a&gt;.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Admin: Delete tokenizer config file</title>
      <link>/en/admin/using-text-search/stemmers-and-tokenizers/configuring-tokenizer/delete-tokenizer-config-file/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/admin/using-text-search/stemmers-and-tokenizers/configuring-tokenizer/delete-tokenizer-config-file/</guid>
      <description>
        
        
        &lt;p&gt;Use the DELETE_TOKENIZER_CONFIG_FILE function to delete a tokenizer configuration file. This function does not delete the User- Defined Transform Function (UDTF). It only deletes the configuration file associated with the UDTF.&lt;/p&gt;
&lt;p&gt;Delete the tokenizer configuration file when the parameter, used, is set to &lt;code&gt;False&lt;/code&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT v_txtindex.DELETE_TOKENIZER_CONFIG_FILE(USING PARAMETERS proc_oid=&amp;#39;45035996274127086&amp;#39;);
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Delete the tokenizer configuration file with the parameter, confirm, set to &lt;code&gt;True&lt;/code&gt;. This setting forces the configuration file deletion, even if the parameter, used, is also set to &lt;code&gt;True&lt;/code&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT v_txtindex.DELETE_TOKENIZER_CONFIG_FILE(USING PARAMETERS proc_oid=&amp;#39;45035996274126984&amp;#39;, confirm=&amp;#39;true&amp;#39;);
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;For more information, see &lt;a href=&#34;../../../../../en/sql-reference/functions/match-and-search-functions/text-search-functions/delete-tokenizer-config-file/#&#34;&gt;DELETE_TOKENIZER_CONFIG_FILE&lt;/a&gt;.&lt;/p&gt;

      </description>
    </item>
    
  </channel>
</rss>
