<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>OpenText Analytics Database 26.2.x – Data preparation</title>
    <link>/en/sql-reference/functions/ml-functions/data-preparation/</link>
    <description>Recent content in Data preparation on OpenText Analytics Database 26.2.x</description>
    <generator>Hugo -- gohugo.io</generator>
    
	  <atom:link href="/en/sql-reference/functions/ml-functions/data-preparation/index.xml" rel="self" type="application/rss+xml" />
    
    
      
        
      
    
    
    <item>
      <title>Sql-Reference: BALANCE</title>
      <link>/en/sql-reference/functions/ml-functions/data-preparation/balance/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/sql-reference/functions/ml-functions/data-preparation/balance/</guid>
      <description>
        
        
        &lt;p&gt;Returns a view with an equal distribution of the input data based on the response_column.&lt;/p&gt;
&lt;p&gt;This is a meta-function. You must call meta-functions in a top-level &lt;a href=&#34;../../../../../en/sql-reference/statements/select/#&#34;&gt;SELECT&lt;/a&gt; statement.&lt;/p&gt;

&lt;h2 id=&#34;behavior-type&#34;&gt;Behavior type&lt;/h2&gt;
&lt;a class=&#34;glosslink&#34; href=&#34;../../../../../en/glossary/volatile-functions/&#34; title=&#34;&#34;&gt;Volatile&lt;/a&gt;
&lt;h2 id=&#34;syntax&#34;&gt;Syntax&lt;/h2&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;BALANCE ( &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;output-view&lt;/span&gt;&amp;#39;, &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;input-relation&lt;/span&gt;&amp;#39;, &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;response-column&lt;/span&gt;&amp;#39;, &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;balance-method&lt;/span&gt;&amp;#39;
       [ USING PARAMETERS sampling_ratio=&lt;span class=&#34;code-variable&#34;&gt;ratio&lt;/span&gt; ] )
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;arguments&#34;&gt;Arguments&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;output-view&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;The name of the view where OpenText™ Analytics Database saves the balanced data from the input relation.

&lt;div class=&#34;alert admonition note&#34; role=&#34;alert&#34;&gt;
&lt;h4 class=&#34;admonition-head&#34;&gt;Note&lt;/h4&gt;

&lt;strong&gt;Note&lt;/strong&gt;: The view that results from this function employs a random function. Its content can differ each time it is used in a query. To make the operations on the view predictable, store it in a regular table.

&lt;/div&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;input-relation&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;The table or view that contains the data the function uses to create a more balanced data set. If the input relation is defined in Hive, use 
&lt;code&gt;&lt;a href=&#34;../../../../../en/sql-reference/functions/hadoop-functions/sync-with-hcatalog-schema/#&#34;&gt;SYNC_WITH_HCATALOG_SCHEMA&lt;/a&gt;&lt;/code&gt; to sync the &lt;code&gt;hcatalog&lt;/code&gt; schema, and then run the machine learning function.&lt;/dd&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;response-column&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;Name of the input column that represents the dependent variable, of type VARCHAR or INTEGER.&lt;/dd&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;balance-method&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;Specifies a method to select data from the minority and majority classes, one of the following.
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;hybrid_sampling&lt;/code&gt;: Performs over-sampling and under-sampling on different classes so each class is equally represented.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;over_sampling&lt;/code&gt;: Over-samples on all classes, with the exception of the most majority class, towards the most majority class&#39;s cardinality.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;under_sampling&lt;/code&gt;: Under-samples on all classes, with the exception of the most minority class, towards the most minority class&#39;s cardinality.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;weighted_sampling&lt;/code&gt;: An alias of &lt;code&gt;under_sampling&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;parameters&#34;&gt;Parameters&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;ratio&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;The desired ratio between the majority class and the minority class. This value has no effect when used with balance method &lt;code&gt;hybrid_sampling&lt;/code&gt;.
&lt;p&gt;&lt;strong&gt;Default:&lt;/strong&gt; 1.0&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;privileges&#34;&gt;Privileges&lt;/h2&gt;
&lt;p&gt;Non-superusers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;SELECT privileges on the input relation&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;CREATE privileges on the output view schema&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;examples&#34;&gt;Examples&lt;/h2&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; CREATE TABLE backyard_bugs (id identity, bug_type int, finder varchar(20));
CREATE TABLE

=&amp;gt; COPY backyard_bugs FROM STDIN;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
&amp;gt;&amp;gt; 1|Ants
&amp;gt;&amp;gt; 1|Beetles
&amp;gt;&amp;gt; 3|Ladybugs
&amp;gt;&amp;gt; 3|Ants
&amp;gt;&amp;gt; 3|Beetles
&amp;gt;&amp;gt; 3|Caterpillars
&amp;gt;&amp;gt; 2|Ladybugs
&amp;gt;&amp;gt; 3|Ants
&amp;gt;&amp;gt; 3|Beetles
&amp;gt;&amp;gt; 1|Ladybugs
&amp;gt;&amp;gt; 3|Ladybugs
&amp;gt;&amp;gt; \.

=&amp;gt; SELECT bug_type, COUNT(bug_type) FROM backyard_bugs GROUP BY bug_type;
 bug_type | COUNT
----------+-------
        2 |     1
        1 |     3
        3 |     7
(3 rows)

=&amp;gt; SELECT BALANCE(&amp;#39;backyard_bugs_balanced&amp;#39;, &amp;#39;backyard_bugs&amp;#39;, &amp;#39;bug_type&amp;#39;, &amp;#39;under_sampling&amp;#39;);
         BALANCE
--------------------------
 Finished in 1 iteration

(1 row)

=&amp;gt; SELECT bug_type, COUNT(bug_type) FROM backyard_bugs_balanced GROUP BY bug_type;
----------+-------
        2 |     1
        1 |     2
        3 |     1
(3 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;see-also&#34;&gt;See also&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;../../../../../en/data-analysis/ml-predictive-analytics/data-preparation/balancing-imbalanced-data/#&#34;&gt;Balancing imbalanced data&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Sql-Reference: CHI_SQUARED</title>
      <link>/en/sql-reference/functions/ml-functions/data-preparation/chi-squared/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/sql-reference/functions/ml-functions/data-preparation/chi-squared/</guid>
      <description>
        
        
        &lt;p&gt;Computes the conditional chi-square independence test on two categorical variables to find the likelihood that the two variables are independent. To condition the independence test on another set of variables, you can partition the data on these variables using a &lt;a href=&#34;../../../../../en/sql-reference/language-elements/window-clauses/window-partition-clause/&#34;&gt;PARTITION BY&lt;/a&gt; clause.&lt;/p&gt;

&lt;div class=&#34;alert admonition tip&#34; role=&#34;alert&#34;&gt;
&lt;h4 class=&#34;admonition-head&#34;&gt;Tip&lt;/h4&gt;

If a categorical column is not of a &lt;a href=&#34;../../../../../en/sql-reference/data-types/numeric-data-types/&#34;&gt;numeric data type&lt;/a&gt;, you can use the &lt;a href=&#34;../../../../../en/sql-reference/functions/mathematical-functions/hash/#&#34;&gt;HASH&lt;/a&gt; function to convert it into a column of type INT, where each category is mapped to a unique integer. However, note that NULL values are hashed to zero, so they will be included in the test instead of skipped by the function.

&lt;/div&gt;
&lt;p&gt;This function is a &lt;a href=&#34;../../../../../en/extending/developing-udxs/transform-functions-udtfs/multiphasetransformfunctionfactory-class/&#34;&gt;multi-phase transform function&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;syntax&#34;&gt;Syntax&lt;/h2&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;CHI_SQUARED( &lt;span class=&#34;code-variable&#34;&gt;&#39;x-column&#39;&lt;/span&gt;, &lt;span class=&#34;code-variable&#34;&gt;&#39;y-column&#39;&lt;/span&gt; 
    [ USING PARAMETERS &lt;span class=&#34;code-variable&#34;&gt;param&lt;/span&gt;=&lt;span class=&#34;code-variable&#34;&gt;value&lt;/span&gt;[,...] ] )
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;arguments&#34;&gt;Arguments&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;x-column&lt;/code&gt;&lt;/em&gt;, &lt;em&gt;&lt;code&gt;y-column&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;Columns in the input relation to be tested for dependency with each other. These columns must contain categorical data in numeric format.&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;parameters&#34;&gt;Parameters&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;x_cardinality&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Integer in the range [1, 20], the cardinality of &lt;em&gt;x-column&lt;/em&gt;. If the cardinality of &lt;em&gt;x-column&lt;/em&gt; is less than the default value of 20, setting this parameter can decrease the amount memory used by the function.
&lt;p&gt;&lt;strong&gt;Default:&lt;/strong&gt; 20&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;y_cardinality&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Integer in the range [1, 20], the cardinality of &lt;em&gt;y-column&lt;/em&gt;. If the cardinality of &lt;em&gt;y-column&lt;/em&gt; is less than the default value of 20, setting this parameter can decrease the amount memory used by the function.
&lt;p&gt;&lt;strong&gt;Default:&lt;/strong&gt; 20&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;alpha&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Float in the range (0.0, 1.0), the significance level. If the returned &lt;code&gt;pvalue&lt;/code&gt; is less than this value, the null hypothesis, which assumes the variables are independent, is rejected.
&lt;p&gt;&lt;strong&gt;Default:&lt;/strong&gt; 0.05&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;returns&#34;&gt;Returns&lt;/h2&gt;
&lt;p&gt;The function returns two values:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;pvalue&lt;/code&gt; (float): the confidence that the two variables are independent. If this value is greater than the &lt;code&gt;alpha&lt;/code&gt; parameter value, the null hypothesis is accepted and the variables are considered independent.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;independent&lt;/code&gt; (boolean): true if the variables are independent; otherwise, false.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;privileges&#34;&gt;Privileges&lt;/h2&gt;
&lt;p&gt;SELECT privileges on the input relation&lt;/p&gt;
&lt;h2 id=&#34;examples&#34;&gt;Examples&lt;/h2&gt;
&lt;p&gt;The following examples use the &lt;code&gt;titanic&lt;/code&gt; dataset from the machine learning example data. If you have not downloaded these datasets, see &lt;a href=&#34;../../../../../en/data-analysis/ml-predictive-analytics/download-ml-example-data/#&#34;&gt;Download the machine learning example data&lt;/a&gt; for instructions.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;titanic_training&lt;/code&gt; table contains data related to passengers on the Titanic, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;pclass&lt;/code&gt;: the ticket class of the passenger, ranging from 1st class to 3rd class&lt;/li&gt;
&lt;li&gt;&lt;code&gt;survived&lt;/code&gt;: whether the passenger survived, where 1 is yes and 0 is no&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gender&lt;/code&gt;: gender of the passenger&lt;/li&gt;
&lt;li&gt;&lt;code&gt;sibling_and_spouse_count&lt;/code&gt;: number of siblings aboard the Titanic&lt;/li&gt;
&lt;li&gt;&lt;code&gt;embarkation_point&lt;/code&gt;: port of embarkation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To test whether the survival of a passenger is dependent on their ticket class, run the following chi-square test:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;o&#34;&gt;=&amp;gt;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;SELECT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;CHI_SQUARED&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;pclass&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;survived&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;USING&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;PARAMETERS&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;x_cardinality&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;3&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;y_cardinality&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;2&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;alpha&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;05&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;OVER&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;FROM&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;titanic_training&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;pvalue&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;|&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;independent&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;--------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;      &lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;|&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;f&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;row&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With a returned &lt;code&gt;pvalue&lt;/code&gt; of zero, the null hypothesis is rejected and you can conclude that the &lt;code&gt;survived&lt;/code&gt; and &lt;code&gt;pclass&lt;/code&gt; variables are dependent. To test whether this outcome is conditional on the gender of the passenger, partition by the &lt;code&gt;gender&lt;/code&gt; column in the OVER clause:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;o&#34;&gt;=&amp;gt;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;SELECT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;CHI_SQUARED&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;pclass&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;survived&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;USING&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;PARAMETERS&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;x_cardinality&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;3&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;y_cardinality&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;2&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;OVER&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;PARTITION&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;BY&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;gender&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;FROM&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;titanic&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;pvalue&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;|&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;independent&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;--------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;      &lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;|&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;f&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;row&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;As the &lt;code&gt;pvalue&lt;/code&gt; is still zero, it is clear that the dependence of the &lt;code&gt;pclass&lt;/code&gt; and &lt;code&gt;survived&lt;/code&gt; variables is not conditional on the gender of the passenger.&lt;/p&gt;
&lt;p&gt;If one of the categorical columns that you want to test is not a numeric type, use the &lt;a href=&#34;../../../../../en/sql-reference/functions/mathematical-functions/hash/#&#34;&gt;HASH&lt;/a&gt; function to convert it into type INT:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;o&#34;&gt;=&amp;gt;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;SELECT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;CHI_SQUARED&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;sibling_and_spouse_count&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;HASH&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;embarkation_point&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;USING&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;PARAMETERS&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;alpha&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;05&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;OVER&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;FROM&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;titanic_training&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;       &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;pvalue&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;       &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;|&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;independent&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;--------------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;0753039994044853&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;|&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;t&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;row&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The returned &lt;code&gt;pvalue&lt;/code&gt; is greater than &lt;code&gt;alpha&lt;/code&gt;, meaning the null hypothesis is accepted and the &lt;code&gt;sibling_and_spouse_count&lt;/code&gt; and &lt;code&gt;embarkation_point&lt;/code&gt; are independent.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Sql-Reference: CORR_MATRIX</title>
      <link>/en/sql-reference/functions/ml-functions/data-preparation/corr-matrix/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/sql-reference/functions/ml-functions/data-preparation/corr-matrix/</guid>
      <description>
        
        
        &lt;p&gt;Takes an input relation with numeric columns, and calculates the &lt;em&gt;&lt;code&gt;Pearson Correlation Coefficient&lt;/code&gt;&lt;/em&gt; between each pair of its input columns. The function is implemented as a Multi-Phase Transform function.&lt;/p&gt;
&lt;h2 id=&#34;syntax&#34;&gt;Syntax&lt;/h2&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;CORR_MATRIX ( &lt;span class=&#34;code-variable&#34;&gt;input-columns&lt;/span&gt; ) OVER()
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;arguments&#34;&gt;Arguments&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;input-columns&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;A comma-separated list of the columns in the input table. The input columns can be of any numeric type or BOOL, but they will be converted internally to FLOAT. The number of input columns must be more than 1 and not more than 1600.&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;returns&#34;&gt;Returns&lt;/h2&gt;
&lt;p&gt;CORR_MATRIX returns the correlation matrix in triplet format. That is, each pair-wise correlation is identified by three returned columns: name of the first variable, name of the second variable, and the correlation value of the pair. The function also returns two extra columns: &lt;code&gt;number_of_ignored_input_rows&lt;/code&gt; and &lt;code&gt;number_of_processed_input_rows&lt;/code&gt;. The value of the fourth/fifth column indicates the number of rows from the input which are ignored/used to calculate the corresponding correlation value. Any input pair with NULL, Inf, or NaN is ignored.&lt;/p&gt;
&lt;p&gt;The correlation matrix is symmetric with a value of 1 on all diagonal elements; therefore, it can return only the value of elements above the diagonals—that is, the upper triangle. Nevertheless, the function returns the entire matrix to simplify any later operations. Then, the number of output rows is:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;(&lt;span class=&#34;code-variable&#34;&gt;#input-columns&lt;/span&gt;)^2
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The first two output columns are of type VARCHAR(128), the third one is of type FLOAT, and the last two are of type INT.&lt;/p&gt;
&lt;h2 id=&#34;notes&#34;&gt;Notes&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The contents of the OVER clause must be empty.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The function returns no rows when the input table is empty.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When any of X_i and Y_i is NULL, Inf, or NaN, the pair will not be included in the calculation of CORR(X, Y). That is, any input pair with NULL, Inf, or NaN is ignored.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;For the pair of (X,X), regardless of the contents of X: CORR(X,X) = 1, number_of_ignored_input_rows = 0, and number_of_processed_input_rows = #input_rows.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When (N&lt;em&gt;SUMX2 == SUMX&lt;/em&gt;SUMX) or (N&lt;em&gt;SUMY2 == SUMY&lt;/em&gt;SUMY) then value of CORR(X, Y) will be NULL. In theory it can happen in case of a column with constant values; nevertheless, it may not be always observed because of rounding error.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In the special case where all pair values of (X_i,Y_i) contain NULL, inf, or NaN, and X != Y: CORR(X,Y)=NULL.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;examples&#34;&gt;Examples&lt;/h2&gt;
&lt;p&gt;The following example uses the &lt;a href=&#34;http://archive.ics.uci.edu/ml/datasets/Iris&#34;&gt;iris&lt;/a&gt; dataset.*

&lt;table class=&#34;table table-bordered&#34; &gt;



&lt;tr&gt; 

&lt;td &gt;























&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT CORR_MATRIX(&amp;#34;Sepal.Length&amp;#34;, &amp;#34;Sepal.Width&amp;#34;, &amp;#34;Petal.Length&amp;#34;, &amp;#34;Petal.Width&amp;#34;) OVER() FROM iris;
variable_name_1 | variable_name_2 | corr_value        | number_of_ignored_input_rows | number_of_processed_input_rows
----------------+-----------------+-------------------+------------------------------+--------------------------------
Sepal.Length    | Sepal.Width     |-0.117569784133002 | 0                            | 150
Sepal.Width     | Sepal.Length    |-0.117569784133002 | 0                            | 150
Sepal.Length    | Petal.Length    |0.871753775886583  | 0                            | 150
Petal.Length    | Sepal.Length    |0.871753775886583  | 0                            | 150
Sepal.Length    | Petal.Width     |0.817941126271577  | 0                            | 150
Petal.Width     | Sepal.Length    |0.817941126271577  | 0                            | 150
Sepal.Width     | Petal.Length    |-0.42844010433054  | 0                            | 150
Petal.Length    | Sepal.Width     |-0.42844010433054  | 0                            | 150
Sepal.Width     | Petal.Width     |-0.366125932536439 | 0                            | 150
Petal.Width     | Sepal.Width     |-0.366125932536439 | 0                            | 150
Petal.Length    | Petal.Width     |0.962865431402796  | 0                            | 150
Petal.Width     | Petal.Length    |0.962865431402796  | 0                            | 150
Sepal.Length    | Sepal.Length    |1                  | 0                            | 150
Sepal.Width     | Sepal.Width     |1                  | 0                            | 150
Petal.Length    | Petal.Length    |1                  | 0                            | 150
Petal.Width     | Petal.Width     |1                  | 0                            | 150
(16 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Sql-Reference: DETECT_OUTLIERS</title>
      <link>/en/sql-reference/functions/ml-functions/data-preparation/detect-outliers/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/sql-reference/functions/ml-functions/data-preparation/detect-outliers/</guid>
      <description>
        
        
        &lt;p&gt;Returns the outliers in a data set based on the outlier threshold. The output is a table that contains the outliers. &lt;code&gt;DETECT_OUTLIERS&lt;/code&gt; uses the detection method &lt;code&gt;robust_szcore&lt;/code&gt; to normalize each input column. The function then identifies as outliers all rows that contain a normalized value greater than the default or specified threshold.

&lt;div class=&#34;alert admonition note&#34; role=&#34;alert&#34;&gt;
&lt;h4 class=&#34;admonition-head&#34;&gt;Note&lt;/h4&gt;

You can calculate normalized column values with OpenText™ Analytics Database functions
&lt;code&gt;&lt;a href=&#34;../../../../../en/sql-reference/functions/ml-functions/data-preparation/normalize/#&#34;&gt;NORMALIZE&lt;/a&gt;&lt;/code&gt; and
&lt;code&gt;&lt;a href=&#34;../../../../../en/sql-reference/functions/ml-functions/data-preparation/normalize-fit/#&#34;&gt;NORMALIZE_FIT&lt;/a&gt;&lt;/code&gt;.

&lt;/div&gt;&lt;/p&gt;
&lt;p&gt;This is a meta-function. You must call meta-functions in a top-level &lt;a href=&#34;../../../../../en/sql-reference/statements/select/#&#34;&gt;SELECT&lt;/a&gt; statement.&lt;/p&gt;

&lt;h2 id=&#34;behavior-type&#34;&gt;Behavior type&lt;/h2&gt;
&lt;a class=&#34;glosslink&#34; href=&#34;../../../../../en/glossary/volatile-functions/&#34; title=&#34;&#34;&gt;Volatile&lt;/a&gt;
&lt;h2 id=&#34;syntax&#34;&gt;Syntax&lt;/h2&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;DETECT_OUTLIERS ( &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;output-table&lt;/span&gt;&amp;#39;, &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;input-relation&lt;/span&gt;&amp;#39;,&amp;#39;&lt;span class=&#34;code-variable&#34;&gt;input-columns&lt;/span&gt;&amp;#39;, &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;detection-method&lt;/span&gt;&amp;#39;
        [ USING PARAMETERS
              [outlier_threshold =&lt;span class=&#34;code-variable&#34;&gt; threshold&lt;/span&gt;]
              [, exclude_columns = &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;excluded-columns&lt;/span&gt;&amp;#39;]
              [, partition_columns = &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;partition-columns&lt;/span&gt;&amp;#39;] ] )
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;arguments&#34;&gt;Arguments&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;output-table&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;The name of the table where the database saves rows that are outliers along the chosen &lt;code&gt;input_columns&lt;/code&gt;. All columns are present in this table.&lt;/dd&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;input-relation&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;The table or view that contains outlier data. If the input relation is defined in Hive, use 
&lt;code&gt;&lt;a href=&#34;../../../../../en/sql-reference/functions/hadoop-functions/sync-with-hcatalog-schema/#&#34;&gt;SYNC_WITH_HCATALOG_SCHEMA&lt;/a&gt;&lt;/code&gt; to sync the &lt;code&gt;hcatalog&lt;/code&gt; schema, and then run the machine learning function.&lt;/dd&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;input-columns&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;Comma-separated list of columns to use from the input relation, or asterisk (*) to select all columns. Input columns must be of type &lt;a href=&#34;../../../../../en/sql-reference/data-types/numeric-data-types/&#34;&gt;numeric&lt;/a&gt;.&lt;/dd&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;detection-method&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;The outlier detection method to use, set to &lt;code&gt;robust_zscore&lt;/code&gt;.&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;parameters&#34;&gt;Parameters&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;outlier_threshold&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;The minimum normalized value in a row that is used to identify that row as an outlier.
&lt;p&gt;&lt;strong&gt;Default:&lt;/strong&gt; 3.0&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;exclude_columns&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;Comma-separated list of column names from &lt;em&gt;&lt;code&gt;input-columns&lt;/code&gt;&lt;/em&gt; to exclude from processing.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;partition_columns&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Comma-separated list of column names from the input table or view that defines the partitions. &lt;code&gt;DETECT_OUTLIERS&lt;/code&gt; detects outliers among each partition separately.
&lt;p&gt;&lt;strong&gt;Default:&lt;/strong&gt; empty list&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;privileges&#34;&gt;Privileges&lt;/h2&gt;
&lt;p&gt;Non-superusers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;SELECT privileges on the input relation&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;CREATE privileges on the output table&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;examples&#34;&gt;Examples&lt;/h2&gt;
&lt;p&gt;The following example shows how to use &lt;code&gt;DETECT_OUTLIERS&lt;/code&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; CREATE TABLE baseball_roster (id identity, last_name varchar(30), hr int, avg float);
CREATE TABLE

=&amp;gt; COPY baseball_roster FROM STDIN;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
&amp;gt;&amp;gt; Polo|7|.233
&amp;gt;&amp;gt; Gloss|45|.170
&amp;gt;&amp;gt; Gus|12|.345
&amp;gt;&amp;gt; Gee|1|.125
&amp;gt;&amp;gt; Laus|3|.095
&amp;gt;&amp;gt; Hilltop|16|.222
&amp;gt;&amp;gt; Wicker|78|.333
&amp;gt;&amp;gt; Scooter|0|.121
&amp;gt;&amp;gt; Hank|999999|.8888
&amp;gt;&amp;gt; Popup|35|.378
&amp;gt;&amp;gt; \.


=&amp;gt; SELECT * FROM baseball_roster;
 id | last_name |   hr   |  avg
----+-----------+--------+--------
  3 | Gus       |     12 |  0.345
  4 | Gee       |      1 |  0.125
  6 | Hilltop   |     16 |  0.222
 10 | Popup     |     35 |  0.378
  1 | Polo      |      7 |  0.233
  7 | Wicker    |     78 |  0.333
  9 | Hank      | 999999 | 0.8888
  2 | Gloss     |     45 |   0.17
  5 | Laus      |      3 |  0.095
  8 | Scooter   |      0 |  0.121
(10 rows)

=&amp;gt; SELECT DETECT_OUTLIERS(&amp;#39;baseball_outliers&amp;#39;, &amp;#39;baseball_roster&amp;#39;, &amp;#39;id, hr, avg&amp;#39;, &amp;#39;robust_zscore&amp;#39; USING PARAMETERS
outlier_threshold=3.0);

     DETECT_OUTLIERS
--------------------------
 Detected 2 outliers

(1 row)

=&amp;gt; SELECT * FROM baseball_outliers;
 id | last_name | hr         | avg
----+-----------+------------+-------------
  7 | Wicker    |         78 |       0.333
  9 | Hank      |     999999 |      0.8888
(2 rows)
&lt;/code&gt;&lt;/pre&gt;
      </description>
    </item>
    
    <item>
      <title>Sql-Reference: IFOREST</title>
      <link>/en/sql-reference/functions/ml-functions/data-preparation/iforest/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/sql-reference/functions/ml-functions/data-preparation/iforest/</guid>
      <description>
        
        
        &lt;p&gt;Trains and returns an isolation forest (iForest) model. After you train the model, you can use the &lt;a href=&#34;../../../../../en/sql-reference/functions/ml-functions/transformation-functions/apply-iforest/#&#34;&gt;APPLY_IFOREST&lt;/a&gt; function to predict outliers in an input relation.&lt;/p&gt;
&lt;p&gt;For more information about how the iForest algorithm works, see &lt;a href=&#34;../../../../../en/data-analysis/ml-predictive-analytics/data-preparation/detect-outliers/#iForest&#34;&gt;Isolation Forest&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This is a meta-function. You must call meta-functions in a top-level &lt;a href=&#34;../../../../../en/sql-reference/statements/select/#&#34;&gt;SELECT&lt;/a&gt; statement.&lt;/p&gt;

&lt;h2 id=&#34;behavior-type&#34;&gt;Behavior type&lt;/h2&gt;
&lt;a class=&#34;glosslink&#34; href=&#34;../../../../../en/glossary/volatile-functions/&#34; title=&#34;&#34;&gt;Volatile&lt;/a&gt;
&lt;h2 id=&#34;syntax&#34;&gt;Syntax&lt;/h2&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;IFOREST( &lt;span class=&#34;code-variable&#34;&gt;&#39;model-name&#39;&lt;/span&gt;, &lt;span class=&#34;code-variable&#34;&gt;&#39;input-relation&#39;&lt;/span&gt;, &lt;span class=&#34;code-variable&#34;&gt;&#39;input-columns&#39;&lt;/span&gt; [ USING PARAMETERS &lt;span class=&#34;code-variable&#34;&gt;param&lt;/span&gt;=&lt;span class=&#34;code-variable&#34;&gt;value&lt;/span&gt;[,...] ] )
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;arguments&#34;&gt;Arguments&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;model-name&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;Identifies the model to create, where &lt;em&gt;&lt;code&gt;model-name&lt;/code&gt;&lt;/em&gt; conforms to conventions described in &lt;a href=&#34;../../../../../en/sql-reference/language-elements/identifiers/#&#34;&gt;Identifiers&lt;/a&gt;. It must also be unique among all names of sequences, tables, projections, views, and models within the same schema.&lt;/dd&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;input-relation&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;The table or view that contains the input data for IFOREST.&lt;/dd&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;input-columns&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;Comma-separated list of columns to use from the input relation, or asterisk (*) to select all columns. Columns must be of types CHAR, VARCHAR, BOOL, INT, or FLOAT.
&lt;p&gt;Columns of types CHAR, VARCHAR, and BOOL are treated as categorical features; all others are treated as numeric features.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;parameters&#34;&gt;Parameters&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;exclude_columns&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Comma-separated list of column names from &lt;em&gt;&lt;code&gt;input-columns&lt;/code&gt;&lt;/em&gt; to exclude from processing.
&lt;p&gt;&lt;strong&gt;Default:&lt;/strong&gt; Empty string (&#39;&#39;)&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;ntree&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Integer in the range [1, 1000], specifies the number of trees in the forest.
&lt;p&gt;&lt;strong&gt;Default:&lt;/strong&gt; 100&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;sampling_size&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Float in the range (0.0, 1.0], specifies the portion of the input data set that is randomly picked, without replacement, for training each tree.
&lt;p&gt;&lt;strong&gt;Default:&lt;/strong&gt; 0.632&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;col_sample_by_tree&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Float in the range (0.0, 1.0], specifies the fraction of columns that are randomly picked for training each tree.
&lt;p&gt;&lt;strong&gt;Default:&lt;/strong&gt; 1.0&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;max_depth&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Integer in the range [1, 100], specifies the maximum depth for growing each tree.
&lt;p&gt;&lt;strong&gt;Default:&lt;/strong&gt; 10&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;nbins&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Integer in the range [2, 1000], specifies the number of bins used to discretize continuous features.
&lt;p&gt;&lt;strong&gt;Default:&lt;/strong&gt; 32&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;model-attributesbr-&#34;&gt;Model Attributes&lt;br /&gt;&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;details&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Details about the function&#39;s predictor columns, including:
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;predictor&lt;/code&gt;: Names of the predictors in the same order specified when training the model.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;type&lt;/code&gt;: Types of the predictors in the same order as their names in &lt;code&gt;predictor&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;tree_count&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Number of trees in the model.&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;rejected_row_count&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Number of rows in &lt;em&gt;&lt;code&gt;input-relation&lt;/code&gt;&lt;/em&gt; that were skipped because they contained an invalid value.&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;accepted_row_count&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Total number of rows in &lt;em&gt;&lt;code&gt;input-relation&lt;/code&gt;&lt;/em&gt; minus &lt;code&gt;rejected_row_count&lt;/code&gt;.&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;call_string&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Value of all input arguments that were specified at the time the function was called.&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;privileges&#34;&gt;Privileges&lt;/h2&gt;
&lt;p&gt;Non-superusers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;CREATE privileges on the schema where the model is created&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;SELECT privileges on the input relation&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;examples&#34;&gt;Examples&lt;/h2&gt;
&lt;p&gt;In the following example, the input data to the function contains columns of type INT, VARCHAR, and FLOAT:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT IFOREST(&amp;#39;baseball_anomalies&amp;#39;,&amp;#39;baseball&amp;#39;,&amp;#39;team, hr, hits, avg, salary&amp;#39; USING PARAMETERS ntree=75, sampling_size=0.7,
max_depth=15);
IFOREST
----------
Finished
(1 row)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;You can verify that all the input columns were read in correctly by calling &lt;a href=&#34;../../../../../en/sql-reference/functions/ml-functions/model-management/get-model-summary/#&#34;&gt;GET_MODEL_SUMMARY&lt;/a&gt; and checking the details section:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT GET_MODEL_SUMMARY(USING PARAMETERS model_name=&amp;#39;baseball_anomalies&amp;#39;);
GET_MODEL_SUMMARY
-------------------------------------------------------------------------------------------------------------------------------------

===========
call_string
===========
SELECT iforest(&amp;#39;public.baseball_anomalies&amp;#39;, &amp;#39;baseball&amp;#39;, &amp;#39;team, hr, hits, avg, salary&amp;#39; USING PARAMETERS exclude_columns=&amp;#39;&amp;#39;, ntree=75,
sampling_size=0.7, col_sample_by_tree=1, max_depth=15, nbins=32);

=======
details
=======
predictor|      type
---------+----------------
  team   |char or varchar
   hr    |      int
  hits   |      int
   avg   |float or numeric
 salary  |float or numeric


===============
Additional Info
===============
       Name       |Value
------------------+-----
    tree_count    | 75
rejected_row_count|  0
accepted_row_count|1000

(1 row)
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;see-also&#34;&gt;See also&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;../../../../../en/data-analysis/ml-predictive-analytics/data-preparation/detect-outliers/#&#34;&gt;Detect outliers&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;../../../../../en/sql-reference/functions/ml-functions/transformation-functions/apply-iforest/#&#34;&gt;APPLY_IFOREST&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;../../../../../en/sql-reference/functions/ml-functions/model-evaluation/read-tree/#&#34;&gt;READ_TREE&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Sql-Reference: IMPUTE</title>
      <link>/en/sql-reference/functions/ml-functions/data-preparation/impute/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/sql-reference/functions/ml-functions/data-preparation/impute/</guid>
      <description>
        
        
        &lt;p&gt;Imputes missing values in a data set with either the mean or the mode, based on observed values for a variable in each column. This function supports &lt;a href=&#34;../../../../../en/sql-reference/data-types/numeric-data-types/&#34;&gt;numeric&lt;/a&gt; and categorical data types.&lt;/p&gt;
&lt;p&gt;This is a meta-function. You must call meta-functions in a top-level &lt;a href=&#34;../../../../../en/sql-reference/statements/select/#&#34;&gt;SELECT&lt;/a&gt; statement.&lt;/p&gt;

&lt;h2 id=&#34;behavior-type&#34;&gt;Behavior type&lt;/h2&gt;
&lt;a class=&#34;glosslink&#34; href=&#34;../../../../../en/glossary/volatile-functions/&#34; title=&#34;&#34;&gt;Volatile&lt;/a&gt;
&lt;h2 id=&#34;syntax&#34;&gt;Syntax&lt;/h2&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;IMPUTE( &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;output-view&lt;/span&gt;&amp;#39;, &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;input-relation&lt;/span&gt;&amp;#39;, &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;input-columns&lt;/span&gt;&amp;#39;, &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;method&lt;/span&gt;&amp;#39;
        [ USING PARAMETERS [exclude_columns = &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;excluded-columns&lt;/span&gt;&amp;#39;] [, partition_columns = &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;partition-columns&lt;/span&gt;&amp;#39;] ] )
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This is a meta-function. You must call meta-functions in a top-level &lt;a href=&#34;../../../../../en/sql-reference/statements/select/#&#34;&gt;SELECT&lt;/a&gt; statement.&lt;/p&gt;

&lt;h2 id=&#34;arguments&#34;&gt;Arguments&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;output-view&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;Name of the view that shows the input table with imputed values in place of missing values. In this view, rows without missing values are kept intact while the rows with missing values are modified according to the specified method.&lt;/dd&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;input-relation&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;The table or view that contains the data for missing-value imputation. If the input relation is defined in Hive, use 
&lt;code&gt;&lt;a href=&#34;../../../../../en/sql-reference/functions/hadoop-functions/sync-with-hcatalog-schema/#&#34;&gt;SYNC_WITH_HCATALOG_SCHEMA&lt;/a&gt;&lt;/code&gt; to sync the &lt;code&gt;hcatalog&lt;/code&gt; schema, and then run the machine learning function.&lt;/dd&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;input-columns&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;Comma-separated list of input columns where missing values will be replaced, or asterisk (*) to specify all columns. All columns must be of type &lt;a href=&#34;../../../../../en/sql-reference/data-types/numeric-data-types/&#34;&gt;numeric&lt;/a&gt; or BOOLEAN.&lt;/dd&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;method&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;The method to compute the missing value replacements, one of the following:
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;mean&lt;/code&gt;: The missing values in each column will be replaced by the mean of that column. This method can be used for &lt;a href=&#34;../../../../../en/sql-reference/data-types/numeric-data-types/&#34;&gt;numeric&lt;/a&gt; data only.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;mode&lt;/code&gt;: The missing values in each column will be replaced by the most frequent value in that column. This method can be used for categorical data only.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;parameters&#34;&gt;Parameters&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;exclude_columns&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;Comma-separated list of column names from &lt;em&gt;&lt;code&gt;input-columns&lt;/code&gt;&lt;/em&gt; to exclude from processing.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;partition_columns&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Comma-separated list of column names from the input relation that defines the partitions.&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;privileges&#34;&gt;Privileges&lt;/h2&gt;
&lt;p&gt;Non-superusers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;SELECT privileges on the input relation&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;CREATE privileges on the output view schema&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;examples&#34;&gt;Examples&lt;/h2&gt;
&lt;p&gt;Execute &lt;code&gt;IMPUTE&lt;/code&gt; on the &lt;code&gt;small_input_impute&lt;/code&gt; table, specifying the mean method:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT impute(&amp;#39;output_view&amp;#39;,&amp;#39;small_input_impute&amp;#39;, &amp;#39;pid, x1,x2,x3,x4&amp;#39;,&amp;#39;mean&amp;#39;
USING PARAMETERS exclude_columns=&amp;#39;pid&amp;#39;);
impute
--------------------------
Finished in 1 iteration
(1 row)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Execute &lt;code&gt;IMPUTE&lt;/code&gt;, specifying the mode method:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT impute(&amp;#39;output_view3&amp;#39;,&amp;#39;small_input_impute&amp;#39;, &amp;#39;pid, x5,x6&amp;#39;,&amp;#39;mode&amp;#39; USING PARAMETERS exclude_columns=&amp;#39;pid&amp;#39;);
impute
--------------------------
Finished in 1 iteration
(1 row)
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;see-also&#34;&gt;See also&lt;/h2&gt;
&lt;a href=&#34;../../../../../en/data-analysis/ml-predictive-analytics/data-preparation/imputing-missing-values/#&#34;&gt;Imputing missing values&lt;/a&gt;

      </description>
    </item>
    
    <item>
      <title>Sql-Reference: NORMALIZE</title>
      <link>/en/sql-reference/functions/ml-functions/data-preparation/normalize/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/sql-reference/functions/ml-functions/data-preparation/normalize/</guid>
      <description>
        
        
        &lt;p&gt;Runs a normalization algorithm on an input relation. The output is a view with the normalized data.

&lt;div class=&#34;alert admonition note&#34; role=&#34;alert&#34;&gt;
&lt;h4 class=&#34;admonition-head&#34;&gt;Note&lt;/h4&gt;

&lt;strong&gt;Note&lt;/strong&gt;: This function differs from NORMALIZE_FIT, which creates and stores a model rather than creating a view definition. This can lead to different performance characteristics between the two functions.

&lt;/div&gt;&lt;/p&gt;
&lt;p&gt;This is a meta-function. You must call meta-functions in a top-level &lt;a href=&#34;../../../../../en/sql-reference/statements/select/#&#34;&gt;SELECT&lt;/a&gt; statement.&lt;/p&gt;

&lt;h2 id=&#34;behavior-type&#34;&gt;Behavior type&lt;/h2&gt;
&lt;a class=&#34;glosslink&#34; href=&#34;../../../../../en/glossary/volatile-functions/&#34; title=&#34;&#34;&gt;Volatile&lt;/a&gt;
&lt;h2 id=&#34;syntax&#34;&gt;Syntax&lt;/h2&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;NORMALIZE ( &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;output-view&lt;/span&gt;&amp;#39;, &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;input-relation&lt;/span&gt;&amp;#39;, &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;input-columns&lt;/span&gt;&amp;#39;, &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;normalization-method&lt;/span&gt;&amp;#39;
           [ USING PARAMETERS exclude_columns = &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;excluded-columns&lt;/span&gt;&amp;#39; ] )
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;arguments&#34;&gt;Arguments&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;output-view&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;The name of the view showing the input relation with normalized data replacing the specified input columns. .&lt;/dd&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;input-relation&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;The table or view that contains the data to normalize. If the input relation is defined in Hive, use 
&lt;code&gt;&lt;a href=&#34;../../../../../en/sql-reference/functions/hadoop-functions/sync-with-hcatalog-schema/#&#34;&gt;SYNC_WITH_HCATALOG_SCHEMA&lt;/a&gt;&lt;/code&gt; to sync the &lt;code&gt;hcatalog&lt;/code&gt; schema, and then run the machine learning function.&lt;/dd&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;input-columns&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;Comma-separated list of &lt;a href=&#34;../../../../../en/sql-reference/data-types/numeric-data-types/&#34;&gt;numeric&lt;/a&gt; input columns that contain the values to normalize, or asterisk (*) to select all columns.&lt;/dd&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;normalization-method&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;The normalization method to use, one of the following:
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;minmax&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;zscore&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;robust_zscore&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If infinity values appear in the table, the method ignores those values.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;parameters&#34;&gt;Parameters&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;exclude_columns&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;Comma-separated list of column names from &lt;em&gt;&lt;code&gt;input-columns&lt;/code&gt;&lt;/em&gt; to exclude from processing.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;privileges&#34;&gt;Privileges&lt;/h2&gt;
&lt;p&gt;Non-superusers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;SELECT privileges on the input relation&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;CREATE privileges on the output view schema&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;examples&#34;&gt;Examples&lt;/h2&gt;
&lt;p&gt;These examples show how you can use the NORMALIZE function on the &lt;code&gt;wt&lt;/code&gt; and &lt;code&gt;hp&lt;/code&gt; columns in the mtcars table.&lt;/p&gt;
&lt;p&gt;Execute the NORMALIZE function, and specify the &lt;code&gt;minmax&lt;/code&gt; method:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT NORMALIZE(&amp;#39;mtcars_norm&amp;#39;, &amp;#39;mtcars&amp;#39;,
                    &amp;#39;wt, hp&amp;#39;, &amp;#39;minmax&amp;#39;);
        NORMALIZE
--------------------------
 Finished in 1 iteration

(1 row)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Execute the NORMALIZE function, and specify the &lt;code&gt;zscore&lt;/code&gt; method:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT NORMALIZE(&amp;#39;mtcars_normz&amp;#39;,&amp;#39;mtcars&amp;#39;,
                    &amp;#39;wt, hp&amp;#39;, &amp;#39;zscore&amp;#39;);
        NORMALIZE
--------------------------
 Finished in 1 iteration

(1 row)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Execute the NORMALIZE function, and specify the &lt;code&gt;robust_zscore&lt;/code&gt; method:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT NORMALIZE(&amp;#39;mtcars_normz&amp;#39;, &amp;#39;mtcars&amp;#39;,
                    &amp;#39;wt, hp&amp;#39;, &amp;#39;robust_zscore&amp;#39;);
        NORMALIZE
--------------------------
 Finished in 1 iteration

(1 row)
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;see-also&#34;&gt;See also&lt;/h2&gt;
&lt;a href=&#34;../../../../../en/data-analysis/ml-predictive-analytics/data-preparation/normalizing-data/#&#34;&gt;Normalizing data&lt;/a&gt;

      </description>
    </item>
    
    <item>
      <title>Sql-Reference: NORMALIZE_FIT</title>
      <link>/en/sql-reference/functions/ml-functions/data-preparation/normalize-fit/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/sql-reference/functions/ml-functions/data-preparation/normalize-fit/</guid>
      <description>
        
        
        
&lt;div class=&#34;alert admonition note&#34; role=&#34;alert&#34;&gt;
&lt;h4 class=&#34;admonition-head&#34;&gt;Note&lt;/h4&gt;

This function differs from &lt;a href=&#34;../../../../../en/sql-reference/functions/ml-functions/data-preparation/normalize/#&#34;&gt;NORMALIZE&lt;/a&gt;, which directly outputs a view with normalized results, rather than storing normalization parameters into a model for later operation.

&lt;/div&gt;
&lt;p&gt;&lt;code&gt;NORMALIZE_FIT&lt;/code&gt; computes normalization parameters for each of the specified columns in an input relation. The resulting model stores the normalization parameters. For example, for &lt;code&gt;MinMax&lt;/code&gt; normalization, the minimum and maximum value of each column are stored in the model. The generated model serves as input to functions &lt;a href=&#34;../../../../../en/sql-reference/functions/ml-functions/transformation-functions/apply-normalize/#&#34;&gt;APPLY_NORMALIZE&lt;/a&gt; and &lt;a href=&#34;../../../../../en/sql-reference/functions/ml-functions/transformation-functions/reverse-normalize/#&#34;&gt;REVERSE_NORMALIZE&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This is a meta-function. You must call meta-functions in a top-level &lt;a href=&#34;../../../../../en/sql-reference/statements/select/#&#34;&gt;SELECT&lt;/a&gt; statement.&lt;/p&gt;

&lt;h2 id=&#34;behavior-type&#34;&gt;Behavior type&lt;/h2&gt;
&lt;a class=&#34;glosslink&#34; href=&#34;../../../../../en/glossary/volatile-functions/&#34; title=&#34;&#34;&gt;Volatile&lt;/a&gt;
&lt;h2 id=&#34;syntax&#34;&gt;Syntax&lt;/h2&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;NORMALIZE_FIT ( &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;model-name&lt;/span&gt;&amp;#39;, &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;input-relation&lt;/span&gt;&amp;#39;, &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;input-columns&lt;/span&gt;&amp;#39;, &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;normalization-method&lt;/span&gt;&amp;#39;
        [ USING PARAMETERS  [exclude_columns = &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;excluded-columns&lt;/span&gt;&amp;#39;] [, output_view = &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;output-view&lt;/span&gt;&amp;#39;] ] )
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;arguments&#34;&gt;Arguments&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;model-name&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;Identifies the model to create, where &lt;em&gt;&lt;code&gt;model-name&lt;/code&gt;&lt;/em&gt; conforms to conventions described in &lt;a href=&#34;../../../../../en/sql-reference/language-elements/identifiers/#&#34;&gt;Identifiers&lt;/a&gt;. It must also be unique among all names of sequences, tables, projections, views, and models within the same schema.&lt;/dd&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;input-relation&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;The table or view that contains the data to normalize. If the input relation is defined in Hive, use 
&lt;code&gt;&lt;a href=&#34;../../../../../en/sql-reference/functions/hadoop-functions/sync-with-hcatalog-schema/#&#34;&gt;SYNC_WITH_HCATALOG_SCHEMA&lt;/a&gt;&lt;/code&gt; to sync the &lt;code&gt;hcatalog&lt;/code&gt; schema, and then run the machine learning function.&lt;/dd&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;input-columns&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;Comma-separated list of columns to use from the input relation, or asterisk (*) to select all columns. Input columns must be of data type &lt;a href=&#34;../../../../../en/sql-reference/data-types/numeric-data-types/&#34;&gt;numeric&lt;/a&gt;.&lt;/dd&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;normalization-method&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;The normalization method to use, one of the following:
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;minmax&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;zscore&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;robust_zscore&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you specify &lt;code&gt;robust_zscore&lt;/code&gt;, &lt;code&gt;NORMALIZE_FIT&lt;/code&gt; uses the function &lt;a href=&#34;../../../../../en/sql-reference/functions/aggregate-functions/approximate-median-aggregate/#&#34;&gt;APPROXIMATE_MEDIAN [aggregate]&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;All normalization methods ignore infinity, negative infinity, or NULL values in the input relation.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;parameters&#34;&gt;Parameters&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;exclude_columns&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;Comma-separated list of column names from &lt;em&gt;&lt;code&gt;input-columns&lt;/code&gt;&lt;/em&gt; to exclude from processing.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;output_view&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Name of the view that contains all columns from the input relation, with the specified input columns normalized.&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;&lt;a name=&#34;ModelAttributes&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;model-attributes&#34;&gt;Model attributes&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;data&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Normalization method set to &lt;code&gt;minmax&lt;/code&gt;:
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;colNames&lt;/code&gt;: Model column names&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;mins&lt;/code&gt;: Minimum value of each column&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;maxes&lt;/code&gt;: Maximum value of each column&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;privileges&#34;&gt;Privileges&lt;/h2&gt;
&lt;p&gt;Non-superusers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;CREATE privileges on the schema where the model is created&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;SELECT privileges on the input relation&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;CREATE privileges on the output view schema&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;examples&#34;&gt;Examples&lt;/h2&gt;
&lt;p&gt;The following example creates a model with &lt;code&gt;NORMALIZE_FIT&lt;/code&gt; using the &lt;code&gt;wt&lt;/code&gt; and &lt;code&gt;hp&lt;/code&gt; columns in table &lt;code&gt;mtcars&lt;/code&gt; , and then uses this model in successive calls to &lt;a href=&#34;../../../../../en/sql-reference/functions/ml-functions/transformation-functions/apply-normalize/#&#34;&gt;APPLY_NORMALIZE&lt;/a&gt; and &lt;a href=&#34;../../../../../en/sql-reference/functions/ml-functions/transformation-functions/reverse-normalize/#&#34;&gt;REVERSE_NORMALIZE&lt;/a&gt;.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT NORMALIZE_FIT(&amp;#39;mtcars_normfit&amp;#39;, &amp;#39;mtcars&amp;#39;, &amp;#39;wt,hp&amp;#39;, &amp;#39;minmax&amp;#39;);
NORMALIZE_FIT
---------------
Success
(1 row)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The following call to &lt;code&gt;APPLY_NORMALIZE&lt;/code&gt; specifies the &lt;code&gt;hp&lt;/code&gt; and &lt;code&gt;cyl&lt;/code&gt; columns in table &lt;code&gt;mtcars&lt;/code&gt;, where &lt;code&gt;hp&lt;/code&gt; is in the normalization model and &lt;code&gt;cyl&lt;/code&gt; is not in the normalization model:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; CREATE TABLE mtcars_normalized AS SELECT APPLY_NORMALIZE (hp, cyl USING PARAMETERS model_name = &amp;#39;mtcars_normfit&amp;#39;) FROM mtcars;
CREATE TABLE
=&amp;gt; SELECT * FROM mtcars_normalized;
          hp        | cyl
--------------------+-----
  0.434628975265018 | 8
  0.681978798586572 | 8
  0.434628975265018 | 6
                  1 | 8
  0.540636042402827 | 8
                  0 | 4
  0.681978798586572 | 8
 0.0459363957597173 | 4
  0.434628975265018 | 8
  0.204946996466431 | 6
  0.250883392226148 | 6
  0.049469964664311 | 4
  0.204946996466431 | 6
  0.201413427561837 | 4
  0.204946996466431 | 6
  0.250883392226148 | 6
  0.049469964664311 | 4
  0.215547703180212 | 4
 0.0353356890459364 | 4
  0.187279151943463 | 6
  0.452296819787986 | 8
  0.628975265017668 | 8
  0.346289752650177 | 8
  0.137809187279152 | 4
  0.749116607773852 | 8
  0.144876325088339 | 4
  0.151943462897526 | 4
  0.452296819787986 | 8
  0.452296819787986 | 8
  0.575971731448763 | 8
  0.159010600706714 | 4
  0.346289752650177 | 8
(32 rows)

=&amp;gt; SELECT REVERSE_NORMALIZE (hp, cyl USING PARAMETERS model_name=&amp;#39;mtcars_normfit&amp;#39;) FROM mtcars_normalized;
  hp | cyl
-----+-----
 175 | 8
 245 | 8
 175 | 6
 335 | 8
 205 | 8
  52 | 4
 245 | 8
  65 | 4
 175 | 8
 110 | 6
 123 | 6
  66 | 4
 110 | 6
 109 | 4
 110 | 6
 123 | 6
  66 | 4
 113 | 4
  62 | 4
 105 | 6
 180 | 8
 230 | 8
 150 | 8
  91 | 4
 264 | 8
  93 | 4
  95 | 4
 180 | 8
 180 | 8
 215 | 8
  97 | 4
 150 | 8
(32 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The following call to &lt;code&gt;REVERSE_NORMALIZE&lt;/code&gt; also specifies the &lt;code&gt;hp&lt;/code&gt; and &lt;code&gt;cyl&lt;/code&gt; columns in table &lt;code&gt;mtcars&lt;/code&gt;, where &lt;code&gt;hp&lt;/code&gt; is in normalization model &lt;code&gt;mtcars_normfit&lt;/code&gt;, and &lt;code&gt;cyl&lt;/code&gt; is not in the normalization model.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT REVERSE_NORMALIZE (hp, cyl USING PARAMETERS model_name=&amp;#39;mtcars_normfit&amp;#39;) FROM mtcars_normalized;
       hp        | cyl
-----------------+-----
205.000005722046 |   8
150.000000357628 |   8
150.000000357628 |   8
93.0000016987324 |   4
 174.99999666214 |   8
94.9999992102385 |   4
214.999997496605 |   8
97.0000009387732 |   4
245.000006556511 |   8
 174.99999666214 |   6
             335 |   8
245.000006556511 |   8
62.0000002086163 |   4
 174.99999666214 |   8
230.000002026558 |   8
              52 |   4
263.999997675419 |   8
109.999999523163 |   6
123.000002324581 |   6
64.9999996386468 |   4
66.0000005029142 |   4
112.999997898936 |   4
109.999999523163 |   6
180.000000983477 |   8
180.000000983477 |   8
108.999998658895 |   4
109.999999523163 |   6
104.999999418855 |   6
123.000002324581 |   6
180.000000983477 |   8
66.0000005029142 |   4
90.9999999701977 |   4
(32 rows)
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;see-also&#34;&gt;See also&lt;/h2&gt;
&lt;a href=&#34;../../../../../en/data-analysis/ml-predictive-analytics/data-preparation/normalizing-data/#&#34;&gt;Normalizing data&lt;/a&gt;

      </description>
    </item>
    
    <item>
      <title>Sql-Reference: ONE_HOT_ENCODER_FIT</title>
      <link>/en/sql-reference/functions/ml-functions/data-preparation/one-hot-encoder-fit/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/sql-reference/functions/ml-functions/data-preparation/one-hot-encoder-fit/</guid>
      <description>
        
        
        &lt;p&gt;Generates a sorted list of each of the category levels for each feature to be encoded, and stores the model.&lt;/p&gt;
&lt;p&gt;This is a meta-function. You must call meta-functions in a top-level &lt;a href=&#34;../../../../../en/sql-reference/statements/select/#&#34;&gt;SELECT&lt;/a&gt; statement.&lt;/p&gt;

&lt;h2 id=&#34;behavior-type&#34;&gt;Behavior type&lt;/h2&gt;
&lt;a class=&#34;glosslink&#34; href=&#34;../../../../../en/glossary/volatile-functions/&#34; title=&#34;&#34;&gt;Volatile&lt;/a&gt;
&lt;h2 id=&#34;syntax&#34;&gt;Syntax&lt;/h2&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;ONE_HOT_ENCODER_FIT ( &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;model-name&lt;/span&gt;&amp;#39;, &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;input-relation&lt;/span&gt;&amp;#39;,&amp;#39;&lt;span class=&#34;code-variable&#34;&gt;input-columns&lt;/span&gt;&amp;#39;
        [ USING PARAMETERS
              [exclude_columns = &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;excluded-columns&lt;/span&gt;&amp;#39;]
              [, output_view = &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;output-view&lt;/span&gt;&amp;#39;]
              [, extra_levels = &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;category-levels&lt;/span&gt;&amp;#39;] ] )
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;arguments&#34;&gt;Arguments&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;model-name&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;Identifies the model to create, where &lt;em&gt;&lt;code&gt;model-name&lt;/code&gt;&lt;/em&gt; conforms to conventions described in &lt;a href=&#34;../../../../../en/sql-reference/language-elements/identifiers/#&#34;&gt;Identifiers&lt;/a&gt;. It must also be unique among all names of sequences, tables, projections, views, and models within the same schema.&lt;/dd&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;input-relation&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;The table or view that contains the data for one hot encoding. If the input relation is defined in Hive, use 
&lt;code&gt;&lt;a href=&#34;../../../../../en/sql-reference/functions/hadoop-functions/sync-with-hcatalog-schema/#&#34;&gt;SYNC_WITH_HCATALOG_SCHEMA&lt;/a&gt;&lt;/code&gt; to sync the &lt;code&gt;hcatalog&lt;/code&gt; schema, and then run the machine learning function.&lt;/dd&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;input-columns&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;Comma-separated list of columns to use from the input relation, or asterisk (*) to select all columns. Input columns must be INTEGER, BOOLEAN, VARCHAR, or dates.&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;parameters&#34;&gt;Parameters&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;exclude_columns&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;Comma-separated list of column names from &lt;em&gt;&lt;code&gt;input-columns&lt;/code&gt;&lt;/em&gt; to exclude from processing.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;output_view&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;The name of the view that stores the input relation and the one hot encodings. Columns are returned in the order they appear in the input relation, with the one-hot encoded columns appended after the original columns.&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;extra_levels&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Additional levels in each category that are not in the input relation. This parameter should be passed as a string that conforms with the JSON standard, with category names as keys, and lists of extra levels in each category as values.&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;&lt;a name=&#34;ModelAttributes&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;model-attributes&#34;&gt;Model attributes&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;call_string&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;The value of all input arguments that were specified at the time the function was called.&lt;/dd&gt;
&lt;dt&gt;
&lt;code&gt;varchar_categories integer_categories boolean_categories date_categories&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Settings for all:
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;category_name&lt;/code&gt;: Column name&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;category_level&lt;/code&gt;: Levels of the category, sorted for each category&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;category_level_index&lt;/code&gt;: Index of this categorical level in the sorted list of levels for the category.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;privileges&#34;&gt;Privileges&lt;/h2&gt;
&lt;p&gt;Non-superusers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;CREATE privileges on the schema where the model is created&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;SELECT privileges on the input relation&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;CREATE privileges on the output view schema&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;examples&#34;&gt;Examples&lt;/h2&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT ONE_HOT_ENCODER_FIT (&amp;#39;one_hot_encoder_model&amp;#39;,&amp;#39;mtcars&amp;#39;,&amp;#39;*&amp;#39;
USING PARAMETERS exclude_columns=&amp;#39;mpg,disp,drat,wt,qsec,vs,am&amp;#39;);
ONE_HOT_ENCODER_FIT
--------------------
Success
(1 row)
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;see-also&#34;&gt;See also&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;../../../../../en/sql-reference/functions/ml-functions/transformation-functions/apply-one-hot-encoder/#&#34;&gt;APPLY_ONE_HOT_ENCODER&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;../../../../../en/data-analysis/ml-predictive-analytics/data-preparation/encoding-categorical-columns/#&#34;&gt;Encoding categorical columns&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Sql-Reference: PCA</title>
      <link>/en/sql-reference/functions/ml-functions/data-preparation/pca/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/sql-reference/functions/ml-functions/data-preparation/pca/</guid>
      <description>
        
        
        &lt;p&gt;Computes principal components from the input table/view. The results are saved in a PCA model. Internally, PCA finds the components by using SVD on the co-variance matrix built from the input date. The singular values of this decomposition are also saved as part of the PCA model. The signs of all elements of a principal component could be flipped all together on different runs.&lt;/p&gt;
&lt;p&gt;This is a meta-function. You must call meta-functions in a top-level &lt;a href=&#34;../../../../../en/sql-reference/statements/select/#&#34;&gt;SELECT&lt;/a&gt; statement.&lt;/p&gt;

&lt;h2 id=&#34;behavior-type&#34;&gt;Behavior type&lt;/h2&gt;
&lt;a class=&#34;glosslink&#34; href=&#34;../../../../../en/glossary/volatile-functions/&#34; title=&#34;&#34;&gt;Volatile&lt;/a&gt;
&lt;h2 id=&#34;syntax&#34;&gt;Syntax&lt;/h2&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PCA ( &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;model-name&lt;/span&gt;&amp;#39;, &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;input-relation&lt;/span&gt;&amp;#39;, &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;input-columns&lt;/span&gt;&amp;#39;
        [ USING PARAMETERS
              [exclude_columns = &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;excluded-columns&lt;/span&gt;&amp;#39;]
              [, num_components = &lt;span class=&#34;code-variable&#34;&gt;num-components&lt;/span&gt;]
              [, scale =&lt;span class=&#34;code-variable&#34;&gt; is-scaled&lt;/span&gt;]
              [, method = &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;method&lt;/span&gt;&amp;#39;] ] )
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;arguments&#34;&gt;Arguments&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;model-name&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;Identifies the model to create, where &lt;em&gt;&lt;code&gt;model-name&lt;/code&gt;&lt;/em&gt; conforms to conventions described in &lt;a href=&#34;../../../../../en/sql-reference/language-elements/identifiers/#&#34;&gt;Identifiers&lt;/a&gt;. It must also be unique among all names of sequences, tables, projections, views, and models within the same schema.&lt;/dd&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;input-relation&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;The table or view that contains the input data for PCA.&lt;/dd&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;input-columns&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;Comma-separated list of columns to use from the input relation, or asterisk (*) to select all columns. All input columns must be a &lt;a href=&#34;../../../../../en/sql-reference/data-types/numeric-data-types/&#34;&gt;numeric&lt;/a&gt; data type.&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;parameters&#34;&gt;Parameters&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;exclude_columns&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;Comma-separated list of column names from &lt;em&gt;&lt;code&gt;input-columns&lt;/code&gt;&lt;/em&gt; to exclude from processing.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;num_components&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;The number of components to keep in the model. If this value is not provided, all components are kept. The maximum number of components is the number of non-zero singular values returned by the internal call to SVD. This number is less than or equal to SVD (number of columns, number of rows).&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;scale&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;A Boolean value that specifies whether to standardize the columns during the preparation step:
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;True&lt;/code&gt;: Use a correlation matrix instead of a covariance matrix.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;False&lt;/code&gt; (default)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;method&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;The method used to calculate PCA, can be set to &lt;code&gt;LAPACK&lt;/code&gt;.&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;&lt;a name=&#34;ModelAttributes&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;model-attributes&#34;&gt;Model attributes&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;columns&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;The information about columns from the input relation used for creating the PCA model:
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;index&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;name&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;singular_values&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;The information about singular values found. They are sorted in descending order:
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;index&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;value&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;explained_variance : percentage of the variance in data that can be attributed to this singular value&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;accumulated_explained_variance : percentage of the variance in data that can be retained if we drop all singular values after this current one&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;principal_components&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;The principal components corresponding to the singular values mentioned above:
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;index: indies of the elements in each component&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;PC1&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;PC2&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;...&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;counters&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;The information collected during training the model, stored as name-value pairs:
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;counter_name&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;accepted_row_count: number of valid rows in the data&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;rejected_row_count: number of invalid rows (having NULL, INF or NaN) in the data&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;iteration_count: number of iterations, always 1 for the current implementation of PCA&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;counter_value&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;call_string&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;The function call that created the model.&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;privileges&#34;&gt;Privileges&lt;/h2&gt;
&lt;p&gt;Non-superusers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;CREATE privileges on the schema where the model is created&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;SELECT privileges on the input relation&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;examples&#34;&gt;Examples&lt;/h2&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;
=&amp;gt; SELECT PCA (&amp;#39;pcamodel&amp;#39;, &amp;#39;world&amp;#39;,&amp;#39;country,HDI,em1970,em1971,em1972,em1973,em1974,em1975,em1976,em1977,
em1978,em1979,em1980,em1981,em1982,em1983,em1984 ,em1985,em1986,em1987,em1988,em1989,em1990,em1991,em1992,
em1993,em1994,em1995,em1996,em1997,em1998,em1999,em2000,em2001,em2002,em2003,em2004,em2005,em2006,em2007,
em2008,em2009,em2010,gdp1970,gdp1971,gdp1972,gdp1973,gdp1974,gdp1975,gdp1976,gdp1977,gdp1978,gdp1979,gdp1980,
gdp1981,gdp1982,gdp1983,gdp1984,gdp1985,gdp1986,gdp1987,gdp1988,gdp1989,gdp1990,gdp1991,gdp1992,gdp1993,
gdp1994,gdp1995,gdp1996,gdp1997,gdp1998,gdp1999,gdp2000,gdp2001,gdp2002,gdp2003,gdp2004,gdp2005,gdp2006,
gdp2007,gdp2008,gdp2009,gdp2010&amp;#39; USING PARAMETERS exclude_columns=&amp;#39;HDI,country&amp;#39;);
PCA
---------------------------------------------------------------
Finished in 1 iterations.
Accepted Rows: 96  Rejected Rows: 0
(1 row)
=&amp;gt; CREATE TABLE worldPCA AS SELECT
APPLY_PCA (HDI,country,em1970,em1971,em1972,em1973,em1974,em1975,em1976,em1977,em1978,em1979,
em1980,em1981,em1982,em1983,em1984 ,em1985,em1986,em1987,em1988,em1989,em1990,em1991,em1992,em1993,em1994,
em1995,em1996,em1997,em1998,em1999,em2000,em2001,em2002,em2003,em2004,em2005,em2006,em2007,em2008,em2009,
em2010,gdp1970,gdp1971,gdp1972,gdp1973,gdp1974,gdp1975,gdp1976,gdp1977,gdp1978,gdp1979,gdp1980,gdp1981,gdp1982,
gdp1983,gdp1984,gdp1985,gdp1986,gdp1987,gdp1988,gdp1989,gdp1990,gdp1991,gdp1992,gdp1993,gdp1994,gdp1995,
gdp1996,gdp1997,gdp1998,gdp1999,gdp2000,gdp2001,gdp2002,gdp2003,gdp2004,gdp2005,gdp2006,gdp2007,gdp2008,
gdp2009,gdp2010 USING PARAMETERS model_name=&amp;#39;pcamodel&amp;#39;, exclude_columns=&amp;#39;HDI, country&amp;#39;, key_columns=&amp;#39;HDI,
country&amp;#39;,cutoff=.3)OVER () FROM world;
CREATE TABLE

=&amp;gt; SELECT * FROM worldPCA;
HDI   |       country       |       col1
------+---------------------+-------------------
0.886 | Belgium             |  79002.2946705704
0.699 | Belize              | -25631.6670012556
0.427 | Benin               | -40373.4104598122
0.805 | Chile               | -16805.7940082156
0.687 | China               | -37279.2893141103
0.744 | Costa Rica          | -19505.5631231635
0.4   | Cote d&amp;#39;Ivoire       | -38058.2060339272
0.776 | Cuba                | -23724.5779612041
0.895 | Denmark             |  117325.594028813
0.644 | Egypt               | -34609.9941604549
...
(96 rows)

=&amp;gt; SELECT APPLY_INVERSE_PCA (HDI, country, col1
    USING PARAMETERS model_name = &amp;#39;pcamodel&amp;#39;, exclude_columns=&amp;#39;HDI,country&amp;#39;,
    key_columns = &amp;#39;HDI, country&amp;#39;) OVER () FROM worldPCA;
HDI  |       country       |      em1970       |      em1971       |      em1972      |      em1973      |
      em1974      |      em1975       |      em1976|      em1977      |      em1978       |      em1979
   |      em1980       |      em1981      |      em1982       |      em1983       |      em1984       |em1985
|      em1986       |      em1987       |      em1988       |      em1989      |      em1990      |      em1991
|      em1992       |      em1993|      em1994      |      em1995       |      em1996       |      em1997
    |      em1998       |      em1999       |      em2000       |      em2001       |em2002       |
em2003      |      em2004       |      em2005      |      em2006       |      em2007       |      em2008
|      em2009      |      em2010       |     gdp1970      |     gdp1971      |     gdp1972      |     gdp1973
|     gdp1974      |     gdp1975      |     gdp1976      |     gdp1977      |gdp1978      |     gdp1979
 |     gdp1980      |     gdp1981      |     gdp1982      |     gdp1983      |     gdp1984      |     gdp1985
      |     gdp1986|    gdp1987      |     gdp1988      |     gdp1989      |     gdp1990      |     gdp1991
     |     gdp1992      |     gdp1993      |     gdp1994      |     gdp1995      |     gdp1996      |
gdp1997      |     gdp1998      |     gdp1999      |     gdp2000      |     gdp2001      |     gdp2002
|     gdp2003      |gdp2004      |     gdp2005      |     gdp2006      |     gdp2007      |     gdp2008
  |     gdp2009      |     gdp2010
-------+---------------------+-------------------+-------------------+------------------+------------------
+------------------+-------------------+------------------+------------------+-------------------+---------
----------+-------------------+------------------+-------------------+-------------------+-----------------
--+------------------+-------------------+-------------------+-------------------+------------------+-------
-----------+------------------+-------------------+-------------------+------------------+------------------
-+-------------------+------------------+-------------------+-------------------+-------------------+-------
------------+--------------------+------------------+-------------------+------------------+----------------
---+-------------------+-------------------+------------------+-------------------+------------------+------
------------+------------------+------------------+------------------+------------------+------------------+
------------------+------------------+------------------+------------------+------------------+-------------
-----+------------------+------------------+------------------+------------------+------------------+-------
-----------+------------------+------------------+------------------+------------------+------------------+-
-----------------+------------------+------------------+------------------+------------------+--------------
----+------------------+------------------+------------------+------------------+------------------+--------
----------+------------------+------------------+------------------+------------------+------------------
0.886 | Belgium             |  18585.6613572407 | -16145.6374560074 |  26938.956253415 | 8094.30475779595 |
 12073.5461203817 | -11069.0567600181 | 19133.8584911727|   5500.312894949 | -4227.94863799987 |  6265.77925410752
|  -10884.749295608 | 30929.4669575201 | -7831.49439429977 |  3235.81760508742 | -22765.9285442662 | 27200
.6767714485 | -10554.9550160917 |   1169.4144482273 | -16783.7961289161 | 27932.2660829329 | 17227.9083196848
| 13956.0524012749 | -40175.6286481088 | -10889.4785920499 | 22703.6576872859 | -14635.5832197402 |
2857.12270512168 | 20473.5044214494 | -52199.4895696423 | -11038.7346460738 |  18466.7298633088 | -17410.4225137703 |
-3475.63826305462 | 29305.6753822341 |   1242.5724942049 | 17491.0096310849 | -12609.9984515902 | -17909.3603476248
|  6276.58431412381 | 21851.9475485178 | -2614.33738160397 | 3777.74134131349 | 4522.08854282736 | 4251.90446379366
| 4512.15101396876 | 4265.49424538129 | 5190.06845330997 | 4543.80444817989 | 5639.81122679089 | 4420.44705213467
|  5658.8820279283 | 5172.69025294376 | 5019.63640408663 | 5938.84979495903 | 4976.57073629812 | 4710.49525137591
| 6523.65700286465 | 5067.82520773578 | 6789.13070219317 | 5525.94643553563 | 6894.68336419297 | 5961.58442474331
| 5661.21093840818 | 7721.56088518218 |  5959.7301109143 | 6453.43604137202 | 6739.39384033096 | 7517.97645468455
| 6907.49136910647 | 7049.03921764209 | 7726.49091035527 | 8552.65909911844 | 7963.94487647115 | 7187.45827585515
| 7994.02955410523 | 9532.89844418041 | 7962.25713582666 | 7846.68238907624 | 10230.9878908643 | 8642.76044946519
| 8886.79860331866 |  8718.3731386891
...
(96 rows)
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;see-also&#34;&gt;See also&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;../../../../../en/sql-reference/functions/ml-functions/transformation-functions/apply-inverse-pca/#&#34;&gt;APPLY_INVERSE_PCA&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;../../../../../en/sql-reference/functions/ml-functions/transformation-functions/apply-pca/#&#34;&gt;APPLY_PCA&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Sql-Reference: SUMMARIZE_CATCOL</title>
      <link>/en/sql-reference/functions/ml-functions/data-preparation/summarize-catcol/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/sql-reference/functions/ml-functions/data-preparation/summarize-catcol/</guid>
      <description>
        
        
        &lt;p&gt;Returns a statistical summary of categorical data input, in three columns:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;CATEGORY: Categorical levels, of the same SQL data type as the summarized column&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;COUNT: The number of category levels, of type INTEGER&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;PERCENT: Represents category percentage, of type FLOAT&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;syntax&#34;&gt;Syntax&lt;/h2&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SUMMARIZE_CATCOL (&lt;span class=&#34;code-variable&#34;&gt;target-column&lt;/span&gt;
        [ USING PARAMETERS TOPK = &lt;span class=&#34;code-variable&#34;&gt;topk-value&lt;/span&gt; [, WITH_TOTALCOUNT = &lt;span class=&#34;code-variable&#34;&gt;show-total&lt;/span&gt;] ] )
OVER()
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;arguments&#34;&gt;Arguments&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;target-column&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;The name of the input column to summarize, one of the following data types:
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;BOOLEAN&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;FLOAT&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;INTEGER&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;DATE&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;CHAR/VARCHAR&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;parameters&#34;&gt;Parameters&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;TOPK&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Integer, specifies how many of the most frequent rows to include in the output.&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;WITH_TOTALCOUNT&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;A Boolean value that specifies whether the table contains a heading row that displays the total number of rows displayed in the target column, and a percent equal to 100.
&lt;p&gt;&lt;strong&gt;Default:&lt;/strong&gt;&lt;code&gt;true&lt;/code&gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;examples&#34;&gt;Examples&lt;/h2&gt;
&lt;p&gt;This example shows the categorical summary for the &lt;code&gt;current_salary&lt;/code&gt; column in the &lt;code&gt;salary_data&lt;/code&gt; table. The output of the query shows the column category, count, and percent. The first column gives the categorical levels, with the same SQL data type as the input column, the second column gives a count of that value, and the third column gives a percentage.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT SUMMARIZE_CATCOL (current_salary USING PARAMETERS TOPK = 5) OVER() FROM salary_data;
CATEGORY | COUNT | PERCENT
---------+-------+---------
         |  1000 |     100
   39004 |     2 |     0.2
   35321 |     1 |     0.1
   36313 |     1 |     0.1
   36538 |     1 |     0.1
   36562 |     1 |     0.1
(6 rows)
&lt;/code&gt;&lt;/pre&gt;
      </description>
    </item>
    
    <item>
      <title>Sql-Reference: SUMMARIZE_NUMCOL</title>
      <link>/en/sql-reference/functions/ml-functions/data-preparation/summarize-numcol/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/sql-reference/functions/ml-functions/data-preparation/summarize-numcol/</guid>
      <description>
        
        
        &lt;p&gt;Returns a statistical summary of columns in an OpenText™ Analytics Database table:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Count&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Mean&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Standard deviation&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Min/max values&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Approximate percentile&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Median&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All summary values are FLOAT data types, except INTEGER for count.&lt;/p&gt;
&lt;h2 id=&#34;syntax&#34;&gt;Syntax&lt;/h2&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SUMMARIZE_NUMCOL (&lt;span class=&#34;code-variable&#34;&gt;input-columns&lt;/span&gt; [ USING PARAMETERS exclude_columns = &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;excluded-columns&lt;/span&gt;&amp;#39;] ) OVER()
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;arguments&#34;&gt;Arguments&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;input-columns&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;Comma-separated list of columns to use from the input relation, or asterisk (*) to select all columns. All columns must be a &lt;a href=&#34;../../../../../en/sql-reference/data-types/numeric-data-types/&#34;&gt;numeric&lt;/a&gt; data type. If you select all columns, &lt;code&gt;SUMMARIZE_NUMCOL&lt;/code&gt; normalizes all columns in the model&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;parameters&#34;&gt;Parameters&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;exclude_columns&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;Comma-separated list of column names from &lt;em&gt;&lt;code&gt;input-columns&lt;/code&gt;&lt;/em&gt; to exclude from processing.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;examples&#34;&gt;Examples&lt;/h2&gt;
&lt;p&gt;Show the statistical summary for the &lt;code&gt;age&lt;/code&gt; and &lt;code&gt;salary&lt;/code&gt; columns in the &lt;code&gt;employee&lt;/code&gt; table:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT SUMMARIZE_NUMCOL(* USING PARAMETERS exclude_columns=&amp;#39;id,name,gender,title&amp;#39;) OVER() FROM employee;
COLUMN         | COUNT |    MEAN    |      STDDEV      |  MIN    | PERC25  | MEDIAN  |  PERC75   |  MAX
---------------+-------+------------+------------------+---------+---------+---------+-----------+--------
age            |     5 |    63.4    | 19.3209730603818 |      44 |      45 |      67 |      71   |     90
salary         |     5 | 3456.76    | 1756.78754300285 | 1234.56 | 2345.67 | 3456.78 | 4567.89   | 5678.9
(2 rows)
&lt;/code&gt;&lt;/pre&gt;
      </description>
    </item>
    
    <item>
      <title>Sql-Reference: SVD</title>
      <link>/en/sql-reference/functions/ml-functions/data-preparation/svd/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/sql-reference/functions/ml-functions/data-preparation/svd/</guid>
      <description>
        
        
        &lt;p&gt;Computes singular values (the diagonal of the S matrix) and right singular vectors (the V matrix) of an SVD decomposition of the input relation. The results are saved as an SVD model. The signs of all elements of a singular vector in SVD could be flipped all together on different runs.&lt;/p&gt;
&lt;p&gt;This is a meta-function. You must call meta-functions in a top-level &lt;a href=&#34;../../../../../en/sql-reference/statements/select/#&#34;&gt;SELECT&lt;/a&gt; statement.&lt;/p&gt;

&lt;h2 id=&#34;behavior-type&#34;&gt;Behavior type&lt;/h2&gt;
&lt;a class=&#34;glosslink&#34; href=&#34;../../../../../en/glossary/volatile-functions/&#34; title=&#34;&#34;&gt;Volatile&lt;/a&gt;
&lt;h2 id=&#34;syntax&#34;&gt;Syntax&lt;/h2&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SVD ( &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;model-name&lt;/span&gt;&amp;#39;, &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;input-relation&lt;/span&gt;&amp;#39;, &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;input-columns&lt;/span&gt;&amp;#39;
     [ USING PARAMETERS
              [exclude_columns = &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;excluded-columns&lt;/span&gt;&amp;#39;]
              [, num_components = &lt;span class=&#34;code-variable&#34;&gt;num-components&lt;/span&gt;]
              [, method = &amp;#39;&lt;span class=&#34;code-variable&#34;&gt;method&lt;/span&gt;&amp;#39;] ] )
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;arguments&#34;&gt;Arguments&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;model-name&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;Identifies the model to create, where &lt;em&gt;&lt;code&gt;model-name&lt;/code&gt;&lt;/em&gt; conforms to conventions described in &lt;a href=&#34;../../../../../en/sql-reference/language-elements/identifiers/#&#34;&gt;Identifiers&lt;/a&gt;. It must also be unique among all names of sequences, tables, projections, views, and models within the same schema.&lt;/dd&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;input-relation&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;The table or view that contains the input data for SVD.&lt;/dd&gt;
&lt;dt&gt;&lt;em&gt;&lt;code&gt;input-columns&lt;/code&gt;&lt;/em&gt;&lt;/dt&gt;
&lt;dd&gt;Comma-separated list of columns to use from the input relation, or asterisk (*) to select all columns. Input columns must be a &lt;a href=&#34;../../../../../en/sql-reference/data-types/numeric-data-types/&#34;&gt;numeric&lt;/a&gt; data type.&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;parameters&#34;&gt;Parameters&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;exclude_columns&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;Comma-separated list of column names from &lt;em&gt;&lt;code&gt;input-columns&lt;/code&gt;&lt;/em&gt; to exclude from processing.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;num_components&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;The number of components to keep in the model. The maximum number of components is the number of non-zero singular values computed, which is less than or equal to min (number of columns, number of rows). If you omit this parameter, all components are kept.&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;method&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;The method used to calculate SVD, can be set to &lt;code&gt;LAPACK&lt;/code&gt;.&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;&lt;a name=&#34;ModelAttributes&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;model-attributes&#34;&gt;Model attributes&lt;/h2&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;columns&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;The information about columns from the input relation used for creating the SVD model:
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;index&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;name&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;singular_values&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;The information about singular values found. They are sorted in descending order:
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;index&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;value&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;explained_variance : percentage of the variance in data that can be attributed to this singular value&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;accumulated_explained_variance : percentage of the variance in data that can be retained if we drop all singular values after this current one&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;right_singular_vectors&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;The right singular vectors corresponding to the singular values mentioned above:
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;index: indices of the elements in each vector&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;vector1&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;vector2&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;...&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;counters&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;The information collected during training the model, stored as name-value pairs:
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;counter_name&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;accepted_row_count: number of valid rows in the data&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;rejected_row_count: number of invalid rows (having NULL, INF or NaN) in the data&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;iteration_count: number of iterations, always 1 for the current implementation of SVD&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;counter_value&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;call_string&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;The function call that created the model.&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&#34;privileges&#34;&gt;Privileges&lt;/h2&gt;
&lt;p&gt;Non-superusers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;CREATE privileges on the schema where the model is created&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;SELECT privileges on the input relation&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;examples&#34;&gt;Examples&lt;/h2&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT SVD (&amp;#39;svdmodel&amp;#39;, &amp;#39;small_svd&amp;#39;, &amp;#39;x1,x2,x3,x4&amp;#39;);
SVD
--------------------------------------------------------------
Finished in 1 iterations.
Accepted Rows: 8  Rejected Rows: 0
(1 row)

=&amp;gt; CREATE TABLE transform_svd AS SELECT
     APPLY_SVD (id, x1, x2, x3, x4 USING PARAMETERS model_name=&amp;#39;svdmodel&amp;#39;, exclude_columns=&amp;#39;id&amp;#39;, key_columns=&amp;#39;id&amp;#39;)
     OVER () FROM small_svd;
CREATE TABLE

=&amp;gt; SELECT * FROM transform_svd;
id  |       col1        |        col2         |        col3         |        col4
----+-------------------+---------------------+---------------------+--------------------
4   |  0.44849499240202 |  -0.347260956311326 |   0.186958376368345 |  0.378561270493651
6   |  0.17652411036246 | -0.0753183783382909 |  -0.678196192333598 | 0.0567124770173372
1   | 0.494871802886819 |   0.161721379259287 |  0.0712816417153664 | -0.473145877877408
2   |  0.17652411036246 | -0.0753183783382909 |  -0.678196192333598 | 0.0567124770173372
3   | 0.150974762654569 |   0.589561842046029 | 0.00392654610109522 |  0.360011163271921
5   | 0.494871802886819 |   0.161721379259287 |  0.0712816417153664 | -0.473145877877408
8   |  0.44849499240202 |  -0.347260956311326 |   0.186958376368345 |  0.378561270493651
7   | 0.150974762654569 |   0.589561842046029 | 0.00392654610109522 |  0.360011163271921
(8 rows)

=&amp;gt; SELECT APPLY_INVERSE_SVD (* USING PARAMETERS model_name=&amp;#39;svdmodel&amp;#39;, exclude_columns=&amp;#39;id&amp;#39;,
key_columns=&amp;#39;id&amp;#39;) OVER () FROM transform_svd;
id  |        x1        |        x2        |        x3        |        x4
----+------------------+------------------+------------------+------------------
4 | 91.4056627665577 | 44.7629617207482 | 83.1704961993117 | 38.9274292265543
6 | 20.6468626294368 | 9.30974906868751 | 8.71006863405534 |  6.5855928603967
7 | 31.2494347777156 | 20.6336519003026 | 27.5668287751507 | 5.84427645886865
1 |  107.93376580719 | 51.6980548011917 | 97.9665796560552 | 40.4918236881051
2 | 20.6468626294368 | 9.30974906868751 | 8.71006863405534 |  6.5855928603967
3 | 31.2494347777156 | 20.6336519003026 | 27.5668287751507 | 5.84427645886865
5 |  107.93376580719 | 51.6980548011917 | 97.9665796560552 | 40.4918236881051
8 | 91.4056627665577 | 44.7629617207482 | 83.1704961993117 | 38.9274292265543
(8 rows)
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;see-also&#34;&gt;See also&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;../../../../../en/sql-reference/functions/ml-functions/transformation-functions/apply-inverse-svd/#&#34;&gt;APPLY_INVERSE_SVD&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;../../../../../en/sql-reference/functions/ml-functions/transformation-functions/apply-svd/#&#34;&gt;APPLY_SVD&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
  </channel>
</rss>
