<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>OpenText Analytics Database 26.2.x – K-means</title>
    <link>/en/data-analysis/ml-predictive-analytics/clustering-algorithms/k-means/</link>
    <description>Recent content in K-means on OpenText Analytics Database 26.2.x</description>
    <generator>Hugo -- gohugo.io</generator>
    
	  <atom:link href="/en/data-analysis/ml-predictive-analytics/clustering-algorithms/k-means/index.xml" rel="self" type="application/rss+xml" />
    
    
      
        
      
    
    
    <item>
      <title>Data-Analysis: Clustering data using k-means</title>
      <link>/en/data-analysis/ml-predictive-analytics/clustering-algorithms/k-means/clustering-data-using-k-means/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/data-analysis/ml-predictive-analytics/clustering-algorithms/k-means/clustering-data-using-k-means/</guid>
      <description>
        
        
        &lt;p&gt;This k-means example uses two small data sets: &lt;code&gt;agar_dish_1&lt;/code&gt; and &lt;code&gt;agar_dish_2&lt;/code&gt;. Using the numeric data in the &lt;code&gt;agar_dish_1&lt;/code&gt; data set, you can cluster the data into &lt;em&gt;k&lt;/em&gt; clusters. Then, using the created k-means model, you can run &lt;a href=&#34;../../../../../en/sql-reference/functions/ml-functions/transformation-functions/apply-kmeans/#&#34;&gt;APPLY_KMEANS&lt;/a&gt; on &lt;code&gt;agar_dish_2&lt;/code&gt; and assign them to the clusters created in your original model.&lt;/p&gt;
Before you begin the example, &lt;a href=&#34;../../../../../en/data-analysis/ml-predictive-analytics/download-ml-example-data/&#34;&gt;load the Machine Learning sample data&lt;/a&gt;.
&lt;h2 id=&#34;clustering-training-data-into-k-clusters&#34;&gt;Clustering training data into &lt;em&gt;k&lt;/em&gt; clusters&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Create the k-means model, named agar_dish_kmeans using the &lt;code&gt;agar_dish_1&lt;/code&gt; table data.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT KMEANS(&amp;#39;agar_dish_kmeans&amp;#39;, &amp;#39;agar_dish_1&amp;#39;, &amp;#39;*&amp;#39;, 5
                  USING PARAMETERS exclude_columns =&amp;#39;id&amp;#39;, max_iterations=20, output_view=&amp;#39;agar_1_view&amp;#39;,
                  key_columns=&amp;#39;id&amp;#39;);
           KMEANS
---------------------------
 Finished in 7 iterations

(1 row)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The example creates a model named &lt;code&gt;agar_dish_kmeans&lt;/code&gt; and a view containing the results of the model named &lt;code&gt;agar_1_view&lt;/code&gt;. You might get different results when you run the clustering algorithm. This is because KMEANS randomly picks initial centers by default.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;View the output of &lt;code&gt;agar_1_view&lt;/code&gt;.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT * FROM agar_1_view;
 id  | cluster_id
-----+------------
   2 |          4
   5 |          4
   7 |          4
   9 |          4
  13 |          4
.
.
.
(375 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Because you specified the number of clusters as 5, verify that the function created five clusters. Count the number of data points within each cluster.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT cluster_id, COUNT(cluster_id) as Total_count
   FROM agar_1_view
   GROUP BY cluster_id;
 cluster_id | Total_count
------------+-------------
          0 |          76
          2 |          80
          1 |          74
          3 |          73
          4 |          72
(5 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;From the output, you can see that five clusters were created: &lt;code&gt;0&lt;/code&gt;, &lt;code&gt;1&lt;/code&gt;, &lt;code&gt;2&lt;/code&gt;, &lt;code&gt;3&lt;/code&gt;, and &lt;code&gt;4&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;You have now successfully clustered the data from &lt;code&gt;agar_dish_1.csv&lt;/code&gt; into five distinct clusters.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;summarizing-your-model&#34;&gt;Summarizing your model&lt;/h2&gt;
&lt;p&gt;View the summary output of agar_dish_means using the &lt;a href=&#34;../../../../../en/sql-reference/functions/ml-functions/model-management/get-model-summary/#&#34;&gt;GET_MODEL_SUMMARY&lt;/a&gt; function.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT GET_MODEL_SUMMARY(USING PARAMETERS model_name=&amp;#39;agar_dish_kmeans&amp;#39;);
----------------------------------------------------------------------------------
=======
centers
=======
x       |   y
--------+--------
0.49708 | 0.51116
-7.48119|-7.52577
-1.56238|-1.50561
-3.50616|-3.55703
-5.52057|-5.49197

=======
metrics
=======
Evaluation metrics:
  Total Sum of Squares: 6008.4619
  Within-Cluster Sum of Squares:
      Cluster 0: 12.083548
      Cluster 1: 12.389038
      Cluster 2: 12.639238
      Cluster 3: 11.210146
      Cluster 4: 12.994356
  Total Within-Cluster Sum of Squares: 61.316326
  Between-Cluster Sum of Squares: 5947.1456
  Between-Cluster SS / Total SS: 98.98%
Number of iterations performed: 2
Converged: True
Call:
kmeans(&amp;#39;public.agar_dish_kmeans&amp;#39;, &amp;#39;agar_dish_1&amp;#39;, &amp;#39;*&amp;#39;, 5
USING PARAMETERS exclude_columns=&amp;#39;id&amp;#39;, max_iterations=20, epsilon=0.0001, init_method=&amp;#39;kmeanspp&amp;#39;,
distance_method=&amp;#39;euclidean&amp;#39;, output_view=&amp;#39;agar_view_1&amp;#39;, key_columns=&amp;#39;id&amp;#39;)
(1 row)
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;clustering-data-using-a-k-means-model&#34;&gt;Clustering data using a k-means model&lt;/h2&gt;
&lt;p&gt;Using &lt;code&gt;agar_dish_kmeans&lt;/code&gt;, the k-means model you just created, you can assign the points in &lt;code&gt;agar_dish_2&lt;/code&gt; to cluster centers.&lt;/p&gt;
&lt;p&gt;Create a table named &lt;code&gt;kmeans_results&lt;/code&gt;, using the &lt;code&gt;agar_dish_2&lt;/code&gt; table as your input table and the &lt;code&gt;agar_dish_kmeans&lt;/code&gt; model for your initial cluster centers.&lt;/p&gt;
&lt;p&gt;Add only the relevant feature columns to the arguments in the &lt;code&gt;APPLY_KMEANS&lt;/code&gt; function.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; CREATE TABLE kmeans_results AS
        (SELECT id,
                APPLY_KMEANS(x, y
                             USING PARAMETERS
                                              model_name=&amp;#39;agar_dish_kmeans&amp;#39;) AS cluster_id
         FROM agar_dish_2);
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The &lt;code&gt;kmeans_results&lt;/code&gt; table shows that the &lt;code&gt;agar_dish_kmeans&lt;/code&gt; model correctly clustered the &lt;code&gt;agar_dish_2&lt;/code&gt; data.&lt;/p&gt;
&lt;h2 id=&#34;see-also&#34;&gt;See also&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;../../../../../en/sql-reference/functions/ml-functions/transformation-functions/apply-kmeans/#&#34;&gt;APPLY_KMEANS&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;../../../../../en/sql-reference/functions/ml-functions/ml-algorithms/kmeans/#&#34;&gt;KMEANS&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;../../../../../en/sql-reference/functions/ml-functions/model-management/get-model-summary/#&#34;&gt;GET_MODEL_SUMMARY&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
  </channel>
</rss>
