<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>OpenText Analytics Database 26.2.x – DISTINCT in a SELECT query list</title>
    <link>/en/data-analysis/query-optimization/distinct-select-query-list/</link>
    <description>Recent content in DISTINCT in a SELECT query list on OpenText Analytics Database 26.2.x</description>
    <generator>Hugo -- gohugo.io</generator>
    
	  <atom:link href="/en/data-analysis/query-optimization/distinct-select-query-list/index.xml" rel="self" type="application/rss+xml" />
    
    
      
        
      
    
    
    <item>
      <title>Data-Analysis: Query has no aggregates in SELECT list</title>
      <link>/en/data-analysis/query-optimization/distinct-select-query-list/query-has-no-aggregates-select-list/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/data-analysis/query-optimization/distinct-select-query-list/query-has-no-aggregates-select-list/</guid>
      <description>
        
        
        &lt;p&gt;If your query has no aggregates in the &lt;code&gt;SELECT&lt;/code&gt; list, internally, the database treats the query as if it uses &lt;code&gt;GROUP BY&lt;/code&gt; instead.&lt;/p&gt;
&lt;p&gt;For example, you can rewrite the following query:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT DISTINCT&lt;span class=&#34;code-input&#34;&gt;&lt;/span&gt; a, b, c FROM table1;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;as:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT a, b, c FROM table1 GROUP BY a, b, c;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;For fastest execution, apply the optimization techniques for &lt;code&gt;GROUP BY&lt;/code&gt; queries described in &lt;a href=&#34;../../../../en/data-analysis/query-optimization/group-by-queries/#&#34;&gt;GROUP BY queries&lt;/a&gt;.
&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Data-Analysis: COUNT (DISTINCT) and other DISTINCT aggregates</title>
      <link>/en/data-analysis/query-optimization/distinct-select-query-list/count-distinct-and-other-distinct-aggregates/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/data-analysis/query-optimization/distinct-select-query-list/count-distinct-and-other-distinct-aggregates/</guid>
      <description>
        
        
        &lt;p&gt;Computing a &lt;span class=&#34;sql&#34;&gt;DISTINCT&lt;/span&gt; aggregate generally requires more work than other aggregates. Also, a query that uses a single &lt;span class=&#34;sql&#34;&gt;DISTINCT&lt;/span&gt; aggregate consumes fewer resources than a query with multiple &lt;span class=&#34;sql&#34;&gt;DISTINCT&lt;/span&gt; aggregates.

&lt;div class=&#34;alert admonition tip&#34; role=&#34;alert&#34;&gt;
&lt;h4 class=&#34;admonition-head&#34;&gt;Tip&lt;/h4&gt;

The database executes queries with multiple distinct aggregates more efficiently when all distinct aggregate columns have a similar number of distinct values.

&lt;/div&gt;&lt;/p&gt;
&lt;h2 id=&#34;examples&#34;&gt;Examples&lt;/h2&gt;
&lt;p&gt;The following query returns the number of distinct values in a column:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT COUNT (DISTINCT date_key) FROM date_dimension;

 COUNT
-------
  1826
(1 row)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This example returns the number of distinct return values from an expression:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT COUNT (DISTINCT date_key + product_key) FROM inventory_fact;

 COUNT
-------
 21560
(1 row)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;You can create an equivalent query using the &lt;span class=&#34;sql&#34;&gt;LIMIT&lt;/span&gt; keyword to restrict the number of rows returned:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT COUNT(date_key + product_key) FROM inventory_fact GROUP BY date_key LIMIT 10;

 COUNT
-------
   173
    31
   321
   113
   286
    84
   244
   238
   145
   202
(10 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The following query uses &lt;span class=&#34;sql&#34;&gt;GROUP BY&lt;/span&gt; to count distinct values within groups:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT product_key, COUNT (DISTINCT date_key) FROM inventory_fact
   GROUP BY product_key LIMIT 10;

 product_key | count
-------------+-------
           1 |    12
           2 |    18
           3 |    13
           4 |    17
           5 |    11
           6 |    14
           7 |    13
           8 |    17
           9 |    15
          10 |    12
(10 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The following query returns the number of distinct products and the total inventory within each date key:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT date_key, COUNT (DISTINCT product_key), SUM(qty_in_stock) FROM inventory_fact
   GROUP BY date_key LIMIT 10;

 date_key | count |  sum
----------+-------+--------
        1 |   173 |  88953
        2 |    31 |  16315
        3 |   318 | 156003
        4 |   113 |  53341
        5 |   285 | 148380
        6 |    84 |  42421
        7 |   241 | 119315
        8 |   238 | 122380
        9 |   142 |  70151
       10 |   202 |  95274
(10 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This query selects each distinct &lt;code&gt;product_key&lt;/code&gt; value and then counts the number of distinct &lt;code&gt;date_key&lt;/code&gt; values for all records with the specific &lt;code&gt;product_key&lt;/code&gt; value. It also counts the number of distinct &lt;code&gt;warehouse_key&lt;/code&gt; values in all records with the specific &lt;code&gt;product_key&lt;/code&gt; value:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT product_key, COUNT (DISTINCT date_key), COUNT (DISTINCT warehouse_key) FROM inventory_fact
   GROUP BY product_key LIMIT 15;

 product_key | count | count
-------------+-------+-------
           1 |    12 |    12
           2 |    18 |    18
           3 |    13 |    12
           4 |    17 |    18
           5 |    11 |     9
           6 |    14 |    13
           7 |    13 |    13
           8 |    17 |    15
           9 |    15 |    14
          10 |    12 |    12
          11 |    11 |    11
          12 |    13 |    12
          13 |     9 |     7
          14 |    13 |    13
          15 |    18 |    17
(15 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This query selects each distinct &lt;code&gt;product_key&lt;/code&gt; value, counts the number of distinct &lt;code&gt;date_key&lt;/code&gt; and &lt;code&gt;warehouse_key&lt;/code&gt; values for all records with the specific &lt;code&gt;product_key&lt;/code&gt; value, and then sums all &lt;code&gt;qty_in_stock&lt;/code&gt; values in records with the specific &lt;code&gt;product_key&lt;/code&gt; value. It then returns the number of &lt;code&gt;product_version&lt;/code&gt; values in records with the specific &lt;code&gt;product_key&lt;/code&gt; value:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT product_key, COUNT (DISTINCT date_key),
      COUNT (DISTINCT warehouse_key),
      SUM (qty_in_stock),
      COUNT (product_version)
      FROM inventory_fact GROUP BY product_key LIMIT 15;

 product_key | count | count |  sum  | count
-------------+-------+-------+-------+-------
           1 |    12 |    12 |  5530 |    12
           2 |    18 |    18 |  9605 |    18
           3 |    13 |    12 |  8404 |    13
           4 |    17 |    18 | 10006 |    18
           5 |    11 |     9 |  4794 |    11
           6 |    14 |    13 |  7359 |    14
           7 |    13 |    13 |  7828 |    13
           8 |    17 |    15 |  9074 |    17
           9 |    15 |    14 |  7032 |    15
          10 |    12 |    12 |  5359 |    12
          11 |    11 |    11 |  6049 |    11
          12 |    13 |    12 |  6075 |    13
          13 |     9 |     7 |  3470 |     9
          14 |    13 |    13 |  5125 |    13
          15 |    18 |    17 |  9277 |    18
(15 rows)
&lt;/code&gt;&lt;/pre&gt;

      </description>
    </item>
    
    <item>
      <title>Data-Analysis: Approximate count distinct functions</title>
      <link>/en/data-analysis/query-optimization/distinct-select-query-list/approximate-count-distinct-functions/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/data-analysis/query-optimization/distinct-select-query-list/approximate-count-distinct-functions/</guid>
      <description>
        
        
        &lt;p&gt;The aggregate function &lt;a href=&#34;../../../../en/sql-reference/functions/aggregate-functions/count-aggregate/&#34;&gt;COUNT(DISTINCT)&lt;/a&gt; computes the exact number of distinct values in a data set. COUNT(DISTINCT) performs well when it executes with the &lt;a href=&#34;../../../../en/data-analysis/query-optimization/group-by-queries/group-by-implementation-options/&#34;&gt;GROUPBY PIPELINED&lt;/a&gt; algorithm.&lt;/p&gt;
&lt;p&gt;An aggregate &lt;a href=&#34;../../../../en/sql-reference/functions/aggregate-functions/count-aggregate/&#34;&gt;COUNT&lt;/a&gt; operation performs well on a data set when the following conditions are true:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;One of the target table&#39;s projections has an ORDER BY clause that facilitates sorted aggregation.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The number of distinct values is fairly small.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Hashed aggregation is required to execute the query.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Alternatively, consider using the &lt;a href=&#34;../../../../en/sql-reference/functions/aggregate-functions/approximate-count-distinct/#&#34;&gt;APPROXIMATE_COUNT_DISTINCT&lt;/a&gt; function instead of COUNT(DISTINCT) when the following conditions are true:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;You have a large data set and you do not require an exact count of distinct values.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The performance of COUNT(DISTINCT) on a given data set is insufficient.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You calculate several distinct counts in the same query.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The plan for COUNT(DISTINCT) uses hashed aggregation.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The expected value that APPROXIMATE_COUNT_DISTINCT returns is equal to COUNT(DISTINCT), with an error that is lognormally distributed with standard deviation &lt;em&gt;s&lt;/em&gt;. You can control the standard deviation by setting the function&#39;s optional error tolerance argument—by default, 1.25 percent.&lt;/p&gt;
&lt;h2 id=&#34;other-approximate_count_distinct-functions&#34;&gt;Other APPROXIMATE_COUNT_DISTINCT functions&lt;/h2&gt;
&lt;p&gt;OpenText™ Analytics Database supports two other functions that you can use together, instead of APPROXIMATE_COUNT_DISTINCT: &lt;a href=&#34;../../../../en/sql-reference/functions/aggregate-functions/approximate-count-distinct-synopsis/#&#34;&gt;APPROXIMATE_COUNT_DISTINCT_SYNOPSIS&lt;/a&gt; and &lt;a href=&#34;../../../../en/sql-reference/functions/aggregate-functions/approximate-count-distinct-of-synopsis/#&#34;&gt;APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS&lt;/a&gt;. Use these functions when the following conditions are true:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;You have a large data set and you don&#39;t require an exact count of distinct values.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The performance of COUNT(DISTINCT) on a given data set is insufficient.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You want to pre-compute the distinct counts and later combine them in different ways.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Use the two functions together as follows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Pass APPROXIMATE_COUNT_DISTINCT_SYNOPSIS the data set and a normally distributed confidence interval. The function returns a subset of the data, as a binary &lt;em&gt;synopsis&lt;/em&gt; object*.*&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Pass the synopsis to the APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS function, which then performs an approximate count distinct on the synopsis.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;You also use &lt;a href=&#34;../../../../en/sql-reference/functions/aggregate-functions/approximate-count-distinct-synopsis-merge/#&#34;&gt;APPROXIMATE_COUNT_DISTINCT_SYNOPSIS_MERGE&lt;/a&gt;, which merges multiple synopses into one synopsis. With this function, you can continually update a &amp;quot;master&amp;quot; synopsis by merging in one or more synopses that cover more recent, shorter periods of time.&lt;/p&gt;
&lt;h2 id=&#34;example&#34;&gt;Example&lt;/h2&gt;
&lt;p&gt;The following example shows how to use APPROXIMATE_COUNT_DISTINCT functions to keep an approximate running count of users who click on a given web page within a given time span.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Create the &lt;code&gt;pviews&lt;/code&gt; table to store data about website visits—time of visit, web page visited, and visitor:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;
=&amp;gt; CREATE TABLE pviews(
visit_time TIMESTAMP NOT NULL,
page_id INTEGER NOT NULL,
user_id INTEGER NOT NULL)
ORDER BY page_id, visit_time
SEGMENTED BY HASH(user_id) ALL NODES KSAFE
PARTITION BY visit_time::DATE GROUP BY CALENDAR_HIERARCHY_DAY(visit_time::DATE, 2, 2);
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;code&gt;pviews&lt;/code&gt; is segmented by hashing &lt;code&gt;user_id&lt;/code&gt; data, so all visits by a given user are stored on the same segment, on the same node. This prevents inefficient cross-node transfer of data, when later we do a COUNT (DISTINCT user_id).&lt;/p&gt;
&lt;p&gt;The table also uses &lt;a href=&#34;../../../../en/admin/partitioning-tables/hierarchical-partitioning/&#34;&gt;hierarchical partitioning&lt;/a&gt; on time of visit to optimize the ROS storage. Doing so improves performance when filtering data by time.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Load data				into &lt;code&gt;pviews&lt;/code&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; INSERT INTO pviews VALUES
     (&amp;#39;2022-02-01 10:00:02&amp;#39;,1002,1),
     (&amp;#39;2022-02-01 10:00:03&amp;#39;,1002,2),
     (&amp;#39;2022-02-01 10:00:04&amp;#39;,1002,1),
     (&amp;#39;2022-02-01 10:00:05&amp;#39;,1002,3),
     (&amp;#39;2022-02-01 10:00:01&amp;#39;,1000,1),
     (&amp;#39;2022-02-01 10:00:06&amp;#39;,1002,1),
     (&amp;#39;2022-02-01 10:00:07&amp;#39;,1002,3),
     (&amp;#39;2022-02-01 10:00:08&amp;#39;,1002,1),
     (&amp;#39;2022-02-01 10:00:09&amp;#39;,1002,3),
     (&amp;#39;2022-02-01 10:00:12&amp;#39;,1002,2),
     (&amp;#39;2022-02-02 10:00:01&amp;#39;,1000,1),
     (&amp;#39;2022-02-02 10:00:02&amp;#39;,1002,4),
     (&amp;#39;2022-02-02 10:00:03&amp;#39;,1002,2),
     (&amp;#39;2022-02-02 10:00:04&amp;#39;,1002,1),
     (&amp;#39;2022-02-02 10:00:05&amp;#39;,1002,3),
     (&amp;#39;2022-02-02 10:00:06&amp;#39;,1002,4),
     (&amp;#39;2022-02-02 10:00:07&amp;#39;,1002,3),
     (&amp;#39;2022-02-02 10:00:08&amp;#39;,1002,4),
     (&amp;#39;2022-02-02 10:00:09&amp;#39;,1002,3),
     (&amp;#39;2022-02-02 10:00:12&amp;#39;,1002,2),
     (&amp;#39;2022-03-02 10:00:01&amp;#39;,1000,1),
     (&amp;#39;2022-03-02 10:00:02&amp;#39;,1002,1),
     (&amp;#39;2022-03-02 10:00:03&amp;#39;,1002,2),
     (&amp;#39;2022-03-02 10:00:04&amp;#39;,1002,1),
     (&amp;#39;2022-03-02 10:00:05&amp;#39;,1002,3),
     (&amp;#39;2022-03-02 10:00:06&amp;#39;,1002,4),
     (&amp;#39;2022-03-02 10:00:07&amp;#39;,1002,3),
     (&amp;#39;2022-03-02 10:00:08&amp;#39;,1002,6),
     (&amp;#39;2022-03-02 10:00:09&amp;#39;,1002,5),
     (&amp;#39;2022-03-02 10:00:12&amp;#39;,1002,2),
     (&amp;#39;2022-03-02 11:00:01&amp;#39;,1000,5),
     (&amp;#39;2022-03-02 11:00:02&amp;#39;,1002,6),
     (&amp;#39;2022-03-02 11:00:03&amp;#39;,1002,7),
     (&amp;#39;2022-03-02 11:00:04&amp;#39;,1002,4),
     (&amp;#39;2022-03-02 11:00:05&amp;#39;,1002,1),
     (&amp;#39;2022-03-02 11:00:06&amp;#39;,1002,6),
     (&amp;#39;2022-03-02 11:00:07&amp;#39;,1002,8),
     (&amp;#39;2022-03-02 11:00:08&amp;#39;,1002,6),
     (&amp;#39;2022-03-02 11:00:09&amp;#39;,1002,7),
     (&amp;#39;2022-03-02 11:00:12&amp;#39;,1002,1),
     (&amp;#39;2022-03-03 10:00:01&amp;#39;,1000,1),
     (&amp;#39;2022-03-03 10:00:02&amp;#39;,1002,2),
     (&amp;#39;2022-03-03 10:00:03&amp;#39;,1002,4),
     (&amp;#39;2022-03-03 10:00:04&amp;#39;,1002,1),
     (&amp;#39;2022-03-03 10:00:05&amp;#39;,1002,2),
     (&amp;#39;2022-03-03 10:00:06&amp;#39;,1002,6),
     (&amp;#39;2022-03-03 10:00:07&amp;#39;,1002,9),
     (&amp;#39;2022-03-03 10:00:08&amp;#39;,1002,10),
     (&amp;#39;2022-03-03 10:00:09&amp;#39;,1002,7),
     (&amp;#39;2022-03-03 10:00:12&amp;#39;,1002,1);
 OUTPUT
--------
     50
(1 row)

=&amp;gt; COMMIT;
COMMIT
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create the &lt;code&gt;pview_summary&lt;/code&gt; table by querying &lt;code&gt;pviews&lt;/code&gt; with &lt;a href=&#34;../../../../en/admin/working-with-native-tables/creating-table-from-other-tables/creating-table-from-query/&#34;&gt;CREATE TABLE...AS SELECT&lt;/a&gt;. Each row of this table summarizes data selected from &lt;code&gt;pviews&lt;/code&gt; for a given date:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;partial_visit_count&lt;/code&gt; stores the number of rows (website visits) in &lt;code&gt;pviews&lt;/code&gt; with that date.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;daily_users_acdp&lt;/code&gt; uses APPROXIMATE_COUNT_DISTINCT_SYNOPSIS to construct a synopsis that approximates the number of distinct users (&lt;code&gt;user_id&lt;/code&gt;) who visited that website on that date.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;
=&amp;gt; CREATE TABLE pview_summary AS SELECT
      visit_time::DATE &amp;#34;date&amp;#34;,
      COUNT(*) partial_visit_count,
      APPROXIMATE_COUNT_DISTINCT_SYNOPSIS(user_id) AS daily_users_acdp
   FROM pviews GROUP BY 1;
CREATE TABLE
=&amp;gt; ALTER TABLE pview_summary ALTER COLUMN &amp;#34;date&amp;#34; SET NOT NULL;
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Update the &lt;code&gt;pview_summary&lt;/code&gt; table so it is partitioned like &lt;code&gt;pviews&lt;/code&gt;. The REORGANIZE keyword forces immediate repartitioning of the table data:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; ALTER TABLE pview_summary
     PARTITION BY &amp;#34;date&amp;#34;
     GROUP BY CALENDAR_HIERARCHY_DAY(&amp;#34;date&amp;#34;, 2, 2) REORGANIZE;
vsql:/home/ale/acd_ex4.sql:93: NOTICE 8364:  The new partitioning scheme will produce partitions in 2 physical storage containers per projection
vsql:/home/ale/acd_ex4.sql:93: NOTICE 4785:  Started background repartition table task
ALTER TABLE
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Use &lt;a href=&#34;../../../../en/admin/working-with-native-tables/creating-table-from-other-tables/replicating-table/&#34;&gt;CREATE TABLE..LIKE&lt;/a&gt; to create two ETL tables, &lt;code&gt;pviews_etl&lt;/code&gt; and &lt;code&gt;pview_summary_etl&lt;/code&gt; with the same DDL as &lt;code&gt;pviews&lt;/code&gt; and &lt;code&gt;pview_summary&lt;/code&gt;, respectively. These tables serve to process incoming data:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; CREATE TABLE pviews_etl LIKE pviews INCLUDING PROJECTIONS;
CREATE TABLE
=&amp;gt; CREATE TABLE pview_summary_etl LIKE pview_summary INCLUDING PROJECTIONS;
CREATE TABLE
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Load new data into &lt;code&gt;pviews_etl&lt;/code&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; INSERT INTO pviews_etl VALUES
     (&amp;#39;2022-03-03 11:00:01&amp;#39;,1000,8),
     (&amp;#39;2022-03-03 11:00:02&amp;#39;,1002,9),
     (&amp;#39;2022-03-03 11:00:03&amp;#39;,1002,1),
     (&amp;#39;2022-03-03 11:00:04&amp;#39;,1002,11),
     (&amp;#39;2022-03-03 11:00:05&amp;#39;,1002,10),
     (&amp;#39;2022-03-03 11:00:06&amp;#39;,1002,12),
     (&amp;#39;2022-03-03 11:00:07&amp;#39;,1002,3),
     (&amp;#39;2022-03-03 11:00:08&amp;#39;,1002,10),
     (&amp;#39;2022-03-03 11:00:09&amp;#39;,1002,1),
     (&amp;#39;2022-03-03 11:00:12&amp;#39;,1002,1);
 OUTPUT
--------
     10
(1 row)

=&amp;gt; COMMIT;
COMMIT
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Summarize the new data in &lt;code&gt;pview_summary_etl&lt;/code&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; INSERT INTO pview_summary_etl SELECT
      visit_time::DATE visit_date,
      COUNT(*) partial_visit_count,
      APPROXIMATE_COUNT_DISTINCT_SYNOPSIS(user_id) AS daily_users_acdp
    FROM pviews_etl GROUP BY visit_date;
 OUTPUT
--------
      1
(1 row)
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Append the &lt;code&gt;pviews_etl&lt;/code&gt; data to &lt;code&gt;pviews&lt;/code&gt; with &lt;a href=&#34;../../../../en/sql-reference/functions/management-functions/partition-functions/copy-partitions-to-table/#&#34;&gt;COPY_PARTITIONS_TO_TABLE&lt;/a&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT COPY_PARTITIONS_TO_TABLE(&amp;#39;pviews_etl&amp;#39;, &amp;#39;01-01-0000&amp;#39;::DATE, &amp;#39;01-01-9999&amp;#39;::DATE, &amp;#39;pviews&amp;#39;);
              COPY_PARTITIONS_TO_TABLE
----------------------------------------------------
 1 distinct partition values copied at epoch 1403.

(1 row)

=&amp;gt; SELECT COPY_PARTITIONS_TO_TABLE(&amp;#39;pview_summary_etl&amp;#39;, &amp;#39;01-01-0000&amp;#39;::DATE, &amp;#39;01-01-9999&amp;#39;::DATE, &amp;#39;pview_summary&amp;#39;);
              COPY_PARTITIONS_TO_TABLE
----------------------------------------------------
 1 distinct partition values copied at epoch 1404.

(1 row)
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create views and distinct (approximate) views by day for all data, including the partition that was just copied from &lt;code&gt;pviews_etl&lt;/code&gt; :&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT
     &amp;#34;date&amp;#34; visit_date,
     SUM(partial_visit_count) visit_count,
     APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS(daily_users_acdp) AS daily_users_acd
   FROM pview_summary GROUP BY visit_date ORDER BY visit_date;
 visit_date | visit_count | daily_users_acd
------------+-------------+-----------------
 2022-02-01 |          10 |               3
 2022-02-02 |          10 |               4
 2022-03-02 |          20 |               8
 2022-03-03 |          20 |              11
(4 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create views and distinct (approximate) views by month:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT
     DATE_TRUNC(&amp;#39;MONTH&amp;#39;, &amp;#34;date&amp;#34;)::DATE &amp;#34;month&amp;#34;,
     SUM(partial_visit_count) visit_count,
     APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS(daily_users_acdp) AS monthly_users_acd
   FROM pview_summary GROUP BY month ORDER BY month;
   month    | visit_count | monthly_users_acd
------------+-------------+-------------------
 2022-02-01 |          20 |                 4
 2022-03-01 |          40 |                12
(2 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Merge daily synopses into monthly synopses:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; CREATE TABLE pview_monthly_summary AS SELECT
     DATE_TRUNC(&amp;#39;MONTH&amp;#39;, &amp;#34;date&amp;#34;)::DATE &amp;#34;month&amp;#34;,
     SUM(partial_visit_count) partial_visit_count,
     APPROXIMATE_COUNT_DISTINCT_SYNOPSIS_MERGE(daily_users_acdp) AS monthly_users_acdp
   FROM pview_summary GROUP BY month ORDER BY month;
CREATE TABLE
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create views and distinct views by month, generated from the merged synopses:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT
     month,
     SUM(partial_visit_count) monthly_visit_count,
     APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS(monthly_users_acdp) AS monthly_users_acd
   FROM pview_monthly_summary GROUP BY month ORDER BY month;
   month    | monthly_visit_count | monthly_users_acd
------------+---------------------+-------------------
 2019-02-01 |                  20 |                 4
 2019-03-01 |                  40 |                12
(2 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You can use the monthly summary to produce a yearly summary. This approach is likely to be faster than using a daily summary if a lot of data needs to be processed:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT
     DATE_TRUNC(&amp;#39;YEAR&amp;#39;, &amp;#34;month&amp;#34;)::DATE &amp;#34;year&amp;#34;,
     SUM(partial_visit_count) yearly_visit_count,
     APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS(monthly_users_acdp) AS yearly_users_acd
   FROM pview_monthly_summary GROUP BY year ORDER BY year;
    year    | yearly_visit_count | yearly_users_acd
------------+--------------------+------------------
 2022-01-01 |                 60 |               12
(1 row)
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Drop the ETL tables:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; DROP TABLE IF EXISTS pviews_etl, pview_summary_etl;
DROP TABLE
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;see-also&#34;&gt;See also&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;../../../../en/sql-reference/functions/aggregate-functions/approximate-count-distinct-synopsis/&#34;&gt;&lt;a href=&#34;../../../../en/sql-reference/functions/aggregate-functions/approximate-count-distinct/#&#34;&gt;APPROXIMATE_COUNT_DISTINCT&lt;/a&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;../../../../en/sql-reference/functions/aggregate-functions/approximate-count-distinct-synopsis/#&#34;&gt;APPROXIMATE_COUNT_DISTINCT_SYNOPSIS&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;../../../../en/sql-reference/functions/aggregate-functions/approximate-count-distinct-of-synopsis/#&#34;&gt;APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;../../../../en/sql-reference/functions/aggregate-functions/approximate-count-distinct-synopsis-merge/#&#34;&gt;APPROXIMATE_COUNT_DISTINCT_SYNOPSIS_MERGE&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;../../../../en/sql-reference/functions/aggregate-functions/count-aggregate/#&#34;&gt;COUNT [aggregate]&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Data-Analysis: Single DISTINCT aggregates</title>
      <link>/en/data-analysis/query-optimization/distinct-select-query-list/single-distinct-aggregates/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/data-analysis/query-optimization/distinct-select-query-list/single-distinct-aggregates/</guid>
      <description>
        
        
        &lt;p&gt;The database computes a &lt;code&gt;DISTINCT&lt;/code&gt; aggregate by first removing all duplicate values of the aggregate&#39;s argument to find the distinct values. Then it computes the aggregate.&lt;/p&gt;
&lt;p&gt;For example, you can rewrite the following query:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT a, b, COUNT(DISTINCT c) AS dcnt FROM table1 GROUP BY a, b;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;as:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT a, b, COUNT(dcnt) FROM
  (SELECT a, b, c AS dcnt FROM table1 GROUP BY a, b, c)
GROUP BY a, b;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;For fastest execution, apply the optimization techniques for GROUP BY queries.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Data-Analysis: Multiple DISTINCT aggregates</title>
      <link>/en/data-analysis/query-optimization/distinct-select-query-list/multiple-distinct-aggregates/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/data-analysis/query-optimization/distinct-select-query-list/multiple-distinct-aggregates/</guid>
      <description>
        
        
        &lt;p&gt;If your query has multiple &lt;code&gt;DISTINCT&lt;/code&gt; aggregates, there is no straightforward SQL rewrite that can compute them. The following query cannot easily be rewritten for improved performance:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT a, COUNT(DISTINCT b), COUNT(DISTINCT c) AS dcnt FROM table1 GROUP BY a;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;For a query with multiple &lt;code&gt;DISTINCT&lt;/code&gt; aggregates, there is no projection design that can avoid using &lt;code&gt;GROUPBY HASH&lt;/code&gt; and resegmenting the data. To improve performance of this query, make sure that it has large amounts of memory available. For more information about memory allocation for queries, see &lt;a href=&#34;../../../../en/admin/managing-db/managing-workloads/resource-manager/#&#34;&gt;Resource manager&lt;/a&gt;.&lt;/p&gt;

      </description>
    </item>
    
  </channel>
</rss>
