<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Vertica Documentation – Configuring HDFS access</title>
    <link>/en/hadoop-integration/configuring-hdfs-access/</link>
    <description>Recent content in Configuring HDFS access on Vertica Documentation</description>
    <generator>Hugo -- gohugo.io</generator>
    
	  <atom:link href="/en/hadoop-integration/configuring-hdfs-access/index.xml" rel="self" type="application/rss+xml" />
    
    
      
        
      
    
    
    <item>
      <title>Hadoop-Integration: Verifying HDFS configuration</title>
      <link>/en/hadoop-integration/configuring-hdfs-access/verifying-hdfs-config/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/hadoop-integration/configuring-hdfs-access/verifying-hdfs-config/</guid>
      <description>
        
        
        &lt;p&gt;Use the &lt;a href=&#34;../../../en/sql-reference/functions/hadoop-functions/external-config-check/&#34;&gt;EXTERNAL_CONFIG_CHECK&lt;/a&gt; function to test access to HDFS. This function calls several others. If you prefer to test individual components, or if some tests do not apply to your configuration, you can instead call the functions individually. For example, if you are not using the HCatalog Connector then you do not need to call that function. The functions are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;../../../en/sql-reference/functions/management-functions/db-functions/kerberos-config-check/&#34;&gt;KERBEROS_CONFIG_CHECK&lt;/a&gt;: tests the Vertica keytab and the user&#39;s Kerberos credential.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;../../../en/sql-reference/functions/hadoop-functions/hadoop-impersonation-config-check/&#34;&gt;HADOOP_IMPERSONATION_CONFIG_CHECK&lt;/a&gt;: shows the delegation tokens that are in use. This function does not test them.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;../../../en/sql-reference/functions/hadoop-functions/hdfs-cluster-config-check/&#34;&gt;HDFS_CLUSTER_CONFIG_CHECK&lt;/a&gt;: tests access to the HDFS clusters found in HadoopConfDir, including using Kerberos and impersonation (delegation tokens).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;../../../en/sql-reference/functions/hadoop-functions/hcatalogconnector-config-check/&#34;&gt;HCATALOGCONNECTOR_CONFIG_CHECK&lt;/a&gt;: tests HCatalog Connector access to HiveServer2.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To run all tests, call &lt;code&gt;EXTERNAL_CONFIG_CHECK&lt;/code&gt; with no arguments:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT EXTERNAL_CONFIG_CHECK();
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;To test only some authorities, nameservices, or Hive schemas, pass a single string argument. The format is a comma-separated list of &amp;quot;key=value&amp;quot; pairs, where keys are &amp;quot;authority&amp;quot;, &amp;quot;nameservice&amp;quot;, and &amp;quot;schema&amp;quot;. The value is passed to all of the sub-functions; see those reference pages for details on how values are interpreted.&lt;/p&gt;
&lt;p&gt;The following example tests the configuration of only the nameservice named &amp;quot;ns1&amp;quot;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT EXTERNAL_CONFIG_CHECK(&amp;#39;nameservice=ns1&amp;#39;);
&lt;/code&gt;&lt;/pre&gt;
      </description>
    </item>
    
    <item>
      <title>Hadoop-Integration: Troubleshooting reads from HDFS</title>
      <link>/en/hadoop-integration/configuring-hdfs-access/troubleshooting-reads-from-hdfs/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/hadoop-integration/configuring-hdfs-access/troubleshooting-reads-from-hdfs/</guid>
      <description>
        
        
        &lt;p&gt;You might encounter the following issues when accessing data in HDFS.&lt;/p&gt;
&lt;h2 id=&#34;queries-using-webhdfs-show-unexpected-results&#34;&gt;Queries using [web]hdfs:/// show unexpected results&lt;/h2&gt;
&lt;p&gt;If you are using the &lt;code&gt;///&lt;/code&gt; shorthand to query external tables and see unexpected results, such as production data in your test cluster, verify that &lt;a href=&#34;../../../en/sql-reference/config-parameters/hadoop-parameters/#HadoopConfDir&#34;&gt;HadoopConfDir&lt;/a&gt; is set to the value you expect. The HadoopConfDir configuration parameter defines a path to search for the Hadoop configuration files that Vertica needs to resolve file locations. The HadoopConfDir parameter can be set at the session level, overriding the permanent value set in the database.&lt;/p&gt;
&lt;p&gt;To debug problems with &lt;code&gt;///&lt;/code&gt; URLs, try replacing the URLs with ones that use an explicit nameservice or name node. If the explicit URL works, then the problem is with the resolution of the shorthand. If the explicit URL also does not work as expected, then the problem is elsewhere (such as your nameservice).&lt;/p&gt;
&lt;h2 id=&#34;queries-take-a-long-time-to-run-when-using-ha&#34;&gt;Queries take a long time to run when using HA&lt;/h2&gt;
&lt;p&gt;The High Availability Name Node feature in HDFS allows a name node to fail over to a standby name node. The &lt;code&gt;dfs.client.failover.max.attempts&lt;/code&gt; configuration parameter (in &lt;code&gt;hdfs-site.xml&lt;/code&gt;) specifies how many attempts to make when failing over. Vertica uses a default value of 4 if this parameter is not set. After reaching the maximum number of failover attempts, Vertica concludes that the HDFS cluster is unavailable and aborts the operation. Vertica uses the &lt;code&gt;dfs.client.failover.sleep.base.millis&lt;/code&gt; and &lt;code&gt;dfs.client.failover.sleep.max.millis&lt;/code&gt; parameters to decide how long to wait between retries. Typical ranges are 500 milliseconds to 15 seconds, with longer waits for successive retries.&lt;/p&gt;
&lt;p&gt;A second parameter, &lt;code&gt;ipc.client.connect.retry.interval&lt;/code&gt;, specifies the time to wait between attempts, with typical values being 10 to 20 seconds.&lt;/p&gt;
&lt;p&gt;Cloudera and Hortonworks both provide tools to automatically generate configuration files. These tools can set the maximum number of failover attempts to a much higher number (50 or 100). If the HDFS cluster is unavailable (all name nodes are unreachable), Vertica can appear to hang for an extended period (minutes to hours) while trying to connect.&lt;/p&gt;
&lt;p&gt;Failover attempts are logged in the &lt;a href=&#34;../../../en/sql-reference/system-tables/v-monitor-schema/query-events/&#34;&gt;QUERY_EVENTS&lt;/a&gt; system table. The following example shows how to query this table to find these events:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT event_category, event_type, event_description, operator_name,
   event_details, count(event_type) AS count
   FROM query_events
   WHERE event_type ilike &amp;#39;WEBHDFS FAILOVER RETRY&amp;#39;
   GROUP BY event_category, event_type, event_description, operator_name, event_details;
-[ RECORD 1 ]-----+---------------------------------------
event_category    | EXECUTION
event_type        | WEBHDFS FAILOVER RETRY
event_description | WebHDFS Namenode failover and retry.
operator_name     | WebHDFS FileSystem
event_details     | WebHDFS request failed on ns
count             | 4
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;You can either wait for Vertica to complete or abort the connection, or set the &lt;code&gt;dfs.client.failover.max.attempts&lt;/code&gt; parameter to a lower value.&lt;/p&gt;
&lt;h2 id=&#34;webhdfs-error-when-using-libhdfs&#34;&gt;WebHDFS error when using LibHDFS++&lt;/h2&gt;
&lt;p&gt;When creating an external table or loading data and using the &lt;code&gt;hdfs&lt;/code&gt; scheme, you might see errors from WebHDFS failures. Such errors indicate that Vertica was not able to use the &lt;code&gt;hdfs&lt;/code&gt; scheme and fell back to &lt;code&gt;webhdfs&lt;/code&gt;, but that the WebHDFS configuration is incorrect.&lt;/p&gt;
&lt;p&gt;First verify the value of the &lt;a href=&#34;../../../en/sql-reference/config-parameters/hadoop-parameters/#HadoopConfDir&#34;&gt;HadoopConfDir&lt;/a&gt; configuration parameter, which can be set at the session level. Then verify that the HDFS configuration files found there have the correct WebHDFS configuration for your Hadoop cluster. See &lt;a href=&#34;../../../en/hadoop-integration/configuring-hdfs-access/&#34;&gt;Configuring HDFS access&lt;/a&gt; for information about use of these files. See your Hadoop documentation for information about WebHDFS configuration.&lt;/p&gt;
&lt;h2 id=&#34;vertica-places-too-much-load-on-the-name-node-libhdfs&#34;&gt;Vertica places too much load on the name node (LibHDFS++)&lt;/h2&gt;
&lt;p&gt;Large HDFS clusters can sometimes experience heavy load on the name node when clients, including Vertica, need to locate data. If your name node is sensitive to this load and if you are using LibHDFS++, you can instruct Vertica to distribute metadata about block locations to its nodes so that they do not have to contact the name node as often. Distributing this metadata can degrade database performance somewhat in deployments where the name node isn&#39;t contended. This performance effect is because the data must be serialized and distributed.&lt;/p&gt;
&lt;p&gt;If protecting your name node from load is more important than query performance, set the &lt;a href=&#34;../../../en/sql-reference/config-parameters/hadoop-parameters/#EnableHDFSBlockInfoCache&#34;&gt;EnableHDFSBlockInfoCache&lt;/a&gt; configuration parameter to 1 (true). Usually this applies to large HDFS clusters where name node contention is already an issue.&lt;/p&gt;
&lt;p&gt;This setting applies to access through LibHDFS++ (&lt;code&gt;hdfs&lt;/code&gt; scheme). Sometimes LibHDFS++ falls back to WebHDFS, which does not use this setting. If you have enabled this setting and you are still seeing high traffic on your name node from Vertica, check the &lt;a href=&#34;../../../en/sql-reference/system-tables/v-monitor-schema/query-events/&#34;&gt;QUERY_EVENTS&lt;/a&gt; system table for &lt;code&gt;LibHDFS++ UNSUPPORTED OPERATION&lt;/code&gt; events.&lt;/p&gt;
&lt;h2 id=&#34;kerberos-authentication-errors&#34;&gt;Kerberos authentication errors&lt;/h2&gt;
&lt;p&gt;Kerberos authentication can fail even though a ticket is valid if Hadoop expires tickets frequently. It can also fail due to clock skew between Hadoop and Vertica nodes. For details, see &lt;a href=&#34;../../../en/security-and-authentication/client-authentication/kerberos-authentication/troubleshooting-kerberos-authentication/&#34;&gt;Troubleshooting Kerberos authentication&lt;/a&gt;.&lt;/p&gt;

      </description>
    </item>
    
  </channel>
</rss>
