<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>OpenText Analytics Database 26.2.x – Load parallelism</title>
    <link>/en/extending/developing-udxs/user-defined-load-udl/load-parallelism/</link>
    <description>Recent content in Load parallelism on OpenText Analytics Database 26.2.x</description>
    <generator>Hugo -- gohugo.io</generator>
    
	  <atom:link href="/en/extending/developing-udxs/user-defined-load-udl/load-parallelism/index.xml" rel="self" type="application/rss+xml" />
    
    
      
        
      
    
    
    <item>
      <title>Extending: Cooperative parse</title>
      <link>/en/extending/developing-udxs/user-defined-load-udl/load-parallelism/cooperative-parse/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/extending/developing-udxs/user-defined-load-udl/load-parallelism/cooperative-parse/</guid>
      <description>
        
        
        &lt;p&gt;By default, OpenText™ Analytics Database parses a data source in a single thread on one database node. You can optionally use &lt;em&gt;cooperative parse&lt;/em&gt; to parse a source using multiple threads on a node. More specifically, data from a source passes through a &lt;em&gt;chunker&lt;/em&gt; that groups blocks from the source stream into logical units. These chunks can be parsed in parallel. The chunker divides the input into pieces that can be individually parsed, and the parser then parses them concurrently. Cooperative parse is available only for unfenced UDxs. (See &lt;a href=&#34;../../../../../en/extending/udxs/fenced-and-unfenced-modes/#&#34;&gt;Fenced and unfenced modes&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;To use cooperative parse, a chunker must be able to locate end-of-record markers in the input. Locating these markers might not be possible in all input formats.&lt;/p&gt;
&lt;p&gt;Chunkers are created by parser factories. At load time, the database first calls the &lt;code&gt;UDChunker&lt;/code&gt; to divide the input into chunks and then calls the &lt;code&gt;UDParser&lt;/code&gt; to parse each chunk.&lt;/p&gt;
&lt;p&gt;You can use cooperative parse and apportioned load independently or together. See &lt;a href=&#34;../../../../../en/extending/developing-udxs/user-defined-load-udl/load-parallelism/combining-cooperative-parse-and-apportioned-load/#&#34;&gt;Combining cooperative parse and apportioned load&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a name=&#34;How&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;how-opentexttrade-analytics-database-divides-a-load&#34;&gt;How OpenText™ Analytics Database divides a load&lt;/h2&gt;
&lt;p&gt;When the database receives data from a source, it calls the chunker&#39;s &lt;code&gt;process()&lt;/code&gt; method repeatedly. A chunker is, essentially, a lightweight parser; instead of parsing, the &lt;code&gt;process()&lt;/code&gt; method divides the input into chunks.&lt;/p&gt;
&lt;p&gt;After the chunker has finished dividing the input into chunks, the database sends those chunks to as many parsers as are available, calling the &lt;code&gt;process()&lt;/code&gt; method on the parser.&lt;/p&gt;
&lt;h2 id=&#34;implementing-cooperative-parse&#34;&gt;Implementing cooperative parse&lt;/h2&gt;
&lt;p&gt;To implement cooperative parse, perform the following actions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Subclass &lt;code&gt;UDChunker&lt;/code&gt; and implement &lt;code&gt;process()&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In your &lt;code&gt;ParserFactory&lt;/code&gt;, implement &lt;code&gt;prepareChunker()&lt;/code&gt; to return a &lt;code&gt;UDChunker&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;See &lt;a href=&#34;../../../../../en/extending/developing-udxs/user-defined-load-udl/user-defined-parser/cpp-example-delimited-parser-and-chunker/#&#34;&gt;C&amp;#43;&amp;#43; example: delimited parser and chunker&lt;/a&gt; for a &lt;code&gt;UDChunker&lt;/code&gt; that also supports apportioned load.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Extending: Apportioned load</title>
      <link>/en/extending/developing-udxs/user-defined-load-udl/load-parallelism/apportioned-load/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/extending/developing-udxs/user-defined-load-udl/load-parallelism/apportioned-load/</guid>
      <description>
        
        
        &lt;p&gt;A parser can use more than one database node to load a single input source in parallel. This approach is referred to as &lt;em&gt;apportioned load&lt;/em&gt;. Among the parsers built into OpenText™ Analytics Database, the default (delimited) parser supports apportioned load.&lt;/p&gt;
&lt;p&gt;Apportioned load, like cooperative parse, requires an input that can be divided at record boundaries. The difference is that cooperative parse does a sequential scan to find record boundaries, while apportioned load first jumps (seeks) to a given position and then scans. Some formats, like generic XML, do not support seeking.&lt;/p&gt;
&lt;p&gt;To use apportioned load, you must ensure that the source is reachable by all participating database nodes. You typically use apportioned load with distributed file systems.&lt;/p&gt;
&lt;p&gt;It is possible for a parser to not support apportioned load directly but to have a chunker that supports apportioning.&lt;/p&gt;
&lt;p&gt;You can use apportioned load and cooperative parse independently or together. See &lt;a href=&#34;../../../../../en/extending/developing-udxs/user-defined-load-udl/load-parallelism/combining-cooperative-parse-and-apportioned-load/#&#34;&gt;Combining cooperative parse and apportioned load&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a name=&#34;How&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;how-opentexttrade-analytics-database-apportions-a-load&#34;&gt;How OpenText™ Analytics Database apportions a load&lt;/h2&gt;
&lt;p&gt;If both the parser and its source support apportioning, then you can specify that a single input is to be distributed to multiple database nodes for loading. The &lt;code&gt;SourceFactory&lt;/code&gt; breaks the input into portions and assigns them to execution nodes. Each &lt;code&gt;Portion&lt;/code&gt; consists of an offset into the input and a size. OpenText™ Analytics Database distributes the portions and their parameters to the execution nodes. A source factory running on each node produces a &lt;code&gt;UDSource&lt;/code&gt; for the given portion.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;UDParser&lt;/code&gt; first determines where to start parsing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;If the portion is the first one in the input, the parser advances to the offset and begins parsing.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If the portion is not the first, the parser advances to the offset and then scans until it finds the end of a record. Because records can break across portions, parsing begins after the first record-end encountered.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The parser must complete a record, which might require it to read past the end of the portion. The parser is responsible for parsing all records that &lt;em&gt;begin&lt;/em&gt; in the assigned portion, regardless of where they end. Most of this work occurs within the &lt;code&gt;process()&lt;/code&gt; method of the parser.&lt;/p&gt;
&lt;p&gt;Sometimes, a portion contains nothing to be parsed by its assigned node. For example, suppose you have a record that begins in portion 1, runs through all of portion 2, and ends in portion 3. The parser assigned to portion 1 parses the record, and the parser assigned to portion 3 starts after that record. The parser assigned to portion 2, however, has no record starting within its portion.&lt;/p&gt;
&lt;p&gt;If the load also uses &lt;a href=&#34;../../../../../en/extending/developing-udxs/user-defined-load-udl/load-parallelism/cooperative-parse/#&#34;&gt;Cooperative parse&lt;/a&gt;, then after apportioning the load and before parsing, the database divides portions into chunks for parallel loading.&lt;/p&gt;
&lt;h2 id=&#34;implementing-apportioned-load&#34;&gt;Implementing apportioned load&lt;/h2&gt;
&lt;p&gt;To implement apportioned load, perform the following actions in the source, the parser, and their factories.&lt;/p&gt;
&lt;p&gt;In your &lt;code&gt;SourceFactory&lt;/code&gt; subclass:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Implement &lt;code&gt;isSourceApportionable()&lt;/code&gt; and return &lt;code&gt;true&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Implement &lt;code&gt;plan()&lt;/code&gt; to determine portion size, designate portions, and assign portions to execution nodes. To assign portions to particular executors, pass the information using the parameter writer on the plan context (&lt;code&gt;PlanContext::getWriter()&lt;/code&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Implement &lt;code&gt;prepareUDSources()&lt;/code&gt;. OpenText™ Analytics Database calls this method on each execution node with the plan context created by the factory. This method returns the &lt;code&gt;UDSource&lt;/code&gt; instances to be used for this node&#39;s assigned portions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If sources can take advantage of parallelism, you can implement &lt;code&gt;getDesiredThreads()&lt;/code&gt; to request a number of threads for each source. See &lt;a href=&#34;../../../../../en/extending/developing-udxs/user-defined-load-udl/user-defined-source/sourcefactory-class/#&#34;&gt;SourceFactory class&lt;/a&gt; for more information about this method.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In your &lt;code&gt;UDSource&lt;/code&gt; subclass, implement &lt;code&gt;process()&lt;/code&gt; as you would for any other source, using the assigned portion. You can retrieve this portion with &lt;code&gt;getPortion()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In your &lt;code&gt;ParserFactory&lt;/code&gt; subclass:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Implement &lt;code&gt;isParserApportionable()&lt;/code&gt; and return &lt;code&gt;true&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If your parser uses a &lt;code&gt;UDChunker&lt;/code&gt; that supports apportioned load, implement &lt;code&gt;isChunkerApportionable()&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In your &lt;code&gt;UDParser&lt;/code&gt; subclass:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Write your &lt;code&gt;UDParser&lt;/code&gt; subclass to operate on portions rather than whole sources. You can do so by handling the stream states &lt;code&gt;PORTION_START&lt;/code&gt; and &lt;code&gt;PORTION_END&lt;/code&gt;, or by using the &lt;code&gt;ContinuousUDParser&lt;/code&gt; API. Your parser must scan for the beginning of the portion, find the first record boundary after that position, and parse to the end of the last record beginning in that portion. Be aware that this behavior might require that the parser read beyond the end of the portion.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Handle the special case of a portion containing no record start by returning without writing any output.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In your &lt;code&gt;UDChunker&lt;/code&gt; subclass, implement &lt;code&gt;alignPortion()&lt;/code&gt;. See &lt;a href=&#34;../../../../../en/extending/developing-udxs/user-defined-load-udl/user-defined-parser/udchunker-class/#AL&#34;&gt;Aligning Portions&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;example&#34;&gt;Example&lt;/h2&gt;
&lt;p&gt;The SDK provides a C++ example of apportioned load in the &lt;code&gt;ApportionLoadFunctions&lt;/code&gt; directory:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;FilePortionSource&lt;/code&gt; is a subclass of &lt;code&gt;UDSource&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;DelimFilePortionParser&lt;/code&gt; is a subclass of &lt;code&gt;ContinuousUDParser&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Use these classes together. You could also use &lt;code&gt;FilePortionSource&lt;/code&gt; with the built-in delimited parser.&lt;/p&gt;
&lt;p&gt;The following example shows how you can load the libraries and create the functions in the database:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;
=&amp;gt; CREATE LIBRARY FilePortionSourceLib as &amp;#39;/home/dbadmin/FP.so&amp;#39;;

=&amp;gt; CREATE LIBRARY DelimFilePortionParserLib as &amp;#39;/home/dbadmin/Delim.so&amp;#39;;

=&amp;gt; CREATE SOURCE FilePortionSource AS
LANGUAGE &amp;#39;C++&amp;#39; NAME &amp;#39;FilePortionSourceFactory&amp;#39; LIBRARY FilePortionSourceLib;

=&amp;gt; CREATE PARSER DelimFilePortionParser AS
LANGUAGE &amp;#39;C++&amp;#39; NAME &amp;#39;DelimFilePortionParserFactory&amp;#39; LIBRARY DelimFilePortionParserLib;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The following example shows how you can use the source and parser to load data:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;
=&amp;gt; COPY t WITH SOURCE FilePortionSource(file=&amp;#39;g1/*.dat&amp;#39;) PARSER DelimFilePortionParser(delimiter = &amp;#39;|&amp;#39;,
    record_terminator = &amp;#39;~&amp;#39;);
&lt;/code&gt;&lt;/pre&gt;
      </description>
    </item>
    
    <item>
      <title>Extending: Combining cooperative parse and apportioned load</title>
      <link>/en/extending/developing-udxs/user-defined-load-udl/load-parallelism/combining-cooperative-parse-and-apportioned-load/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/extending/developing-udxs/user-defined-load-udl/load-parallelism/combining-cooperative-parse-and-apportioned-load/</guid>
      <description>
        
        
        &lt;p&gt;You can enable both &lt;a href=&#34;../../../../../en/extending/developing-udxs/user-defined-load-udl/load-parallelism/cooperative-parse/#&#34;&gt;Cooperative parse&lt;/a&gt; and &lt;a href=&#34;../../../../../en/extending/developing-udxs/user-defined-load-udl/load-parallelism/apportioned-load/#&#34;&gt;Apportioned load&lt;/a&gt; in the same parser, allowing OpenText™ Analytics Database to decide how to load data.&lt;/p&gt;
&lt;h2 id=&#34;deciding-how-to-divide-a-load&#34;&gt;Deciding how to divide a load&lt;/h2&gt;
&lt;p&gt;OpenText™ Analytics Database uses apportioned load, where possible, at query-planning time. It decides whether to also use cooperative parse at execution time.&lt;/p&gt;
&lt;p&gt;Apportioned load requires &lt;code&gt;SourceFactory&lt;/code&gt; support. Given a suitable &lt;code&gt;UDSource&lt;/code&gt;, at planning time the database calls the &lt;code&gt;isParserApportionable()&lt;/code&gt; method on the &lt;code&gt;ParserFactory&lt;/code&gt;. If this method returns &lt;code&gt;true&lt;/code&gt;, the database apportions the load.&lt;/p&gt;
&lt;p&gt;If &lt;code&gt;isParserApportionable()&lt;/code&gt; returns &lt;code&gt;false&lt;/code&gt; but &lt;code&gt;isChunkerApportionable()&lt;/code&gt; returns &lt;code&gt;true&lt;/code&gt;, then a chunker is available for cooperative parse and that chunker supports apportioned load. The database apportions the load.&lt;/p&gt;
&lt;p&gt;If neither of these methods returns &lt;code&gt;true&lt;/code&gt;, the database does not apportion the load.&lt;/p&gt;
&lt;p&gt;At execution time, the database first checks whether the load is running in unfenced mode and proceeds only if it is. Cooperative parse is not supported in fenced mode.&lt;/p&gt;
&lt;p&gt;If the load is not apportioned, and more than one thread is available, the database uses cooperative parse.&lt;/p&gt;
&lt;p&gt;If the load is apportioned, and exactly one thread is available, the database uses cooperative parse if and only if the parser is not apportionable. In this case, the chunker is apportionable but the parser is not.&lt;/p&gt;
&lt;p&gt;If the load is apportioned, and more than one thread is available, and the chunker is apportionable, the database uses cooperative parse.&lt;/p&gt;
&lt;p&gt;If the database uses cooperative parse but &lt;code&gt;prepareChunker()&lt;/code&gt; does not return a &lt;code&gt;UDChunker&lt;/code&gt; instance, the database reports an error.&lt;/p&gt;
&lt;h2 id=&#34;executing-apportioned-cooperative-loads&#34;&gt;Executing apportioned, cooperative loads&lt;/h2&gt;
&lt;p&gt;If a load uses both apportioned load and cooperative parse, the database uses the &lt;code&gt;SourceFactory&lt;/code&gt; to break the input into portions. It then assigns the portions to execution nodes. See &lt;a href=&#34;../../../../../en/extending/developing-udxs/user-defined-load-udl/load-parallelism/apportioned-load/#How&#34;&gt;How OpenText™ Analytics Database Apportions a Load&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;On the execution node, the database calls the chunker&#39;s &lt;code&gt;alignPortion()&lt;/code&gt; method to align the input with portion boundaries. (This step is skipped for the first portion, which by definition is already aligned at the beginning.) This step is necessary because a parser using apportioned load sometimes has to read beyond the end of the portion, so a chunker needs to find the end point.&lt;/p&gt;
&lt;p&gt;After aligning the portion, the database calls the chunker&#39;s &lt;code&gt;process()&lt;/code&gt; method repeatedly. See &lt;a href=&#34;../../../../../en/extending/developing-udxs/user-defined-load-udl/load-parallelism/cooperative-parse/#How&#34;&gt;How OpenText™ Analytics Database Divides a Load&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The chunks found by the chunker are then sent to the parser&#39;s &lt;code&gt;process()&lt;/code&gt; method for processing in the usual way.&lt;/p&gt;

      </description>
    </item>
    
  </channel>
</rss>
