Difference between revisions of "Datasource Adapters"

From 3forge Documentation
Jump to navigation Jump to search
Tag: visualeditor-switched
Tag: visualeditor
 
(29 intermediate revisions by 2 users not shown)
Line 69: Line 69:
  
 
=== File name Directive (Required) ===
 
=== File name Directive (Required) ===
 +
'''Syntax'''
  
==== Syntax ====
+
<span style="font-family: Courier New; color: blue;">_file=''path/to/file''</span>
_file=''path/to/file''
+
 
 +
'''Overview'''
  
==== Overview ====
 
 
This directive controls the location of the file to read, relative to the datasource's url. Use the forward slash (/) to indicate directories (standard UNIX convention)                                                             
 
This directive controls the location of the file to read, relative to the datasource's url. Use the forward slash (/) to indicate directories (standard UNIX convention)                                                             
  
==== Examples ====
+
'''Examples'''
_file="data.txt" (Read the ''data.txt'' file, located at the root of the datasource's url)
 
  
_file="subdir/data.txt" (Read the ''data.txt'' file, found under the ''subdir'' directory)
+
<span style="font-family: Courier New; color: blue;">_file="data.txt"</span> (Read the ''data.txt'' file, located at the root of the datasource's url)
 +
 
 +
<span style="font-family: Courier New; color: blue;">_file="subdir/data.txt"</span> (Read the ''data.txt'' file, found under the ''subdir'' directory)
  
 
=== Field definitions Directive (Required) ===
 
=== Field definitions Directive (Required) ===
 +
'''Syntax'''
 +
 +
<span style="font-family: Courier New; color: blue;">_fields=''col1_type col_name, col2_type col2_name, ...''</span>
  
==== Syntax ====
+
'''Overview'''
_fields=''col1_type col_name, col2_type col2_name, ...''
 
  
==== Overview ====
 
 
This directive controls the Column names that will be returned, along with their types.  The order in which they are defined is the same as the order in which they are returned.  If the column type is not supplied, the default is String. Special note on additional columns: If the line number (see _''linenum'' directive) column is not supplied in the list, it will default to type integer and be added to the end of the table schema.  Columns defined in the Pattern (see _''pattern'' directive) but not defined in _''fields'' will be added to the end of the table schema.
 
This directive controls the Column names that will be returned, along with their types.  The order in which they are defined is the same as the order in which they are returned.  If the column type is not supplied, the default is String. Special note on additional columns: If the line number (see _''linenum'' directive) column is not supplied in the list, it will default to type integer and be added to the end of the table schema.  Columns defined in the Pattern (see _''pattern'' directive) but not defined in _''fields'' will be added to the end of the table schema.
  
Types should be one of: ''String, Long, Integer, Boolean, Double, Float, UTC''
+
Types should be one of: <span style="font-family: Courier New; color: blue;">''String, Long, Integer, Boolean, Double, Float, UTC''</span>
  
 
Column names must be valid variable names.
 
Column names must be valid variable names.
  
==== Examples ====
+
'''Examples'''
_fields="String account,Long quantity" (define two columns)
+
 
 +
<span style="font-family: Courier New; color: blue;">_fields="String account,Long quantity"</span> (define two columns)
  
_fields ="fname,lname,int age" (define 3 columns, ''fname'' and ''lname'' default to String)
+
<span style="font-family: Courier New; color: blue;">_fields ="fname,lname,int age"</span> (define 3 columns, ''fname'' and ''lname'' default to String)
  
 
== Directives for parsing Delimited list of ordered Fields ==
 
== Directives for parsing Delimited list of ordered Fields ==
_file=''file_name'' ('''Required''', see general directives)  
+
<span style="font-family: Courier New; color: blue;">_file=''file_name''</span> ('''Required''', see general directives)  
  
_fields=''col1_type col1_name, ...'' ('''Required''', see general directives)  
+
<span style="font-family: Courier New; color: blue;">_fields=''col1_type col1_name, ...''</span> ('''Required''', see general directives)  
  
_delim=delim_string ('''Required''')  
+
<span style="font-family: Courier New; color: blue;">_delim=delim_string</span> ('''Required''')  
  
_conflateDelim=true|false (Optional. Default is false)  
+
<span style="font-family: Courier New; color: blue;">_conflateDelim=true|false</span> (Optional. Default is false)  
  
''_quote=single_quote_char'' (Optional)  
+
<span style="font-family: Courier New; color: blue;">''_quote=single_quote_char''</span> (Optional)  
  
''_escape=single_escape_char'' (Optional)  
+
<span style="font-family: Courier New; color: blue;">''_escape=single_escape_char''</span> (Optional)  
  
 
The _''delim'' indicates the char (or chars) used to separate each field (If _''conflateDelim'' is true, then 1 or more consecutive delimiters are treated as a single delimiter). The _''fields'' is an ordered list of types and field names for each of the delimited fields. If the _''quote'' is supplied, then a field value starting with ''quote'' will be read until another ''quote'' char is found, meaning ''delim''s within ''quotes'' will not be treated as delims.  If the _e''scape'' char is supplied then when an ''escape'' char is read, it is skipped and the following char is read as a literal.
 
The _''delim'' indicates the char (or chars) used to separate each field (If _''conflateDelim'' is true, then 1 or more consecutive delimiters are treated as a single delimiter). The _''fields'' is an ordered list of types and field names for each of the delimited fields. If the _''quote'' is supplied, then a field value starting with ''quote'' will be read until another ''quote'' char is found, meaning ''delim''s within ''quotes'' will not be treated as delims.  If the _e''scape'' char is supplied then when an ''escape'' char is read, it is skipped and the following char is read as a literal.
  
=== Examples ===
+
'''Examples'''
_delim="|"
 
  
_fields="code,lname,int age"
+
<span style="font-family: Courier New; color: blue;">_delim="|"</span>
  
_quote="'"
+
<span style="font-family: Courier New; color: blue;">_fields="code,lname,int age"</span>
  
_escape="\\"
+
<span style="font-family: Courier New; color: blue;">_quote="'"</span>
 +
 
 +
<span style="font-family: Courier New; color: blue;">_escape="\\"</span>
  
 
This defines a pattern such that:
 
This defines a pattern such that:
Line 150: Line 155:
  
 
== Directives for parsing Key Value Pairs ==
 
== Directives for parsing Key Value Pairs ==
_file=''file_name'' ('''Required''', see general directives)  
+
<span style="font-family: Courier New; color: blue;">_file=''file_name''</span> ('''Required''', see general directives)  
  
_fields=''col1_type col1_name, ...'' ('''Required''', see general directives)  
+
<span style="font-family: Courier New; color: blue;">_fields=''col1_type col1_name, ...''</span> ('''Required''', see general directives)  
  
_delim=delim_string ('''Required''')  
+
<span style="font-family: Courier New; color: blue;">_delim=delim_string</span> ('''Required''')  
  
_conflateDelim=true|false (Optional. Default is false)  
+
<span style="font-family: Courier New; color: blue;">_conflateDelim=true|false</span> (Optional. Default is false)  
  
''_equals=single_equals_char'' ('''Required''')
+
<span style="font-family: Courier New; color: blue;">''_equals=single_equals_char''</span> ('''Required''')
  
_mappings=from1=to1,from2=to2,... (Optional)  
+
<span style="font-family: Courier New; color: blue;">_mappings=from1=to1,from2=to2,...</span> (Optional)  
  
''_quote=single_quote_char'' (Optional)  
+
<span style="font-family: Courier New; color: blue;">''_quote=single_quote_char''</span> (Optional)  
  
''_escape=single_escape_char'' (Optional)  
+
<span style="font-family: Courier New; color: blue;">''_escape=single_escape_char''</span> (Optional)  
  
 
The _''delim'' indicates the char (or chars) used to separate each field (If _''conflateDelim'' is true, then 1 or more consecutive delimiters are treated as a single delimiter). The _equals char is used to indicate the key/value separator. The _''fields'' is an ordered list of types and field names for each of the delimited fields. If the _''quote'' is supplied, then a field value starting with ''quote'' will be read until another ''quote'' char is found, meaning ''delim''s within ''quotes'' will not be treated as delims.  If the _''escape'' char is supplied then when an ''escape'' char is read, it is skipped and the following char is read as a literal.
 
The _''delim'' indicates the char (or chars) used to separate each field (If _''conflateDelim'' is true, then 1 or more consecutive delimiters are treated as a single delimiter). The _equals char is used to indicate the key/value separator. The _''fields'' is an ordered list of types and field names for each of the delimited fields. If the _''quote'' is supplied, then a field value starting with ''quote'' will be read until another ''quote'' char is found, meaning ''delim''s within ''quotes'' will not be treated as delims.  If the _''escape'' char is supplied then when an ''escape'' char is read, it is skipped and the following char is read as a literal.
Line 170: Line 175:
 
The optional _m''appings'' directive allows you to map keys within the flat file to file names specified in the _''fields'' directive. This is useful when a file has key names that are not valid field names, or a file has multiple key names that should be used to populate the same column.
 
The optional _m''appings'' directive allows you to map keys within the flat file to file names specified in the _''fields'' directive. This is useful when a file has key names that are not valid field names, or a file has multiple key names that should be used to populate the same column.
  
=== Examples ===
+
'''Examples'''
_delim="|"
+
 
 +
<span style="font-family: Courier New; color: blue;">_delim="|"</span>
  
_equals="="
+
<span style="font-family: Courier New; color: blue;">_equals="="</span>
  
_fields="code,lname,int age"
+
<span style="font-family: Courier New; color: blue;">_fields="code,lname,int age"</span>
  
_mappings="20=code,21=lname,22=age"
+
<span style="font-family: Courier New; color: blue;">_mappings="20=code,21=lname,22=age"</span>
  
_quote="'"
+
<span style="font-family: Courier New; color: blue;">_quote="'"</span>
  
_escape="\\"
+
<span style="font-family: Courier New; color: blue;">_escape="\\"</span>
  
 
This defines a pattern such that:
 
This defines a pattern such that:
Line 211: Line 217:
  
 
== Directives for Pattern Capture ==
 
== Directives for Pattern Capture ==
_file=''file_name'' ('''Required''', see general directives)  
+
<span style="font-family: Courier New; color: blue;">_file=''file_name''</span> ('''Required''', see general directives)  
  
_fields=''col1_type col1_name, ...'' (Optional, see general directives)  
+
<span style="font-family: Courier New; color: blue;">_fields=''col1_type col1_name, ...''</span> (Optional, see general directives)  
  
_''pattern''=''col1_type col1_name, ...=regex_with_grouping'' ('''Required''')
+
<span style="font-family: Courier New; color: blue;">_''pattern''=''col1_type col1_name, ...=regex_with_grouping''</span> ('''Required''')
  
 
The ''_pattern'' must start with a list of column names, followed by an equal sign (=) and then a regular expression with grouping (this is dubbed a column-to-pattern mapping).  The regular expression's first grouping value will be mapped to the first column, 2<sup>nd</sup> grouping to the second and so on.
 
The ''_pattern'' must start with a list of column names, followed by an equal sign (=) and then a regular expression with grouping (this is dubbed a column-to-pattern mapping).  The regular expression's first grouping value will be mapped to the first column, 2<sup>nd</sup> grouping to the second and so on.
Line 223: Line 229:
 
For multiple column-to-pattern mappings, use the \n (new line) to separate each one.
 
For multiple column-to-pattern mappings, use the \n (new line) to separate each one.
  
=== Examples ===
+
'''Examples'''
 +
 
 
<span style="font-family: Courier New; color: blue;">_pattern="fname,lname,int age=User (.*) (.*) is (.*) years old"</span>
 
<span style="font-family: Courier New; color: blue;">_pattern="fname,lname,int age=User (.*) (.*) is (.*) years old"</span>
  
Line 275: Line 282:
  
 
=== Skipping Lines Directive (optional) ===
 
=== Skipping Lines Directive (optional) ===
 +
'''Syntax'''
  
==== Syntax ====
+
<span style="font-family: Courier New; color: blue;">_skipLines=''number_of_lines''</span>
_skipLines=''number_of_lines''
+
 
 +
'''Overview'''
  
==== Overview ====
 
 
This directive controls the number of lines to skip from the top of the file.  This is useful for ignoring "junk" at the top of a file. If not supplied, then no lines are skipped. From a performance standpoint, skipping lines is highly efficient.
 
This directive controls the number of lines to skip from the top of the file.  This is useful for ignoring "junk" at the top of a file. If not supplied, then no lines are skipped. From a performance standpoint, skipping lines is highly efficient.
  
==== Examples ====
+
'''Examples'''
_skipLines="0"     (this is the default, don't skip any lines)
+
 
 +
<span style="font-family: Courier New; color: blue;">_skipLines="0"</span>     (this is the default, don't skip any lines)
  
_skipLines="1"      (skip the first line, for example if there is a header)
+
<span style="font-family: Courier New; color: blue;">_skipLines="1"</span>      (skip the first line, for example if there is a header)
  
 
=== Line Number Column Directive (optional) ===
 
=== Line Number Column Directive (optional) ===
 +
'''Syntax'''
 +
 +
<span style="font-family: Courier New; color: blue;">_linenum=''column_name''</span>
  
==== Syntax ====
+
'''Overview'''
_linenum=''column_name''
 
  
==== Overview ====
 
 
This directive controls the name of the column that will contain the line number. If not supplied, the default is "linenum". Notes about the line number: The first line is line number 1, and skipped/filtered out lines are still considered in numbering. For example, if the _''skipLines''=2 , then the first line will have a line number of 3.
 
This directive controls the name of the column that will contain the line number. If not supplied, the default is "linenum". Notes about the line number: The first line is line number 1, and skipped/filtered out lines are still considered in numbering. For example, if the _''skipLines''=2 , then the first line will have a line number of 3.
  
==== Examples ====
+
'''Examples'''
_linenum="" (A line number column is not included in the table)
 
  
_linenum="linenum" (The column ''linenum'' will contain line numbers, this is the default)
+
<span style="font-family: Courier New; color: blue;">_linenum=""</span> (A line number column is not included in the table)
  
_linenum="rownum" (The column ''rownum'' will contain line numbers)
+
<span style="font-family: Courier New; color: blue;">_linenum="linenum"</span> (The column ''linenum'' will contain line numbers, this is the default)
 +
 
 +
<span style="font-family: Courier New; color: blue;">_linenum="rownum"</span> (The column ''rownum'' will contain line numbers)
  
 
== Optional Line Filtering Directives ==
 
== Optional Line Filtering Directives ==
  
 
=== Filtering Out Lines Directive (optional) ===
 
=== Filtering Out Lines Directive (optional) ===
 +
'''Syntax'''
  
==== Syntax ====
+
<span style="font-family: Courier New; color: blue;">_filterOut=''regex''</span>
_filterOut=''regex''
+
 
 +
'''Overview'''
  
==== Overview ====
 
 
Any line that matches the supplied regular expression will be ignored. If not supplied, then no lines are filtered out. From a Performance standpoint, this is applied before other parsing is considered, so ignoring lines using a filter out directive is faster, as opposed to using a WHERE clause, for example.
 
Any line that matches the supplied regular expression will be ignored. If not supplied, then no lines are filtered out. From a Performance standpoint, this is applied before other parsing is considered, so ignoring lines using a filter out directive is faster, as opposed to using a WHERE clause, for example.
  
==== Examples ====
+
'''Examples'''
_filterOut="Test" (ignore any lines containing the text ''Test'')
 
  
_filterOut="^Comment" (ignore any lines starting with ''Comment'')
+
<span style="font-family: Courier New; color: blue;">_filterOut="Test"</span> (ignore any lines containing the text ''Test'')
  
_filterOut="This|That" (ignore any lines containing the text ''This'' or ''That'')
+
<span style="font-family: Courier New; color: blue;">_filterOut="^Comment"</span> (ignore any lines starting with ''Comment'')
 +
 
 +
<span style="font-family: Courier New; color: blue;">_filterOut="This|That"</span> (ignore any lines containing the text ''This'' or ''That'')
  
 
=== Filtering In Lines Directive (optional) ===
 
=== Filtering In Lines Directive (optional) ===
 +
'''Syntax'''
  
==== Syntax ====
+
<span style="font-family: Courier New; color: blue;">_filterIn=''regex''</span>
_filterIn=''regex''
+
 
 +
'''Overview'''
  
==== Overview ====
 
 
Only lines that match the supplied regular expression will be considered. If not supplied, then all lines are considered. From a Performance standpoint, this is applied before other parsing is considered, so narrowing down the lines considered using a filter in directive is faster, as opposed to using a WHERE clause, for example. If you use a grouping (..) inside the regular expression, then only the contents of the first grouping will be considered for parsing
 
Only lines that match the supplied regular expression will be considered. If not supplied, then all lines are considered. From a Performance standpoint, this is applied before other parsing is considered, so narrowing down the lines considered using a filter in directive is faster, as opposed to using a WHERE clause, for example. If you use a grouping (..) inside the regular expression, then only the contents of the first grouping will be considered for parsing
  
==== Examples ====
+
'''Examples'''
_filterIn="3Forge" (ignore any lines that don't contain the word ''3Forge'')
+
 
 +
<span style="font-family: Courier New; color: blue;">_filterIn="3Forge"</span> (ignore any lines that don't contain the word ''3Forge'')
  
_filterIn="^Outgoing" (ignore any lines that don't start with ''Outgoing'')
+
<span style="font-family: Courier New; color: blue;">_filterIn="^Outgoing"</span> (ignore any lines that don't start with ''Outgoing'')
  
_filterIn="Data(.*)" (ignore any lines that don't start with ''Data,'' and only consider  the text after the word ''Data'' for processing)
+
<span style="font-family: Courier New; color: blue;">_filterIn="Data(.*)"</span> (ignore any lines that don't start with ''Data,'' and only consider  the text after the word ''Data'' for processing)
  
 
= Shell Command Reader =
 
= Shell Command Reader =
Line 399: Line 414:
 
4. In the Add datasource dialog:
 
4. In the Add datasource dialog:
  
Name: Supply a user defined Name, ex: ''MyShell''
+
'''Name''': Supply a user defined Name, ex: ''MyShell''
  
URL: /full/path/to/path/of/working/directory (ex: ''/home/myuser/files'' )
+
'''URL''': /full/path/to/path/of/working/directory (ex: ''/home/myuser/files'' )
  
 
(Keep in mind that the path is on the machine running AMI, not necessarily your local desktop)
 
(Keep in mind that the path is on the machine running AMI, not necessarily your local desktop)
Line 407: Line 422:
 
5. Click "Add Datasource" Button
 
5. Click "Add Datasource" Button
  
'''Running Commands Remotely''': You can execute commands on remote machines as well using an ''AMI Relay''. First install an AMI relay on the machine that the command should be executed on ( See ''AMI for the Enterpise'' documentation for details on how to install an AMI relay).  Then in the Add Datasource wizard select the relay in the "Relay To Run On" dropdown.
+
'''Running Commands Remotely''': You can execute commands on remote machines as well using an ''AMI Relay''. First install an AMI relay on the machine that the command should be executed on ( See ''AMI for the Enterprise'' documentation for details on how to install an AMI relay).  Then in the Add Datasource wizard select the relay in the "Relay To Run On" dropdown.
  
 
== General Directives ==
 
== General Directives ==
  
 
=== Command Directive (Required) ===
 
=== Command Directive (Required) ===
 +
'''Syntax'''
 +
 +
<span style="font-family: Courier New; color: blue;">_cmd="''command to run"''</span>
  
==== Syntax ====
+
'''Overview'''
_cmd="''command to run"''
 
  
==== Overview ====
 
 
This directive controls the command to execute.
 
This directive controls the command to execute.
  
==== Examples ====
+
'''Examples'''
_cmd="ls -lrt" (execute ls -lrt)
 
  
_cmd="ls | sort" (execute ls and pipe that into sort)
+
<span style="font-family: Courier New; color: blue;">_cmd="ls -lrt"</span> (execute ls -lrt)
  
_cmd="dir /s" (execute dir on a windows system)
+
<span style="font-family: Courier New; color: blue;">_cmd="ls | sort"</span> (execute ls and pipe that into sort)
 +
 
 +
<span style="font-family: Courier New; color: blue;">_cmd="dir /s"</span> (execute dir on a windows system)
  
 
=== Supplying Standard Input (Optional) ===
 
=== Supplying Standard Input (Optional) ===
 +
'''Syntax'''
  
==== Syntax ====
+
<span style="font-family: Courier New; color: blue;">_stdin="''text to pipe into stdin"''</span>
_stdin="''text to pipe into stdin"''
+
 
 +
'''Overview'''
  
==== Overview ====
 
 
This directive is used to control what data is piped into the standard in (stdin) of the process to run.                                                                
 
This directive is used to control what data is piped into the standard in (stdin) of the process to run.                                                                
  
==== Examples ====
+
'''Example'''
_cmd="cat > out.txt" _stdin="hello world" (will pipe "hello world" into out.txt)
+
 
 +
<span style="font-family: Courier New; color: blue;">_cmd="cat > out.txt" _stdin="hello world"</span> (will pipe "hello world" into out.txt)
  
 
=== Controlling what is captured from the Process (Optional) ===
 
=== Controlling what is captured from the Process (Optional) ===
 +
'''Syntax'''
  
==== Syntax ====
+
<span style="font-family: Courier New; color: blue;">_capture="comma_delimited_list"</span> (''default is stdout,stderr,exitCode'')  
_capture="comma_delimited_list" (''default is stdout,stderr,exitCode'')  
+
 
 +
'''Overview'''
  
==== Overview                                                         ====
 
 
This directive is used to control what output data from running the command is captured. It is a comma delimited list and the order determines what order the tables are returned in.  An empty list ("") indicates that nothing will be captured (the command is executed, and all output is ignored). Options include the following:
 
This directive is used to control what output data from running the command is captured. It is a comma delimited list and the order determines what order the tables are returned in.  An empty list ("") indicates that nothing will be captured (the command is executed, and all output is ignored). Options include the following:
  
* stdout - Capture standard out from the process
+
* <span style="font-family: Courier New; color: blue;">stdout</span> - Capture standard out from the process
* stderr - Capture standard error from the process
+
* <span style="font-family: Courier New; color: blue;">stderr</span> - Capture standard error from the process
* exitCOde - Capture the exit code from the process                                                                                                    
+
* <span style="font-family: Courier New; color: blue;">exitCode</span> - Capture the exit code from the process                                                                                                    
 +
'''Examples'''
  
==== Examples ====
+
<span style="font-family: Courier New; color: blue;">_capture="exitCode,stdout"</span> (the 1<sup>st</sup> table will contain the exit code, 2<sup>nd</sup> will contain stdout)
_capture="exitCode,stdout" (the 1<sup>st</sup> table will contain the exit code, 2<sup>nd</sup> will contain stdout)
 
  
 
=== Field Definitions Directive (Required) ===
 
=== Field Definitions Directive (Required) ===
 +
'''Syntax'''
  
==== Syntax ====
+
<span style="font-family: Courier New; color: blue;">_fields=''col1_type col_name, col2_type col2_name, ...''</span>
_fields=''col1_type col_name, col2_type col2_name, ...''
+
 
 +
'''Overview'''
  
==== Overview ====
 
 
This directive controls the Column names that will be returned, along with their types.  The order in which they are defined is the same as the order in which they are returned.  If the column type is not supplied, the default is String. Special note on additional columns: If the line number (see _''linenum'' directive) column is not supplied in the list, it will default to type integer and be added to the end of the table schema.  Columns defined in the Pattern (see _''pattern'' directive) but not defined in _''fields'' will be added to the end of the table schema.
 
This directive controls the Column names that will be returned, along with their types.  The order in which they are defined is the same as the order in which they are returned.  If the column type is not supplied, the default is String. Special note on additional columns: If the line number (see _''linenum'' directive) column is not supplied in the list, it will default to type integer and be added to the end of the table schema.  Columns defined in the Pattern (see _''pattern'' directive) but not defined in _''fields'' will be added to the end of the table schema.
  
Types should be one of: ''String, Long, Integer, Boolean, Double, Float, UTC''
+
Types should be one of: <span style="font-family: Courier New; color: blue;">''String, Long, Integer, Boolean, Double, Float, UTC''</span>
  
 
Column names must be valid variable names.
 
Column names must be valid variable names.
  
==== Examples ====
+
'''Examples'''
_fields="String account,Long quantity" (define two columns)
 
  
_fields ="fname,lname,int age" (define 3 columns, ''fname'' and ''lname'' default to String)
+
<span style="font-family: Courier New; color: blue;">_fields="String account,Long quantity"</span> (define two columns)
 +
 
 +
<span style="font-family: Courier New; color: blue;">_fields ="fname,lname,int age"</span> (define 3 columns, ''fname'' and ''lname'' default to String)
  
 
=== Environment Directive (Optional) ===
 
=== Environment Directive (Optional) ===
 +
'''Syntax'''
  
==== Syntax ====
+
<span style="font-family: Courier New; color: blue;">_env="''key=value,key=value,...''"</span> (Optional. Default is false)  
_env="''key=value,key=value,...''" (Optional. Default is false)  
+
 
 +
'''Overview'''
  
==== Overview ====
 
 
This directive controls what environment variables are set when running a command
 
This directive controls what environment variables are set when running a command
  
==== Examples ====
+
'''Example'''
_env="name=Rob,Location=NY"
+
 
 +
<span style="font-family: Courier New; color: blue;">_env="name=Rob,Location=NY"</span>
  
 
=== Use Host Environment Directive (Optional) ===
 
=== Use Host Environment Directive (Optional) ===
 +
'''Syntax'''
  
==== Syntax ====
+
<span style="font-family: Courier New; color: blue;">_useHostEnv=true|false</span> (Optional. Default is false)  
_useHostEnv=true|false (Optional. Default is false)  
+
 
 +
'''Overview'''
  
==== Overview ====
 
 
If true, then the environment properties of the Ami process executing the command are passed to the shell.  Please note, that _''env'' values can be used to override specific environment variables.
 
If true, then the environment properties of the Ami process executing the command are passed to the shell.  Please note, that _''env'' values can be used to override specific environment variables.
  
==== Examples ====
+
'''Example'''
_useHostEnv="true"
+
 
 +
<span style="font-family: Courier New; color: blue;">_useHostEnv="true"</span>
  
 
== Directives for parsing Delimited list of ordered Fields ==
 
== Directives for parsing Delimited list of ordered Fields ==
_cmd=''command_to_execute'' ('''Required''', see general directives)  
+
<span style="font-family: Courier New; color: blue;">_cmd=''command_to_execute''</span> ('''Required''', see general directives)  
  
_fields=''col1_type col1_name, ...'' ('''Required''', see general directives)  
+
<span style="font-family: Courier New; color: blue;">_fields=''col1_type col1_name, ...''</span> ('''Required''', see general directives)  
  
_delim=delim_string ('''Required''')  
+
<span style="font-family: Courier New; color: blue;">_delim=delim_string</span> ('''Required''')  
  
_conflateDelim=true|false (Optional. Default is false)  
+
<span style="font-family: Courier New; color: blue;">_conflateDelim=true|false</span> (Optional. Default is false)  
  
''_quote=single_quote_char'' (Optional)  
+
<span style="font-family: Courier New; color: blue;">''_quote=single_quote_char''</span> (Optional)  
  
''_escape=single_escape_char'' (Optional)  
+
<span style="font-family: Courier New; color: blue;">''_escape=single_escape_char''</span> (Optional)  
  
 
The _''delim'' indicates the char (or chars) used to separate each field (If _''conflateDelim'' is true, then 1 or more consecutive delimiters are treated as a single delimiter). The _''fields'' is an ordered list of types and field names for each of the delimited fields. If the _''quote'' is supplied, then a field value starting with ''quote'' will be read until another ''quote'' char is found, meaning ''delim''s within ''quotes'' will not be treated as delims.  If the _e''scape'' char is supplied then when an ''escape'' char is read, it is skipped and the following char is read as a literal.
 
The _''delim'' indicates the char (or chars) used to separate each field (If _''conflateDelim'' is true, then 1 or more consecutive delimiters are treated as a single delimiter). The _''fields'' is an ordered list of types and field names for each of the delimited fields. If the _''quote'' is supplied, then a field value starting with ''quote'' will be read until another ''quote'' char is found, meaning ''delim''s within ''quotes'' will not be treated as delims.  If the _e''scape'' char is supplied then when an ''escape'' char is read, it is skipped and the following char is read as a literal.
  
=== Examples ===
+
'''Examples'''
_delim="|"
+
 
 +
<span style="font-family: Courier New; color: blue;">_delim="|"</span>
  
_fields="code,lname,int age"
+
<span style="font-family: Courier New; color: blue;">_fields="code,lname,int age"</span>
  
_quote="'"
+
<span style="font-family: Courier New; color: blue;">_quote="'"</span>
  
_escape="\\"
+
<span style="font-family: Courier New; color: blue;">_escape="\\"</span>
  
 
This defines a pattern such that:
 
This defines a pattern such that:
Line 545: Line 572:
  
 
== Directives for parsing Key Value Pairs ==
 
== Directives for parsing Key Value Pairs ==
_cmd=''command_to_execute'' ('''Required''', see general directives)  
+
<span style="font-family: Courier New; color: blue;">_cmd=''command_to_execute''</span> ('''Required''', see general directives)  
  
_fields=''col1_type col1_name, ...'' ('''Required''', see general directives)  
+
<span style="font-family: Courier New; color: blue;">_fields=''col1_type col1_name, ...''</span> ('''Required''', see general directives)  
  
_delim=delim_string ('''Required''')  
+
<span style="font-family: Courier New; color: blue;">_delim=delim_string</span> ('''Required''')  
  
_conflateDelim=true|false (Optional. Default is false)  
+
<span style="font-family: Courier New; color: blue;">_conflateDelim=true|false</span> (Optional. Default is false)  
  
''_equals=single_equals_char'' ('''Required''')
+
<span style="font-family: Courier New; color: blue;">''_equals=single_equals_char''</span> ('''Required''')
  
_mappings=from1=to1,from2=to2,... (Optional)  
+
<span style="font-family: Courier New; color: blue;">_mappings=from1=to1,from2=to2,...</span> (Optional)  
  
''_quote=single_quote_char'' (Optional)  
+
<span style="font-family: Courier New; color: blue;">''_quote=single_quote_char''</span> (Optional)  
  
''_escape=single_escape_char'' (Optional)  
+
<span style="font-family: Courier New; color: blue;">''_escape=single_escape_char''</span> (Optional)  
  
 
The _''delim'' indicates the char (or chars) used to separate each field (If _''conflateDelim'' is true, then 1 or more consecutive delimiters are treated as a single delimiter). The _equals char is used to indicate the key/value separator. The _''fields'' is an ordered list of types and field names for each of the delimited fields. If the _''quote'' is supplied, then a field value starting with ''quote'' will be read until another ''quote'' char is found, meaning ''delim''s within ''quotes'' will not be treated as delims.  If the _''escape'' char is supplied then when an ''escape'' char is read, it is skipped and the following char is read as a literal.
 
The _''delim'' indicates the char (or chars) used to separate each field (If _''conflateDelim'' is true, then 1 or more consecutive delimiters are treated as a single delimiter). The _equals char is used to indicate the key/value separator. The _''fields'' is an ordered list of types and field names for each of the delimited fields. If the _''quote'' is supplied, then a field value starting with ''quote'' will be read until another ''quote'' char is found, meaning ''delim''s within ''quotes'' will not be treated as delims.  If the _''escape'' char is supplied then when an ''escape'' char is read, it is skipped and the following char is read as a literal.
Line 565: Line 592:
 
The optional _m''appings'' directive allows you to map keys within the output to field names specified in the _''fields'' directive. This is useful when the output has key names that are not valid field names, or the output has multiple key names that should be used to populate the same column.
 
The optional _m''appings'' directive allows you to map keys within the output to field names specified in the _''fields'' directive. This is useful when the output has key names that are not valid field names, or the output has multiple key names that should be used to populate the same column.
  
=== Examples ===
+
'''Examples'''
_delim="|"
 
  
_equals="="
+
<span style="font-family: Courier New; color: blue;">_delim="|"</span>
  
_fields="code,lname,int age"
+
<span style="font-family: Courier New; color: blue;">_equals="="</span>
  
_mappings="20=code,21=lname,22=age"
+
<span style="font-family: Courier New; color: blue;">_fields="code,lname,int age"</span>
  
_quote="'"
+
<span style="font-family: Courier New; color: blue;">_mappings="20=code,21=lname,22=age"</span>
  
_escape="\\"
+
<span style="font-family: Courier New; color: blue;">_quote="'"</span>
 +
 
 +
<span style="font-family: Courier New; color: blue;">_escape="\\"</span>
  
 
This defines a pattern such that:
 
This defines a pattern such that:
Line 606: Line 634:
  
 
== Directives for Pattern Capture ==
 
== Directives for Pattern Capture ==
_cmd=''command_to_execute'' ('''Required''', see general directives)  
+
<span style="font-family: Courier New; color: blue;">_cmd=''command_to_execute''</span> ('''Required''', see general directives)  
  
_fields=''col1_type col1_name, ...'' (Optional, see general directives)  
+
<span style="font-family: Courier New; color: blue;">_fields=''col1_type col1_name, ...''</span> (Optional, see general directives)  
  
_''pattern''=''col1_type col1_name, ...=regex_with_grouping'' ('''Required''')
+
<span style="font-family: Courier New; color: blue;">_''pattern''=''col1_type col1_name, ...=regex_with_grouping''</span> ('''Required''')
  
 
The ''_pattern'' must start with a list of column names, followed by an equal sign (=) and then a regular expression with grouping (this is dubbed a column-to-pattern mapping).  The regular expression's first grouping value will be mapped to the first column, 2<sup>nd</sup> grouping to the second and so on.
 
The ''_pattern'' must start with a list of column names, followed by an equal sign (=) and then a regular expression with grouping (this is dubbed a column-to-pattern mapping).  The regular expression's first grouping value will be mapped to the first column, 2<sup>nd</sup> grouping to the second and so on.
Line 618: Line 646:
 
For multiple column-to-pattern mappings, use the \n (new line) to separate each one.
 
For multiple column-to-pattern mappings, use the \n (new line) to separate each one.
  
=== Examples ===
+
'''Examples'''
 +
 
 
<span style="font-family: Courier New;color: blue;">_pattern="fname,lname,int age=User (.*) (.*) is (.*) years old"</span>
 
<span style="font-family: Courier New;color: blue;">_pattern="fname,lname,int age=User (.*) (.*) is (.*) years old"</span>
  
Line 670: Line 699:
  
 
=== Skipping Lines Directive (optional) ===
 
=== Skipping Lines Directive (optional) ===
 +
'''Syntax'''
  
==== Syntax ====
+
<span style="font-family: Courier New; color: blue;">_skipLines=''number_of_lines''</span>
_skipLines=''number_of_lines''
+
 
 +
'''Overview'''
  
==== Overview ====
 
 
This directive controls the number of lines to skip from the top of the output. This is useful for ignoring a head at the top of output. If not supplied, then no lines are skipped. From a performance standpoint, skipping lines is highly efficient.
 
This directive controls the number of lines to skip from the top of the output. This is useful for ignoring a head at the top of output. If not supplied, then no lines are skipped. From a performance standpoint, skipping lines is highly efficient.
  
==== Examples ====
+
'''Examples'''
_skipLines="0"     (this is the default, don't skip any lines)
 
  
_skipLines="1"      (skip the first line, for example if there is a header)
+
<span style="font-family: Courier New; color: blue;">_skipLines="0"</span>     (this is the default, don't skip any lines)
 +
 
 +
<span style="font-family: Courier New; color: blue;">_skipLines="1"</span>      (skip the first line, for example if there is a header)
  
 
=== Line Number Column Directive (optional) ===
 
=== Line Number Column Directive (optional) ===
 +
'''Syntax'''
  
==== Syntax ====
+
<span style="font-family: Courier New; color: blue;">_linenum=''column_name''</span>
_linenum=''column_name''
+
 
 +
'''Overview'''
  
==== Overview ====
 
 
This directive controls the name of the column that will contain the line number. If not supplied, the default is "linenum". Notes about the line number: The first line is line number 1, and skipped/filtered out lines are still considered in numbering. For example, if the _''skipLines''=2 , then the first line will have a line number of 3.
 
This directive controls the name of the column that will contain the line number. If not supplied, the default is "linenum". Notes about the line number: The first line is line number 1, and skipped/filtered out lines are still considered in numbering. For example, if the _''skipLines''=2 , then the first line will have a line number of 3.
  
==== Examples ====
+
'''Examples'''
_linenum="" (A line number column is not included in the table)
 
  
_linenum="linenum" (The column ''linenum'' will contain line numbers, this is the default)
+
<span style="font-family: Courier New; color: blue;">_linenum=""</span> (A line number column is not included in the table)
  
_linenum="rownum" (The column ''rownum'' will contain line numbers)
+
<span style="font-family: Courier New; color: blue;">_linenum="linenum"</span> (The column ''linenum'' will contain line numbers, this is the default)
 +
 
 +
<span style="font-family: Courier New; color: blue;">_linenum="rownum"</span> (The column ''rownum'' will contain line numbers)
  
 
== Optional Line Filtering Directives ==
 
== Optional Line Filtering Directives ==
  
 
=== Filtering Out Lines Directive (optional) ===
 
=== Filtering Out Lines Directive (optional) ===
 +
'''Syntax'''
 +
 +
<span style="font-family: Courier New; color: blue;">_filterOut=''regex''</span>
  
==== Syntax ====
+
'''Overview'''
_filterOut=''regex''
 
  
==== Overview ====
 
 
Any line that matches the supplied regular expression will be ignored. If not supplied, then no lines are filtered out. From a Performance standpoint, this is applied before other parsing is considered, so ignoring lines using a filter out directive is faster, as opposed to using a WHERE clause, for example.
 
Any line that matches the supplied regular expression will be ignored. If not supplied, then no lines are filtered out. From a Performance standpoint, this is applied before other parsing is considered, so ignoring lines using a filter out directive is faster, as opposed to using a WHERE clause, for example.
  
==== Examples ====
+
'''Examples'''
_filterOut="Test" (ignore any lines containing the text ''Test'')
+
 
 +
<span style="font-family: Courier New; color: blue;">_filterOut="Test"</span> (ignore any lines containing the text ''Test'')
  
_filterOut="^Comment" (ignore any lines starting with ''Comment'')
+
<span style="font-family: Courier New; color: blue;">_filterOut="^Comment"</span> (ignore any lines starting with ''Comment'')
  
_filterOut="This|That" (ignore any lines containing the text ''This'' or ''That'')
+
<span style="font-family: Courier New; color: blue;">_filterOut="This|That"</span> (ignore any lines containing the text ''This'' or ''That'')
  
 
=== Filtering In Lines Directive (optional) ===
 
=== Filtering In Lines Directive (optional) ===
 +
'''Syntax'''
  
==== Syntax ====
+
<span style="font-family: Courier New; color: blue;">_filterIn=''regex''</span>
_filterIn=''regex''
+
 
 +
'''Overview'''
  
==== Overview ====
 
 
Only lines that match the supplied regular expression will be considered. If not supplied, then all lines are considered. From a Performance standpoint, this is applied before other parsing is considered, so narrowing down the lines considered using a filter in directive is faster, as opposed to using a WHERE clause, for example. If you use a grouping (..) inside the regular expression, then only the contents of the first grouping will be considered for parsing
 
Only lines that match the supplied regular expression will be considered. If not supplied, then all lines are considered. From a Performance standpoint, this is applied before other parsing is considered, so narrowing down the lines considered using a filter in directive is faster, as opposed to using a WHERE clause, for example. If you use a grouping (..) inside the regular expression, then only the contents of the first grouping will be considered for parsing
  
==== Examples ====
+
'''Examples'''
_filterIn="3Forge" (ignore any lines that don't contain the word ''3Forge'')
+
 
 +
<span style="font-family: Courier New; color: blue;">_filterIn="3Forge"</span> (ignore any lines that don't contain the word ''3Forge'')
 +
 
 +
<span style="font-family: Courier New; color: blue;">_filterIn="^Outgoing"</span> (ignore any lines that don't start with ''Outgoing'')
 +
 
 +
<span style="font-family: Courier New; color: blue;">_filterIn="Data(.*)"</span> (ignore any lines that don't start with ''Data,'' and only consider  the text after the word ''Data'' for processing)
 +
= SSH Adapter =
 +
 
 +
== Overview ==
 +
The AMI SSH Datasource Adapter is a highly configurable adapter designed to execute shell commands and capture the stdout, stderr and exitcode on remote hosts via the secure ssh protocol. There are a number of directives which can be used to control how the command is executed, including supplying data to be passed to stdin. The adapter processes the output from the command. Each line (delineated by a Line feed) is considered independently for parsing. Note the EXECUTE <sql> clause supports the full AMI sql language.
 +
 
 +
Please note, that running the command will produce 3 tables:
 +
 
 +
* Stdout - Contains the contents of standard out
 +
* Stderr - Contains the contents from standard err
 +
* exitCode - Contains the executed code of the process
 +
 
 +
(You can limit which tables are returned using the ''_include'' directive)
 +
 
 +
Generally speaking, the parser can handle '''four''' ('''4''') different methods of parsing:
 +
 
 +
=== Delimited list or ordered fields ===
 +
Example data and query:  
 +
 
 +
''11232''|''1000''|''123.20''
 +
 
 +
''12412''|''8900''|''430.90''<syntaxhighlight lang="sql">
 +
CREATE TABLE mytable AS USE _cmd="my_cmd" _delim="|"
 +
_fields="String account, Integer qty, Double px"
 +
EXECUTE SELECT * FROM cmd
 +
 
 +
 
 +
</syntaxhighlight>
 +
 
 +
=== Key value pairs ===
 +
Example data and query:
 +
 
 +
account=''11232''|quantity=''1000''|price=''123.20''
 +
 
 +
account=''12412''|quantity=''8900''|price=''430.90''<syntaxhighlight lang="sql">
 +
CREATE TABLE mytable AS USE _cmd="my_cmd" _delim="|" _equals="="
 +
_fields="String account, Integer qty, Double px"
 +
EXECUTE SELECT * FROM cmd
 +
</syntaxhighlight>
 +
 
 +
=== Pattern Capture ===
 +
Example data and query:
 +
 
 +
Account ''11232'' has ''1000'' shares at $''123.20'' px
 +
 
 +
Account ''12412'' has ''8900'' shares at $''430.90'' px<syntaxhighlight lang="sql">
 +
CREATE TABLE mytable AS USE _cmd="my_cmd"
 +
_fields="String account, Integer qty, Double px"
 +
_pattern="account,qty,px=Account (.*) has (.*) shares at \\$(.*) px"
 +
EXECUTE SELECT * FROM cmd
 +
</syntaxhighlight>
 +
 
 +
=== Raw Line ===
 +
If you do not specify a _''fields'', _''mapping'' nor _''pattern'' directive then the parser defaults to a simple row-per-line parser. A "line" column is generated containing the entire contents of each line from the command's output<syntaxhighlight lang="sql">
 +
CREATE TABLE mytable AS USE _cmd="my_cmd" EXECUTE SELECT * FROM cmd
 +
</syntaxhighlight>
 +
 
 +
== Configuring the Adapter for first use ==
 +
1. Open the '''datamodeler''' (In Developer Mode -> Menu Bar -> Dashboard -> Datamodel)
 +
 
 +
2. Choose the "'''Add Datasource'''" button
 +
 
 +
3. Choose '''SSH Command''' adapter
 +
 
 +
4. In the Add datasource dialog:
 +
 
 +
'''Name''': Supply a user defined Name, ex: ''MyShell''
 +
 
 +
'''URL''': ''hostname'' or ''hostname:port''
 +
 
 +
'''Username''': the name of the ssh user to login as
 +
 
 +
'''Password''': the password of the ssh user to login as
 +
 
 +
Options: See below, note when using multiple options they should be comma delimited
 +
 
 +
* For servers requiring keyboard interactive authentication:  ''authMode=keyboardInteractive''
 +
* To use a public/private key for authentication: ''publicKeyFile=/path/to/key/file''  ''(Note this is often /your_home_dir/.ssh/id_rsa)''
 +
* To request a dumb pty connection: ''useDumbPty=true''     
 +
 
 +
5. Click "Add Datasource" Button
 +
 
 +
'''Running Commands Remotely''': You can execute commands on remote machines as well using an ''AMI Relay''. First install an AMI relay on the machine that the command should be executed on ( See ''AMI for the Enterprise'' documentation for details on how to install an AMI relay).  Then in the Add Datasource wizard select the relay in the "Relay To Run On" dropdown.
 +
 
 +
== General Directives ==
 +
 
 +
=== Command Directive (Required) ===
 +
'''Syntax'''
 +
 
 +
<span style="font-family: Courier New; color: blue;">_cmd="''command to run"''</span>
 +
 
 +
'''Overview'''
 +
 
 +
This directive controls the command to execute.
 +
 
 +
'''Examples'''
 +
 
 +
<span style="font-family: Courier New; color: blue;">_cmd="ls -lrt"</span> (execute ls -lrt)
 +
 
 +
<span style="font-family: Courier New; color: blue;">_cmd="ls | sort"</span> (execute ls and pipe that into sort)
 +
 
 +
<span style="font-family: Courier New; color: blue;">_cmd="dir /s"</span> (execute dir on a windows system)
 +
 
 +
=== Supplying Standard Input (Optional) ===
 +
'''Syntax'''
 +
 
 +
<span style="font-family: Courier New; color: blue;">_stdin="''text to pipe into stdin"''</span>
 +
 
 +
'''Overview'''
 +
 
 +
This directive is used to control what data is piped into the standard in (stdin) of the process to run.                                                                
 +
 
 +
'''Example'''
 +
 
 +
<span style="font-family: Courier New; color: blue;">_cmd="cat > out.txt" _stdin="hello world"</span> (will pipe "hello world" into out.txt)
 +
 
 +
=== Controlling what is captured from the Process (Optional) ===
 +
'''Syntax'''
 +
 
 +
<span style="font-family: Courier New; color: blue;">_capture="comma_delimited_list"</span> (''default is stdout,stderr,exitCode'')
 +
 
 +
'''Overview'''
 +
 
 +
This directive is used to control what output data from running the command is captured. It is a comma delimited list and the order determines what order the tables are returned in.  An empty list ("") indicates that nothing will be captured (the command is executed, and all output is ignored). Options include the following:
 +
 
 +
*<span style="font-family: Courier New; color: blue;">stdout</span> - Capture standard out from the process
 +
*<span style="font-family: Courier New; color: blue;">stderr</span> - Capture standard error from the process
 +
*<span style="font-family: Courier New; color: blue;">exitCode</span> - Capture the exit code from the process                                                                                                    
 +
'''Example'''
 +
 
 +
<span style="font-family: Courier New; color: blue;">_capture="exitCode,stdout"</span> (the 1<sup>st</sup> table will contain the exit code, 2<sup>nd</sup> will contain stdout)
 +
 
 +
=== Field Definitions Directive (Required) ===
 +
'''Syntax'''
 +
 
 +
<span style="font-family: Courier New; color: blue;">_fields=''col1_type col_name, col2_type col2_name, ...''</span>
 +
 
 +
'''Overview'''
 +
 
 +
This directive controls the Column names that will be returned, along with their types.  The order in which they are defined is the same as the order in which they are returned.  If the column type is not supplied, the default is String. Special note on additional columns: If the line number (see _''linenum'' directive) column is not supplied in the list, it will default to type integer and be added to the end of the table schema.  Columns defined in the Pattern (see _''pattern'' directive) but not defined in _''fields'' will be added to the end of the table schema.
 +
 
 +
Types should be one of: <span style="font-family: Courier New; color: blue;">''String, Long, Integer, Boolean, Double, Float, UTC''</span>
 +
 
 +
Column names must be valid variable names.
 +
 
 +
'''Examples'''
 +
 
 +
<span style="font-family: Courier New; color: blue;">_fields="String account,Long quantity"</span> (define two columns)
 +
 
 +
<span style="font-family: Courier New; color: blue;">_fields ="fname,lname,int age"</span> (define 3 columns, ''fname'' and ''lname'' default to String)
 +
 
 +
=== Environment Directive (Optional) ===
 +
'''Syntax'''
 +
 
 +
<span style="font-family: Courier New; color: blue;">_env="''key=value,key=value,...''"</span> (Optional. Default is false)
 +
 
 +
'''Overview'''
 +
 
 +
This directive controls what environment variables are set when running a command
 +
 
 +
'''Example'''
 +
 
 +
<span style="font-family: Courier New; color: blue;">_env="name=Rob,Location=NY"</span>
 +
 
 +
=== Use Host Environment Directive (Optional) ===
 +
'''Syntax'''
 +
 
 +
<span style="font-family: Courier New; color: blue;">_useHostEnv=true|false</span> (Optional. Default is false)
 +
 
 +
'''Overview'''
 +
 
 +
If true, then the environment properties of the Ami process executing the command are passed to the shell.  Please note, that _''env'' values can be used to override specific environment variables.
 +
 
 +
'''Example'''
 +
 
 +
<span style="font-family: Courier New; color: blue;">_useHostEnv="true"</span>
 +
 
 +
== Directives for parsing Delimited list of ordered Fields ==
 +
<span style="font-family: Courier New; color: blue;">_cmd=''command_to_execute''</span> ('''Required''', see general directives)
 +
 
 +
<span style="font-family: Courier New; color: blue;">_fields=''col1_type col1_name, ...''</span> ('''Required''', see general directives)
 +
 
 +
<span style="font-family: Courier New; color: blue;">_delim=delim_string</span> ('''Required''')
 +
 
 +
<span style="font-family: Courier New; color: blue;">_conflateDelim=true|false</span> (Optional. Default is false)
 +
 
 +
<span style="font-family: Courier New; color: blue;">''_quote=single_quote_char''</span> (Optional)
 +
 
 +
<span style="font-family: Courier New; color: blue;">''_escape=single_escape_char''</span> (Optional)
 +
 
 +
The _''delim'' indicates the char (or chars) used to separate each field (If _''conflateDelim'' is true, then 1 or more consecutive delimiters are treated as a single delimiter). The _''fields'' is an ordered list of types and field names for each of the delimited fields. If the _''quote'' is supplied, then a field value starting with ''quote'' will be read until another ''quote'' char is found, meaning ''delim''s within ''quotes'' will not be treated as delims.  If the _e''scape'' char is supplied then when an ''escape'' char is read, it is skipped and the following char is read as a literal.
 +
 
 +
'''Examples'''
 +
 
 +
<span style="font-family: Courier New; color: blue;">_delim="|"</span>
 +
 
 +
<span style="font-family: Courier New; color: blue;">_fields="code,lname,int age"</span>
 +
 
 +
<span style="font-family: Courier New; color: blue;">_quote="'"</span>
 +
 
 +
<span style="font-family: Courier New; color: blue;">_escape="\\"</span>
 +
 
 +
This defines a pattern such that:
 +
 
 +
''11232-33|Smith|20''
 +
 
 +
''<nowiki/>'1332|ABC'||30''
 +
 
 +
''<nowiki/>''
 +
 
 +
''Account\|112|Jones|18''
 +
 
 +
Maps to:
 +
{| class="wikitable"
 +
!code
 +
!lname
 +
!age
 +
|-
 +
|11232-33
 +
|Smith
 +
|20
 +
|-
 +
|<nowiki>1332|ABC</nowiki>
 +
|
 +
|30
 +
|-
 +
|<nowiki>Account|112</nowiki>
 +
|Jones
 +
|18
 +
|}
 +
 
 +
== Directives for parsing Key Value Pairs ==
 +
<span style="font-family: Courier New; color: blue;">_cmd=''command_to_execute''</span> ('''Required''', see general directives)
 +
 
 +
<span style="font-family: Courier New; color: blue;">_fields=''col1_type col1_name, ...''</span> ('''Required''', see general directives)
 +
 
 +
<span style="font-family: Courier New; color: blue;">_delim=delim_string</span> ('''Required''')
 +
 
 +
<span style="font-family: Courier New; color: blue;">_conflateDelim=true|false</span> (Optional. Default is false)
 +
 
 +
<span style="font-family: Courier New; color: blue;">''_equals=single_equals_char''</span> ('''Required''')
 +
 
 +
<span style="font-family: Courier New; color: blue;">_mappings=from1=to1,from2=to2,...</span> (Optional)
 +
 
 +
<span style="font-family: Courier New; color: blue;">''_quote=single_quote_char''</span> (Optional)
 +
 
 +
<span style="font-family: Courier New; color: blue;">''_escape=single_escape_char''</span> (Optional)
 +
 
 +
The _''delim'' indicates the char (or chars) used to separate each field (If _''conflateDelim'' is true, then 1 or more consecutive delimiters are treated as a single delimiter). The _equals char is used to indicate the key/value separator. The _''fields'' is an ordered list of types and field names for each of the delimited fields. If the _''quote'' is supplied, then a field value starting with ''quote'' will be read until another ''quote'' char is found, meaning ''delim''s within ''quotes'' will not be treated as delims.  If the _''escape'' char is supplied then when an ''escape'' char is read, it is skipped and the following char is read as a literal.
 +
 
 +
The optional _m''appings'' directive allows you to map keys within the output to field names specified in the _''fields'' directive. This is useful when the output has key names that are not valid field names, or the output has multiple key names that should be used to populate the same column.
 +
 
 +
'''Examples'''
 +
 
 +
<span style="font-family: Courier New; color: blue;">_delim="|"</span>
 +
 
 +
<span style="font-family: Courier New; color: blue;">_equals="="</span>
 +
 
 +
<span style="font-family: Courier New; color: blue;">_fields="code,lname,int age"</span>
 +
 
 +
<span style="font-family: Courier New; color: blue;">_mappings="20=code,21=lname,22=age"</span>
 +
 
 +
<span style="font-family: Courier New; color: blue;">_quote="'"</span>
 +
 
 +
<span style="font-family: Courier New; color: blue;">_escape="\\"</span>
 +
 
 +
This defines a pattern such that:
 +
 
 +
''code=11232-33|lname=Smith|age=20''
 +
 
 +
''code='1332|ABC'|age=30''
 +
 
 +
''20=Act\|112|21=J|22=18 (Note: this row will work due to the _mappings directive)''
 +
 
 +
Maps to:
 +
{| class="wikitable"
 +
!code
 +
!lname
 +
!age
 +
|-
 +
|11232-33
 +
|Smith
 +
|20
 +
|-
 +
|<nowiki>1332|ABC</nowiki>
 +
|
 +
|30
 +
|-
 +
|<nowiki>Act|112</nowiki>
 +
|J
 +
|18
 +
|}
 +
 
 +
== Directives for Pattern Capture ==
 +
<span style="font-family: Courier New; color: blue;">_cmd=''command_to_execute''</span> ('''Required''', see general directives)
 +
 
 +
<span style="font-family: Courier New; color: blue;">_fields=''col1_type col1_name, ...''</span> (Optional, see general directives)
 +
 
 +
<span style="font-family: Courier New; color: blue;">_''pattern''=''col1_type col1_name, ...=regex_with_grouping''</span> ('''Required''')
 +
 
 +
The ''_pattern'' must start with a list of column names, followed by an equal sign (=) and then a regular expression with grouping (this is dubbed a column-to-pattern mapping).  The regular expression's first grouping value will be mapped to the first column, 2<sup>nd</sup> grouping to the second and so on.
 +
 
 +
If a column is already defined in the _fields directive, then it's preferred to not include the column type in the ''_pattern'' definition.
 +
 
 +
For multiple column-to-pattern mappings, use the \n (new line) to separate each one.
 +
 
 +
'''Examples'''
 +
 
 +
<span style="font-family: Courier New;color: blue;">_pattern="fname,lname,int age=User (.*) (.*) is (.*) years old"</span>
 +
 
 +
This defines a pattern such that:
 +
 
 +
''User John Smith is 20 years old''
 +
 
 +
''User Bobby Boy is 30 years old''
 +
 
 +
Maps to:
 +
{| class="wikitable"
 +
!fname
 +
!lname
 +
!age
 +
|-
 +
|John
 +
|Smith
 +
|20
 +
|-
 +
|Bobby
 +
|Boy
 +
|30
 +
|}
 +
<span style="font-family: Courier New; color: blue;">_pattern="fname,lname,int age=User (.*) (.*) is (.*) years old\n lname,fname,int weight=Customer (.*),(.*) weighs (.*) pounds"</span>
 +
 
 +
This defines two patterns such that:
 +
 
 +
''User John Smith is 20 years old''
 +
 
 +
''User Boy,Bobby weighs 130 pounds'''
 +
 
 +
Maps to:
 +
{| class="wikitable"
 +
!fname
 +
!lname
 +
!age
 +
!weight
 +
|-
 +
|John
 +
|Smith
 +
|20
 +
|
 +
|-
 +
|Bobby
 +
|Boy
 +
|
 +
|130
 +
|}
 +
 
 +
== Optional Line Number Directives ==
 +
 
 +
=== Skipping Lines Directive (optional) ===
 +
'''Syntax'''
 +
 
 +
<span style="font-family: Courier New; color: blue;">_skipLines=''number_of_lines''</span>
 +
 
 +
'''Overview'''
 +
 
 +
This directive controls the number of lines to skip from the top of the output. This is useful for ignoring a head at the top of output. If not supplied, then no lines are skipped. From a performance standpoint, skipping lines is highly efficient.
 +
 
 +
'''Examples'''
 +
 
 +
<span style="font-family: Courier New; color: blue;">_skipLines="0"</span>     (this is the default, don't skip any lines)
 +
 
 +
<span style="font-family: Courier New; color: blue;">_skipLines="1"</span>      (skip the first line, for example if there is a header)
 +
 
 +
=== Line Number Column Directive (optional) ===
 +
'''Syntax'''
 +
 
 +
<span style="font-family: Courier New; color: blue;">_linenum=''column_name''</span>
 +
 
 +
'''Overview'''
 +
 
 +
This directive controls the name of the column that will contain the line number. If not supplied, the default is "linenum". Notes about the line number: The first line is line number 1, and skipped/filtered out lines are still considered in numbering. For example, if the _''skipLines''=2 , then the first line will have a line number of 3.
 +
 
 +
'''Examples'''
 +
 
 +
<span style="font-family: Courier New; color: blue;">_linenum=""</span> (A line number column is not included in the table)
 +
 
 +
<span style="font-family: Courier New; color: blue;">_linenum="linenum"</span> (The column ''linenum'' will contain line numbers, this is the default)
 +
 
 +
<span style="font-family: Courier New; color: blue;">_linenum="rownum"</span> (The column ''rownum'' will contain line numbers)
 +
 
 +
== Optional Line Filtering Directives ==
 +
 
 +
=== Filtering Out Lines Directive (optional) ===
 +
'''Syntax'''
 +
 
 +
<span style="font-family: Courier New; color: blue;">_filterOut=''regex''</span>
 +
 
 +
'''Overview'''
 +
 
 +
Any line that matches the supplied regular expression will be ignored. If not supplied, then no lines are filtered out. From a Performance standpoint, this is applied before other parsing is considered, so ignoring lines using a filter out directive is faster, as opposed to using a WHERE clause, for example.
 +
 
 +
'''Examples'''
 +
 
 +
<span style="font-family: Courier New; color: blue;">_filterOut="Test"</span> (ignore any lines containing the text ''Test'')
 +
 
 +
<span style="font-family: Courier New; color: blue;">_filterOut="^Comment"</span> (ignore any lines starting with ''Comment'')
 +
 
 +
<span style="font-family: Courier New; color: blue;">_filterOut="This|That"</span> (ignore any lines containing the text ''This'' or ''That'')
 +
 
 +
=== Filtering In Lines Directive (optional) ===
 +
'''Syntax'''
 +
 
 +
<span style="font-family: Courier New; color: blue;">_filterIn=''regex''</span>
 +
 
 +
'''Overview'''
 +
 
 +
Only lines that match the supplied regular expression will be considered. If not supplied, then all lines are considered. From a Performance standpoint, this is applied before other parsing is considered, so narrowing down the lines considered using a filter in directive is faster, as opposed to using a WHERE clause, for example. If you use a grouping (..) inside the regular expression, then only the contents of the first grouping will be considered for parsing
 +
 
 +
'''Examples'''
 +
 
 +
<span style="font-family: Courier New; color: blue;">_filterIn="3Forge"</span> (ignore any lines that don't contain the word ''3Forge'')
  
_filterIn="^Outgoing" (ignore any lines that don't start with ''Outgoing'')
+
<span style="font-family: Courier New; color: blue;">_filterIn="^Outgoing"</span> (ignore any lines that don't start with ''Outgoing'')
  
_filterIn="Data(.*)" (ignore any lines that don't start with ''Data,'' and only consider  the text after the word ''Data'' for processing)
+
<span style="font-family: Courier New; color: blue;">_filterIn="Data(.*)"</span> (ignore any lines that don't start with ''Data,'' and only consider  the text after the word ''Data'' for processing)

Latest revision as of 21:48, 22 March 2021

Flat File Reader

Overview

The AMI Flat File Reader Datasource Adapter is a highly configurable adapter designed to process extremely large flat files into tables at rates exceeding 100mb per second*. There are a number of directives which can be used to control how the flat file reader processes a file. Each line (delineated by a Line feed) is considered independently for parsing. Note the EXECUTE <sql> clause supports the full AMI sql language.

*Using Pattern Capture technique (_pattern) to extract 3 fields across a 4.080 gb text file containing 11,999,504 records. This generated a table of 11,999,504 records x 4 columns in 37,364 milliseconds (additional column is the default linenum). Tested on raid-2 7200rpm 2TB drive

Generally speaking, the parser can handle three (4) different methods of parsing files:

Delimited list or ordered fields

Example data and query:  

11232|1000|123.20

12412|8900|430.90

CREATE TABLE mytable AS USE _file="myfile.txt" _delim="|"
_fields="String account, Integer qty, Double px"
EXECUTE SELECT * FROM file

Key value pairs

Example data and query:

account=11232|quantity=1000|price=123.20

account=12412|quantity=8900|price=430.90

CREATE TABLE mytable AS USE _file="myfile.txt" _delim="|" _equals="="
_fields="String account, Integer qty, Double px"
EXECUTE SELECT * FROM file

Pattern Capture

Example data and query:

Account 11232 has 1000 shares at $123.20 px

Account 12412 has 8900 shares at $430.90 px

CREATE TABLE mytable AS USE _file="myfile.txt"
_fields="String account, Integer qty, Double px"
_pattern="account,qty,px=Account (.*) has (.*) shares at \\$(.*) px"
EXECUTE SELECT * FROM fil

Raw Line

If you do not specify a _fields, _mapping nor _pattern directive then the parser defaults to a simple row-per-line parser. A "line" column is generated containing the entire contents of each line from the file

CREATE TABLE mytable AS USE _file="f.txt" EXECUTE SELECT * FROM FILE

Configuring the Adapter for first use

1. Open the datamodeler (In Developer Mode -> Menu Bar -> Dashboard -> Datamodel)

2. Choose the "Add Datasource" button

3. Choose Flat File Reader

4. In the Add datasource dialog:

Name: Supply a user defined Name, ex: MyFiles

URL: /full/path/to/directory/containing/files  (ex: /home/myuser/files )

               (Keep in mind that the path is on the machine running AMI, not necessarily your local desktop)

5. Click "Add Datasource" Button

Accessing Files Remotely: You can also access files on remote machines as well using an AMI Relay. First install an AMI relay on the machine that contains the files, or at least has access to the files, you wish to read ( See AMI for the Enterprise documentation for details on how to install an AMI relay).  Then in the Add Datasource wizard select the relay in the "Relay To Run On" dropdown.

General Directives

File name Directive (Required)

Syntax

_file=path/to/file

Overview

This directive controls the location of the file to read, relative to the datasource's url. Use the forward slash (/) to indicate directories (standard UNIX convention)                                                           

Examples

_file="data.txt" (Read the data.txt file, located at the root of the datasource's url)

_file="subdir/data.txt" (Read the data.txt file, found under the subdir directory)

Field definitions Directive (Required)

Syntax

_fields=col1_type col_name, col2_type col2_name, ...

Overview

This directive controls the Column names that will be returned, along with their types.  The order in which they are defined is the same as the order in which they are returned.  If the column type is not supplied, the default is String. Special note on additional columns: If the line number (see _linenum directive) column is not supplied in the list, it will default to type integer and be added to the end of the table schema.  Columns defined in the Pattern (see _pattern directive) but not defined in _fields will be added to the end of the table schema.

Types should be one of: String, Long, Integer, Boolean, Double, Float, UTC

Column names must be valid variable names.

Examples

_fields="String account,Long quantity" (define two columns)

_fields ="fname,lname,int age" (define 3 columns, fname and lname default to String)

Directives for parsing Delimited list of ordered Fields

_file=file_name (Required, see general directives)

_fields=col1_type col1_name, ... (Required, see general directives)

_delim=delim_string (Required)

_conflateDelim=true|false (Optional. Default is false)

_quote=single_quote_char (Optional)

_escape=single_escape_char (Optional)

The _delim indicates the char (or chars) used to separate each field (If _conflateDelim is true, then 1 or more consecutive delimiters are treated as a single delimiter). The _fields is an ordered list of types and field names for each of the delimited fields. If the _quote is supplied, then a field value starting with quote will be read until another quote char is found, meaning delims within quotes will not be treated as delims.  If the _escape char is supplied then when an escape char is read, it is skipped and the following char is read as a literal.

Examples

_delim="|"

_fields="code,lname,int age"

_quote="'"

_escape="\\"

This defines a pattern such that:

11232-33|Smith|20

'1332|ABC'||30

Account\|112|Jones|18

Maps to:

code lname age
11232-33 Smith 20
1332|ABC 30
Account|112 Jones 18

Directives for parsing Key Value Pairs

_file=file_name (Required, see general directives)

_fields=col1_type col1_name, ... (Required, see general directives)

_delim=delim_string (Required)

_conflateDelim=true|false (Optional. Default is false)

_equals=single_equals_char (Required)

_mappings=from1=to1,from2=to2,... (Optional)

_quote=single_quote_char (Optional)

_escape=single_escape_char (Optional)

The _delim indicates the char (or chars) used to separate each field (If _conflateDelim is true, then 1 or more consecutive delimiters are treated as a single delimiter). The _equals char is used to indicate the key/value separator. The _fields is an ordered list of types and field names for each of the delimited fields. If the _quote is supplied, then a field value starting with quote will be read until another quote char is found, meaning delims within quotes will not be treated as delims.  If the _escape char is supplied then when an escape char is read, it is skipped and the following char is read as a literal.

The optional _mappings directive allows you to map keys within the flat file to file names specified in the _fields directive. This is useful when a file has key names that are not valid field names, or a file has multiple key names that should be used to populate the same column.

Examples

_delim="|"

_equals="="

_fields="code,lname,int age"

_mappings="20=code,21=lname,22=age"

_quote="'"

_escape="\\"

This defines a pattern such that:

code=11232-33|lname=Smith|age=20

code='1332|ABC'|age=30

20=Act\|112|21=J|22=18 (Note: this row will work due to the _mappings directive)

Maps to:

code lname age
11232-33 Smith 20
1332|ABC 30
Act|112 J 18

Directives for Pattern Capture

_file=file_name (Required, see general directives)

_fields=col1_type col1_name, ... (Optional, see general directives)

_pattern=col1_type col1_name, ...=regex_with_grouping (Required)

The _pattern must start with a list of column names, followed by an equal sign (=) and then a regular expression with grouping (this is dubbed a column-to-pattern mapping).  The regular expression's first grouping value will be mapped to the first column, 2nd grouping to the second and so on.

If a column is already defined in the _fields directive, then it's preferred to not include the column type in the _pattern definition.

For multiple column-to-pattern mappings, use the \n (new line) to separate each one.

Examples

_pattern="fname,lname,int age=User (.*) (.*) is (.*) years old"

This defines a pattern such that:

User John Smith is 20 years old

User Bobby Boy is 30 years old

Maps to:

fname lname age
John Smith 20
Bobby Boy 30

_pattern="fname,lname,int age=User (.*) (.*) is (.*) years old\n lname,fname,int weight=Customer (.*),(.*) weighs (.*) pounds"

This defines two patterns such that:

User John Smith is 20 years old

User Boy,Bobby weighs 130 pounds'

Maps to:

fname lname age weight
John Smith 20
Bobby Boy 130

Optional Line Number Directives

Skipping Lines Directive (optional)

Syntax

_skipLines=number_of_lines

Overview

This directive controls the number of lines to skip from the top of the file.  This is useful for ignoring "junk" at the top of a file. If not supplied, then no lines are skipped. From a performance standpoint, skipping lines is highly efficient.

Examples

_skipLines="0"     (this is the default, don't skip any lines)

_skipLines="1"      (skip the first line, for example if there is a header)

Line Number Column Directive (optional)

Syntax

_linenum=column_name

Overview

This directive controls the name of the column that will contain the line number. If not supplied, the default is "linenum". Notes about the line number: The first line is line number 1, and skipped/filtered out lines are still considered in numbering. For example, if the _skipLines=2 , then the first line will have a line number of 3.

Examples

_linenum="" (A line number column is not included in the table)

_linenum="linenum" (The column linenum will contain line numbers, this is the default)

_linenum="rownum" (The column rownum will contain line numbers)

Optional Line Filtering Directives

Filtering Out Lines Directive (optional)

Syntax

_filterOut=regex

Overview

Any line that matches the supplied regular expression will be ignored. If not supplied, then no lines are filtered out. From a Performance standpoint, this is applied before other parsing is considered, so ignoring lines using a filter out directive is faster, as opposed to using a WHERE clause, for example.

Examples

_filterOut="Test" (ignore any lines containing the text Test)

_filterOut="^Comment" (ignore any lines starting with Comment)

_filterOut="This|That" (ignore any lines containing the text This or That)

Filtering In Lines Directive (optional)

Syntax

_filterIn=regex

Overview

Only lines that match the supplied regular expression will be considered. If not supplied, then all lines are considered. From a Performance standpoint, this is applied before other parsing is considered, so narrowing down the lines considered using a filter in directive is faster, as opposed to using a WHERE clause, for example. If you use a grouping (..) inside the regular expression, then only the contents of the first grouping will be considered for parsing

Examples

_filterIn="3Forge" (ignore any lines that don't contain the word 3Forge)

_filterIn="^Outgoing" (ignore any lines that don't start with Outgoing)

_filterIn="Data(.*)" (ignore any lines that don't start with Data, and only consider  the text after the word Data for processing)

Shell Command Reader

Overview

The AMI Shell Command Datasource Adapter is a highly configurable adapter designed to execute shell commands and capture the stdout, stderr and exitcode. There are a number of directives which can be used to control how the command is executed, including setting environment variables and supplying data to be passed to stdin. The adapter processes the output from the command. Each line (delineated by a Line feed) is considered independently for parsing. Note the EXECUTE <sql> clause supports the full AMI sql language.

Please note, that running the command will produce 3 tables:

  • Stdout - Contains the contents of standard out
  • Stderr - Contains the contents from standard err
  • exitCode - Contains the executed code of the process

(You can limit which tables are returned using the _include directive)

Generally speaking, the parser can handle four (4) different methods of parsing:

Delimited list or ordered fields

Example data and query:  

11232|1000|123.20

12412|8900|430.90

CREATE TABLE mytable AS USE _cmd="my_cmd" _delim="|"
_fields="String account, Integer qty, Double px"
EXECUTE SELECT * FROM cmd

Key value pairs

Example data and query:

account=11232|quantity=1000|price=123.20

account=12412|quantity=8900|price=430.90

CREATE TABLE mytable AS USE _cmd="my_cmd" _delim="|" _equals="="
_fields="String account, Integer qty, Double px"
EXECUTE SELECT * FROM cmd

Pattern Capture

Example data and query:

Account 11232 has 1000 shares at $123.20 px

Account 12412 has 8900 shares at $430.90 px

CREATE TABLE mytable AS USE _cmd="my_cmd"
_fields="String account, Integer qty, Double px"
_pattern="account,qty,px=Account (.*) has (.*) shares at \\$(.*) px"
EXECUTE SELECT * FROM cmd

Raw Line

If you do not specify a _fields, _mapping nor _pattern directive then the parser defaults to a simple row-per-line parser. A "line" column is generated containing the entire contents of each line from the command's output

CREATE TABLE mytable AS USE _cmd="my_cmd" EXECUTE SELECT * FROM cmd

Configuring the Adapter for first use

1. Open the datamodeler (In Developer Mode -> Menu Bar -> Dashboard -> Datamodel)

2. Choose the "Add Datasource" button

3. Choose Shell Command Reader

4. In the Add datasource dialog:

Name: Supply a user defined Name, ex: MyShell

URL: /full/path/to/path/of/working/directory (ex: /home/myuser/files )

(Keep in mind that the path is on the machine running AMI, not necessarily your local desktop)

5. Click "Add Datasource" Button

Running Commands Remotely: You can execute commands on remote machines as well using an AMI Relay. First install an AMI relay on the machine that the command should be executed on ( See AMI for the Enterprise documentation for details on how to install an AMI relay).  Then in the Add Datasource wizard select the relay in the "Relay To Run On" dropdown.

General Directives

Command Directive (Required)

Syntax

_cmd="command to run"

Overview

This directive controls the command to execute.

Examples

_cmd="ls -lrt" (execute ls -lrt)

_cmd="ls | sort" (execute ls and pipe that into sort)

_cmd="dir /s" (execute dir on a windows system)

Supplying Standard Input (Optional)

Syntax

_stdin="text to pipe into stdin"

Overview

This directive is used to control what data is piped into the standard in (stdin) of the process to run.                                                                

Example

_cmd="cat > out.txt" _stdin="hello world" (will pipe "hello world" into out.txt)

Controlling what is captured from the Process (Optional)

Syntax

_capture="comma_delimited_list" (default is stdout,stderr,exitCode)

Overview

This directive is used to control what output data from running the command is captured. It is a comma delimited list and the order determines what order the tables are returned in.  An empty list ("") indicates that nothing will be captured (the command is executed, and all output is ignored). Options include the following:

  • stdout - Capture standard out from the process
  • stderr - Capture standard error from the process
  • exitCode - Capture the exit code from the process                                                                                                    

Examples

_capture="exitCode,stdout" (the 1st table will contain the exit code, 2nd will contain stdout)

Field Definitions Directive (Required)

Syntax

_fields=col1_type col_name, col2_type col2_name, ...

Overview

This directive controls the Column names that will be returned, along with their types.  The order in which they are defined is the same as the order in which they are returned.  If the column type is not supplied, the default is String. Special note on additional columns: If the line number (see _linenum directive) column is not supplied in the list, it will default to type integer and be added to the end of the table schema.  Columns defined in the Pattern (see _pattern directive) but not defined in _fields will be added to the end of the table schema.

Types should be one of: String, Long, Integer, Boolean, Double, Float, UTC

Column names must be valid variable names.

Examples

_fields="String account,Long quantity" (define two columns)

_fields ="fname,lname,int age" (define 3 columns, fname and lname default to String)

Environment Directive (Optional)

Syntax

_env="key=value,key=value,..." (Optional. Default is false)

Overview

This directive controls what environment variables are set when running a command

Example

_env="name=Rob,Location=NY"

Use Host Environment Directive (Optional)

Syntax

_useHostEnv=true|false (Optional. Default is false)

Overview

If true, then the environment properties of the Ami process executing the command are passed to the shell.  Please note, that _env values can be used to override specific environment variables.

Example

_useHostEnv="true"

Directives for parsing Delimited list of ordered Fields

_cmd=command_to_execute (Required, see general directives)

_fields=col1_type col1_name, ... (Required, see general directives)

_delim=delim_string (Required)

_conflateDelim=true|false (Optional. Default is false)

_quote=single_quote_char (Optional)

_escape=single_escape_char (Optional)

The _delim indicates the char (or chars) used to separate each field (If _conflateDelim is true, then 1 or more consecutive delimiters are treated as a single delimiter). The _fields is an ordered list of types and field names for each of the delimited fields. If the _quote is supplied, then a field value starting with quote will be read until another quote char is found, meaning delims within quotes will not be treated as delims.  If the _escape char is supplied then when an escape char is read, it is skipped and the following char is read as a literal.

Examples

_delim="|"

_fields="code,lname,int age"

_quote="'"

_escape="\\"

This defines a pattern such that:

11232-33|Smith|20

'1332|ABC'||30

Account\|112|Jones|18

Maps to:

code lname age
11232-33 Smith 20
1332|ABC 30
Account|112 Jones 18

Directives for parsing Key Value Pairs

_cmd=command_to_execute (Required, see general directives)

_fields=col1_type col1_name, ... (Required, see general directives)

_delim=delim_string (Required)

_conflateDelim=true|false (Optional. Default is false)

_equals=single_equals_char (Required)

_mappings=from1=to1,from2=to2,... (Optional)

_quote=single_quote_char (Optional)

_escape=single_escape_char (Optional)

The _delim indicates the char (or chars) used to separate each field (If _conflateDelim is true, then 1 or more consecutive delimiters are treated as a single delimiter). The _equals char is used to indicate the key/value separator. The _fields is an ordered list of types and field names for each of the delimited fields. If the _quote is supplied, then a field value starting with quote will be read until another quote char is found, meaning delims within quotes will not be treated as delims.  If the _escape char is supplied then when an escape char is read, it is skipped and the following char is read as a literal.

The optional _mappings directive allows you to map keys within the output to field names specified in the _fields directive. This is useful when the output has key names that are not valid field names, or the output has multiple key names that should be used to populate the same column.

Examples

_delim="|"

_equals="="

_fields="code,lname,int age"

_mappings="20=code,21=lname,22=age"

_quote="'"

_escape="\\"

This defines a pattern such that:

code=11232-33|lname=Smith|age=20

code='1332|ABC'|age=30

20=Act\|112|21=J|22=18 (Note: this row will work due to the _mappings directive)

Maps to:

code lname age
11232-33 Smith 20
1332|ABC 30
Act|112 J 18

Directives for Pattern Capture

_cmd=command_to_execute (Required, see general directives)

_fields=col1_type col1_name, ... (Optional, see general directives)

_pattern=col1_type col1_name, ...=regex_with_grouping (Required)

The _pattern must start with a list of column names, followed by an equal sign (=) and then a regular expression with grouping (this is dubbed a column-to-pattern mapping).  The regular expression's first grouping value will be mapped to the first column, 2nd grouping to the second and so on.

If a column is already defined in the _fields directive, then it's preferred to not include the column type in the _pattern definition.

For multiple column-to-pattern mappings, use the \n (new line) to separate each one.

Examples

_pattern="fname,lname,int age=User (.*) (.*) is (.*) years old"

This defines a pattern such that:

User John Smith is 20 years old

User Bobby Boy is 30 years old

Maps to:

fname lname age
John Smith 20
Bobby Boy 30

_pattern="fname,lname,int age=User (.*) (.*) is (.*) years old\n lname,fname,int weight=Customer (.*),(.*) weighs (.*) pounds"

This defines two patterns such that:

User John Smith is 20 years old

User Boy,Bobby weighs 130 pounds'

Maps to:

fname lname age weight
John Smith 20
Bobby Boy 130

Optional Line Number Directives

Skipping Lines Directive (optional)

Syntax

_skipLines=number_of_lines

Overview

This directive controls the number of lines to skip from the top of the output. This is useful for ignoring a head at the top of output. If not supplied, then no lines are skipped. From a performance standpoint, skipping lines is highly efficient.

Examples

_skipLines="0"     (this is the default, don't skip any lines)

_skipLines="1"      (skip the first line, for example if there is a header)

Line Number Column Directive (optional)

Syntax

_linenum=column_name

Overview

This directive controls the name of the column that will contain the line number. If not supplied, the default is "linenum". Notes about the line number: The first line is line number 1, and skipped/filtered out lines are still considered in numbering. For example, if the _skipLines=2 , then the first line will have a line number of 3.

Examples

_linenum="" (A line number column is not included in the table)

_linenum="linenum" (The column linenum will contain line numbers, this is the default)

_linenum="rownum" (The column rownum will contain line numbers)

Optional Line Filtering Directives

Filtering Out Lines Directive (optional)

Syntax

_filterOut=regex

Overview

Any line that matches the supplied regular expression will be ignored. If not supplied, then no lines are filtered out. From a Performance standpoint, this is applied before other parsing is considered, so ignoring lines using a filter out directive is faster, as opposed to using a WHERE clause, for example.

Examples

_filterOut="Test" (ignore any lines containing the text Test)

_filterOut="^Comment" (ignore any lines starting with Comment)

_filterOut="This|That" (ignore any lines containing the text This or That)

Filtering In Lines Directive (optional)

Syntax

_filterIn=regex

Overview

Only lines that match the supplied regular expression will be considered. If not supplied, then all lines are considered. From a Performance standpoint, this is applied before other parsing is considered, so narrowing down the lines considered using a filter in directive is faster, as opposed to using a WHERE clause, for example. If you use a grouping (..) inside the regular expression, then only the contents of the first grouping will be considered for parsing

Examples

_filterIn="3Forge" (ignore any lines that don't contain the word 3Forge)

_filterIn="^Outgoing" (ignore any lines that don't start with Outgoing)

_filterIn="Data(.*)" (ignore any lines that don't start with Data, and only consider  the text after the word Data for processing)

SSH Adapter

Overview

The AMI SSH Datasource Adapter is a highly configurable adapter designed to execute shell commands and capture the stdout, stderr and exitcode on remote hosts via the secure ssh protocol. There are a number of directives which can be used to control how the command is executed, including supplying data to be passed to stdin. The adapter processes the output from the command. Each line (delineated by a Line feed) is considered independently for parsing. Note the EXECUTE <sql> clause supports the full AMI sql language.

Please note, that running the command will produce 3 tables:

  • Stdout - Contains the contents of standard out
  • Stderr - Contains the contents from standard err
  • exitCode - Contains the executed code of the process

(You can limit which tables are returned using the _include directive)

Generally speaking, the parser can handle four (4) different methods of parsing:

Delimited list or ordered fields

Example data and query:  

11232|1000|123.20

12412|8900|430.90

CREATE TABLE mytable AS USE _cmd="my_cmd" _delim="|"
_fields="String account, Integer qty, Double px"
EXECUTE SELECT * FROM cmd

Key value pairs

Example data and query:

account=11232|quantity=1000|price=123.20

account=12412|quantity=8900|price=430.90

CREATE TABLE mytable AS USE _cmd="my_cmd" _delim="|" _equals="="
_fields="String account, Integer qty, Double px"
EXECUTE SELECT * FROM cmd

Pattern Capture

Example data and query:

Account 11232 has 1000 shares at $123.20 px

Account 12412 has 8900 shares at $430.90 px

CREATE TABLE mytable AS USE _cmd="my_cmd"
_fields="String account, Integer qty, Double px"
_pattern="account,qty,px=Account (.*) has (.*) shares at \\$(.*) px"
EXECUTE SELECT * FROM cmd

Raw Line

If you do not specify a _fields, _mapping nor _pattern directive then the parser defaults to a simple row-per-line parser. A "line" column is generated containing the entire contents of each line from the command's output

CREATE TABLE mytable AS USE _cmd="my_cmd" EXECUTE SELECT * FROM cmd

Configuring the Adapter for first use

1. Open the datamodeler (In Developer Mode -> Menu Bar -> Dashboard -> Datamodel)

2. Choose the "Add Datasource" button

3. Choose SSH Command adapter

4. In the Add datasource dialog:

Name: Supply a user defined Name, ex: MyShell

URL: hostname or hostname:port

Username: the name of the ssh user to login as

Password: the password of the ssh user to login as

Options: See below, note when using multiple options they should be comma delimited

  • For servers requiring keyboard interactive authentication:  authMode=keyboardInteractive
  • To use a public/private key for authentication: publicKeyFile=/path/to/key/file (Note this is often /your_home_dir/.ssh/id_rsa)
  • To request a dumb pty connection: useDumbPty=true     

5. Click "Add Datasource" Button

Running Commands Remotely: You can execute commands on remote machines as well using an AMI Relay. First install an AMI relay on the machine that the command should be executed on ( See AMI for the Enterprise documentation for details on how to install an AMI relay).  Then in the Add Datasource wizard select the relay in the "Relay To Run On" dropdown.

General Directives

Command Directive (Required)

Syntax

_cmd="command to run"

Overview

This directive controls the command to execute.

Examples

_cmd="ls -lrt" (execute ls -lrt)

_cmd="ls | sort" (execute ls and pipe that into sort)

_cmd="dir /s" (execute dir on a windows system)

Supplying Standard Input (Optional)

Syntax

_stdin="text to pipe into stdin"

Overview

This directive is used to control what data is piped into the standard in (stdin) of the process to run.                                                                

Example

_cmd="cat > out.txt" _stdin="hello world" (will pipe "hello world" into out.txt)

Controlling what is captured from the Process (Optional)

Syntax

_capture="comma_delimited_list" (default is stdout,stderr,exitCode)

Overview

This directive is used to control what output data from running the command is captured. It is a comma delimited list and the order determines what order the tables are returned in.  An empty list ("") indicates that nothing will be captured (the command is executed, and all output is ignored). Options include the following:

  • stdout - Capture standard out from the process
  • stderr - Capture standard error from the process
  • exitCode - Capture the exit code from the process                                                                                                    

Example

_capture="exitCode,stdout" (the 1st table will contain the exit code, 2nd will contain stdout)

Field Definitions Directive (Required)

Syntax

_fields=col1_type col_name, col2_type col2_name, ...

Overview

This directive controls the Column names that will be returned, along with their types.  The order in which they are defined is the same as the order in which they are returned.  If the column type is not supplied, the default is String. Special note on additional columns: If the line number (see _linenum directive) column is not supplied in the list, it will default to type integer and be added to the end of the table schema.  Columns defined in the Pattern (see _pattern directive) but not defined in _fields will be added to the end of the table schema.

Types should be one of: String, Long, Integer, Boolean, Double, Float, UTC

Column names must be valid variable names.

Examples

_fields="String account,Long quantity" (define two columns)

_fields ="fname,lname,int age" (define 3 columns, fname and lname default to String)

Environment Directive (Optional)

Syntax

_env="key=value,key=value,..." (Optional. Default is false)

Overview

This directive controls what environment variables are set when running a command

Example

_env="name=Rob,Location=NY"

Use Host Environment Directive (Optional)

Syntax

_useHostEnv=true|false (Optional. Default is false)

Overview

If true, then the environment properties of the Ami process executing the command are passed to the shell.  Please note, that _env values can be used to override specific environment variables.

Example

_useHostEnv="true"

Directives for parsing Delimited list of ordered Fields

_cmd=command_to_execute (Required, see general directives)

_fields=col1_type col1_name, ... (Required, see general directives)

_delim=delim_string (Required)

_conflateDelim=true|false (Optional. Default is false)

_quote=single_quote_char (Optional)

_escape=single_escape_char (Optional)

The _delim indicates the char (or chars) used to separate each field (If _conflateDelim is true, then 1 or more consecutive delimiters are treated as a single delimiter). The _fields is an ordered list of types and field names for each of the delimited fields. If the _quote is supplied, then a field value starting with quote will be read until another quote char is found, meaning delims within quotes will not be treated as delims.  If the _escape char is supplied then when an escape char is read, it is skipped and the following char is read as a literal.

Examples

_delim="|"

_fields="code,lname,int age"

_quote="'"

_escape="\\"

This defines a pattern such that:

11232-33|Smith|20

'1332|ABC'||30

Account\|112|Jones|18

Maps to:

code lname age
11232-33 Smith 20
1332|ABC 30
Account|112 Jones 18

Directives for parsing Key Value Pairs

_cmd=command_to_execute (Required, see general directives)

_fields=col1_type col1_name, ... (Required, see general directives)

_delim=delim_string (Required)

_conflateDelim=true|false (Optional. Default is false)

_equals=single_equals_char (Required)

_mappings=from1=to1,from2=to2,... (Optional)

_quote=single_quote_char (Optional)

_escape=single_escape_char (Optional)

The _delim indicates the char (or chars) used to separate each field (If _conflateDelim is true, then 1 or more consecutive delimiters are treated as a single delimiter). The _equals char is used to indicate the key/value separator. The _fields is an ordered list of types and field names for each of the delimited fields. If the _quote is supplied, then a field value starting with quote will be read until another quote char is found, meaning delims within quotes will not be treated as delims.  If the _escape char is supplied then when an escape char is read, it is skipped and the following char is read as a literal.

The optional _mappings directive allows you to map keys within the output to field names specified in the _fields directive. This is useful when the output has key names that are not valid field names, or the output has multiple key names that should be used to populate the same column.

Examples

_delim="|"

_equals="="

_fields="code,lname,int age"

_mappings="20=code,21=lname,22=age"

_quote="'"

_escape="\\"

This defines a pattern such that:

code=11232-33|lname=Smith|age=20

code='1332|ABC'|age=30

20=Act\|112|21=J|22=18 (Note: this row will work due to the _mappings directive)

Maps to:

code lname age
11232-33 Smith 20
1332|ABC 30
Act|112 J 18

Directives for Pattern Capture

_cmd=command_to_execute (Required, see general directives)

_fields=col1_type col1_name, ... (Optional, see general directives)

_pattern=col1_type col1_name, ...=regex_with_grouping (Required)

The _pattern must start with a list of column names, followed by an equal sign (=) and then a regular expression with grouping (this is dubbed a column-to-pattern mapping).  The regular expression's first grouping value will be mapped to the first column, 2nd grouping to the second and so on.

If a column is already defined in the _fields directive, then it's preferred to not include the column type in the _pattern definition.

For multiple column-to-pattern mappings, use the \n (new line) to separate each one.

Examples

_pattern="fname,lname,int age=User (.*) (.*) is (.*) years old"

This defines a pattern such that:

User John Smith is 20 years old

User Bobby Boy is 30 years old

Maps to:

fname lname age
John Smith 20
Bobby Boy 30

_pattern="fname,lname,int age=User (.*) (.*) is (.*) years old\n lname,fname,int weight=Customer (.*),(.*) weighs (.*) pounds"

This defines two patterns such that:

User John Smith is 20 years old

User Boy,Bobby weighs 130 pounds'

Maps to:

fname lname age weight
John Smith 20
Bobby Boy 130

Optional Line Number Directives

Skipping Lines Directive (optional)

Syntax

_skipLines=number_of_lines

Overview

This directive controls the number of lines to skip from the top of the output. This is useful for ignoring a head at the top of output. If not supplied, then no lines are skipped. From a performance standpoint, skipping lines is highly efficient.

Examples

_skipLines="0"     (this is the default, don't skip any lines)

_skipLines="1"      (skip the first line, for example if there is a header)

Line Number Column Directive (optional)

Syntax

_linenum=column_name

Overview

This directive controls the name of the column that will contain the line number. If not supplied, the default is "linenum". Notes about the line number: The first line is line number 1, and skipped/filtered out lines are still considered in numbering. For example, if the _skipLines=2 , then the first line will have a line number of 3.

Examples

_linenum="" (A line number column is not included in the table)

_linenum="linenum" (The column linenum will contain line numbers, this is the default)

_linenum="rownum" (The column rownum will contain line numbers)

Optional Line Filtering Directives

Filtering Out Lines Directive (optional)

Syntax

_filterOut=regex

Overview

Any line that matches the supplied regular expression will be ignored. If not supplied, then no lines are filtered out. From a Performance standpoint, this is applied before other parsing is considered, so ignoring lines using a filter out directive is faster, as opposed to using a WHERE clause, for example.

Examples

_filterOut="Test" (ignore any lines containing the text Test)

_filterOut="^Comment" (ignore any lines starting with Comment)

_filterOut="This|That" (ignore any lines containing the text This or That)

Filtering In Lines Directive (optional)

Syntax

_filterIn=regex

Overview

Only lines that match the supplied regular expression will be considered. If not supplied, then all lines are considered. From a Performance standpoint, this is applied before other parsing is considered, so narrowing down the lines considered using a filter in directive is faster, as opposed to using a WHERE clause, for example. If you use a grouping (..) inside the regular expression, then only the contents of the first grouping will be considered for parsing

Examples

_filterIn="3Forge" (ignore any lines that don't contain the word 3Forge)

_filterIn="^Outgoing" (ignore any lines that don't start with Outgoing)

_filterIn="Data(.*)" (ignore any lines that don't start with Data, and only consider  the text after the word Data for processing)