Datasource Adapters
Flat File Reader
Overview
The AMI Flat File Reader Datasource Adapter is a highly configurable adapter designed to process extremely large flat files into tables at rates exceeding 100mb per second*. There are a number of directives which can be used to control how the flat file reader processes a file. Each line (delineated by a Line feed) is considered independently for parsing. Note the EXECUTE <sql> clause supports the full AMI sql language.
*Using Pattern Capture technique (_pattern) to extract 3 fields across a 4.080 gb text file containing 11,999,504 records. This generated a table of 11,999,504 records x 4 columns in 37,364 milliseconds (additional column is the default linenum). Tested on raid-2 7200rpm 2TB drive
Generally speaking, the parser can handle three (4) different methods of parsing files:
Delimited list or ordered fields
Example data and query:
11232|1000|123.20
12412|8900|430.90
CREATE TABLE mytable AS USE _file="myfile.txt" _delim="|"
_fields="String account, Integer qty, Double px"
EXECUTE SELECT * FROM file
Key value pairs
Example data and query:
account=11232|quantity=1000|price=123.20
account=12412|quantity=8900|price=430.90
CREATE TABLE mytable AS USE _file="myfile.txt" _delim="|" _equals="="
_fields="String account, Integer qty, Double px"
EXECUTE SELECT * FROM file
Pattern Capture
Example data and query:
Account 11232 has 1000 shares at $123.20 px
Account 12412 has 8900 shares at $430.90 px
CREATE TABLE mytable AS USE _file="myfile.txt"
_fields="String account, Integer qty, Double px"
_pattern="account,qty,px=Account (.*) has (.*) shares at \\$(.*) px"
EXECUTE SELECT * FROM fil
Raw Line
If you do not specify a _fields, _mapping nor _pattern directive then the parser defaults to a simple row-per-line parser. A "line" column is generated containing the entire contents of each line from the file
CREATE TABLE mytable AS USE _file="f.txt" EXECUTE SELECT * FROM FILE
Configuring the Adapter for first use
1. Open the datamodeler (In Developer Mode -> Menu Bar -> Dashboard -> Datamodel)
2. Choose the "Add Datasource" button
3. Choose Flat File Reader
4. In the Add datasource dialog:
Name: Supply a user defined Name, ex: MyFiles
URL: /full/path/to/directory/containing/files (ex: /home/myuser/files )
(Keep in mind that the path is on the machine running AMI, not necessarily your local desktop)
5. Click "Add Datasource" Button
Accessing Files Remotely: You can also access files on remote machines as well using an AMI Relay. First install an AMI relay on the machine that contains the files, or at least has access to the files, you wish to read ( See AMI for the Enterprise documentation for details on how to install an AMI relay). Then in the Add Datasource wizard select the relay in the "Relay To Run On" dropdown.