Datasource Adapters
Flat File Reader
Overview
The AMI Flat File Reader Datasource Adapter is a highly configurable adapter designed to process extremely large flat files into tables at rates exceeding 100mb per second*. There are a number of directives which can be used to control how the flat file reader processes a file. Each line (delineated by a Line feed) is considered independently for parsing. Note the EXECUTE <sql> clause supports the full AMI sql language.
*Using Pattern Capture technique (_pattern) to extract 3 fields across a 4.080 gb text file containing 11,999,504 records. This generated a table of 11,999,504 records x 4 columns in 37,364 milliseconds (additional column is the default linenum). Tested on raid-2 7200rpm 2TB drive
Generally speaking, the parser can handle three (4) different methods of parsing files:
Delimited list or ordered fields
Example data and query:
11232|1000|123.20
12412|8900|430.90
CREATE TABLE mytable AS USE _file="myfile.txt" _delim="|"
_fields="String account, Integer qty, Double px"
EXECUTE SELECT * FROM file
Key value pairs
Example data and query:
account=11232|quantity=1000|price=123.20
account=12412|quantity=8900|price=430.90
CREATE TABLE mytable AS USE _file="myfile.txt" _delim="|" _equals="="
_fields="String account, Integer qty, Double px"
EXECUTE SELECT * FROM file
Pattern Capture
Example data and query:
Account 11232 has 1000 shares at $123.20 px
Account 12412 has 8900 shares at $430.90 px
CREATE TABLE mytable AS USE _file="myfile.txt"
_fields="String account, Integer qty, Double px"
_pattern="account,qty,px=Account (.*) has (.*) shares at \\$(.*) px"
EXECUTE SELECT * FROM file
Raw Line
If you do not specify a _fields, _mapping nor _pattern directive then the parser defaults to a simple row-per-line parser. A "line" column is generated containing the entire contents of each line from the file
CREATE TABLE mytable AS USE _file="f.txt" EXECUTE SELECT * FROM FILE