AMI SQL
Goals
The 3Forge SQL language provides a comprehensive instruction set for working with table(s) of data.
- Retrieval - Accessing and Normalizing data from external sources
- Blending - Perform Joins and Unions on tables
- Modifying - Inserting, Updating, and Deleting rows
- Analyzing - Grouping, Sorting, and Filtering
- Schema Definition - Creating, Updating, and Dropping tables
Broadly speaking, the logic steps involved in data-visualization are:
1. Running a query, or set of queries, on external datasources and/or the AMI-Realtime database.
- Each datasource is predefined and uniquely named.
- The results of queries on datasources will result in temporary, in-memory tables
- The queries can be constructed using user-input/user-selected/user-associated data
- In more advanced cases, the queries can be sequential, meaning the results from one query are then fed into a subsequent query (allowing for reactionary data blending)
2. Blending the results from Step 1, resulting in new temporary in-memory tables 3. Building visualizations on the tables from Step 1 and/or Step 2
AMI SQL and Standard SQL Differences
For those familiar with standard SQL, these are the differences:
1. Equality is done with ==
- AMISQL: SELECT * FROM Users WHERE userid == 15;
- MYSQL: SELECT * FROM Users WHERE userid = 15;
2. Checking for null is done with equality, ex. == null
- AMISQL: SELECT * FROM Users WHERE status == null;
- MYSQL: SELECT * FROM Users WHERE status IS NULL;
3. Strings are surrounded in double quotes
- AMISQL: SELECT * FROM Users WHERE name == "David";
- MYSQL: SELECT * FROM Users WHERE name = 'David';
4. "Matches" (~~) instead of LIKE
- AMISQL: SELECT * FROM Users WHERE name ~~ "^Dav";
- MYSQL: SELECT * FROM Users WHERE name LIKE 'Dav%';
(See Simplified Text Matching for pattern matching rules/expressions)
5. Boolean expressions && works same as AND. || works same as OR
- AMISQL: SELECT * FROM Users WHERE userid == 15 AND status == null && name == "David";
- MYSQL: SELECT * FROM Users WHERE userid = 15 AND status IS NULL && name = 'David';
6. By default all tables are TEMPORARY, use the PUBLIC key word to make non-temporary tables:
- AMISQL: CREATE TABLE MyTemporaryTable(id String);
- AMISQL: CREATE PUBLIC TABLE MyPermanantTable(id String);
Data Source Access Clauses
use...execute_clause
Overview
This command enables you to "execute" code on an external datasource and process/store the results locally
- USE: While datamodels have default options, you can override them for a particular query. Valid options include
- ds = "datasource_name"
- The name of the datasource to execute this script on
- timeout = timeout_in_millis
- The number of milliseconds before the execute command times out
- limit = row_limit_to_return
- The max number of records returned
- ds_url = "url"
- Overrides the url of the datasource
- When used in conjunction with ds="datasource_name", supplied datasource must contain URL in the PermittedOverrides field
- ds_username = "username"
- Overrides the username of the datasource
- When used in conjunction with ds="datasource_name", supplied datasource must contain USERNAME in the PermittedOverrides field
- ds_password = "plain_text_password"
- Overrides the password of the datasource using a plain text password
- When used in conjunction with ds="datasource_name", supplied datasource must contain PASSWORD in the PermittedOverrides field
- Mutually exclusive with the ds_password_enc option
- ds_password_enc = "aes_encrypted_password"
- Overrides the password of the datasource, using an encrypted password
- Use strEncrypt(...) to create an encrypted password
- When used in conjunction with ds="datasource_name", supplied datasource must contain PASSWORD in the PermittedOverrides field
- Mutually exclusive with the ds_password option
- ds_options = "some options"
- Override the options of the datasource
- When used in conjunction with ds="datasource_name", supplied datasource must contain OPTIONS in the PermittedOverrides field
- ds_relay = "relayid"
- Override the relay of the datasource
- Run SHOW RELAYS command and inspect RelayId for available relays
- When used in conjunction with ds="datasource_name", supplied datasource must contain RELAY in the PermittedOverrides field
- ds_adapter = "SOME_ADAPTER"
- Use an "anonymous" datasource of the specified adapter type
- Run SHOW DATASOURCE_TYPES command and inspect the I column for available adapters
- Mutually exclusive with ds="datasource_name" option
- See ami.db.anonymous.datasources.enabled option for permission details
- _custom=some_directive
- Prefixing with an underbar (_) passes the option through to the datasource's adapter
- ds = "datasource_name"
- EXECUTE: The literal text to execute. The text will be sent directly to the datasource for execution. Therefore, variables cannot be referenced. Instead, substitution can be used. See dollarsign_substitution
Syntax
[USE option = value [ option = value ...]] EXECUTE code_to_execute
Example
CREATE TABLE
final_table AS
USE
ds="mydb"
timeout=10000
limit=100
EXECUTE
select * from mytable;
use...show tables clause
Overview
This command enables you to see what tables are available on a given datasource (similar to what the AMI gui wizard shows you when selecting a table).
- USE: While datamodels have default options, you can override them for a particular query. Valid options include
- ds = "datasource_name" The name of the datasource to execute the script on
The resulting table will be called TABLE and contain two String columns: TableName and Schema
Syntax
USE ds="mydb" SHOW TABLES;
Example
This example will return a table displaying all tables available in the mydb datasource. If the datasource supports schema names, that will be populated as well.
USE ds="mydb" SHOW TABLES;
//returns:
use...insert clause
Overview
This command uploads data from AMI into a datasource.
- USE: While datamodels have default options, you can override them for a particular query. Valid options include
- ds = "datasource_name" The name of the datasource to execute the script on
The resulting table will be called TABLE and contain two String columns: TableName and Schema
Syntax
USE ds="datasource_name" INSERT [INTO] datasource_table [(col1[, col2 ...])] select_clause
USE ds="datasource_name" INSERT [INTO] datasource_table [(col1[, col2 ...])] analyze_clause
USE ds="datasource_name" INSERT [INTO] datasource_table [(col1[, col2 ...])] prepare_clause
USE ds="datasource_name" INSERT [INTO] datasource_table [(col1[, col2 ...])] execute_clause
USE ds="datasource_name" INSERT [INTO] datasource_table [(col1[, col2 ...])] VALUES values_expr[, values_expr ...]
USE ds="datasource_name" INSERT [INTO] datasource_table [(col1[, col2 ...])] Use_execute_clause
datasource_name - string name of the datasource containing the table to insert values into
datasource_table - name of the table to insert values into
Examples
This example will insert two rows into the mydb datasource's users table:
USE ds="mydb" INSERT INTO users(name,city,state) values ("Rob","NYC","NY"),("Sara","OC","CA");
This example copy all rows from AMI's my_users table into the "mydb" datasource's users table:
USE ds="mydb" INSERT INTO users(name,city,state) SELECT Name,Local,Region FROM my_users;
This example copy all rows from the old_users table in the otherds datasource to the users table in the mydb datasource.
USE ds="mydb" INSERT INTO users(name,city,state) USE ds="otherds" EXECUTE SELECT * FROM old_users
concurrent_block (advanced)
Overview
When running analysis that needs to query many datasources, there can be large performance gains by executing these queries simultaneously. The CONCURRENT{...} key word causes all EXECUTE clauses within the block to be run in parallel (instead of sequentially).
In both of the examples below, we are going to execute three queries on three separate datasources (mydb1, mydb2, mydb3). The first example executes the queries sequentially and the second example executes the queries concurrently.
Assuming each query takes 2 seconds for a database to process, the first example will complete in about 6 seconds (2+2+2), while the second, concurrent, example will complete in only 2 seconds:
Example of Sequential Execution (Default)
{
CREATE TABLE mytable1 AS USE ds="mydb1" EXECUTE select * from mytable;
CREATE TABLE mytable2 AS USE ds="mydb2" EXECUTE select * from mytable;
CREATE TABLE mytable3 AS USE ds="mydb3" EXECUTE select * from mytable;
}
Here is the sequence of events:
00:00:00 - EXECUTE clause is sent to mydb1
00:00:00 - AMI Script sleeps until there is a response from mydb1...
00:00:02 - mytable1 is created (from result of query to mydb1)
00:00:02 - EXECUTE clause is sent to mydb2
00:00:02 - AMI Script sleeps until there is a response from mydb2...
00:00:04 - mytable2 is created (from result of query to mydb2)
00:00:04 - EXECUTE clause is sent to mydb3
00:00:04 - AMI Script sleeps until there is a response from mydb3...
00:00:06 - mytable3 is created (from result of query to mydb3)
00:00:06 - Block is complete after 6 seconds (code after the block will be executed)
Example of Concurrent Execution
So now let's add the CONCURRENT keyword before the block. In this case, all 3 executes will be initiated at the same time.
CONCURRENT{
CREATE TABLE mytable1 AS USE ds="mydb1" EXECUTE select * from mytable;
CREATE TABLE mytable2 AS USE ds="mydb2" EXECUTE select * from mytable;
CREATE TABLE mytable3 AS USE ds="mydb3" EXECUTE select * from mytable;
}
Here is the sequence of events:
00:00:00 - EXECUTE clause is sent to mydb1
00:00:00 - EXECUTE clause is sent to mydb2
00:00:00 - EXECUTE clause is sent to mydb3
00:00:00 - AMI Script sleeps until there is a response from all 3 queries...
00:00:02 - mytable1 is created (from result of query to mydb1)
00:00:02 - mytable2 is created (from result of query to mydb2)
00:00:02 - mytable3 is created (from result of query to mydb3)
00:00:02 - Block is complete after 2 seconds (code after the block will be executed)
Query Clauses
select_clause
Overview
This clause allows the combination of Filtering (WHERE), Sorting (ORDER BY), Aggregating (GROUP BY), Projecting (SELECT …) and Cropping (LIMIT) of individual tables. Next, the results may be combined via Unions (UNION) and/or Joins (FROM … JOIN ON). Complex Strings containing delimited data can be split into distinct rows via the UNPACK operation. Finally, see the create_table_as_clause for storing the final result of a select clause into another table.
- To aggregate data, use the GROUP BY clause to identify any number of columns or derived values that should be used to group rows. Provide an alias (AS) for derived values to be used within the select clause. The select columns can only use aggregate functions (min, max, count, sum) on underlying rows and columns declared in the group-by clause
- To filter data, provide a WHERE clause which evaluates to a Boolean. True indicates don't filter. When filtering in conjunction with aggregating (GROUP BY), those records filtered out will not be considered for aggregation. To filter rows after aggregation, use the HAVING clause.
- For projecting, supply columns to include in the SELECT clause. Derived values may be provided. Provide an alias (AS) for derived values or to rename a column. Use star (*) to include all columns or tblname.* to include only those columns from a particular table provided in the FROM clause.
- To crop out rows, use the LIMIT clause. You may provide a number of rows to skip (offset) and a number of maximum rows to return (rowcount).
- To sort, use the ORDER BY clause. If multiple expressions are given the left-most takes priority.
- To union (concatenate) the results from multiple select clauses, use the UNION clause. The column names and types from the first select clause will provide the result set schema definition. If BYNAME is specified, then target columns and source columns are mapped by column nume. If BYNAME is not specified all select clauses participating in a union must define the same number of columns. Note: this operates like a union all in many databases.
- To join (blend) tables, supply all tables together in the FROM clause and a Cartesian product will be provided. For more advanced Joins, you can use a combination of the LEFT, RIGHT, OUTER, JOIN, ON and NEAREST clauses. See JOINING DATA section for details
- To process String columns containing multi-value delimited data, use the UNPACK col ON delimeter expression. This will split the string such that each entry will result in a new row.
Syntax
SELECT column_expr|lambda_expr [, column_expr|lambda_expr ...] FROM table_expr|nested_select_expr [, table_expr|nested_select_expr ...]|
[[LEFT|RIGHT|OUTER] [ONLY] JOIN [table_expr|nested_select_expr] [ON expr] [NEAREST nearest_expr]]
[WHERE where_expr ]
[GROUP BY column_expr [, column_expr ...] [HAVING expr ]]
[ORDER BY expr [ASC|DESC] [, expr [ASC|DESC] ...]]
[UNPACK col_to_unpack ON unpack_delimiter [,col_to_unpack ON unpack_delimiter]]
[LIMIT [offset,] rowcount]
[UNION [BYNAME] select_clause]
offset - integer number of rows to skip
rowcount - integer number of maximum rows that will be returned
Joining Data
Simple Cartesian Joins
Any number of tables can be joined by simply listing them after the FROM clause which will produce a Cartesian product, such that a row will be returned for each possible combination of rows from the source table. For example, the Cartesian product of a table with 5 rows and another table with 3 rows will produce 15 rows (5x3=15). Of course, the WHERE clause can be used to limit results to only the interesting combinations.
Handling Rows that don't match rows in other tables
For more granular control when joining two tables, use the JOIN syntax. There are 7 types of statements, let's assume two tables a and b:
- JOIN: FROM a JOIN b ON expr - This is the same as FROM a,b WHERE expr
- LEFT JOIN: FROM a LEFT JOIN b ON expr - A JOIN plus all records from table a that would otherwise be omitted (due to the expr).
- LEFT ONLY JOIN: FROM a LEFT ONLY JOIN b ON expr - Includes only records from table a that would otherwise be omitted (due to the expr).
- RIGHT JOIN: FROM a RIGHT JOIN b ON expr - A JOIN plus all records from table b that would otherwise be omitted (due to the expr). (similar to LEFT JOIN, swapping a and b)
- RIGHT ONLY JOIN: FROM a RIGHT ONLY JOIN b ON expr - Includes only records from table b that would otherwise be omitted (due to the expr). (similar to LEFT ONLY JOIN, swapping a and b)
- OUTER JOIN: A JOIN plus all records from table a and from table b that would otherwise be omitted (due to the expr)
- OUTER ONLY JOIN: Includes only records from table a and from table b that would otherwise be omitted (due to the expr).
Joining to a nearest match
Use the NEAREST clause in conjunction with one of the above 7 JOIN clauses to match two records where there may not be an exact match, such as doing an "as-of" join with timestamps. After the ON clause, include a NEAREST fuzzy_clause.
Consider: a JOIN b ON on_expr NEAREST fuzzy_clause, where the fuzzy clause is one of the 5 syntaxes:
- a.cola == b.colb - For each row in table a, join with the row in table b that satisfies on_expr and has the smallest delta between cola and colb.
- a.cola <= b.colb - For each row in table a, join with the row in table b that satisfies on_expr and cola is less than or equal to colb and has the smallest delta between cola and colb.
- a.cola >= b.colb - For each row in table a, join with the row in table b that satisfies on_expr and cola is greater than or equal to colb and has the smallest delta between cola and colb.
- a.cola < b.colb - For each row in table a, join with the row in table b that satisfies on_expr and cola is less than colb and has the smallest delta between cola and colb.
- a.cola > b.colb - For each row in table a, join with the row in table b that satisfies on_expr and cola is greater than colb and has the smallest delta between cola and colb.
Note the ordering of table a vs b in the fuzzy_clause does not matter. For example b.colb >= a.cola is the same as a.cola <= b.colb
General Examples
SELECT age, dob, weight, name FROM Patients
SELECT age, dob, weight, name FROM Patients WHERE name =~ "mr"
SELECT count(*) as cnt, age, avg(dob), stdev(weight) FROM Patients GROUP BY age ORDER BY age
SELECT * FROM Patients, Accounts WHERE Patients.id == Accounts.patient_id
SELECT * FROM Orders JOIN MarketData ON Orders.symbol==MarketData.symbol NEAREST marketData.time<=Orders.time
SELECT * EXCEPT (OrderID, Quantity) from Orders; //return all Orders data except the OrderID and Quantity Columns
SELECT * FROM Orders where OrderId in ("a1","a2","b1");
SELECT * FROM Orders where OrderId in (select origOrderId from AnotherTable);
SELECT * FROM Orders where (OrderId, account) in (("a1","account1"),("a2","account1"),("b1","account2"));
SELECT * FROM Orders where (OrderId, account) in (select origOrderId, origAccount from AnotherTable);
SELECT DISTINCT EXAMPLE
SELECT symbol FROM MarketData GROUP BY symbol; //Get unique symbols from the MarketData table.
Nearest Join Examples
//Assume two existing tables: Orders and MarketData
AMIDB> SELECT * FROM Orders;
AMIDB> SELECT * FROM MarketData;
//A JOIN will only include orders that have a satisfactory market data event. Note, we’re also calculating the price delta from the order vs the market
AMIDB> select *,Orders.Price-MarketData.Price as delta from Orders JOIN MarketData ON Orders.sym==MarketData.sym NEAREST Orders.time>=MarketData.time ORDER by Orders.time;
//A LEFT JOIN will include all orders, event those without a market data event (using null as a place holder).
AMIDB> select *,Orders.Price-MarketData.Price as delta from Orders LEFT JOIN MarketData ON Orders.sym==MarketData.sym NEAREST Orders.time>=MarketData.time ORDER by Orders.time;
Join Examples
Let's assume we have two tables A and B, each with 1 column:
We can see that the value 1 only exists in the A table, the value 3 exists only in the B table and the value 2 existing in both tables. When comparing the data between them there are, exhaustively speaking, 7 different ways you could want to see what matches and what doesn't.
JOIN | OUTER JOIN | OUTER ONLY JOIN | LEFT JOIN | LEFT ONLY JOIN | RIGHT JOIN | RIGHT ONLY JOIN | |
Include Matches? | YES | YES | YES | ||||
Includes Left W/O a Match on Right | YES | YES | YES | YES | |||
Includes Right W/O a Match on Left | YES | YES | YES | YES |
JOIN Syntax - Return only those rows with values that coexist in both tables. (Note, this is the same as the natural join select * from a,b where a.aid==b.bid)
SELECT * FROM a JOIN b on a.aid==b.bid
OUTER JOIN Syntax - Return all rows from both tables.
SELECT * FROM a OUTER JOIN b on a.aid==b.bid
OUTER ONLY JOIN Syntax - Return rows in the left table that don't have matching rows in the right table and rows in the right table that don't have matching rows in the left table.
SELECT * FROM a OUTER ONLY JOIN b on a.aid==b.bid
LEFT JOIN Syntax - Return all rows from the left table and only the rows in the right table matching rows in the left table.
SELECT * FROM a LEFT JOIN b on a.aid==b.bid
LEFT ONLY JOIN Syntax - Return only rows in the left table that don't have matching rows in the right table.
SELECT * FROM a LEFT ONLY JOIN b on a.aid==b.bid
RIGHT JOIN Syntax - Return all rows from the right table and only the rows in the left table matching rows in the right table. (Note, this is equivalent to … FROM b LEFT JOIN a …)
SELECT * FROM a RIGHT JOIN b on a.aid==b.bid
RIGHT ONLY JOIN Syntax - Return only rows in the right table that don't have matching rows in the left table. (Note, this is equivalent to … FROM b LEFT ONLY JOIN a …)
SELECT * FROM a RIGHT ONLY JOIN b on a.aid==b.bid
MULTIPLE TABLES INNER JOIN Syntax - Return rows that are joined based on columns from multiple tables. (Note, we only support inner join for mutliple table joins)
CREATE PUBLIC TABLE Orders(Id Int,Symbol String,SideId Int, StatusId Int);
INSERT INTO Orders VALUES(1,"AAPL",1,3),(2,"IBM",2,1),(3,"MSFT",2,2),(4,"TSLA",1,4);
CREATE PUBLIC TABLE Side(SideId Int, Side String);
INSERT INTO Side VALUES(1,"Buy"),(2,"Sell");
CREATE PUBLIC TABLE Status(StatusId Int, Status String);
INSERT INTO Status VALUES(1,"New"),(2,"Partial"),(3,"Filled"),(4,"Cancelled");
//Join Three Tables
SELECT o.Id,o.Symbol,a.Side,b.Status FROM Orders AS o, Side AS a, Status AS b WHERE a.SideId==o.SideId AND b.StatusId==o.StatusId;
TABLE UNPACKING Syntax - Return the result unpacked by a specific delimiter based on a specific column
CREATE TABLE Orders (orderId String,accounts String);
INSERT INTO Orders VALUES("ord1","a1,a2,a3"),("ord2","b4,b5"),("ord3","");
SELECT * FROM Orders UNPACK accounts ON ",";
analyze_clause
Overview
This command runs analysis on a table. Analysis is performed on rows contained in a defined window, similar to a self-join. Each window can define an order-by to allow for finer control over the order in which an aggregate is applied. Multiple windows may be defined. See window_expr in nested expressions for details.
When adding columns, the new cells will default to null. The column will be added to the end (right-most) of the table.
ANALYZE agg_expr [, agg_expr ...] FROM tblname window_expr [, window_expr ...]
tblname - string name of the table to alter the schema of
Example
CREATE TABLE
final_table AS
ANALYZE
a_date AS date,
BETA(win.a_open, win.b_open) AS beta,
COR(win.a_open, win.b_open) AS cor
FROM source_table
WINDOW
win ON win.date <= date + __date_range && date <= win.date
prepare_clause
Overview
This command allows the user to "prepare" additional columns whose values are calculated as aggregates across rows. For example, a running sum needs to consider not just the current row, but all rows before the current row. This can be accomplished using the STACK() method inside a PREPARE clause (see example below).
- ORDER BY: In many cases, the ordering of rows in the input table is important, especially for running sums. The ORDER BY clause is useful to guarantee the correct ordering.
- PARTITION BY: Used to perform many aggregates on a single table, each aggregate considering only a subset of the table's rows based on some grouping criteria.
PREPARE prep_expr [, prep_expr ...] FROM tblname
[ORDER BY expr [ASC|DESC] [, expr [ASC|DESC] ...]]
[PARTITION BY column_expr [, column_expr ...]
tblname - string name of the table to alter the schema of
Example
This example will produce a running sum of the qty column order by extime on a per-account basis
CREATE TABLE
final_table AS
PREPARE account, qty, extime, STACK(qty) as acc_qty FROM table ORDER BY extime
PARTITION BY account
Data Modification
insert_values_clause
Overview
This clause enables the insertion of raw values into an existing table. Many rows may be inserted in a single call. Each values_expr represents a single row of data to be inserted. To insert multiple rows append additional values_expr
In order to specify columns in which to insert values, provide an ordered list of column names after the table name. The same number of values must be supplied in the values_expr and in the same order. All other columns will default to null.
If a list of column names is not provided, all columns will be populated. Each values_expr should contain one value for each column in the target table. Values should be in the same natural order as the columns in the target table.
The forloop_expr allows for the insertion of values_expr to be repeated for each cycle of the for loop. The values can reference the for loop's declared variable. Multiple for loops can be nested inside one another.
Syntax
INSERT [INTO] tblname [(col1[, col2 ...])] [forloop_expr ...] VALUES values_expr[, values_expr ...] [LIMIT maxinsert]
tblname - string name of the table to insert into
colN - string name of the column to contain the Nth value from values_expr
Examples
INSERT INTO rectangles (area, perimeter) VALUES (20,18);
int length=10,width=5;
INSERT INTO rectangles (area, perimeter) VALUES ((length * width), (2 * length + 2 * width));
insert_from_clause
Overview
This clause causes the insertion of the result of a projecting-clause (select_clause, analyze_clause, prepare_clause, execute_clause) into an existing table. Each row from the result set of the select_clause will result in the insertion of a new record into the target table. Please note, multiple tables may be inserted in a single call using the UNION clause.
In order to specify columns in which to insert values, provide an ordered list of column names after the table name. The result set from the preceding select_ must have the same number of columns. The columns should be in the same order. All other columns will default to null.
If a list of column names is not provided, all columns will be populated. Each values_expr should contain one value for each column in the target table. Values should be in the same natural order as the columns in the target table.
If the BYNAME keyword is included, then source and target columns are aligned based on column name. Column Names in the target table that do not existing in the query clause will be populated with null
Syntax
INSERT [INTO] tblname [(col1[, col2 ...])] [BYNAME] [FROM] select_clause
INSERT [INTO] tblname [(col1[, col2 ...])] [BYNAME] [FROM] analyze_clause
INSERT [INTO] tblname [(col1[, col2 ...])] [BYNAME] [FROM] prepare_clause
INSERT [INTO] tblname [(col1[, col2 ...])] [BYNAME] [FROM] execute_clause
INSERT [INTO] tblname [(col1[, col2 ...])] [BYNAME] [FROM] describe_clause
INSERT [INTO] tblname [(col1[, col2 ...])] [BYNAME] [FROM] show_clause
INSERT [INTO] tblname [(col1[, col2 ...])] [BYNAME] [FROM] call_clause
tblname - string name of the table to insert into
colN - string name of the column to contain the Nth value from results of select clause
Example
INSERT INTO employee_summary (age, birthplace) FROM SELECT age, birthplace FROM employee
sync_clause
Overview
This clause causes the target table to reflect the result of a projecting-clause (select_clause, analyze_clause, prepare_clause, execute_clause), meaning that rows will be inserted, deleted and updated accordingly on the target table such that the target table will properly mirror the result of the projecting-clause.
In otherwords, the two examples will result in the same final state for the my_target table, but from a trigger stand point they are significantly different:
//Example 1: using a traditional clear-and-rebuild technique:
TRUNCATE my_target; INSERT INTO my_target FROM SELECT * from my_source;
//Example 2: using the sync method:
SYNC my_target FROM SELECT * from my_source;
In the first example, the truncate instruction will cause a delete trigger to be fired for every row originally in the my_target table and then the subsequent insert instruction will cause an insert trigger to be fired for every row inserted into the my_target table.
The following rules apply for how the target table is mutated:
- Records in the projecting-clause that do not have a matching record in the target table are inserted into the target table
- Records in the target table that do not have a matching record in the projecting-clause are deleted from the target table
- Records in the target table that have a "perfect" matching record in the projecting-clause remain untouched. (no triggers fired
- Records in the target table that have a "non-perfect" matching record in the projecting-clause are updated.
An ON clause can be used to tell the sync tool how to determine if a source and target row are considered equivalent. When an equivalent row between the target table and projecting-clause is identified, this will result in an update. Rows that don't have a match will result in a delete and/or insert.
Matching Rules:
- If all cells between two rows are equivalent, that is considered a "Perfect match"
- If an ON clause is specified: Two row are considered a "non-perfect" match if all of the cells specified in the ON clause are a match, but other cells are not. (If an ON clause is not specified then there can not be non-perfect matches)
In order to specify columns in which to sync values, provide an ordered list of column names after the table name (note that only these columns will be considered for the perfect-matching criteria). The result set from the preceding select_clause must have the same number of columns. The columns should be in the same order. All other columns will default to null for inserts and will be left unaltered for updates.
If a list of column names is not provided, all columns will be populated. Each values_expr should contain one value for each column in the target table. Values should be in the same natural order as the columns in the target table.
Syntax
SYNC [INTO] tblname [(col1[, col2 ...])] [ON(onCol1,[,onCol2])] [FROM] select_clause
SYNC [INTO] tblname [(col1[, col2 ...])] [ON(onCol1,[,onCol2])] [FROM] analyze_clause
SYNC [INTO] tblname [(col1[, col2 ...])] [ON(onCol1,[,onCol2])] [FROM] prepare_clause
SYNC [INTO] tblname [(col1[, col2 ...])] [ON(onCol1,[,onCol2])] [FROM] execute_clause
tblname - string name of the table to insert into
colN - string name of the column to contain the Nth value from results of select clause
onColN - string name of the column to be used in determining if two records are equivalent
Examples
SYNC accounts USE ds=my_datasource EXECUTE select * from Accounts;
SYNC new_employees SELECT * FROM old_employees;
delete_from_clause
Overview
This clause enables the deletion of specific rows or all rows from a table and also calls the onDelete triggers if applicable.
When it comes to deleting all rows from the table, this is similar to using truncate clause but consumes more memory.
To delete only specific rows, provide a WHERE clause which must evaluate to a Boolean value where true indicates that the row should be deleted. Otherwise, to delete all rows, do not provide a WHERE clause.
You can specify the maximum number of rows to delete using the LIMIT clause (maxdelete). Additionally, you may provide a number of rows to skip when deleting (skipdelete). Note the limit is applied after the where clause, meaning that the skipdelete indicates how many records to skip deleting that would otherwise be deleted based on the where clause.
The JOIN .. ON clause allows you to use the values from another table to determine which rows to delete from the target table. The JOIN operates similar to the SELECT's JOIN clause. Note that LEFT JOIN and LEFT ONLY are also supported. Keep in mind that you can only delete rows from the left table.
Syntax
DELETE FROM tblname [WHERE where_expr] [LIMIT [skipdelete,] maxdelete]
DELETE FROM tblname [LEFT [ONLY] JOIN table_expr ON expr ] [WHERE where_expr] [LIMIT [skipdelete,] maxdelete]
DELETE FROM tblname [LEFT [ONLY] JOIN table_expr ON expr ] [WHERE where_expr] [LIMIT [skipdelete,] maxdelete]
tblname - string name of the table to delete from
Examples
DELETE FROM names; //deletes all records from names table
DELETE FROM names WHERE first_name == "Dave"; //deletes all records where name is Dave
DELETE FROM names WHERE first_name == "Dave" LIMIT 2; //delete 1st & 2nd record where name is Dave
DELETE FROM names WHERE first_name == "Dave" LIMIT 2,1; //deletes 3rd record where name is Dave
DELETE FROM names JOIN users ON first_name == users.name; //deletes all
records from names where the first_name has a match with the user's table
DELETE FROM names LEFT ONLY JOIN users ON first_name == users; //deletes all
records from names where the first_name does NOT have a match with the
user's table
truncate_clause
Overview
This command deletes all rows from a table. You can specify multiple tables, separated by commas.
Syntax
TRUNCATE [TABLE] tblname [, tblname ...]
Examples
TRUNCATE TABLE Accounts;
TRUNCATE TABLE Orders, Executions, Alerts;
update_clause
Overview
This clause can be used to update specific rows or all rows in a table. Following the SET statement, provide an assignment (target=expr) clause for each column to be updated, where target is the name of an existing column.
To update only specific rows, provide a WHERE clause which evaluates to a Boolean value, where true indicates that the row should be updated. Otherwise, to update all rows, do not provide a WHERE clause.
You can specify the maximum number of rows to update using the LIMIT clause (maxupdate). Additionally, you may provide a number of rows to skip when updating (skipupdate). Note the limit is applied after the where clause, meaning that the skipupdate indicates how many records to skip updating that would otherwise be updated based on the where clause.
The JOIN .. ON clause allows you to use the values from another table to update values in the target table. The JOIN operates similar to the SELECT's JOIN clause. Note that LEFT JOIN and LEFT ONLY are also supported. Keep in mind that you can only update values on the left table.
Syntax
UPDATE tblname SET col1=expr [, col2=expr...] [WHERE where_expr] [LIMIT [skipupdate,] maxupdate]
UPDATE tblname [LEFT [ONLY] JOIN table_expr ON expr ] SET col1=expr [, col2=expr...] [WHERE where_expr] [LIMIT [skipupdate,] maxupdate]
UPDATE tblname [LEFT [ONLY] JOIN table_expr ON expr ] SET col1=expr [, col2=expr...][WHERE where_expr] [LIMIT [skipupdate,] maxupdate]
tblname - string name of the table to update
colN - string name of the column to assign value from expression
Examples
UPDATE groceries SET price = 10, status=null; //update all records setting price to 10 and status to null
UPDATE groceries SET price = price + 1 WHERE type == "dairy" //increment price for all records where type is dairy
UPDATE groceries SET price = price + 1 WHERE type == "dairy" LIMIT 10; //increments the price for the first 10 records where type is dairy
UPDATE groceries SET price = price + 1 WHERE type == "dairy" LIMIT 3,2 ; //increments the price for the 4th and 5th record where type is dairy
UPDATE flags set a1=a2,a2=a1 where a1<a2; //Swap the values in the a1 and a2 columns where a1 is less than a2
UPDATE groceries JOIN products ON groceries.id==products.grocery_id SET
price=products.price WHERE type == "dairy"; // Sets the price from the
products table onto the groceries table where the ids match.
Schema Modification
create_table_clause
Overview
This command is used to create a new, empty (no rows) table with a specified set of columns (names and types). If another table already exists with the tblname, the existing table will be implicitly dropped and replaced with the newly created table.
At least one column must be specified, and each column must have a unique name.
Syntax
CREATE TABLE [IF NOT EXISTS] tblname (col1 type_expr [, col2 type_expr ...])
tblname - string name of the table to create
colN - string name of the Nth column to create
Example
CREATE TABLE countries (country String, language String, GDP Double)
create_table_as_clause
Overview
This command is used to create a new table which has the same schema definition as the result set of a select_clause. If another table already exists with the tblname, the existing table will be implicitly dropped and replaced with the newly created table.
All rows from the select_clause's result set will be inserted into the table.
Syntax
CREATE TABLE [IF NOT EXISTS] tblname[(col1 column_type [, col2 column_type ...])] AS select_clause
CREATE TABLE [IF NOT EXISTS] tblname[(col1 column_type [, col2 column_type ...])] AS analyze_clause
CREATE TABLE [IF NOT EXISTS] tblname[(col1 column_type [, col2 column_type ...])] AS prepare_clause
CREATE TABLE [IF NOT EXISTS] tblname[, tblname ...] AS execute_clause
CREATE TABLE [IF NOT EXISTS] tblname[(col1 column_type [, col2 column_type ...])] AS describe_clause
CREATE TABLE [IF NOT EXISTS] tblname[(col1 column_type [, col2 column_type ...])] AS show_clause
CREATE TABLE [IF NOT EXISTS] tblname[(col1 column_type [, col2 column_type ...])] AS call_clause
tblname - string name of the table to create
Example
CREATE TABLE names AS SELECT first_name, last_name FROM employees
alter_table_clause
Overview
This command alters the schema (column names and types) of an existing table. You can add columns (ADD), rename columns (RENAME ... TO), delete columns (DROP), and change the type of columns (MODIFY). Note, when multiple alterations are supplied, they are evaluated left to right. When changing types, a best effort cast will be applied.
When adding columns, the new cells will default to null. The column will be added to the end (right-most) of the table.
Syntax
ALTER TABLE tblname alter_col_expr [, alter_col_expr ...]
tblname - string name of the table to alter
Examples
ALTER TABLE AccountHoldings ADD UserId long, RENAME AccountId TO Account_Id, DROP UserInfo
ALTER TABLE AccountHoldings MODIFY UserId AS UserId INT
ALTER TABLE AccountHoldings RENAME UserId TO UserIds, ADD UserId STRING
rename_table_clause
Overview
This command renames an existing table. If another table already exists with the newname, the existing table will be implicitly dropped and replaced with the renamed table.
Syntax
RENAME TABLE tblname TO newname
tblname - string name of the table to rename
newname - string name of the new name for the table
Example
RENAME TABLE AccountHoldings TO Accounts
drop_table_clause
Overview
This command drops an existing table, along with its schema and all rows. You can specify multiple tables, separated by commas.
Syntax
DROP TABLE [IF EXISTS] tblname [, tblname ...]
tblname - string name of the table to drop
Example
DROP TABLE Accounts
AMIDB Scope Options
TEMPORARY
A table visible only to the user. It exists as long as the user session is active.
VARIABLE
A table created as an AMI Script Table object. It has the same scope and lifetime as a TEMPORARY table.
Table vt = SELECT * FROM Tablename;
PUBLIC
A table which is visible to all users. Its definition (but not the content) persists.
For persistent content see PersistEngine options.
If the scope is not specified, AMI will first check TEMPORARY tables, then VARIABLE tables, and lastly PUBLIC tables. Tables with different scopes but the same name can exist. However, if a PUBLIC table already exists, then it cannot be created again.
You may look at the DefinedBy column to see if a column is a variable or temporary.
Note: Argument TEMP can also be used instead of TEMPORARY with the same effect
Display
SHOW [TEMPORARY|PUBLIC|VARIABLE] TABLES;
shows scoped tables
SHOW [TEMPORARY|PUBLIC|VARIABLE] COLUMNS;
shows columns of scoped tables
SHOW [TEMPORARY|PUBLIC|VARIABLE] TABLE TableName;
shows column details of a scoped table
DESCRIBE [TEMPORARY|PUBLIC|VARIABLE] Tablename;
provides the command which can be used to create the scoped table
Schema Modification
CREATE [TEMPORARY|PUBLIC] TABLE Tablename ...;
creates a scoped table
Examples
CREATE TABLE TempTable1 (a int);
CREATE TEMP TABLE TempTable2 (a int);
CREATE TEMPORARY TABLE TempTable3 (a int);
CREATE PUBLIC TABLE PubTable1 (x int);
CREATE PUBLIC TABLE PubTable2 (x int, y double);
—
ALTER [TEMPORARY|PUBLIC|VARIABLE] TABLE Tablename ...;
modifies a scoped table
Examples
ALTER TEMPORARY TABLE TempTable1 ADD id String;
ALTER TEMPORARY TABLE TempTable2 MODIFY a as a BigInteger;
ALTER TEMPORARY TABLE TempTable3 RENAME a to b;
ALTER PUBLIC TABLE PubTable2 DROP y;
—
DROP [TEMPORARY|PUBLIC|VARIABLE] TABLE Tablename;
deletes a scoped table
Examples
DROP TEMPORARY TABLE TempTable3;
DROP PUBLIC TABLE PubTable2;
—
RENAME [TEMPORARY|PUBLIC|VARIABLE] TABLE Tablename TO NewTablename;
renames a scoped table
Examples
RENAME TEMPORARY TABLE TempTable2 TO TempTableTwo;
RENAME PUBLIC TABLE PubTable1 TO PubTable;
(All schema modification syntax options apply as in Schema Modification)
Data Modification
INSERT INTO [TEMPORARY|PUBLIC|VARIABLE] Tablename ...;
inserts row into scoped table
Examples
INSERT INTO TEMPORARY TempTable1 VALUES (1, "ID1"), (2, "ID2"), (3, "ID3");
INSERT INTO PUBLIC PubTable VALUES (4), (5);
—
PREPARE ... FROM [TEMPORARY|PUBLIC|VARIABLE] ... Tablename ...;
prepares additional columns on scoped table —
ANALYZE ... FROM [TEMPORARY|PUBLIC|VARIABLE] ... Tablename;
runs analysis on scoped table
Examples
ANALYZE COUNT(win.a) as Count FROM TEMPORARY TempTable1 WINDOW win ON true;
ANALYZE SUM(win.x) as Sum FROM PUBLIC PubTable WINDOW win ON true;
—
SYNC [TEMPORARY|PUBLIC|VARIABLE] Tablename FROM ... ProjectingClause;
syncs scoped table to ProjectingClause result
Example
SYNC TEMPORARY TempTableTwo FROM select * from PubTable;
—
TRUNCATE[TEMPORARY|PUBLIC|VARIABLE][TABLE] Tablename;
deletes all rows from scoped table
(All data modification syntax options apply as in Data Modification)
Query Clauses
SELECT ... FROM [TEMPORARY|PUBLIC|VARIABLE] Tablename ...[AS Asname] ...
queries scoped table
Examples
SELECT * FROM TEMPORARY TempTable1 WHERE id == "ID2";
SELECT * FROM PUBLIC PubTable;
—
UPDATE [TEMPORARY|PUBLIC|VARIABLE] Tablename [AS Asname] ...[JOIN_CLAUSE [TEMPORARY|PUBLIC|VARIABLE] Tablename 2 [AS Asname2] ] ...
updates scoped table
Examples
UPDATE TEMPORARY TempTable1 set id = "NewID";
UPDATE PUBLIC PubTable set x = -1;
—
DELETE FROM [TEMPORARY|PUBLIC|VARIABLE] Tablename [AS Asname] [JOIN_CLAUSE [TEMPORARY|PUBLIC|VARIABLE] Tablename2 [AS Asname2] ] ...
deletes data from scoped table
Examples
DELETE FROM TEMPORARY TempTable1 WHERE a == 3;
DELETE FROM PUBLIC PubTable;
(All query clauses syntax options apply as in Query Clauses)
Assigning query results to local variables
AMI Script supports the ability to execute queries and store the results into local variables. There are various ways the query results can be stored, beyond just Table objects, see type Coercion rules below.
This is the general form:
type var_name = query;
type: the type of local variable to store the query result.
Var_name: Name of the local
Query: a valid sql query. Either select_clause, analyze_clause, or prepare_clause
Valid Type Coercion rules are listed below. Please note that an attempt to coerce the query result to an invalid type, will result in null.
Table mytable = SELECT * from SampleTable;
List: Query results that have a single column can be stored to a List. Each value in the query result will be inserted into the list in order.
List myvals = select SampleColumn from SampleTable;
Set: Query results that have a single column can be stored to a Set. Each value in the query result will be inserted into the set in order, ignoring duplicate values (as per set's contract).
Set myvals = select SampleColumn from SampleTable;
Map: Query results that have a single row can be stored to a map. The map's keys will be the columns' names, and the map's values will be the values of the first (and only) row.
Map myvals = select * from SampleTable limit 1;
Primitives: Queries that result in a single value (one row and one column) can be stored to a primitive variable. The value of the single cell is what will be stored.
Long myCount = select count(*) from SampleTable;
String myName = select max(name) from SampleTable;
AMI-SQL Glossary of Nested Expressions
alter_col_expr
MODIFY colname AS newname type_expr|ADD newname type_expr [BEFORE beforecol]|DROP colname|RENAME colname TO newname
colname: String name of the column to alter
newname: String name of the column to create
beforecol: String name of the existing column to add the new column before
column_expr
expr [AS colname]|[tablename.]* [EXCEPT (colname [,colname,...])]
colname: name of the column that will contain results of expression
*: expands to available columns for all tables or specified table
expr
See "Standard Operators" section.
forloop_expr
FOR varname = start_const TO end_const [STEP step]
varname: variable that can be referenced by expresions within the for loop
start_const: initial value of varname, inclusive
end_const: last number to loop over, inclusive unless the step would cause it to be skipped.
step: positive number dictating size to increment or decrement varname by in each loop cycle.
A negative or zero step will cause an error. If start_const <= end_const then step is used to increment, otherwise decrement.
in_expr
When only a single column will participate in the in clause:
expr IN (select_clause)
//(select must return a single column)
expr IN values_expr
//(each entry must be a single value)
When two or more columns will participate in the in clause:
(expr,[,expr2 ...]) [NOT] IN (select_clause)
//(select must return same number of columns)
(expr,[,expr2 ...]) [NOT] IN (values_expr[,values_expr...])
//(values_expr entry count must match expr count)
The IN clause evaluates to true for a given row if the result of expr from the left hand expression exists in the right hand expression. If using multiple columns, the combination of the expr must match the combination of the values_expr or select_clause for a given row.
lambda_expr
{statements} [AS colname]
The lambda expression allows for contained AMI Script to be executed once per row returned by the containing select statement. colname is the name of the column that the return value of this lambda will be placed in.
nested_select_expr
(select_clause) [AS aliasname]
Select_clause: inner query to execute
aliasname: name of alias for use in column_expr and expr
table_expr
tlbname [AS aliasname]
tblname: name of table to include in select
aliasname: name of alias for use in column_expr and expr
type_expr
LONG|INT|CHAR|BYTE|SHORT|DOUBLE|FLOAT|UTC|UTCN|BOOLEAN|STRING
values_expr
(val [,val2...])
where_expr
expr|in_expr [AND|OR where_expr]
Expressions with the added IN clause (in_expr) funtionality
window_expr
WINDOW window_name ON windowing_expr [ORDER BY order_expr] [PARTITION BY partitioning_expr]
window_name: The name of this window which is used in the preceding col_expr (in the analyze clause). For example, if the name of a window is "dates" then max(dates.open) would give you the max value of the open column for that window.
windowing_expr: The boolean expression that defines (for each row) the other rows that belong to the same window.
order_expr: Optional expression which determines the natural order of rows for the window. This is important for order dependent aggregates such as first(...) and last(...)
partitioning_expr: Optional expression which deterines how the table is broken up before windowing is considered. In other words, two rows that evaluate to a different partitioning_expr will never belong to the same window.
prep_expr
stack(expr) | running sum of expr from first to row n, inclusive |
count(expr) | running count where expr!=null to the current row, inclusive |
norm(expr) | similar to stack, but scaled so that the max value is one. Short hand for: stack(expr)/sum(expr). See dnorm(...) |
dnorm(expr) | similar to stack, but scaled so that the min value is zero. Short hand for: (stack(expr) - expr)/sum(expr). This, in conjunction with norm(...) can used for preparing area charts that should fill an area, such as a pie chart. |
rank(expr) | will evaluate expr for all rows and the lowest value will get 1, 2nd will get 2, etc. If multiple rows expr evaluate to the same value, they will get the same ranking. |
urank(expr) | will evaluate expr for all rows and the lowest value will get 1, 2nd will get 2, etc. If multiple rows expr evaluate to the same value, the first occurrence gets the lower value. |
unique(expr) | each unique evaluation of expr will return a unique integer, starting at 1. Repeat occurrences of expr will return the same integer value. |
offset(expr,offset) | Will grab the value offset from the current row. If offset==0 use current row, negative number is nth prior row, positive number is nth future row. Evaluates to null if offset is out of bounds for current table.
Ex: offset(price,-1) for first row returns null, for second row returns first row's price and for nth row returns (n-1)ths row's price |
kmeansCluster(expr1,expr2,...,nClusters,seed) | Clusters data using the k-means clustering algorithm. Takes one or more expressions, followed by int nClusters specifying the number of clusters and long seed specifying the random seed for initializing the model parameters. The seed argument may be set to null if no specific seed is desired. Returns a column of integers labeling each record according to its learned cluster. Input expressions must evaluate to numbers. |
emGmmCluster(expr1,expr2,...,nClusters,seed) | Same functionality as kmeansCluster(), but instead uses the expectation-maximization algorithm to train a Gaussian mixture model. |
viGmmCluster(expr1,expr2,...,nClusters,seed) | Same functionality as kmeansCluster(), but instead uses the variational inference algorithm to train a Gaussian mixture model. |
prep_expr
sum(expr) | Summation of all values, skips nulls |
min(expr) | Minimum value, skips nulls |
max(expr) | Maximum value, skips nulls |
count(expr) | Count of non-null values |
countUnique(expr) | Count of unique non-null values |
cat(expr, delim, limit) | Concatenates the first limit non-null values to a string with delim. Ex: cat(a,'/',2) selects the first 2 non-null values from column a and appends '/' at the end of each value. |
avg(expr) | Average of all non-null values |
var(expr) | Population Variance of all non-null values |
varS(expr) | Sample Variance of all non-null values |
stdev(expr) | Population Standard Deviation of all non-null values |
stdevS(expr) | Sample Standard Deviation of all non-null values |
first(expr) | First value, may be null if first value is null |
last(expr) | Last value, may be null if last value is null |
covar(expr1, expr2) | Population Covariance of 1st expr and 2nd expr where both aren't null. |
covarS(expr1, expr2) | Sample Covariance of 1st expr and 2nd expr where both aren't null. |
cor(expr1, expr2) | Correlation of 1st expr and 2nd expr where both aren't null. Returns 0 - 1 |
beta(expr1, expr2) | Beta of 1st expr and 2nd expr where both are not null. The beta function is defined as beta(x, y) = covar(x, y) / var(y). If stdev(y) = 0, the expression returns NaN. |
avgExp(expr, decay, desc) | The exponentially weighted average of non-null values with a specified decay rate. Desc of true/false means first/last row are highest weighted. |
avgGauss(expr, variance) | The Gaussian weighted average of non-null values, such that the middle value has highest weight. |
median(expr) | The median of all non-null values. |
percentileCont(expr) | eturns the value at the percentile provided. Is interpolated and might not be equal to any value within the supplied dataset |
percentileDisc(expr) | Returns the value nearest to the percentile provided |
AMI-SQL Escaping Rules
Use AMI reserve keyword as a table name
If you want to use a reserve keyword as your table name, such as order,limit, you need to use backtick ` to escape it.
CREATE PUBLIC TABLE order(id int, sym string, price double,qty int);
INSERT INTO `order` VALUES(123,"aapl",20.5,100);
SELECT * FROM `order`;