AMI SQL

From 3forge Documentation
Revision as of 14:52, 5 July 2022 by Tommy Hu (talk | contribs) (clarification on concat)
Jump to navigation Jump to search

Goals

The 3Forge SQL language provides a comprehensive instruction set for working with table(s) of data.

  • Retrieval - Accessing and Normalizing data from external sources
  • Blending - Perform Joins and Unions on tables
  • Modifying - Inserting, Updating, and Deleting rows
  • Analyzing - Grouping, Sorting, and Filtering
  • Schema Definition - Creating, Updating, and Dropping tables

Broadly speaking, the logic steps involved in data-visualization are:

1. Running a query, or set of queries, on external datasources and/or the AMI-Realtime database.

  • Each datasource is predefined and uniquely named.
  • The results of queries on datasources will result in temporary, in-memory tables
  • The queries can be constructed using user-input/user-selected/user-associated data
  • In more advanced cases, the queries can be sequential, meaning the results from one query are then fed into a subsequent query (allowing for reactionary data blending)

2. Blending the results from Step 1, resulting in new temporary in-memory tables 3. Building visualizations on the tables from Step 1 and/or Step 2

AMI SQL and Standard SQL Differences

For those familiar with standard SQL, these are the differences:

1. Equality is done with ==

  • AMISQL: SELECT * FROM Users WHERE userid == 15;
  • MYSQL: SELECT * FROM Users WHERE userid = 15;

2. Checking for null is done with equality, ex. == null

  • AMISQL: SELECT * FROM Users WHERE status == null;
  • MYSQL: SELECT * FROM Users WHERE status IS NULL;

3. Strings are surrounded in double quotes

  • AMISQL: SELECT * FROM Users WHERE name == "David";
  • MYSQL: SELECT * FROM Users WHERE name = 'David';

4. "Matches" (~~) instead of LIKE

  • AMISQL: SELECT * FROM Users WHERE name ~~ "^Dav";
  •  MYSQL: SELECT * FROM Users WHERE name LIKE 'Dav%';

 (See Simplified Text Matching for pattern matching rules/expressions)

5. Boolean expressions && works same as AND.  || works same as OR

  • AMISQL: SELECT * FROM Users WHERE userid == 15 AND status == null && name == "David";
  • MYSQL: SELECT * FROM Users WHERE userid = 15 AND status IS NULL && name = 'David';

6. By default all tables are TEMPORARY, use the PUBLIC key word to make non-temporary tables:

  • AMISQL: CREATE TABLE MyTemporaryTable(id String);
  • AMISQL: CREATE PUBLIC TABLE MyPermanantTable(id String);

Data Source Access Clauses

use...execute_clause

Overview

This command enables you to "execute" code on an external datasource and process/store the results locally

  • USE: While datamodels have default options, you can override them for a particular query.  Valid options include
    • ds = "datasource_name"
      • The name of the datasource to execute this script on
    • timeout = timeout_in_millis
      • The number of milliseconds before the execute command times out
    • limit = row_limit_to_return
      • The max number of records returned
    • ds_url = "url"
      • Overrides the url of the datasource
      • When used in conjunction with ds="datasource_name", supplied datasource must contain URL in the PermittedOverrides field
    • ds_username = "username"
      • Overrides the username of the datasource
      • When used in conjunction with ds="datasource_name", supplied datasource must contain USERNAME in the PermittedOverrides field
    • ds_password = "plain_text_password"
      • Overrides the password of the datasource using a plain text password
      • When used in conjunction with ds="datasource_name", supplied datasource must contain PASSWORD in the PermittedOverrides field
      • Mutually exclusive with the ds_password_enc option
    • ds_password_enc = "aes_encrypted_password"
      • Overrides the password of the datasource, using an encrypted password
      • Use strEncrypt(...) to create an encrypted password
      • When used in conjunction with ds="datasource_name", supplied datasource must contain PASSWORD in the PermittedOverrides field
      • Mutually exclusive with the ds_password option
    • ds_options = "some options"
      • Override the options of the datasource
      • When used in conjunction with ds="datasource_name", supplied datasource must contain OPTIONS in the PermittedOverrides field
    • ds_relay = "relayid"
      • Override the relay of the datasource
      • Run SHOW RELAYS command and inspect RelayId for available relays
      • When used in conjunction with ds="datasource_name", supplied datasource must contain RELAY in the PermittedOverrides field
    • ds_adapter = "SOME_ADAPTER"
      • Use an "anonymous" datasource of the specified adapter type
      • Run SHOW DATASOURCE_TYPES command and inspect the I column for available adapters
      • Mutually exclusive with ds="datasource_name" option
      • See ami.db.anonymous.datasources.enabled option for permission details
    • _custom=some_directive
      • Prefixing with an underbar (_) passes the option through to the datasource's adapter
  • EXECUTE: The literal text to execute. The text will be sent directly to the datasource for execution. Therefore, variables cannot be referenced. Instead, substitution can be used. See dollarsign_substitution

Syntax

[USE option = value  [  option = value  ...]]  EXECUTE code_to_execute

Example

CREATE TABLE

 final_table AS

USE

 ds="mydb"

 timeout=10000

 limit=100

EXECUTE

 select * from mytable;

use...show tables clause

Overview

This command enables you to see what tables are available on a given datasource (similar to what the AMI gui wizard shows you when selecting a table).

  • USE: While datamodels have default options, you can override them for a particular query.  Valid options include
    • ds = "datasource_name"             The name of the datasource to execute the script on

The resulting table will be called TABLE and contain two String columns: TableName and Schema

Syntax                                                                                                  

USE ds="mydb" SHOW TABLES;

Example

This example will return a table displaying all tables available in the mydb datasource. If the datasource supports schema names, that will be populated as well.

USE ds="mydb" SHOW TABLES;

//returns:

AMISQL.DSAccessTable.jpg

use...insert clause

Overview

This command uploads data from AMI into a datasource.

  • USE: While datamodels have default options, you can override them for a particular query.  Valid options include
    • ds = "datasource_name"             The name of the datasource to execute the script on

The resulting table will be called TABLE and contain two String columns: TableName and Schema

Syntax

USE ds="datasource_name"  INSERT [INTO] datasource_table  [(col1[, col2 ...])] select_clause
USE ds="datasource_name"  INSERT [INTO] datasource_table  [(col1[, col2 ...])] analyze_clause 
USE ds="datasource_name"  INSERT [INTO] datasource_table  [(col1[, col2 ...])] prepare_clause 
USE ds="datasource_name"  INSERT [INTO] datasource_table  [(col1[, col2 ...])] execute_clause
USE ds="datasource_name"  INSERT [INTO] datasource_table  [(col1[, col2 ...])] VALUES values_expr[, values_expr ...]
USE ds="datasource_name"  INSERT [INTO] datasource_table  [(col1[, col2 ...])] Use_execute_clause

datasource_name - string name of the datasource containing the table to insert values into

datasource_table - name of the table to insert values into

Examples

This example will insert two rows into the mydb datasource's users table:

USE ds="mydb" INSERT INTO users(name,city,state) values ("Rob","NYC","NY"),("Sara","OC","CA");

This example copy all rows from AMI's my_users table into the "mydb" datasource's users table:

USE ds="mydb" INSERT INTO users(name,city,state) SELECT Name,Local,Region FROM my_users;

This example copy all rows from the old_users table in the otherds datasource to the users table in the mydb datasource.

USE ds="mydb" INSERT INTO users(name,city,state) USE ds="otherds" EXECUTE SELECT * FROM old_users

concurrent_block (advanced)

Overview

When running analysis that needs to query many datasources, there can be large performance gains by executing these queries simultaneously. The CONCURRENT{...} key word causes all EXECUTE clauses within the block to be run in parallel (instead of sequentially).

In both of the examples below, we are going to execute three queries on three separate datasources (mydb1, mydb2, mydb3). The first example executes the queries sequentially and the second example executes the queries concurrently.

Assuming each query takes 2 seconds for a database to process, the first example will complete in about 6 seconds (2+2+2), while the second, concurrent, example will complete in only 2 seconds:

Example of Sequential Execution (Default)

{
  CREATE TABLE mytable1 AS USE ds="mydb1" EXECUTE select * from mytable;
  CREATE TABLE mytable2 AS USE ds="mydb2" EXECUTE select * from mytable;
  CREATE TABLE mytable3 AS USE ds="mydb3" EXECUTE select * from mytable;
}

Here is the sequence of events:

00:00:00 -  EXECUTE clause is sent to mydb1

00:00:00 -  AMI Script sleeps until there is a response from mydb1...


00:00:02 - mytable1 is created (from result of query to mydb1)

00:00:02 -  EXECUTE clause is sent to mydb2

00:00:02 -  AMI Script sleeps until there is a response from mydb2...


00:00:04 - mytable2 is created (from result of query to mydb2)

00:00:04 -  EXECUTE clause is sent to mydb3

00:00:04 -  AMI Script sleeps until there is a response from mydb3...


00:00:06 - mytable3 is created (from result of query to mydb3)

00:00:06 -  Block is complete after 6 seconds (code after the block will be executed)

Example of Concurrent Execution

So now let's add the CONCURRENT keyword before the block.  In this case, all 3 executes will be initiated at the same time.

CONCURRENT{
  CREATE TABLE mytable1 AS USE ds="mydb1" EXECUTE select * from mytable;
  CREATE TABLE mytable2 AS USE ds="mydb2" EXECUTE select * from mytable;
  CREATE TABLE mytable3 AS USE ds="mydb3" EXECUTE select * from mytable;
}

Here is the sequence of events:

00:00:00 - EXECUTE clause is sent to mydb1

00:00:00 - EXECUTE clause is sent to mydb2

00:00:00 - EXECUTE clause is sent to mydb3

00:00:00 - AMI Script sleeps until there is a response from all 3 queries...


00:00:02 - mytable1 is created (from result of query to mydb1)

00:00:02 - mytable2 is created (from result of query to mydb2)

00:00:02 - mytable3 is created (from result of query to mydb3)

00:00:02 -  Block is complete after 2 seconds (code after the block will be executed)

Query Clauses

select_clause

Overview

This clause allows the combination of Filtering (WHERE), Sorting (ORDER BY), Aggregating (GROUP BY), Projecting (SELECT …) and Cropping (LIMIT) of individual tables.  Next, the results may be combined via Unions (UNION) and/or Joins (FROM … JOIN ON). Complex Strings containing delimited data can be split into distinct rows via the UNPACK operation. Finally, see the create_table_as_clause for storing the final result of a select clause into another table.

  • To aggregate data, use the GROUP BY clause to identify any number of columns or derived values that should be used to group rows. Provide an alias (AS) for derived values to be used within the select clause. The select columns can only use aggregate functions (min, max, count, sum) on underlying rows and columns declared in the group-by clause
  • To filter data, provide a WHERE clause which evaluates to a Boolean.  True indicates don't filter. When filtering in conjunction with aggregating (GROUP BY), those records filtered out will not be considered for aggregation.  To filter rows after aggregation, use the HAVING clause.
  • For projecting, supply columns to include in the SELECT clause.  Derived values may be provided. Provide an alias (AS) for derived values or to rename a column.  Use star (*) to include all columns or tblname.* to include only those columns from a particular table provided in the FROM clause.
  • To crop out rows, use the LIMIT clause. You may provide a number of rows to skip (offset) and a number of maximum rows to return (rowcount).
  • To sort, use the ORDER BY clause. If multiple expressions are given the left-most takes priority.
  • To union (concatenate) the results from multiple select clauses, use the UNION clause. The column names and types from the first select clause will provide the result set schema definition.  If BYNAME is specified, then target columns and source columns are mapped by column nume. If BYNAME is not specified all select clauses participating in a union must define the same number of columns. Note: this operates like a union all in many databases.
  • To join (blend) tables, supply all tables together in the FROM clause and a Cartesian product will be provided. For more advanced Joins, you can use a combination of the LEFT, RIGHT, OUTER, JOIN, ON and NEAREST clauses. See JOINING DATA section for details
  • To process String columns containing multi-value delimited data, use the UNPACK col ON delimeter expression. This will split the string such that each entry will result in a new row.

Syntax

SELECT column_expr|lambda_expr  [,  column_expr|lambda_expr  ...] FROM table_expr|nested_select_expr  [, table_expr|nested_select_expr  ...]|
    [[LEFT|RIGHT|OUTER] [ONLY] JOIN [table_expr|nested_select_expr] [ON expr] [NEAREST nearest_expr]]
    [WHERE where_expr ]
    [GROUP BY column_expr [, column_expr ...] [HAVING expr ]]
    [ORDER BY expr [ASC|DESC] [, expr [ASC|DESC] ...]]
    [UNPACK col_to_unpack ON unpack_delimiter [,col_to_unpack ON unpack_delimiter]]
    [LIMIT [offset,] rowcount]
[UNION [BYNAME] select_clause] 

offset - integer number of rows to skip

rowcount - integer number of maximum rows that will be returned

Joining Data

Simple Cartesian Joins

Any number of tables can be joined by simply listing them after the FROM clause which will produce a Cartesian product, such that a row will be returned for each possible combination of rows from the source table.  For example, the Cartesian product of a table with 5 rows and another table with 3 rows will produce 15 rows (5x3=15). Of course, the WHERE clause can be used to limit results to only the interesting combinations.

Handling Rows that don't match rows in other tables

For more granular control when joining two tables, use the JOIN syntax.  There are 7 types of statements, let's assume two tables a and b:

  • FULL JOIN: FROM a JOIN b ON expr - This is the same as FROM a,b WHERE expr
  • LEFT JOIN: FROM a LEFT JOIN b ON expr - A FULL JOIN plus all records from table a that would otherwise be omitted (due to the expr).
  • LEFT ONLY JOIN: FROM a LEFT ONLY JOIN b ON expr - Includes only records from table a that would otherwise be omitted (due to the expr).
  • RIGHT JOIN: FROM a RIGHT JOIN b ON expr - A FULL JOIN plus all records from table b that would otherwise be omitted (due to the expr). (similar to LEFT JOIN, swapping a and b)
  • RIGHT ONLY JOIN: FROM a RIGHT ONLY JOIN b ON expr - Includes only records from table b that would otherwise be omitted (due to the expr). (similar to LEFT ONLY JOIN, swapping a and b)
  • OUTER JOIN: A FULL JOIN plus all records from table a and from table b that would otherwise be omitted (due to the expr
  • OUTER ONLY JOIN: Includes only records from table a and from table b that would otherwise be omitted (due to the expr).

Joining to a nearest match

Use the NEAREST clause in conjunction with one of the above 7 JOIN clauses to match two records where there may not be an exact match, such as doing an "as-of" join with timestamps. After the ON clause, include a NEAREST fuzzy_clause.

Consider: a JOIN b ON on_expr NEAREST fuzzy_clause, where the fuzzy clause is one of the 5 syntaxes:

  • a.cola == b.colb - For each row in table a, join with the row in table b that satisfies on_expr and has the smallest delta between cola and colb.
  • a.cola <= b.colb - For each row in table a, join with the row in table b that satisfies on_expr and cola is less than or equal to colb and has the smallest delta between cola and colb.
  • a.cola >= b.colb - For each row in table a, join with the row in table b that satisfies on_expr and cola is greater than or equal to colb and has the smallest delta between cola and colb.
  • a.cola < b.colb - For each row in table a, join with the row in table b that satisfies on_expr and cola is less than colb and has the smallest delta between cola and colb.
  • a.cola > b.colb - For each row in table a, join with the row in table b that satisfies on_expr and cola is greater than colb and has the smallest delta between cola and colb.

Note the ordering of table a vs b in the fuzzy_clause does not matter. For example  b.colb >= a.cola is the same as a.cola <= b.colb

General Examples

SELECT age, dob, weight, name FROM Patients
SELECT age, dob, weight, name FROM Patients WHERE name =~ "mr"
SELECT count(*) as cnt, age, avg(dob), stdev(weight) FROM Patients GROUP BY age ORDER BY age
SELECT * FROM Patients, Accounts WHERE Patients.id == Accounts.patient_id
SELECT * FROM Orders JOIN MarketData ON Orders.symbol==MarketData.symbol NEAREST marketData.time<=Orders.time
SELECT * EXCEPT (OrderID, Quantity) from Orders; //return all Orders data except the OrderID and Quantity Columns
SELECT * FROM Orders where OrderId in ("a1","a2","b1");
SELECT * FROM Orders where OrderId in (select origOrderId from AnotherTable);
SELECT * FROM Orders where (OrderId, account) in (("a1","account1"),("a2","account1"),("b1","account2"));
SELECT * FROM Orders where (OrderId, account) in (select origOrderId, origAccount from AnotherTable);
SELECT DISTINCT EXAMPLE
SELECT symbol FROM MarketData GROUP BY symbol; //Get unique symbols from the MarketData table.

Nearest Join Examples

//Assume two existing tables: Orders and MarketData

AMIDB> SELECT * FROM Orders;

AMISQL.NearestJoinExample01.jpg

AMIDB> SELECT * FROM MarketData;

AMISQL.NearestJoinExample02.jpg

//A JOIN will only include orders that have a satisfactory market data event. Note, we’re also calculating the price delta from the order vs the market

AMIDB> select *,Orders.Price-MarketData.Price as delta from Orders JOIN MarketData ON Orders.sym==MarketData.sym NEAREST Orders.time>=MarketData.time ORDER by Orders.time;

AMISQL.NearestJoinExample03.jpg

//A LEFT JOIN will include all orders, event those without a market data event (using null as a place holder).

AMIDB> select *,Orders.Price-MarketData.Price as delta from Orders LEFT JOIN MarketData ON Orders.sym==MarketData.sym NEAREST Orders.time>=MarketData.time ORDER by Orders.time;

AMISQL.NearestJoinExample04.jpg

Join Examples

Let's assume we have two tables A and B, each with 1 column:

AMISQL.NearestJoinExample05.jpg

We can see that the value 1 only exists in the A table, the value 3 exists only in the B table and the value 2 existing in both tables.  When comparing the data between them there are, exhaustively speaking, 7 different ways you could want to see what matches and what doesn't.

JOIN

AMISQL.QueryClauses.JOIN.png

OUTER JOIN

AMISQL.QueryClauses.OUTERJOIN.png

OUTER ONLY JOIN

AMISQL.QueryClauses.OUTERJOINONLY.png

LEFT JOIN

AMISQL.QueryClauses.LEFTJOIN.png

LEFT ONLY JOIN

AMISQL.QueryClauses.LEFTONLYJOIN.png

RIGHT JOIN

AMISQL.QueryClauses.RIGHTJOIN.png

RIGHT ONLY JOIN

AMISQL.QueryClauses.RIGHTONLYJOIN.png

Include Matches? YES YES YES
Includes Left W/O a Match on Right YES YES YES YES
Includes Right W/O a Match on Left YES YES YES YES

AMISQL.QueryClauses.JOIN.pngJOIN Syntax - Return only those rows with values that coexist in both tables.  (Note, this is the same as the natural join select * from a,b where a.aid==b.bid)

SELECT * FROM a JOIN b on a.aid==b.bid

AMISQL.JoinExample01.Fixed.jpg


AMISQL.QueryClauses.OUTERJOIN.png OUTER JOIN Syntax - Return all rows from both tables.

SELECT * FROM a OUTER JOIN b on a.aid==b.bid

AMISQL.JoinExample02.jpg


AMISQL.QueryClauses.OUTERJOINONLY.png OUTER ONLY JOIN Syntax - Return rows in the left table that don't have matching rows in the right table and rows in the right table that don't have matching rows in the left table.

SELECT * FROM a OUTER ONLY JOIN b on a.aid==b.bid

AMISQL.JoinExample03.jpg


AMISQL.QueryClauses.LEFTJOIN.png LEFT JOIN Syntax - Return all rows from the left table and only the rows in the right table matching rows in the left table.

SELECT * FROM a LEFT JOIN b on a.aid==b.bid 

AMISQL.JoinExample04.jpg


AMISQL.QueryClauses.LEFTONLYJOIN.png LEFT ONLY JOIN Syntax - Return only rows in the left table that don't have matching rows in the right table.

SELECT * FROM a LEFT ONLY JOIN b on a.aid==b.bid 

AMISQL.JoinExample05.jpg


AMISQL.QueryClauses.RIGHTJOIN.png RIGHT JOIN Syntax - Return all rows from the right table and only the rows in the left table matching rows in the right table.  (Note, this is equivalent to … FROM b LEFT JOIN a …)

SELECT * FROM a RIGHT JOIN b on a.aid==b.bid 

AMISQL.JoinExample06.jpg


AMISQL.QueryClauses.RIGHTONLYJOIN.png RIGHT ONLY JOIN Syntax - Return only rows in the right table that don't have matching rows in the left table.  (Note, this is equivalent to … FROM b LEFT ONLY JOIN a …)

SELECT * FROM a RIGHT ONLY JOIN b on a.aid==b.bid 

AMISQL.JoinExample07.jpg

Unpack example

CREATE TABLE Orders (orderId String,accounts String);
INSERT INTO Orders VALUES("ord1","a1,a2,a3"),("ord2","b4,b5"),("ord3","");
SELECT * FROM Orders UNPACK accounts ON ",";

AMISQL.UnpackExample.jpg

analyze_clause

Overview

This command runs analysis on a table. Analysis is performed on rows contained in a defined window, similar to a self-join. Each window can define an order-by to allow for finer control over the order in which an aggregate is applied.  Multiple windows may be defined. See window_expr in nested expressions for details.

When adding columns, the new cells will default to null. The column will be added to the end (right-most) of the table.

ANALYZE agg_expr [, agg_expr ...]  FROM tblname  window_expr  [,  window_expr  ...]

tblname - string name of the table to alter the schema of

Example

CREATE TABLE
  final_table AS
ANALYZE
  a_date AS date,
  BETA(win.a_open, win.b_open) AS beta,
  COR(win.a_open, win.b_open) AS cor
FROM source_table
WINDOW
  win ON win.date <= date + __date_range && date <= win.date

prepare_clause

Overview

This command allows the user to "prepare" additional columns whose values are calculated as aggregates across rows. For example, a running sum needs to consider not just the current row, but all rows before the current row. This can be accomplished using the STACK() method inside a PREPARE clause (see example below).

  • ORDER BY: In many cases, the ordering of rows in the input table is important, especially for running sums. The ORDER BY clause is useful to guarantee the correct ordering.
  • PARTITION BY: Used to perform many aggregates on a single table, each aggregate considering only a subset of the table's rows based on some grouping criteria.
PREPARE prep_expr [, prep_expr ...]  FROM tblname   
  [ORDER BY expr [ASC|DESC] [, expr [ASC|DESC] ...]]
  [PARTITION BY column_expr [, column_expr ...]

tblname - string name of the table to alter the schema of

Example

This example will produce a running sum of the qty column order by extime on a per-account basis

CREATE TABLE
  final_table AS
PREPARE account, qty, extime, STACK(qty) as acc_qty FROM table ORDER BY extime
PARTITION BY account

Data Modification

insert_values_clause

Overview

This clause enables the insertion of raw values into an existing table.  Many rows may be inserted in a single call. Each values_expr represents a single row of data to be inserted. To insert multiple rows append additional values_expr

In order to specify columns in which to insert values, provide an ordered list of column names after the table name.  The same number of values must be supplied in the values_expr and in the same order.  All other columns will default to null.

If a list of column names is not provided, all columns will be populated. Each values_expr should contain one value for each column in the target table. Values should be in the same natural order as the columns in the target table.

The forloop_expr allows for the insertion of values_expr to be repeated for each cycle of the for loop. The values can reference the for loop's declared variable. Multiple for loops can be nested inside one another.

Syntax

INSERT [INTO]  tblname [(col1[, col2 ...])] [forloop_expr ...] VALUES values_expr[, values_expr ...] [LIMIT maxinsert]

tblname - string name of the table to insert into

colN - string name of the column to contain the Nth value from values_expr

Examples

INSERT INTO rectangles (area, perimeter) VALUES (20,18);
int length=10,width=5;
INSERT INTO rectangles (area, perimeter) VALUES ((length * width), (2 * length + 2 * width));

insert_from_clause

Overview

This clause causes the insertion of the result of a projecting-clause (select_clause, analyze_clause, prepare_clause, execute_clause) into an existing table.  Each row from the result set of the select_clause will result in the insertion of a new record into the target table. Please note, multiple tables may be inserted in a single call using the UNION clause.

In order to specify columns in which to insert values, provide an ordered list of column names after the table name.  The result set from the preceding select_ must have the same number of columns. The columns should be in the same order.  All other columns will default to null.

If a list of column names is not provided, all columns will be populated. Each values_expr should contain one value for each column in the target table. Values should be in the same natural order as the columns in the target table.

If the BYNAME keyword is included, then source and target columns are aligned based on column name. Column Names in the target table that do not existing in the query clause will be populated with null

Syntax

INSERT [INTO]  tblname [(col1[, col2 ...])] [BYNAME] [FROM]  select_clause
INSERT [INTO]  tblname [(col1[, col2 ...])] [BYNAME] [FROM]  analyze_clause
INSERT [INTO]  tblname [(col1[, col2 ...])] [BYNAME] [FROM]  prepare_clause
INSERT [INTO]  tblname [(col1[, col2 ...])] [BYNAME] [FROM]  execute_clause
INSERT [INTO]  tblname [(col1[, col2 ...])] [BYNAME] [FROM]  describe_clause
INSERT [INTO]  tblname [(col1[, col2 ...])] [BYNAME] [FROM]  show_clause
INSERT [INTO]  tblname [(col1[, col2 ...])] [BYNAME] [FROM]  call_clause

tblname - string name of the table to insert into

colN - string name of the column to contain the Nth value from results of select clause

Example

INSERT INTO employee_summary (age, birthplace) FROM SELECT age, birthplace FROM employee

sync_clause

Overview

This clause causes the target table to reflect the result of a projecting-clause (select_clause, analyze_clause, prepare_clause, execute_clause), meaning that rows will be inserted, deleted and updated accordingly on the target table such that the target table will properly mirror the result of the projecting-clause.

In otherwords, the two examples will result in the same final state for the my_target table, but from a trigger stand point they are significantly different:

//Example 1: using a traditional clear-and-rebuild technique:
 TRUNCATE my_target; INSERT INTO my_target FROM SELECT * from my_source;
//Example 2: using the sync method:
 SYNC my_target FROM SELECT * from my_source;

In the first example, the truncate instruction will cause a delete trigger to be fired for every row originally in the my_target table and then the subsequent insert instruction will cause an insert trigger to be fired for every row inserted into the my_target table.  

The following rules apply for how the target table is mutated:

  • Records in the projecting-clause that do not have a matching record in the target table are inserted into the target table
  • Records in the target table that do not have a matching record in the projecting-clause are deleted from the target table
  • Records in the target table that have a "perfect" matching record in the projecting-clause remain untouched. (no triggers fired
  • Records in the target table that have a "non-perfect" matching record in the projecting-clause are updated.

An ON clause can be used to tell the sync tool how to determine if a source and target row are considered equivalent.  When an equivalent row between the target table and projecting-clause is identified, this will result in an update.  Rows that don't have a match will result in a delete and/or insert.

Matching Rules:

  • If  all cells between two rows are equivalent, that is considered a "Perfect match"
  • If an ON clause is specified: Two row are considered a "non-perfect" match if all of the cells specified in the ON clause are a match, but other cells are not.  (If an ON clause is not specified then there can not be non-perfect matches)

In order to specify columns in which to sync values, provide an ordered list of column names after the table name (note that only these columns will be considered for the perfect-matching criteria).  The result set from the preceding select_clause must have the same number of columns. The columns should be in the same order.  All other columns will default to null for inserts and will be left unaltered for updates.

If a list of column names is not provided, all columns will be populated. Each values_expr should contain one value for each column in the target table. Values should be in the same natural order as the columns in the target table.

Syntax

SYNC [INTO]  tblname [(col1[, col2 ...])] [ON(onCol1,[,onCol2])] [FROM]  select_clause
SYNC [INTO]  tblname [(col1[, col2 ...])] [ON(onCol1,[,onCol2])] [FROM]  analyze_clause
SYNC [INTO]  tblname [(col1[, col2 ...])] [ON(onCol1,[,onCol2])] [FROM]  prepare_clause
SYNC [INTO]  tblname [(col1[, col2 ...])] [ON(onCol1,[,onCol2])] [FROM]  execute_clause

tblname - string name of the table to insert into

colN - string name of the column to contain the Nth value from results of select clause

onColN - string name of the column to be used in determining if two records are equivalent

Examples

SYNC accounts USE ds=my_datasource EXECUTE select * from Accounts;
SYNC new_employees SELECT * FROM old_employees;

delete_from_clause

Overview

This clause enables the deletion of specific rows or all rows from a table.

To delete only specific rows, provide a WHERE clause which must evaluate to a Boolean value where true indicates that the row should be deleted. Otherwise, to delete all rows, do not provide a WHERE clause.

You can specify the maximum number of rows to delete using the LIMIT clause (maxdelete). Additionally, you may provide a number of rows to skip when deleting (skipdelete). Note the limit is applied after the  where clause, meaning that the skipdelete indicates how many records to skip deleting that would otherwise be deleted based on the where clause.

The JOIN .. ON clause allows you to use the values from another table to determine which rows to delete from the target table. The JOIN operates similar to the SELECT's JOIN clause. Note that LEFT JOIN and LEFT ONLY are also supported. Keep in mind that you can only delete rows from the left table.

Syntax

DELETE FROM tblname [WHERE where_expr] [LIMIT [skipdelete,] maxdelete]
DELETE FROM tblname [LEFT [ONLY] JOIN table_expr ON expr ]  [WHERE where_expr] [LIMIT [skipdelete,] maxdelete]
DELETE FROM tblname [LEFT [ONLY] JOIN table_expr ON expr ]  [WHERE where_expr] [LIMIT [skipdelete,] maxdelete]

tblname - string name of the table to delete from

Examples

DELETE FROM names; //deletes all records from names table                                                             
DELETE FROM names WHERE first_name == "Dave"; //deletes all records where name is Dave
DELETE FROM names WHERE first_name == "Dave" LIMIT 2; //delete 1st & 2nd record where name is Dave
DELETE FROM names WHERE first_name == "Dave" LIMIT 2,1; //deletes 3rd record where name is Dave
DELETE FROM names JOIN users ON first_name == users.name; //deletes all
records from names where the first_name has a match with the user's table
DELETE FROM names LEFT ONLY JOIN users ON first_name == users; //deletes all
records from names where the first_name does NOT have a match with the
user's table

truncate_clause

Overview

This command deletes all rows from a table. You can specify multiple tables, separated by commas.

Syntax

TRUNCATE [TABLE]  tblname [, tblname ...] 

Examples

TRUNCATE TABLE Accounts;
TRUNCATE TABLE Orders, Executions, Alerts;

update_clause

Overview

This clause can be used to update specific rows or all rows in a table. Following the SET statement, provide an assignment (target=expr) clause for each column to be updated, where target is the name of an existing column.

To update only specific rows, provide a WHERE clause which evaluates to a Boolean value, where true indicates that the row should be updated. Otherwise, to update all rows, do not provide a WHERE clause.

You can specify the maximum number of rows to update using the LIMIT clause (maxupdate). Additionally, you may provide a number of rows to skip when updating (skipupdate). Note the limit is applied after the  where clause, meaning that the skipupdate indicates how many records to skip updating that would otherwise be updated based on the where clause.

The JOIN .. ON clause allows you to use the values from another table to update values in the target table. The JOIN operates similar to the SELECT's JOIN clause. Note that LEFT JOIN and LEFT ONLY are also supported. Keep in mind that you can only update values on the left table.

Syntax

UPDATE tblname  SET col1=expr [, col2=expr...] [WHERE where_expr] [LIMIT [skipupdate,] maxupdate]
UPDATE tblname [LEFT [ONLY] JOIN table_expr ON expr ] SET col1=expr [, col2=expr...] [WHERE where_expr] [LIMIT [skipupdate,] maxupdate]
UPDATE tblname [LEFT [ONLY] JOIN table_expr ON expr ] SET col1=expr [, col2=expr...][WHERE where_expr] [LIMIT [skipupdate,] maxupdate]

tblname - string name of the table to update

colN - string name of the column to assign value from expression

Examples

UPDATE groceries SET price = 10, status=null; //update all records setting price to 10 and status  to null
UPDATE groceries SET price = price + 1 WHERE type == "dairy" //increment price for all records where type is dairy
UPDATE groceries SET price = price + 1 WHERE type == "dairy" LIMIT 10; //increments the price for the first 10 records where type is dairy 
UPDATE groceries SET price = price + 1 WHERE type == "dairy" LIMIT 3,2 ; //increments the price for the 4th and 5th  record where type is dairy 
UPDATE flags set a1=a2,a2=a1 where a1<a2; //Swap the values in the a1 and a2 columns where a1 is less than a2
UPDATE groceries JOIN products ON groceries.id==products.grocery_id SET
price=products.price WHERE type == "dairy"; // Sets the price from the
products table onto the groceries table where the ids match.

Schema Modification

create_table_clause

Overview

This command is used to create a new, empty (no rows) table with a specified set of columns (names and types).  If another table already exists with the <span style="color: blue;">tblname</span>, the existing table will be implicitly dropped and replaced with the newly created table.  

At least one column must be specified, and each column must have a unique name. 

Syntax

CREATE TABLE [IF NOT EXISTS] tblname (col1 type_expr  [, col2 type_expr ...])

tblname - string name of the table to create

colN - string name of the Nth column to create

Example

CREATE TABLE countries (country String, language String, GDP Double)

create_table_as_clause

Overview

This command is used to create a new table which has the same schema definition as the result set of a select_clause.  If another table already exists with the tblname, the existing table will be implicitly dropped and replaced with the newly created table.

All rows from the select_clause's result set will be inserted into the table.     

Syntax

CREATE TABLE [IF NOT EXISTS] tblname[(col1 column_type  [, col2 column_type ...])] AS select_clause
CREATE TABLE [IF NOT EXISTS] tblname[(col1 column_type  [, col2 column_type ...])] AS analyze_clause
CREATE TABLE [IF NOT EXISTS] tblname[(col1 column_type  [, col2 column_type ...])] AS prepare_clause
CREATE TABLE [IF NOT EXISTS] tblname[, tblname ...]  AS execute_clause
CREATE TABLE [IF NOT EXISTS] tblname[(col1 column_type  [, col2 column_type ...])] AS describe_clause
CREATE TABLE [IF NOT EXISTS] tblname[(col1 column_type  [, col2 column_type ...])] AS show_clause
CREATE TABLE [IF NOT EXISTS] tblname[(col1 column_type  [, col2 column_type ...])] AS call_clause

tblname - string name of the table to create

Example

CREATE TABLE names AS SELECT first_name, last_name FROM employees

alter_table_clause

Overview

This command alters the schema (column names and types) of an existing table.  You can add columns (ADD), rename columns (RENAME ... TO), delete columns (DROP), and change the type of columns (MODIFY).  Note, when multiple alterations are supplied, they are evaluated left to right. When changing types, a best effort cast will be applied.

When adding columns, the new cells will default to null. The column will be added to the end (right-most) of the table.

Syntax

ALTER TABLE tblname  alter_col_expr  [, alter_col_expr  ...]

tblname - string name of the table to alter

Examples

ALTER TABLE AccountHoldings ADD UserId long, RENAME AccountId TO Account_Id, DROP UserInfo
ALTER TABLE AccountHoldings MODIFY UserId AS UserId INT
ALTER TABLE AccountHoldings RENAME UserId TO UserIds, ADD UserId STRING

rename_table_clause

Overview

This command renames an existing table. If another table already exists with the newname, the existing table will be implicitly dropped and replaced with the renamed table.

Syntax

RENAME TABLE tblname  TO newname

tblname - string name of the table to rename

newname - string name of the new name for the table

Example

RENAME TABLE AccountHoldings TO Accounts

drop_table_clause

Overview

This command drops an existing table, along with its schema and all rows. You can specify multiple tables, separated by commas.

Syntax

DROP TABLE [IF EXISTS] tblname [, tblname ...] 

tblname - string name of the table to drop

Example

DROP TABLE Accounts

AMIDB Scope Options

TEMPORARY

A table visible only to the user. It exists as long as the user session is active.

VARIABLE

A table created as an AMI Script Table object. It has the same scope and lifetime as a TEMPORARY table.

Table vt = SELECT * FROM Tablename;

PUBLIC

A table which is visible to all users. Its definition (but not the content) persists.

For persistent content see PersistEngine options.

If the scope is not specified, AMI will first check TEMPORARY tables, then VARIABLE tables, and lastly PUBLIC tables. Tables with different scopes but the same name can exist. However, if a PUBLIC table already exists, then it cannot be created again.

You may look at the DefinedBy column to see if a column is a variable or temporary.

Note: Argument TEMP can also be used instead of TEMPORARY with the same effect

Display

SHOW [TEMPORARY|PUBLIC|VARIABLE] TABLES;

shows scoped tables

SHOW [TEMPORARY|PUBLIC|VARIABLE] COLUMNS;

shows columns of scoped tables

SHOW [TEMPORARY|PUBLIC|VARIABLE] TABLE TableName; 

shows column details of a scoped table

DESCRIBE [TEMPORARY|PUBLIC|VARIABLE] Tablename;

provides the command which can be used to create the scoped table

Schema Modification

CREATE [TEMPORARY|PUBLIC] TABLE Tablename ...;

creates a scoped table

Examples

CREATE TABLE TempTable1 (a int);
CREATE TEMP TABLE TempTable2 (a int);
CREATE TEMPORARY TABLE TempTable3 (a int);
CREATE PUBLIC TABLE PubTable1 (x int);
CREATE PUBLIC TABLE PubTable2 (x int, y double);

ALTER [TEMPORARY|PUBLIC|VARIABLE] TABLE Tablename ...;

modifies a scoped table

Examples

ALTER TEMPORARY TABLE TempTable1 ADD id String;
ALTER TEMPORARY TABLE TempTable2 MODIFY a as a BigInteger;
ALTER TEMPORARY TABLE TempTable3 RENAME a to b;
ALTER PUBLIC TABLE PubTable2 DROP y;

DROP [TEMPORARY|PUBLIC|VARIABLE] TABLE Tablename;

deletes a scoped table

Examples

DROP TEMPORARY TABLE TempTable3;

DROP PUBLIC TABLE PubTable2;

RENAME  [TEMPORARY|PUBLIC|VARIABLE] TABLE Tablename TO NewTablename;

renames a scoped table

Examples

RENAME TEMPORARY TABLE TempTable2 TO TempTableTwo;
RENAME PUBLIC TABLE PubTable1 TO PubTable;

(All schema modification syntax options apply as in Schema Modification)

Data Modification

INSERT INTO [TEMPORARY|PUBLIC|VARIABLE] Tablename ...;

inserts row into scoped table

Examples

INSERT INTO TEMPORARY TempTable1 VALUES (1, "ID1"), (2, "ID2"), (3, "ID3");
INSERT INTO PUBLIC PubTable VALUES (4), (5);

PREPARE ... FROM  [TEMPORARY|PUBLIC|VARIABLE] ... Tablename ...;

prepares additional columns on scoped table

ANALYZE ... FROM  [TEMPORARY|PUBLIC|VARIABLE] ... Tablename;

runs analysis on scoped table

Examples

ANALYZE COUNT(win.a) as Count FROM TEMPORARY TempTable1 WINDOW win ON true;
ANALYZE SUM(win.x) as Sum FROM PUBLIC PubTable WINDOW win ON true;

SYNC [TEMPORARY|PUBLIC|VARIABLE] Tablename FROM ... ProjectingClause;

syncs scoped table to ProjectingClause result

Example

SYNC TEMPORARY TempTableTwo FROM select * from PubTable;

TRUNCATE[TEMPORARY|PUBLIC|VARIABLE][TABLE] Tablename;

deletes all rows from scoped table

(All data modification syntax options apply as in Data Modification)

Query Clauses

SELECT ... FROM  [TEMPORARY|PUBLIC|VARIABLE] Tablename ...[AS Asname]  ...

queries scoped table

Examples

SELECT * FROM TEMPORARY TempTable1 WHERE id == "ID2";
SELECT * FROM PUBLIC PubTable;

UPDATE [TEMPORARY|PUBLIC|VARIABLE] Tablename [AS Asname] ...[JOIN_CLAUSE   [TEMPORARY|PUBLIC|VARIABLE] Tablename 2 [AS Asname2] ] ...

updates scoped table

Examples

UPDATE TEMPORARY TempTable1 set id = "NewID";
UPDATE PUBLIC PubTable set x = -1;

DELETE FROM [TEMPORARY|PUBLIC|VARIABLE] Tablename [AS Asname] [JOIN_CLAUSE  [TEMPORARY|PUBLIC|VARIABLE]  Tablename2 [AS Asname2] ] ...

deletes data from scoped table

Examples

DELETE FROM TEMPORARY TempTable1 WHERE a == 3;
DELETE FROM PUBLIC PubTable;

(All query clauses syntax options apply as in Query Clauses)

Assigning query results to local variables

AMI Script supports the ability to execute queries and store the results into local variables. There are various ways the query results can be stored, beyond just Table objects, see type Coercion rules below.

This is the general form:

type var_name = query;

type: the type of local variable to store the query result.

Var_name: Name of the local

Query: a valid sql query. Either select_clause, analyze_clause, or prepare_clause

Valid Type Coercion rules are listed below.  Please note that an attempt to coerce the query result to an invalid type, will result in null.

Table mytable = SELECT * from SampleTable;

List:  Query results that have a single column can be stored to a List. Each value in the query result will be inserted into the list in order.

List myvals = select SampleColumn from SampleTable;

Set: Query results that have a single column can be stored to a Set. Each value in the query result will be inserted into the set in order, ignoring duplicate values (as per set's contract).   

Set myvals = select SampleColumn from SampleTable;

Map: Query results that have a single row can be stored to a map. The map's keys will be the columns' names, and the map's values will be the values of the first (and only) row.

Map myvals = select * from SampleTable limit 1;

Primitives: Queries that result in a single value (one row and one column) can be stored to a primitive variable.  The value of the single cell is what will be stored.

Long myCount = select count(*) from SampleTable;

String myName = select max(name) from SampleTable;

AMI-SQL Glossary of Nested Expressions

alter_col_expr

MODIFY colname AS newname type_expr|ADD newname type_expr [BEFORE beforecol]|DROP colname|RENAME colname TO newname

colname: String name of the column to alter

newname: String name of the column to create

beforecol: String name of the existing column to add the new column before

column_expr

expr [AS colname]|[tablename.]* [EXCEPT (colname [,colname,...])]

colname: name of the column that will contain results of expression

*: expands to available columns for all tables or specified table

expr

See "Standard Operators" section.

forloop_expr

FOR varname = start_const TO end_const [STEP step]

varname: variable that can be referenced by expresions within the for loop

start_const: initial value of varname, inclusive

end_const: last number to loop over, inclusive unless the step would cause it to be skipped.

step: positive number dictating size to increment or decrement varname by in each loop cycle.

A negative or zero step will cause an error. If start_const <= end_const then step is used to increment, otherwise decrement.

in_expr

When only a single column will participate in the in clause: 

expr IN  (select_clause)
//(select must return a single column)
expr IN  values_expr
//(each entry must be a single value)

When two or more columns will participate in the in clause:

(expr,[,expr2 ...]) [NOT] IN (select_clause)
//(select must return same number of columns)
(expr,[,expr2 ...]) [NOT] IN (values_expr[,values_expr...])
//(values_expr entry count must match expr count)

The IN clause evaluates to true for a given row if the result of expr from the left hand expression exists in the right hand expression.  If using multiple columns, the combination of the expr must match the combination of the values_expr or select_clause for a given row.

lambda_expr

{statements} [AS colname]

The lambda expression allows for contained AMI Script to be executed once per row returned by the containing select statement. colname is the name of the column that the return value of this lambda will be placed in.

nested_select_expr

(select_clause) [AS aliasname]

Select_clause: inner query to execute

aliasname: name of alias for use in column_expr and expr

table_expr

tlbname [AS aliasname]

tblname: name of table to include in select

aliasname: name of alias for use in column_expr and expr

type_expr

LONG|INT|CHAR|BYTE|SHORT|DOUBLE|FLOAT|UTC|UTCN|BOOLEAN|STRING

values_expr

(val [,val2...])

where_expr

expr|in_expr [AND|OR where_expr]

Expressions with the added IN clause (in_expr) funtionality

window_expr

WINDOW window_name ON windowing_expr [ORDER BY order_expr]  [PARTITION BY partitioning_expr]

window_name: The name of this window which is used in the preceding col_expr (in the analyze clause). For example, if the name of a window is "dates" then max(dates.open) would give you the max value of the open column for that window.

windowing_expr: The boolean expression that defines (for each row) the other rows that belong to the same window.

order_expr: Optional expression which determines the natural order of rows for the window.  This is important for order dependent aggregates such as first(...) and last(...)

partitioning_expr: Optional expression which deterines how the table is broken up before windowing is considered. In other words, two rows that evaluate to a different partitioning_expr will never belong to the same window.

prep_expr

stack(expr) running sum of expr from first to row n, inclusive
count(expr) running count where expr!=null to the current row, inclusive
norm(expr) similar to stack, but scaled so that the max value is one. Short hand for: stack(expr)/sum(expr). See dnorm(...)
dnorm(expr) similar to stack, but scaled so that the min value is zero. Short hand for: (stack(expr) - expr)/sum(expr). This, in conjunction with norm(...) can used for preparing area charts that should fill an area, such as a pie chart.
rank(expr) will evaluate expr for all rows and the lowest value will get 1, 2nd will get 2, etc. If multiple rows expr evaluate to the same value, they will get the same ranking.
urank(expr) will evaluate expr for all rows and the lowest value will get 1, 2nd will get 2, etc. If multiple rows expr evaluate to the same value, the first occurrence gets the lower value.
unique(expr) each unique evaluation of expr will return a unique integer, starting at 1. Repeat occurrences of expr will return the same integer value.
offset(expr,offset) Will grab the value offset from the current row. If offset==0 use current row, negative number is nth prior row, positive number is nth future row. Evaluates to null if offset is out of bounds for current table.

Ex: offset(price,-1) for first row returns null, for second row returns first row's price and for nth row returns (n-1)ths row's price

kmeansCluster(expr1,expr2,...,nClusters,seed) Clusters data using the k-means clustering algorithm. Takes one or more expressions, followed by int nClusters specifying the number of clusters and long seed specifying the random seed for initializing the model parameters. The seed argument may be set to null if no specific seed is desired. Returns a column of integers labeling each record according to its learned cluster. Input expressions must evaluate to numbers.
emGmmCluster(expr1,expr2,...,nClusters,seed) Same functionality as kmeansCluster(), but instead uses the expectation-maximization algorithm to train a Gaussian mixture model.
viGmmCluster(expr1,expr2,...,nClusters,seed) Same functionality as kmeansCluster(), but instead uses the variational inference algorithm to train a Gaussian mixture model.

prep_expr

sum(expr) Summation of all values, skips nulls
min(expr) Minimum value, skips nulls
max(expr) Maximum value, skips nulls
count(expr) Count of non-null values
countUnique(expr) Count of unique non-null values
cat(expr, delim, limit) Concatenates the first limit non-null values to a string with delim. Ex: cat(a,'/',2) selects the first 2 non-null values from column a and appends '/' at the end of each value.
avg(expr) Average of all non-null values
var(expr) Population Variance of all non-null values
varS(expr) Sample Variance of all non-null values
stdev(expr) Population Standard Deviation of all non-null values
stdevS(expr) Sample Standard Deviation of all non-null values
first(expr) First value, may be null if first value is null
last(expr) Last value, may be null if last value is null
covar(expr1, expr2) Population  Covariance of 1st expr and 2nd expr where both aren't null.
covarS(expr1, expr2) Sample  Covariance of 1st expr and 2nd expr where both aren't null.
cor(expr1, expr2) Correlation of 1st expr and 2nd expr where both aren't null. Returns 0 - 1
beta(expr1, expr2) Beta of 1st expr and 2nd expr where both are not null. The beta function is defined as beta(x, y) = covar(x, y) / var(y). If stdev(y) = 0, the expression returns NaN.
avgExp(expr, decay, desc) The exponentially weighted average of non-null values with a specified decay rate. Desc of true/false means first/last row are highest weighted.
avgGauss(expr, variance) The Gaussian weighted average of non-null values, such that the middle value has highest weight.
median(expr) The median of all non-null values.
percentileCont(expr) eturns the value at the percentile provided. Is interpolated and might not be equal to any value within the supplied dataset
percentileDisc(expr) Returns the value nearest to the percentile provided