Difference between revisions of "AMI SQL"

From 3forge Documentation
Jump to navigation Jump to search
Tag: visualeditor-switched
Line 346: Line 346:
 
[[File:AMISQL.QueryClauses.OUTERJOIN.png|30x30px]]
 
[[File:AMISQL.QueryClauses.OUTERJOIN.png|30x30px]]
 
|OUTER ONLY JOIN
 
|OUTER ONLY JOIN
 +
[[File:AMISQL.QueryClauses.OUTERJOINONLY.png|30x30px]]
 
|LEFT JOIN
 
|LEFT JOIN
 +
[[File:AMISQL.QueryClauses.LEFTJOIN.png|30x30px]]
 
|LEFT ONLY JOIN
 
|LEFT ONLY JOIN
 +
[[File:AMISQL.QueryClauses.LEFTONLYJOIN.png|30x30px]]
 
|RIGHT JOIN
 
|RIGHT JOIN
 +
[[File:AMISQL.QueryClauses.RIGHTJOIN.png|30x30px]]
 
|RIGHT ONLY JOIN
 
|RIGHT ONLY JOIN
 +
[[File:AMISQL.QueryClauses.RIGHTONLYJOIN.png|30x30px]]
 
|-
 
|-
|Include Matches?
+
|<span style="color: green;">Include Matches?</span>
|YES
+
|<span style="color: green;">YES</span>
|YES
+
|<span style="color: green;">YES</span>
 
|
 
|
|YES
+
|<span style="color: green;">YES</span>
 
|
 
|
 
|
 
|
 
|
 
|
 
|-
 
|-
|Includes Left W/O a Match on Right
+
|<span style="color: blue;">Includes Left W/O a Match on Right</span>
 
|
 
|
|YES
+
|<span style="color: blue;">YES</span>
|YES
+
|<span style="color: blue;">YES</span>
|YES
+
|<span style="color: blue;">YES</span>
|YES
+
|<span style="color: blue;">YES</span>
 
|
 
|
 
|
 
|
 
|-
 
|-
|Includes Right W/O a Match on Left
+
|<span style="color: orange;">Includes Right W/O a Match on Left</span>
 
|
 
|
|YES
+
|<span style="color: orange;">YES</span>
|YES
+
|<span style="color: orange;">YES</span>
 
|
 
|
 
|
 
|
|YES
+
|<span style="color: orange;">YES</span>
|YES
+
|<span style="color: orange;">YES</span>
 
|}
 
|}

Revision as of 18:29, 22 March 2021

Goals

The 3Forge SQL language provides a comprehensive instruction set for working with table(s) of data.

  • Retrieval - Accessing and Normalizing data from external sources
  • Blending - Perform Joins and Unions on tables
  • Modifying - Inserting, Updating, and Deleting rows
  • Analyzing - Grouping, Sorting, and Filtering
  • Schema Definition - Creating, Updating, and Dropping tables

Broadly speaking, the logic steps involved in data-visualization are:

1. Running a query, or set of queries, on external datasources and/or the AMI-Realtime database.

  • Each datasource is predefined and uniquely named.
  • The results of queries on datasources will result in temporary, in-memory tables
  • The queries can be constructed using user-input/user-selected/user-associated data
  • In more advanced cases, the queries can be sequential, meaning the results from one query are then fed into a subsequent query (allowing for reactionary data blending)

2. Blending the results from Step 1, resulting in new temporary in-memory tables 3. Building visualizations on the tables from Step 1 and/or Step 2

AMI SQL and Standard SQL Differences

For those familiar with standard SQL, these are the differences:

1. Equality is done with ==

  • AMISQL: SELECT * FROM Users WHERE userid == 15;
  • MYSQL: SELECT * FROM Users WHERE userid = 15;

2. Checking for null is done with equality, ex. == null

  • AMISQL: SELECT * FROM Users WHERE status == null;
  • MYSQL: SELECT * FROM Users WHERE status IS NULL;

3. Strings are surrounded in double quotes

  • AMISQL: SELECT * FROM Users WHERE name == "David";
  • MYSQL: SELECT * FROM Users WHERE name = "David";

4. "Matches" (~~) instead of LIKE

  • AMISQL: SELECT * FROM Users WHERE name ~~ "^Dav";
  •  MYSQL: SELECT * FROM Users WHERE name LIKE "Dav%";

 (See Simplified Text Matching for pattern matching rules/expressions)

5. Boolean expressions && works same as AND.  || works same as OR

  • AMISQL: SELECT * FROM Users WHERE userid == 15 AND status == null && name == "David";
  • MYSQL: SELECT * FROM Users WHERE userid = 15 AND status IS NULL && name = "David";

6. By default all tables are TEMPORARY, use the PUBLIC key word to make non-temporary tables:

  • AMISQL: CREATE TABLE MyTemporaryTable(id String);
  • AMISQL: CREATE PUBLIC TABLE MyPermanantTable(id String);

Data Source Access Clauses

use...execute_clause

Overview

This command enables you to "execute" code on an external datasource and process/store the results locally

  • USE: While datamodels have default options, you can override them for a particular query.  Valid options include
    • ds = "datasource_name"
      • The name of the datasource to execute this script on
    • timeout = timeout_in_millis
      • The number of milliseconds before the execute command times out
    • limit = row_limit_to_return
      • The max number of records returned
    • ds_url = "url"
      • Overrides the url of the datasource
      • When used in conjunction with ds="datasource_name", supplied datasource must contain URL in the PermittedOverrides field
    • ds_username = "username"
      • Overrides the username of the datasource
      • When used in conjunction with ds="datasource_name", supplied datasource must contain USERNAME in the PermittedOverrides field
    • ds_password = "plain_text_password"
      • Overrides the password of the datasource using a plain text password
      • When used in conjunction with ds="datasource_name", supplied datasource must contain PASSWORD in the PermittedOverrides field
      • Mutually exclusive with the ds_password_enc option
    • ds_password_enc = "aes_encrypted_password"
      • Overrides the password of the datasource, using an encrypted password
      • Use strEncrypt(...) to create an encrypted password
      • When used in conjunction with ds="datasource_name", supplied datasource must contain PASSWORD in the PermittedOverrides field
      • Mutually exclusive with the ds_password option
    • ds_options = "some options"
      • Override the options of the datasource
      • When used in conjunction with ds="datasource_name", supplied datasource must contain OPTIONS in the PermittedOverrides field
    • ds_relay = "relayid"
      • Override the relay of the datasource
      • Run SHOW RELAYS command and inspect RelayId for available relays
      • When used in conjunction with ds="datasource_name", supplied datasource must contain RELAY in the PermittedOverrides field
    • ds_adapter = "SOME_ADAPTER"
      • Use an "anonymous" datasource of the specified adapter type
      • Run SHOW DATASOURCE_TYPES command and inspect the I column for available adapters
      • Mutually exclusive with ds="datasource_name" option
      • See ami.db.anonymous.datasources.enabled option for permission details
    • _custom=some_directive
      • Prefixing with an underbar (_) passes the option through to the datasource's adapter
  • EXECUTE: The literal text to execute. The text will be sent directly to the datasource for execution. Therefore, variables cannot be referenced. Instead, substitution can be used. See dollarsign_substitution

Syntax

[USE option = value  [  option = value  ...]]  EXECUTE code_to_execute

Example

CREATE TABLE

 final_table AS

USE

 ds="mydb"

 timeout=10000

 limit=100

EXECUTE

 select * from mytable;

use...show tables clause

Overview

This command enables you to see what tables are available on a given datasource (similar to what the AMI gui wizard shows you when selecting a table).

  • USE: While datamodels have default options, you can override them for a particular query.  Valid options include
    • ds = "datasource_name"             The name of the datasource to execute the script on

The resulting table will be called TABLE and contain two String columns: TableName and Schema

Syntax                                                                                                  

USE ds="mydb" SHOW TABLES;

Example

This example will return a table displaying all tables available in the mydb datasource. If the datasource supports schema names, that will be populated as well.

USE ds="mydb" SHOW TABLES;

//returns:

AMISQL.DSAccessTable.jpg

use...insert clause

Overview

This command uploads data from AMI into a datasource.

  • USE: While datamodels have default options, you can override them for a particular query.  Valid options include
    • ds = "datasource_name"             The name of the datasource to execute the script on

The resulting table will be called TABLE and contain two String columns: TableName and Schema

Syntax

USE ds="datasource_name"  INSERT [INTO] datasource_table  [(col1[, col2 ...])] select_clause
USE ds="datasource_name"  INSERT [INTO] datasource_table  [(col1[, col2 ...])] analyze_clause 
USE ds="datasource_name"  INSERT [INTO] datasource_table  [(col1[, col2 ...])] prepare_clause 
USE ds="datasource_name"  INSERT [INTO] datasource_table  [(col1[, col2 ...])] execute_clause
USE ds="datasource_name"  INSERT [INTO] datasource_table  [(col1[, col2 ...])] VALUES values_expr[, values_expr ...]
USE ds="datasource_name"  INSERT [INTO] datasource_table  [(col1[, col2 ...])] Use_execute_clause

datasource_name - string name of the datasource containing the table to insert values into

datasource_table - name of the table to insert values into

Examples

This example will insert two rows into the mydb datasource's users table:

USE ds="mydb" INSERT INTO users(name,city,state) values ("Rob","NYC","NY"),("Sara","OC","CA");

This example copy all rows from AMI's my_users table into the "mydb" datasource's users table:

USE ds="mydb" INSERT INTO users(name,city,state) SELECT Name,Local,Region FROM my_users;

This example copy all rows from the old_users table in the otherds datasource to the users table in the mydb datasource.

USE ds="mydb" INSERT INTO users(name,city,state) USE ds="otherds" EXECUTE SELECT * FROM old_users

concurrent_block (advanced)

Overview

When running analysis that needs to query many datasources, there can be large performance gains by executing these queries simultaneously. The CONCURRENT{...} key word causes all EXECUTE clauses within the block to be run in parallel (instead of sequentially).

In both of the examples below, we are going to execute three queries on three separate datasources (mydb1, mydb2, mydb3). The first example executes the queries sequentially and the second example executes the queries concurrently.

Assuming each query takes 2 seconds for a database to process, the first example will complete in about 6 seconds (2+2+2), while the second, concurrent, example will complete in only 2 seconds:

Example of Sequential Execution (Default)

{
  CREATE TABLE mytable1 AS USE ds="mydb1" EXECUTE select * from mytable;
  CREATE TABLE mytable2 AS USE ds="mydb2" EXECUTE select * from mytable;
  CREATE TABLE mytable3 AS USE ds="mydb3" EXECUTE select * from mytable;
}

Here is the sequence of events:

00:00:00 -  EXECUTE clause is sent to mydb1

00:00:00 -  AMI Script sleeps until there is a response from mydb1...


00:00:02 - mytable1 is created (from result of query to mydb1)

00:00:02 -  EXECUTE clause is sent to mydb2

00:00:02 -  AMI Script sleeps until there is a response from mydb2...


00:00:04 - mytable2 is created (from result of query to mydb2)

00:00:04 -  EXECUTE clause is sent to mydb3

00:00:04 -  AMI Script sleeps until there is a response from mydb3...


00:00:06 - mytable3 is created (from result of query to mydb3)

00:00:06 -  Block is complete after 6 seconds (code after the block will be executed)

Example of Concurrent Execution

So now let's add the CONCURRENT keyword before the block.  In this case, all 3 executes will be initiated at the same time.

CONCURRENT{
  CREATE TABLE mytable1 AS USE ds="mydb1" EXECUTE select * from mytable;
  CREATE TABLE mytable2 AS USE ds="mydb2" EXECUTE select * from mytable;
  CREATE TABLE mytable3 AS USE ds="mydb3" EXECUTE select * from mytable;
}

Here is the sequence of events:

00:00:00 - EXECUTE clause is sent to mydb1

00:00:00 - EXECUTE clause is sent to mydb2

00:00:00 - EXECUTE clause is sent to mydb3

00:00:00 - AMI Script sleeps until there is a response from all 3 queries...


00:00:02 - mytable1 is created (from result of query to mydb1)

00:00:02 - mytable2 is created (from result of query to mydb2)

00:00:02 - mytable3 is created (from result of query to mydb3)

00:00:02 -  Block is complete after 2 seconds (code after the block will be executed)

Query Clauses

select_clause

Overview

This clause allows the combination of Filtering (WHERE), Sorting (ORDER BY), Aggregating (GROUP BY), Projecting (SELECT …) and Cropping (LIMIT) of individual tables.  Next, the results may be combined via Unions (UNION) and/or Joins (FROM … JOIN ON). Complex Strings containing delimited data can be split into distinct rows via the UNPACK operation. Finally, see the create_table_as_clause for storing the final result of a select clause into another table.

  • To aggregate data, use the GROUP BY clause to identify any number of columns or derived values that should be used to group rows. Provide an alias (AS) for derived values to be used within the select clause. The select columns can only use aggregate functions (min, max, count, sum) on underlying rows and columns declared in the group-by clause
  • To filter data, provide a WHERE clause which evaluates to a Boolean.  True indicates don't filter. When filtering in conjunction with aggregating (GROUP BY), those records filtered out will not be considered for aggregation.  To filter rows after aggregation, use the HAVING clause.
  • For projecting, supply columns to include in the SELECT clause.  Derived values may be provided. Provide an alias (AS) for derived values or to rename a column.  Use star (*) to include all columns or tblname.* to include only those columns from a particular table provided in the FROM clause.
  • To crop out rows, use the LIMIT clause. You may provide a number of rows to skip (offset) and a number of maximum rows to return (rowcount).
  • To sort, use the ORDER BY clause. If multiple expressions are given the left-most takes priority.
  • To union (concatenate) the results from multiple select clauses, use the UNION clause. The column names and types from the first select clause will provide the result set schema definition.  All select clauses participating in a union must define the same number of columns. Note: this operates like a union all in many databases.
  • To join (blend) tables, supply all tables together in the FROM clause and a Cartesian product will be provided. For more advanced Joins, you can use a combination of the LEFT, RIGHT, OUTER, JOIN, ON and NEAREST clauses. See JOINING DATA section for details
  • To process String columns containing multi-value delimited data, use the UNPACK col ON delimeter expression. This will split the string such that each entry will result in a new row.

Syntax

SELECT column_expr|lambda_expr  [,  column_expr|lambda_expr  ...] FROM table_expr|nested_select_expr  [, table_expr|nested_select_expr  ...]|
    [[LEFT|RIGHT|OUTER] [ONLY] JOIN [table_expr|nested_select_expr] [ON expr] [NEAREST nearest_expr]]
    [WHERE where_expr ]
    [GROUP BY column_expr [, column_expr ...] [HAVING expr ]]
    [ORDER BY expr [ASC|DESC] [, expr [ASC|DESC] ...]]
    [UNPACK col_to_unpack ON unpack_delimiter [,col_to_unpack ON unpack_delimiter]]
    [LIMIT [offset,] rowcount]
[UNION select_clause] 

offset - integer number of rows to skip

rowcount - integer number of maximum rows that will be returned

Joining Data

Simple Cartesian Joins

Any number of tables can be joined by simply listing them after the FROM clause which will produce a Cartesian product, such that a row will be returned for each possible combination of rows from the source table.  For example, the Cartesian product of a table with 5 rows and another table with 3 rows will produce 15 rows (5x3=15). Of course, the WHERE clause can be used to limit results to only the interesting combinations.

Handling Rows that don't match rows in other tables

For more granular control when joining two tables, use the JOIN syntax.  There are 7 types of statements, let's assume two tables a and b:

  • FULL JOIN: FROM a JOIN b ON expr - This is the same as FROM a,b WHERE expr
  • LEFT JOIN: FROM a LEFT JOIN b ON expr - A FULL JOIN plus all records from table a that would otherwise be omitted (due to the expr).
  • LEFT ONLY JOIN: FROM a LEFT ONLY JOIN b ON expr - Includes only records from table a that would otherwise be omitted (due to the expr).
  • RIGHT JOIN: FROM a RIGHT JOIN b ON expr - A FULL JOIN plus all records from table b that would otherwise be omitted (due to the expr). (similar to LEFT JOIN, swapping a and b)
  • RIGHT ONLY JOIN: FROM a RIGHT ONLY JOIN b ON expr - Includes only records from table b that would otherwise be omitted (due to the expr). (similar to LEFT ONLY JOIN, swapping a and b)
  • OUTER JOIN: A FULL JOIN plus all records from table a and from table b that would otherwise be omitted (due to the expr
  • OUTER ONLY JOIN: Includes only records from table a and from table b that would otherwise be omitted (due to the expr).

Joining to a nearest match

Use the NEAREST clause in conjunction with one of the above 7 JOIN clauses to match two records where there may not be an exact match, such as doing an "as-of" join with timestamps. After the ON clause, include a NEAREST fuzzy_clause.

Consider: a JOIN b ON on_expr NEAREST fuzzy_clause, where the fuzzy clause is one of the 5 syntaxes:

  • a.cola == b.colb - For each row in table a, join with the row in table b that satisfies on_expr and has the smallest delta between cola and colb.
  • a.cola <= b.colb - For each row in table a, join with the row in table b that satisfies on_expr and cola is less than or equal to colb and has the smallest delta between cola and colb.
  • a.cola >= b.colb - For each row in table a, join with the row in table b that satisfies on_expr and cola is greater than or equal to colb and has the smallest delta between cola and colb.
  • a.cola < b.colb - For each row in table a, join with the row in table b that satisfies on_expr and cola is less than colb and has the smallest delta between cola and colb.
  • a.cola > b.colb - For each row in table a, join with the row in table b that satisfies on_expr and cola is greater than colb and has the smallest delta between cola and colb.

Note the ordering of table a vs b in the fuzzy_clause does not matter. For example  b.colb >= a.cola is the same as a.cola <= b.colb

General Examples

SELECT age, dob, weight, name FROM Patients
SELECT age, dob, weight, name FROM Patients WHERE name =~ "mr"
SELECT count(*) as cnt, age, avg(dob), stdev(weight) FROM Patients GROUP BY age ORDER BY age
SELECT * FROM Patients, Accounts WHERE Patients.id == Accounts.patient_id
SELECT * FROM Orders JOIN MarketData ON Orders.symbol==MarketData.symbol NEAREST marketData.time<=Orders.time
SELECT * EXCEPT (OrderID, Quantity) from Orders; //return all Orders data except the OrderID and Quantity Columns
SELECT * FROM Orders where OrderId in ("a1","a2","b1");
SELECT * FROM Orders where OrderId in (select origOrderId from AnotherTable);
SELECT * FROM Orders where (OrderId, account) in (("a1","account1"),("a2","account1"),("b1","account2"));
SELECT * FROM Orders where (OrderId, account) in (select origOrderId, origAccount from AnotherTable);
SELECT DISTINCT EXAMPLE
SELECT symbol FROM MarketData GROUP BY symbol; //Get unique symbols from the MarketData table.

Nearest Join Examples

//Assume two existing tables: Orders and MarketData

AMIDB> SELECT * FROM Orders;

AMISQL.NearestJoinExample01.jpg

AMIDB> SELECT * FROM MarketData;

AMISQL.NearestJoinExample02.jpg

//A JOIN will only include orders that have a satisfactory market data event. Note, we’re also calculating the price delta from the order vs the market

AMIDB> select *,Orders.Price-MarketData.Price as delta from Orders JOIN MarketData ON Orders.sym==MarketData.sym NEAREST Orders.time>=MarketData.time ORDER by Orders.time;

AMISQL.NearestJoinExample03.jpg

//A LEFT JOIN will include all orders, event those without a market data event (using null as a place holder).

AMIDB> select *,Orders.Price-MarketData.Price as delta from Orders LEFT JOIN MarketData ON Orders.sym==MarketData.sym NEAREST Orders.time>=MarketData.time ORDER by Orders.time;

AMISQL.NearestJoinExample04.jpg

Join Examples

Let's assume we have two tables A and B, each with 1 column:

AMISQL.NearestJoinExample05.jpg

We can see that the value 1 only exists in the A table, the value 3 exists only in the B table and the value 2 existing in both tables.  When comparing the data between them there are, exhaustively speaking, 7 different ways you could want to see what matches and what doesn't.

JOIN

AMISQL.QueryClauses.JOIN.png

OUTER JOIN

AMISQL.QueryClauses.OUTERJOIN.png

OUTER ONLY JOIN

AMISQL.QueryClauses.OUTERJOINONLY.png

LEFT JOIN

AMISQL.QueryClauses.LEFTJOIN.png

LEFT ONLY JOIN

AMISQL.QueryClauses.LEFTONLYJOIN.png

RIGHT JOIN

AMISQL.QueryClauses.RIGHTJOIN.png

RIGHT ONLY JOIN

AMISQL.QueryClauses.RIGHTONLYJOIN.png

Include Matches? YES YES YES
Includes Left W/O a Match on Right YES YES YES YES
Includes Right W/O a Match on Left YES YES YES YES