Difference between revisions of "AMI Realtime Database"

From 3forge Documentation
Jump to navigation Jump to search
Tag: visualeditor-switched
Tag: visualeditor-switched
Line 204: Line 204:
 
!Usage Scenario
 
!Usage Scenario
 
|-
 
|-
|
+
|<span style="font-family: courier new; color: red;">STRING</span>
|
+
|Each cell is stored as a java.lang.String object.
|
+
|Fastest Retrieval
|
+
|High Memory Usage. Internally, each entry is stored as a java object.
|
+
|For columns that have very high cardinality and accessed often or participate in indexes. Another use would be when the total data in the column (adding up the lengths of all cells in the column) exceeds 2 billion chars.
 
|-
 
|-
|
+
|<span style="font-family: courier new; color: red;">STRING COMPACT</span>
|
+
|Each cell contains a 32-bit address into a continuous char array.15-bits are used to store string length. Each cell has a unique address.
|
+
|Low Memory/Object usage. A single large continuous array
|
+
|Slower Retrieval, as cell values are used a temporary string object may be constructed. Frequent deletes or updates that change the cell content lengths may result in array-data compaction.
|
+
|For columns that have high cardinality and whose cells are either beyond 126 chars or have extended ascii. But is not accessed frequently (ex, as part of a where clause). For example ''comment'' columns
 
|-
 
|-
|
+
|<span style="font-family: courier new; color: red;">STRING COMPACT ASCII</span>
|
+
|Each cell contains a 32-bit address into a continuous byte array. 7-bits are used to store string length. Each cell has a unique address
|
+
|1/2 the memory of a STRING COMPACT.
|
+
|Slower Retrieval, as cell values are used a temporary string object may be constructed. Frequent deletes or updates that change the cell content lengths may result in array-data compaction.
|
+
|Same as STRING COMPACT, except limited to shorter strings that are basic asci (not extended asci)
 
|-
 
|-
|
+
|<span style="font-family: courier new; color: red;">STRING BITMAP</span>
|
+
|Each cell contains a 8,16 or 32-bit address into a map of unique java.lang.String Objects local to the column.  Multiple cells containing the same value will share an address.
|
+
|Most Compact.
|
+
|Only used for repeating data (lower cardinality). Retrieval is slightly slower than a simple STRING column, but much faster than COMPACT type columns.
|
+
|Any scenario where columns have lower cardinality and fewer than approximately 32,000 unique values.
 
|-
 
|-
|
+
|<span style="font-family: courier new; color: red;">ENUM</span>
|
+
|Each cell contains a 32-bit address into a map of unique java.lang.String Objects, global to the entire database.  Multiple cells containing the same value will share an address, even if those cells are on different columns of type ENUM.
|
+
|Compact.  Note: this is generally for legacy, BITMAP is generally preferred
|
+
|Must be used only on columns with low cardinality. Note this is mostly for legacy support and STRING BITMAP is preferred.
|
+
|When several columns all use a similar set of values (ex: BUY,SELL, SHORT) then all columns can be set to use ENUM.
 
|-
 
|-
|
+
|<span style="font-family: courier new; color: red;">STRING ONDISK</span>
|
+
|Each cell contains a 64-bit address into a random-access file. And only the most recent values read/wrote are cached in memory
|
+
|Fixed memory size regardless of string length.
|
+
|Slower read times for un-cached Strings
|
+
|For large text documents or those that do not need to be searched on
 
|}
 
|}
  

Revision as of 07:26, 10 March 2021

Goals

The 3Forge Realtime database builds on all of the features available in the AMI SQL language. The real-time datamodeling

  • High Speed - 100k+ inserts per second, with 1 billion+ rows
  • Persistent - Tables can be configured for long-term persistence
  • Data Virtualization - Access external sources seamlessly
  • Realtime Streaming - Chaining streams of updates into aggregations and other projections
  • Source Synchronizing - Synchronizing in-memory tables based on external datasources
  • Extendable - Create custom triggers, procedures, etc.

Broadly speaking, the logic steps involved in data-visualization are:

  1. Data is "streamed" into the AMI real-time database through a combination of methods
    • High Speed Messaging
    • JDBC
    • Synchronizing from External Datasources
    • User updates, from the front end visualization layer
  2. Data is fed through logical components which can aggregate, etc.
  3. Visualizations are built on the real-time tables and streams.

About Realtime (PUBLIC) tables

For the AMI Center, by default tables that are created are only visible to the current session and will be cleaned up when the connection is terminated.  When a table is created with the PUBLIC directive (ex CREATE PUBLIC TABLE ...) the table will now be accessible to all sessions (external connections, triggers, etc) and front ends for real-time viewing. Public tables will continue to exist until they are explicitly dropped.  Please see the reserved columns on public tables section for details on specialized columns that can be added.

Realtime Tables

create_public_table_clause

Overview

This command is used to create a new, empty (no rows) public table with a specified set of columns (names, types, no-null restriction).  If another public table already exists with the tblname, the create table will fail.

At least one column must be specified, and each column must have a unique name. Columns with the nonull constraint cannot hold a null value. Updates and inserts in violation of the rule will fail.

The Following Use Options can be applied to a table:

  • PersistEngine -
    • "FAST": The table will be persisted to disk using a fast, binary protocol. Additional options include:
      • "persist_dir" = "/path/to/directory/for/storing/file" - this is the directory where the .fast binary file will be saved/loaded from
      • "persist_compact_percent" = ".95" - a number between 0 and 1 which says how much of the file should contain updates & deletes before it's compacted. Because the FAST table uses a journaling strategy, updates and deletes will get appended and can create large files. Setting a value of .95 would indicate that if the file becomes 95% updates and deletes then the file should be rebuilt
    • "TEXT": The table will be persisted to disk using a slow, but easy to read text file
      • "persist_dir" = "/path/to/directory/for/storing/file" - this is the directory where the .txt text file will be saved/loaded from
  • PersistEngine="Custom": A custom persist engine will be used (see AMI plugins documentation). Note, that PersistOptions="..." is available to pass options to the custom plugin
  • Broadcast="true": Front end visualizations & external listeners will be notified as data is updated in the table
  • RefreshPeriodMs="durration_millis": The period that the table will conflate and broadcast changes to front end at.  For example, if a cells' value changes 10 times in one second and the refresh period is 500ms, then only ~2 updates will be broadcast out (the other 8 will be conflated)
  • OnUndefColumn - Behaviour to take when a realtime Object record contains an undefined column
    • "REJECT" - The record will be rejected.  This is the default
    • "IGNORE" - The  record will be inserted, but the undefined values will be ignored
    • "ADD" -  The table will automatically have the column added
  • InitialCapacity="number_of_rows" - the number of rows to allocate memory for when the table is created. Must be at least 1. The default initial capacity is 1,000 rows

Syntax

CREATE PUBLIC TABLE [IF NOT EXISTS] tblname (col1 public_column_type [, col2 public_column_type ...])

[USE  [PersistEngine="[FAST|TEXT|custom]"]

[PersistOptions="custom_options"]

[Broadcast="[true|false]"]

[RefreshPeriodMs="duration_millis"]

[OnUndefColumn="REJECT|IGNORE|ADD"]

tblname - string name of the table to create

colN - string name of Nth column to create

Examples

CREATE PUBLIC TABLE countries (country String NoNull, language String, GDP Double) USE PersistEngine="TEXT" Broadcast="false" InitialCapacity="100"

create_public_table_as_clause

Overview

This command is used to create a new table which has the same schema definition as the result set of a select_clause.  If another table already exists with the tblname, the create table will fail.

All rows from the select_clause's result set will be inserted into the table. For valid USE options, see create_public_table_clause Options.

Syntax

CREATE PUBLIC  TABLE [IF NOT EXISTS] tblname [(col1 public_column_type [, col2 public_column_type ...])] [USE ... ] AS select_clause

CREATE PUBLIC  TABLE [IF NOT EXISTS] tblname [(col1 public_column_type [, col2 public_column_type ...])] [USE ... ] AS analyze_clause

CREATE PUBLIC  TABLE [IF NOT EXISTS] tblname [(col1 public_column_type [, col2 public_column_type ...])] [USE ... ] AS prepare_clause

CREATE PUBLIC  TABLE [IF NOT EXISTS] tblname [USE ... ]  [, tblname [USE ... ] ...] AS execute_clause

tblname - string name of the table to create

Examples

CREATE PUBLIC TABLE names USE PersistEngine="FAST" AS SELECT first_name, last_name FROM employees

alter_public_table_clause

OVERVIEW

This command alters the schema (column names and types) of an existing table.  You can add columns (ADD), rename columns (RENAME … TO), delete columns (DROP), and change the type of columns (MODIFY).  Note, when multiple alterations are supplied, they are evaluated left to right. When changing types, a best effort cast will be applied.

When adding columns, the new cells will default to null. The column will be added to the end (right-most) of the table.

ALTER PUBLIC TABLE tblname alter_public_col_expr [, alter_public_col_expr ...]

tblname - string name of the table to alter

Examples

ALTER PUBLIC TABLE AccountHoldings ADD UserId long, RENAME AccountId TO Account_Id, DROP UserInfo

ALTER PUBLIC TABLE AccountHoldings MODIF UserId AS UserId INT

ALTER PUBLIC TABLE AccountHoldings RENAME UserId TO UserIds, ADD UserId STRING AS

alter_public_column_expr

MODIFY colname AS newname public_column_type |ADD newname public_column_type [BEFORE beforecol]|DROP colname|RENAME colname TO newname

colname - String name of the column to alter

newname - String name of the column to create

beforecol - String name of the existing column to add the new column before

rename_public_table_clause

Overview

This command renames an existing table. If another table already exists with the newname, the existing table will be implicitly dropped and replaced with the renamed table.

Syntax

RENAME PUBLIC TABLE tblname TO newname

tblname - string name of the table to rename

newname - string name of the new name for the table

Examples

RENAME PUBLIC TABLE AccountHoldings TO Accounts

drop_public_table_clause

Overview

This command drops an existing table, along with its schema and all rows. You can specify multiple tables, separated by commas.

Syntax

DROP PUBLIC TABLE [IF EXISTS] tblname [, tblname ...]

Examples

DROP PUBLIC TABLE Accounts

public_column_type

The following types are supported for columns in public tables.

Note on Nulls: If the NONULL attribute is included, the column does not support null values, and inserts/updates of null will fail.

Note on NOBROADCAST: Columns with broadcast disabled will not be included in updates sent to front end visualizations & external listeners. This is a more granular version of the Broadcast option at the table level.

BYTE [NONULL] [NOBROADCAST] 8-bit signed number (see java.lang.Byte for details on range). If NONULL is omitted then an extra 1-bit per row is used (precisely, 64-bits per each 64 rows).

SHORT [NONULL] [NOBROADCAST] 16-bit signed number (see java.lang.Short for details on range). If NONULL is omitted then an extra 1-bit per row is used (precisely, 64-bits per each 64 rows).

INT [NONULL] [NOBROADCAST] 32-bit signed number (see java.lang.Integer for details on range). If NONULL is omitted then an extra 1-bit per row is used (precisely, 64-bits per each 64 rows).

LONG [NONULL] [NOBROADCAST] 64-bit signed number (see java.lang.Long for details on range). If NONULL is omitted then an extra 1-bit per row is used (precisely, 64-bits per each 64 rows).

DOUBLE [NONULL] [NOBROADCAST] 64-bit signed float (see java.lang.Double for details on range). If NONULL is omitted then an extra 1-bit per row is used (precisely, 64-bits per each 64 rows).

FLOAT [NONULL] [NOBROADCAST] 32-bit signed float (see java.lang.Float for details on range). If NONULL is omitted then an extra 1-bit per row is used (precisely, 64-bits per each 64 rows).

UTC [NONULL] [NOBROADCAST] 64-bit unix epoc timestamp in milliseconds. If NONULL is omitted then an extra 1-bit per row is used (precisely, 64-bits per each 64 rows).

UTCN [NONULL] [NOBROADCAST] 64-bit unix epoc timestamp in nanoseconds. If NONULL is omitted then an extra 1-bit per row is used (precisely, 64-bits per each 64 rows).

CHAR [NONULL] [NOBROADCAST] 16-bit unicode character (see.java.lang.Character for details on range). If NONULL is omitted then an extra 1-bit per row is used (precisely, 64-bits per each 64 rows).

BOOLEAN [NONULL] [NOBROADCAST] 1-bit signed Boolean (see java.lang.Boolean for details on range).

STRING [NONULL] [NOBROADCAST] variable length string. Each cell is backed by a java.lang.String instance

STRING COMPACT [NONULL] [NOBROADCAST] 4-byte index into a continuous char array for all values in the column. Values are limited to 32,766 characters in length. Max total chars per column (include 2 byte EOL) is 2^31.

STRING COMPACT ASCII [NONULL] [NOBROADCAST] 4-byte index into a continuous byte array for all values in the column. Extended ascii not supported. Values are limited to 126 chars. Max total chars per column (include 1 byte EOL) is 2^31

ENUM [NONULL] [NOBROADCAST] 32-bit index into map of global Strings (map is shared across all public tables)

STRING BITMAP [NONULL] [NOBROADCAST] 8-bit, 16-bit, or 32-bit address into an underlying map of strings (local to the column). The addressing depends on the number of unique values encountered in the column: 0 - 255 unique entries is 8-bit, 256 - 65,535 unique entries is 16-bit, otherwise 32-bit addressing is used. Note that as new unique values are added, the addressing will automatically escalate as needed. Note that entries from the underlying map are never removed, even as cells are updated deleted.

BINARY [NONULL] [NOBROADCAST]  variable length byte array, useful for storing images and other media.

STRING ONDISK [NONULL] [CACHE=nnnMB][NOBROADCAST]  Same as a String, but only a 64-bit pointer is kept in memory, which resolves to a Block on disk, meaning that strings of any length will take up the same 64-bits in memory. The optional CACHE directive specifies the maximum RAM to use for caching the most recently used values, supported units are KB,MB,GB and TB (if no unit is specified, then in bytes),  example to cache 2 gigabytes: STRING ONDISK CACHE=2GB. Cache-misses will result in reading from disk which can slow down query performance.

BINARY ONDISK [NONULL] [CACHE=nnnMB][NOBROADCAST]  Same as a Binary, but only a 64-bit pointer is kept in memory, which resolves to a Block on disk, meaning that binaries of any length will take up the same 64-bits in memory. The optional CACHE directive specifies the maximum RAM to use for caching the most recently used values, supported units are KB,MB,GB and TB (if no unit is specified, then in bytes), example to cache ~100 megs: BINARY ONDISK CACHE=100000000. Cache-misses will result in reading from disk which can slow down query performance.

BIGDECIMAL [NONULL] [NOBROADCAST] Unbounded(variable length) signed decimal (see java.lang.BigDecimal for details on range).

BIGINTEGER [NONULL] [NOBROADCAST] Unbounded(variable length) signed number (see java.lang.BigInteger for details on range).

COMPLEX [NONULL] [NOBROADCAST] A 128-bit Complex number, composed of 64-bit signed real and 64-bit signed imaginary parts.

UUID [NONULL] [NOBROADCAST] a 128-bit Universally Unique Identifier.

Choosing the Best Column Type for String

Simple rules: For data that never repeats and is queried often STRING is the best. For columns with data that does not repeat but is not queried often STRING COMPACT (or STRING COMPACT ASCII for non-extended ascii data) offer a lower memory footprint. For repeating data, generally STRING BITMAP is best. Below is a table outlining the advantages/disadvantages of each.

Type Storage Method Advantage Disadvantage Usage Scenario
STRING Each cell is stored as a java.lang.String object. Fastest Retrieval High Memory Usage. Internally, each entry is stored as a java object. For columns that have very high cardinality and accessed often or participate in indexes. Another use would be when the total data in the column (adding up the lengths of all cells in the column) exceeds 2 billion chars.
STRING COMPACT Each cell contains a 32-bit address into a continuous char array.15-bits are used to store string length. Each cell has a unique address. Low Memory/Object usage. A single large continuous array Slower Retrieval, as cell values are used a temporary string object may be constructed. Frequent deletes or updates that change the cell content lengths may result in array-data compaction. For columns that have high cardinality and whose cells are either beyond 126 chars or have extended ascii. But is not accessed frequently (ex, as part of a where clause). For example comment columns
STRING COMPACT ASCII Each cell contains a 32-bit address into a continuous byte array. 7-bits are used to store string length. Each cell has a unique address 1/2 the memory of a STRING COMPACT. Slower Retrieval, as cell values are used a temporary string object may be constructed. Frequent deletes or updates that change the cell content lengths may result in array-data compaction. Same as STRING COMPACT, except limited to shorter strings that are basic asci (not extended asci)
STRING BITMAP Each cell contains a 8,16 or 32-bit address into a map of unique java.lang.String Objects local to the column.  Multiple cells containing the same value will share an address. Most Compact. Only used for repeating data (lower cardinality). Retrieval is slightly slower than a simple STRING column, but much faster than COMPACT type columns. Any scenario where columns have lower cardinality and fewer than approximately 32,000 unique values.
ENUM Each cell contains a 32-bit address into a map of unique java.lang.String Objects, global to the entire database.  Multiple cells containing the same value will share an address, even if those cells are on different columns of type ENUM. Compact.  Note: this is generally for legacy, BITMAP is generally preferred Must be used only on columns with low cardinality. Note this is mostly for legacy support and STRING BITMAP is preferred. When several columns all use a similar set of values (ex: BUY,SELL, SHORT) then all columns can be set to use ENUM.
STRING ONDISK Each cell contains a 64-bit address into a random-access file. And only the most recent values read/wrote are cached in memory Fixed memory size regardless of string length. Slower read times for un-cached Strings For large text documents or those that do not need to be searched on

Reserved columns on public Tables

Realtime Indexes

Realtime Triggers

Realtime Timers

Realtime Procedures

Realtime Tools

Capacity Planning