AMI Historical Database
Introduction to AMI Historical Database
The AMI Historical Table is a columnar-based on-disk table engine with support for partitioning. The introduction section will briefly describe the core concepts of the AMI Historical database: Partitioning, Data storage and optimizations.
Partitioning
Partitioning is used to organize data for performance. To understand why partitioning is critical
for large datasets, let’s first consider a non-partitioned standard table: as the number or rows
increases, the performance (time to insert, update, query and alter) will decline. Partitioning
provides an elegant solution by breaking up rows into separate isolated “virtual tables” called
partitions. Once data is broken into partitions, the row count of one partition will not affect the
performance of other partitions. In order to dictate how rows are organized into partitions, the
AMI Historical Table introduces a special PARTITION column type. The value in the PARTITION
column(s) determines which partition a given row will be assigned to. In other words, all rows for
a given partition will have the same value in the PARTITION column(s).
By way of example, data is often partitioned by a date: Imagine we have the following historical
table, partitioned by the column “date”
When inserting row(s) whose date match(es) the existing partition key(s), the query engine will automatically navigate to the corresponding partition according to the partition key and append the rows accordingly.
Note that adding another day of data will automatically create a new partition without affecting
existing days’ data: For example, if we want to insert two new rows into the table whose “Date”
== 20221203
To create a historical table add USE PersistEngine="HISTORICAL" and include the PARTITION keyword after the column(s) that the rows should be partitioned by. Ex:
CREATE PUBLIC TABLE PartitionDemo(date LONG PARTITION, symbol STRING, qty INTEGER, comments STRING) use PersistEngine="HISTORICAL";
The file structure of AMIHDB is such that each table has its own directory and each partition is
assigned its own subdirectory. Each column and index has its own file(s) per partition directory.
<amiroot>/hdb/Table_Name.3fhtab/Partition_Id.dfhpar/<Files within a partition>
- ● Use ami.hdb.root.dir property to override location of hdb root
- ● Use ami.hdb.filehandles.max property to control how many files can be open at a time
- ● Table_Name - is the name of the table
- ● Partition_Id - An auto-incrementing id (starting at 0001)
- ● Files within a partition:
- ○ KEY.3fhkey - This stores the (plaintext) key for this partition
- ○ ColumnName.3fhcol - Stores data for the column
- ○ IndexName.3fhidx - Stores index data
- ○ KEY.3fhkey - This stores the (plaintext) key for this partition
This paradigm is designed for fast bulk INSERTs, fast random SELECTs and fast DELETES of
entire partitions. UPDATES, and sparse DELETES are possible but will operate slower
Optimized for Schema Alterations
Each partition is a self-contained set of data. This means that altering a schema does not
impact the way existing partitions are saved on disk. Instead, AMIHDB will map the “old”
partitions using the “old” schema to the current schema definition at query time. This applies for
ADD, DROP and MODIFY of columns.
Important Note: The PARTITION columns are the only columns that can not be altered
after the table is created.
Data storage types and optimizations
The AMI historical database has different data storage types designed for data of various
natures and characteristics, such as length, cardinality and partition keys. While continuously
inserting, the nature of the data in the table is unpredictable so a more flexible but less efficient
strategy is used for storage. Therefore, after the partition is completed, it is important to optimize
the storage strategy to adapt to the existing data to improve query performance and minimize
disk storage. Data storage types and optimizations will be further discussed in Section 2: Data
types and storage types, and Section 4.1:Optimization on existing data.
PARTITION MODES
As a historical database, it is important to have a mode for fast insert-on-demand data. In
AMIHDB, when a new partition is created it will have a status of
IN_APPEND_MODE/NOT_OPTIMIZED. Then, after that table is optimized, the partition will
become optimized for data compression and query times.
There are two ways to force optimize partitions:
(a). By calling CALL __OPTIMIZE_HISTORICIAL_TABLE(“Your_Historical_Table_Name”)
(b). By restarting AMI Center
You can use the DIAGNOSE TABLE <Your_Hitorical_Table_Name> command to view the optimized status of each partition:
Before commit:
After commit:
Data Types and Storage Type
AMIHDB supports the same data types as the AMI Real-time database:
BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, CHAR, UTC, UTCN, UUID, COMPLEX, BINARY,
STRING, BIGINT, BIGDEC
Each data type has several storage types each suited to optimize storage/retrieval depending
on the cardinality and/or length of data. In AMIHDB, there are four storage types:
FLAT, BITMAP, VARSIZE, PARTITION.
Important Note: Different partitions can have different storage types for the same column.