System-versioned tables in MariaDB

Posted by in MariaDB

This post is specific for MariaDB databases only, and it is related to a really cool feature I like a lot: system-versioned tables, available since MariaDB version 10.3.4. The concept, temporal data tables, you must not confuse with temporal tables. Keep on reading to learn more!

Temporal tables

Temporal tables are tables that they are only available during the session, while is open. When the session is over, the temporal tables will be removed. It is very useful because it allows to speed up the work during the session, and it is deleted when the session ends. But this is not what we want to see in this entry.

System-versioned tables

System-versioned tables are tables that store the history of all changes occurred in any record of that table. It does not affect only the current data, but allows to know the evolution of a record (regardless of whether the data is valid or not in time).

The use of this concept is indicated for data analysis at any point in time, allows auditing of changes over time, and can be used for:

  • Forensic and legal analysis of storage requirements for x years
  • Data analysis
  • Data recovery at a certain point in time, which even allows to know the status of the table at that moment.

The SQL syntax to introduce this feature was included in the standard version SQL:2011.

System-versioned tables allow:

  • Convert all UPDATEs into INSERTs
  • Store both current and historical data
  • The data are valid in their time (since they were inserted in the table)

Before continuing, let’s look at some previous concepts to differentiate possible conflicts:

Temporal data

It is not about time series of data, but it is about knowing the evolution of changes that a record has had. The validity or invalidity of each case depends on the time and date (may or may not be related).

Nor does it mean that the data is temporary, but that the temporary information is generated and confirms the execution of sentences

UPDATE table SET ….

And this is the way that system versioned tables are been generated.

Adding system-versioned to tables

The long syntax includes the creation of the two necessary columns, like this:

CREATE TABLE t1 (
x INT,
start_timestamp TIMESTAMP(6) GENERATED ALWAYS AS ROW START,
end_timestamp TIMESTAMP(6) GENERATED ALWAYS AS ROW END,
PERIOD FOR SYSTEM_TIME(start_timestamp, end_timestamp)
) WITH SYSTEM VERSIONING;

CREATE TABLE

But also, there is a short syntax:

CREATE TABLE t2 (
   x INT
) WITH SYSTEM VERSIONING;
CREATE TABLE

The difference between the first version and the second one is that, in the first one, the management of the versions is done with fields that I have defined as start_timestamp and end_timestamp, while in the second one they are called ROW_START and ROW_END.

In the first case, the SHOW TABLE of the table looks like this:

Table T1

And, in the second table:

Table T2

As you can see, by not naming the fields specifically, they do not appear (they are hidden).

First, let’s insert a data, and let’s create a SELECT sentence to both tables:

INSERTs and SELECTs

Time to make an UPDATE in the data:

UPDATEs

Finally, to check the evolution in time, we can make several queries using FOR SYSTEM_TIME, so, for a given moment, we write:

SELECT * FROM t1 FOR SYSTEM_TIME AS OF TIMESTAMP ‘2020-07-06 17:38:06’;

SELECTs

The record was changed at 17:48, but we asked what the status was at 17:38.

Another option is to know what happened in a certain period, for which we write:

SELECT *, row_start, row_end FROM t2 FOR SYSTEM_TIME FROM ‘2020-01-01 00:00:00’ TO ‘2021-01-01 00:00:00’;

Records during a period

As in table t1, the record in table t2 also changed at 17:48, but we ask the evolution of that record during 2020, and it has had two states. Specifically, FROM shows all records that at some point in that period have been visible including the START, but excluding the END.

The last option is to use BETWEEN, that is, also between two dates, but it works inclusively, a visible row exactly at the beginning or at the end will also be shown.

Some considerations about system-versioned tables

Sincerely, the fact of allowing to know the evolution of a record in time (and with time), is a magnificent characteristic that, although it is not interesting to exploit in many tables, it can be of much interest in some.

Anyway, if you already have a table, you can always add the versioning system by means of an ALTER TABLE, and having this possibility since then.

When you perform queries on a table that stores versions, it will only show the historical data if the FOR SYSTEM_TIME clause is included, since if it is not included, it will be selected over the current data (not historical).

Views can be created using FOR STSTEM_TIME, but can also be used with subqueries.

Perhaps it can be very interesting to separate the storage of historical data from the current data, because the joint storage means an increase in size, in the indexes, …. and in the case that many queries affect only the current data, in those cases it can be very attractive to have the historical data in another independent table.

And to end with, the versioning system is not something that should be generalized, but rather should be a specific feature of certain tables, because in the case you need to make a change in the table schema, you will get a “cool” error. therefore, you will need to activate permission for this with the statement.

SET @@system_versioning_alter_history = 1;

And then, you can proceed to make alterations in the structure of the table.

And, trat’s all by now. Happy coding!