The mecanism of indexing on MySQL-MariaDB

Posted by in Databases, MariaDB, MySQL

In previous entrance related to MySQL and MariaDB I wrote one entrance related to why indexing is important, when working with databases (the article is only in spanish, but I will translate into english), and later, how to force MySQL-MariaDB to use a specific index that it is not the one chosen by MySQL (also in spanish). Today’s entrance is related to explain how works the mechanism of an index on MySQL-MariaDB tables, and what important considerations you need in order to get the best performance, and optimize your queries!

Although this entrance is theoric, I hope it can get your attention with several concepts that you should take into account when working with databases, because having efficient queries depent on index design, specially if you are dealing with large tables. If that’s your case, it is not the same reading a full table that only certain parts. Partitioning is another issue that can help with faster queries, but that will be a topic for a new entrance.

Let start with some definitions.

Definition of table

The first concept for understanding what a table is, is to define a dictionary:

A dictionary is a set of (key,value) pairs.

Once you know what is a dictionary, let’s define table:

A table is a set of dictionaries that can be update or select, and where it is allowed to add new elements (insert) or remove elements (delete).

There are a several examples of dictionaries, such as B-Trees or Fractal Trees.

A B-tree is a method of placing and locating files (called records or keys) in a database. The B-tree algorithm minimizes the number of times a medium must be accessed to locate a desired record, thereby speeding up the process.

A Fractal Tree index is a tree data structure that keeps data sorted and allows searches and sequential access in the same time as a B-tree but with insertions and deletions that are asymptotically faster than a B-tree

A range query is a common database operation that retrieves all records where some value is between an upper and lower boundary. As an example, you have the BETWEEN operator.

Definition of Index

An index is an a data structure that improves the speed of data retrieval operations on a database table. It is stored out of the table for a better data organization. Another definition is:

An index in a tabla is a type of dictionary with some particularities:

  • The key is a subset of data from the main dictionary (a register)
  • The value is the primary key of the table (the index table).

The more common index types are B_Tree, Hash, Log-structured Merge, full-text, spatial, …

The sort order of a dictionary is defined by the key.

The key of an index is a subset of fields in the primary dictionary (row). The value in index is the primary key of the table (although there are other ways to define the value).

In order to get benefit of the indexes, queries must use the criterias defined in the index (in order) when a query is executed to a database, and you’ll get a fast response. The process can be totally different when the searching criteria are not defined, and the query must read the full table, that takes much time (slow response).

The reason why indexes are created in the tables is to make queries faster, but for the design of the indices is essential to know the queries that will be performed on the table. The price you pay for it can be very high if not done correctly, always with the basis of most frequent queries. The cost you must pay is that each insert, update or delete data also requires work on the part of the indexes (to recreate them), so that the insert, update or deletion will not be as fast.

You must think that if there is no optimal index when performing a query on a table, then the whole table must be read, and this operation can be a high cost in time!

The indexes should be prepared for a series of consultations, not for all, so it may not work with queries that are slow rates.

Three rules to define an index

Before indicating the rules, it is important to know that rules are applicable regardless of the algorithm used (B-Tree or Fractal Trees) and the database schema. In addition, it is very important to select the most frequently asked queries in order to design the index to solve these queries, plus the cost of maintaining indexes.

Although there are no fixed rule, the three rules are:

  • Read less data: the larger data you must retieve, the most work you need to do, more network, more resources, … based on the WHERE clause.

In order to apply this rule, you must study the WHERE clause, and the best option is to locate at the beginning of the index all equalities, so the number of rows to process will be lower. In case there is no equality, the order of the fields doesn’t assure you that you are reading as les data as it should in the process.

  • Avoid point queries: secuential access is faster than point access.

While the cost per record of reading an entire table is low, the price of the records sought (some records) is high, just because they have to read the entire table. However, it is desirable that the index also includes fields you require operations, so that the data is already in the index, and not have to return to the table to look. That is why, to avoid the cost of finding other columns in the table, the indices should include all fields used in the entire statement, not just those in the WHERE clause.

  • Avoid sorting, because the operators GROUP BY and ORDER BY are applies after the data is retrieved, so more work to do.

The best way of avoiding sorting is to use this fields in the index definition, so pre-ordering the data can save time, and queries are executed faster.

Although not mentioned yet, it is possible to design indexes for several fields, but keep in mind that in order to see the index fields for the optimal functioning of the index is to be respected, and that the order of fields the index is very important.

That’s only the beginning of working with indexes

As I already wrote at the beginning of this entrance, the process of creating indexes needs training, and only with practice and following this three rules, a better performance can be achieved. But this is only the beginning og working with indexing, a nice topic that will require a lot of time and study while you get advantage of indexing.

Index requires to analize the queries índices requieren de mucho análisis de las consultas!

Have a nice day.

PD: I encourage you to see this video Understanding Indexing, where all concepts are very well explained, even with examples!

Google+ Comments - Comentarios Google+