MySQL Schema Performance Considerations

The most optimization tricks for MySQL focus on query performance or server tuning. But the optimization starts with the design of the database schema. When you forget to optimize the base of your database (the structure), then you will pay the price of your laxity from the beginning of your work with the database. Sure, every storage engine have his own advantages and disadvantages. But regardless of the engine you choose, you should consider the following chapters in your database schema. In this article, I will write about some architectural considerations, that you should keep in mind, when you design a database, that will run under heavy load. This article is a part of the web-application performance series.

Read-optimized tables

In current web applications, the frequency of reading from a database is much likely, then writing data to it. For this reason, the database server have to scan a lot tables, to deliver the data. The most tables, that are including the (so called) master data are very huge. This tables have a big row size and they are very costly to scan. For this reasons it is a well-known method to split up this table. You can hold the data, that is most likely involved in scanning processes, in a seperate table, to avoid a scan about the whole data. A problem occures, when you want do query the full data set. Here you have two options:

  1. execute a join between the read-optimized table and the other table (using a foreign key in the read-optimized table)
  2. denormalize the tables with an attribute, that both tables have (and that is indexed two times)

The first alternative is not oriented on performance aspects, but it fullfills all rules of normalization. The second alternative is a denormalized table, that includes some redundancy in data. Most web applications run with little redundancies in there databases to use the performance improvement of this read-optimized tables. See also the chapter: “denormalize tables”.

Denormalize tables

As I said in the chapter above, you should denormalize tables, when your joins between two and more tables are to slow. To denormalize should be the last step, after all of your architectural repertoire is exhausted. Because denormalized tables mean to have some data not only in one column. Of course, some inconsistencies can occure and the database is stronger to handle. The administrative outlay will increase, if you denormalize to much of your data.

Denormalization is useful, if you split one table into two tables and you don’t want to join this tables together in a query. In most cases, the database would execute a join with over 2 tables to get the information. This is really costly for current web applications! But be aware, think about, which denormalization is necessary and which is not.

Artificial key columns

If you have a fast natural key (that results from the data analysis) don’t add a artifical key to it. Remember that all keys are need disk or/and memory space. But you have to weigh up whether your natural key is fast enough or not. Maybe your key is alphanumeric and is to large to be efficient under high load. Then you should use a numeric and artificial primary key.

Bigint vs. unsigned

It is important for the performance to choose the right data type for the primary key. Consider, if you really need a BIGINT or INT. Maybe the MEDIUMINT datatype can fix your needs. The difference between INT and MEDIUMINT is, that you save 25% of space with using MEDIUMINT. You should use the smallest data type for 2 reasons. The first reason is, that the row will be smaller and the second reason is, that the index of the primary key will me smaller and faster. Another idea is, to use the UNSIGNED part of a data type. In most cases, the range of UNSIGNED INT will suffice. Of course, you can apply this consideration for all numerical fields (and not just for key columns).

ranges of data types

Type Storage Minimum Value Maximum Value
(Bytes) (Signed/Unsigned) Signed/Unsigned)
TINYINT 1 -128 127
0 255
SMALLINT 2 -32768 32767
0 65535
MEDIUMINT 3 -8388608 8388607
0 16777215
INT 4 -2147483648 2147483647
0 4294967295
BIGINT 8 -9223372036854775808 9223372036854775807
0 18446744073709551615

source: mysql.com

Text vs. varchar

This is a one of the most rookie mistakes with MySQL. Some people think, that it is better to use TEXT instead of VARCHAR. One point is that VARCHAR has a defined length and TEXT hasn’t. But that is not the last one. Furthermore TEXT needs more memory for sorting. However, you should use TEXT just as a “storage” data type and VARCHAR as small field of textual information (or as a key).

NOT NULL

If it is possible, declare all columns as NOT NULL. Every column needs a bit more and this is not necessary. When you really need this column, you should use it. But when you can avoid this, you should avoid it. MySQL does more working steps, if you set NULL to a column.

Other considerations

  • Do not index columns that you not need in a select
  • Use clever refactoring to admit changes to your schema
  • Choose the minimal character set, that fits your needs
  • Use triggers just, when you really need it

3 thoughts on “MySQL Schema Performance Considerations

Leave a Reply

Your email address will not be published. Required fields are marked *