Apr
20

The scope of MySQL optimization is very wide. One aspect of this complex field is the query performance and this article deals with it. MySQL provides powerful tools, that allow you to have a look inside the query parser & optimizer. To understand this article some MySQL server & SQL knowledge is required. Please be aware, that this article is just an overview about query performance. It can’t cover all aspects of it. This article is a part of the web-application performance series.

A simple calculation

On MySQL.com the company behind MySQL (MySQL AB) published a simple calculation to find out the estimated query performance:

log(row_count)/log(index_block_length/3*2/(index_length+data_pointer_length))+1
  • row_count: number of rows in your table
  • index_block_length: normally 1024 bytes
  • index_length: key value length (usually 3 bytes)
  • data_pointer_length: data pointer (usually 4 bytes)

The result of this formula represents the number of disk seeks for a special query. This formula is not valid for all queries.

Using EXPLAIN

EXPLAIN is a powerful tool to analyze the query performance. You can use EXPLAIN for every query.

EXPLAIN SELECT * FROM employee WHERE id = 18273;

After sending this query to the server, the result set contains information of the MySQL Optimizer. The optimizer analyzes the query and decides which execution plan would be the fastest and/or cost-effective. Check approximatly all queries to make sure, that you don’t overlook a unindexed or slow query.

The output of the optimizer contains the following columns:

  • table: involved tables (if you do a join, then this table contains more than one entry)
  • type: type of query (e.g. SYSTEM | CONST | EQ_REF| REF | RANGE | INDEX | ALL, ordered by the fastest in a descending order)
  • possible_keys: all possible keys for this query
  • key: finally used key
  • key_len: the length of the key (the length correlates with the performance)
  • ref: shows which column or constant would used to compare the value of key
  • rows: estimated number of rows to check
  • Extra: additional information about how MySQL executes the query

If you want to use an index that MySQL don’t use in the execution plan, then you can force MySQL to use this index with:

SELCT lname, fname, birthday FROM employee FORCE INDEX(lname)...

LIMIT

You should use LIMIT, if you have a lot of rows in your result set. It saves a lot of time, if you’re using LIMIT. So, you can build your query, after your pagination variable is evaluated.

SELECT lname,fname FROM employee LIMIT 0, 30

or with the application of a pagination:

SELECT lname, fname FROM employee LIMIT ($page * $rows_per_page), $rows_per_page

Caching queries effectively

MySQL implements a query cache. This cache saves the execution plan for your queries and the results. Until the table is not changed, the query cache holds the queries in the cache (depends on the possible memory).  The most common fault with the query cache is the innocence of how the query cache works. It works case-sensitive! This can be tricky if you use the same query more than one times. Then, you have to establish query coding-standards in your application, to keep the cache smart, efficently and economical.

SELECT * FROM employee WHERE id > 10 AND id < 10000

is not the same like:

select * from employee where id > 10 AND id < 10000

Split up complex queries

At a certain level, some queries can perform really slow. This will happen especially by joins, correlated queries, large tables and such complex constructions. Then you achieve a better performance with splitting this one query into more queries. Test this queries against the EXPLAIN command to check which query performs well and which not. But, be aware that before splitting a query up , the first option should be to check the indexes and the query cache.

SQL_CACHE vs SQL_NO_CACHE

If you are performing statistical queries to update user statistics and so one you should use the SQL_NO_CACHE directive. This directive is a part of the query cache. If you set the value of query_cache to  2, then you can use this directive. Use SQL_NO_CACHE for all queries, that you don’t need in your application. Because the space of your query cache is limited and shouln’t be wasted.

LIKE with %

‘%’ is known as the wildcard operator in MySQL. If you have an index on a column it can perform fast.

SELECT * FROM employee WHERE lname LIKE('New%');

This works fast, if you have an index on lname. The example below don’t works fast, because the index couldn’t be used:

SELECT * FROM employee WHERE lname LIKE('%man');

So, try to avoid the wildcard at the beginning of the like-clause. An considerable option can be to reverse the values of this column.

Other considerations

  • avoid calculated comparisons between values
  • don’t use DISTINCT and GROUP BY together
  • avoid correlated subqueries
  • use the FULLTEXT search and avoid LIKE
Apr
17

This short article i have focused on a special field of PHP optimization. It deals with the php.ini.

The php.ini file is the configuration file of PHP. It contains directives and settings, that affect the execution of the PHP script. The possible modifcations, that you can do to increase the speed of execution, are very special. You have to decide which of the settings, that are listed below, seems wise to you and your application. This article is a part of the web-application performance series.

realpath_cache_size & realpath_cache_ttl

PHP uses a realpath-cache to cache the path of include/required-files. This cache was introduced in PHP 5.10 and brought a high improvement of speed into PHP. The default value should be 16K, which is a really good ratio.

register_globals

The default value since PHP 4.2.0 is set to Off, nevertheless you should convince yourself, that this value isn’t  set to On. On the one hand, register_globals are well known as a big security issue and on the other hand PHP creates all variables in $_GET, $_POST, $_COOKIE, $_REQUEST, $_SERVER and $_SESSION in register_globals variables, too. This costs a unnecessary amount of time!

always_populate_raw_post_data

Is always_populate_raw_post_data set to On, PHP fills $HTTP_RAW_POST_DATA with raw data, that comes in with POST. Turn this directive to Off,  if you’re not planning something special with it. It saves time and memory.

register_long_arrays

This directive copies all data into the outdated $HTTP_*_VARS. You can access this data for instance with $_GET. You should turn register_long_arrays to Off to save time and memory.

expose_php

To tell the webserver which version of PHP is installed, PHP sends an information string to the webserver. Set this directive to Off, to avoid an unnecessary string copy. In addition to that, it is safer to tell just the necessary information and not more.

register_argc_argv

If you are using your PHP scripts with a webserver, then you don’t need argc (number of command line arguments) and argv (array that contains the command line arguments). Is register_argc_argv set to On, then PHP tries to parse and copy this variables into the symbol table of the script. This takes unnecessary time. Turn it to Off to save time.

asp_tags

The asp_tags directive points out, that the PHP detects also ASP tags like <% and %>. It’s not best practice to use ASP tags in PHP scripts. If you don’t use these tags, you should turn Off the asp_tags directive to save time.

Jul
09

Today we want to introduce you to the Static File Optimization Script called lightscript. We use some techniques of the script in our web-based projects. The handling is very easy and the effect is huge. Lightscript allows the user to gzip, merge and cache the merged files on the fly. Now, we have to explain some aspects of lightscript.

$cache_dir and $script_dir are the variables that point out the place where the scripts are located and where lightscript can store cached files. Make sure that these directories are writeable (777). In the downloadable archive we have created 2 folders (scripts & cache). You can use this folders by default.

Lightscript also implements some cache directives.

header("Expires: ".date('r',mktime(0,0,0,12,12,date('Y')+5)));
header("Pragma: public");
header("Cache-control: public");
You can perform a request on lightscript in the following way:
http://example.com/scripts/lightscript.php?q=prototype.js|scriptaculous.js|main.js&c=1
With q you can list all files you need and with c you decide whether cache the file or not (c=1 || c=0).

Here you can download lightscript in version one.
Jun
27

Hello World,

first of all we want to introduce to you the cip-labs. CIP is an acronym for „Code Is Poetry“. And that it’s all about. „Code“ is the keyword for our new site. At the cip-labs we will run some performance tests, introduce serveral software and own projects to the public. At the moment our team contains only one engineer. It’s Alex.

Alex studies computer science at the HTW Dresden. He works for some companies to improve his knowledge and to get some feedback from these clients. His subjects are networks, administration/clustering, C/C++, C#, Java, PHP (favourite), Python and databases. Alex is specialised in optimizing webapplications.

This was our short introduction to cip-labs.

Stay tuned!