mysql> select * from log_event where id >= ‘latest id’ and datasource_id = ‘datasource_id’ and level >= ‘log level’ order by id desc;In effect this query asks: “give me the latest logs for datasource with id X having having at least a certain log level”. Partitioning basically divides a table into different portions that are stored and can be queried separately. The benefit is that if a query only has to hit a small portion instead of the whole table, it can be answered faster. There are different ways that you can partition tables in MySQL, and you can read about them all in the MySQL reference manual. I first experimented using Key partitioning using the table ID. Unfortunately, because different logs for a datasource could be spread across different partitions, the tail query would have to hit all partitions. To check how many partitions the query hits, I used the following query:
mysql> explain partitions select * from … ;This resulted in an even slower response than without partitioning, so Tim thought about it from a different angle. He discovered a nice solution using Range partitioning by datasource ID instead. This way the table would get divided into ranges of datasources that are contiguous but not overlapping. A range size of 1000 was used, so the 1st partition would contain all logs for datasources with IDs between 0 – 999, the 2nd partition would contain all logs for datasources with IDs between 1000 – 1999 and so on. Part of the command used to apply Tim’s partitioning strategy (having 36 partitions) is displayed below:
ALTER TABLE log_event ADD PRIMARY KEY(id,bio_datasource_fk) PARTITION BY RANGE (bio_datasource_fk) ( PARTITION p0 VALUES LESS THAN (1000), PARTITION p1 VALUES LESS THAN (2000), . . PARTITION p36 VALUES LESS THAN MAXVALUE );Checking how many partitions the tail query would hit, I confirmed that it only ever uses a single partition. The result was impressive, and initial tests resulted in a speed-up of over 9000 times! Important to note is that the primary key must include all fields in the partition. Therefore because we were partitioning using the datasource id, this field had to be included in the primary key before partitioning would work. Also, an index on the id was also added to further optimize the query - why not right? The speed-up might be dramatic now, but as more log messages get written to a partition and it starts to swell, I envisage having to either delete old logs or repartition the table again using smaller range sizes in order to sustain good performance. There is a trade-off between the number of partitions and performance, so some tweaking is needed in every case I guess. Lastly, I’ll reiterate that improper partitioning can actually make things worse. Perhaps it could work for you too, but please apply with caution.