Sunday, March 11, 2007

Reworking a message table

In one of the databases I manage, there is currently two large tables containing Messages and
Guestbook-entries. Both of these tables are huge, and is starting to become the primary db-servers' main concern.

Create definition of the current (old) table:


CREATE TABLE `message` (
`id` int(10) unsigned NOT NULL auto_increment,
`sender_id` int(11) unsigned NOT NULL default '0',
`receiver_id` int(11) unsigned NOT NULL default '0',
`sent` datetime NOT NULL default '0000-00-00 00:00:00',
`status` enum('unread','read','archived','replied','deleted') NOT NULL default 'unread',
`massmess_id` int(10) unsigned default NULL,
`title` varchar(127) NOT NULL default '',
`message` text NOT NULL,
`show_out` enum('Y','N') NOT NULL default 'Y',
PRIMARY KEY (`id`),
KEY `massmess_id_idx` (`massmess_id`),
KEY `receiver_id_sent_idx` (`receiver_id`,`sent`),
KEY `receiver_id_status_idx` (`receiver_id`,`status`),
KEY `sender_id_sent_idx` (`sender_id`,`sent`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

(The guestbook table looks very similar.)

Needless to say, holding 7 million rows, it was time to re-design and also consider joining the two big tables into a single database structure that could scale this data more efficiently.

Considering that I run these tables in InnoDB, I would ofcourse want to take advantage of it's
index clustering feature, and make sure we don't duplicate data too much.

For any efficient design to work, I needed to look at the current queries executed in the messaging system.

There are 4 major query types (% are guestimated):
Count new messages
SELECT COUNT(*) FROM message WHERE receiver_id=X AND status='unread' (90%)
Read inbox messages:
SELECT basic_fields FROM message [JOINS] WHERE receiver_id=X; (7%)
Read outbox messages:
SELECT basic_fields FROM message [JOINS] WHERE sender_id=X; (2%)
Read full message:
SELECT most_fields FROM message WHERE id=X; (1%)

Mainly considering these queries, I ended up with the following new tables (no FOREIGN KEYS in this example):

CREATE TABLE message_receiver (
receiver_id int unsigned not null,
message_id int unsigned not null,
type enum('message', 'guestbook') NOT NULL DEFAULT 'message',
status enum('unread','read','archived','replied','deleted') NOT NULL default 'unread',
PRIMARY KEY(receiver_id, message_id, type, status)
) ENGINE=InnoDB;

CREATE TABLE message_sender (
sender_id int unsigned not null,
message_id int unsigned not null,
type enum('message', 'guestbook') NOT NULL DEFAULT 'message',
status enum('unread','read','archived','replied','deleted') NOT NULL default 'unread',
PRIMARY KEY(sender_id, message_id, type, status)
) ENGINE=InnoDB;

CREATE TABLE message_detail (
`message_id` int unsigned NOT NULL default '0',
`sender_id` int(11) unsigned NOT NULL default '0',
`receiver_id` int(11) unsigned NOT NULL default '0',
`sent` datetime NOT NULL default '0000-00-00 00:00:00',
`massmess_id` int(10) unsigned default NULL,
PRIMARY KEY (message_id, receiver_id),
KEY `massmess_id_idx` (`massmess_id`),
KEY sender_idx (sender_id)
) ENGINE=InnoDB;

CREATE TABLE message_data (
`message_id` int unsigned NOT NULL default '0',
`title` varchar(127) NOT NULL default '',
`message` text NOT NULL,
PRIMARY KEY (`message_id`)
) ENGINE=InnoDB;



This structure provides for extremely high-speed access for counting the unread messages, and also for listing the inbox and outbox, due to InnoDB clustering of the primary keys in message_receiver and message_sender tables. The small size of these tables also make them fit better in the innodb data buffer.

Now for the last trick, which saves me having to rewrite some of the code, but still benefit from the optimizations of the new structure; A view. MySQL supports it, it works, so let's use it.

Defining the view is simple, and I create it to mimic the definition of the old "message" table:
CREATE VIEW message AS   SELECT MR.message_id, MR.receiver_id, MR.type, MR.status,
MD.sent, MD.massmess_id,
DA.title, DA.message
FROM message_receiver MR
INNER JOIN message_detail MD ON MD.message_id=MR.message_id
INNER JOIN message_data DA ON DA.message_id=MR.message_id;

While the view only allows us to read data the way we used to (multi-table updates on a view is not yet possible), it means I need not rewrite all of the code in my application, but only change the methods that modify data.

The new optimized query for counting unread messages:
Count new messages:
SELECT COUNT(*) FROM message_receiver WHERE receiver_id=X AND status='unread'

From EXPLAIN, I can see that this query now "Uses index", and I can also see that this index is used even if I perform the original count-query against the view. Amazing!

Left now is populating the new tables and moving the old tabel out of the way for the view.
ALTER TABLE message RENAME message_old;

INSERT INTO message_receiver (receiver_id,message_id,status) SELECT receiver_id,id,status FROM message;
INSERT INTO message_sender (sender_id,message_id,status) SELECT sender_id,id,status FROM message;
INSERT INTO message_detail (message_id,receiver_id,sender_id,sent,massmess_id) SELECT id,receiver_id,sender_id,sent,massmess_id FROM message_old;
INSERT INTO message_data (message_id,title,message) SELECT id,title,message FROM message_old;

That's it for the database part of things. Time to dive in to the code and make this work =)

Comments appreciated!

No comments:

Post a Comment