PHP标签Tag的设计模式
2009年10月21日 星期三 发表人:4studio

原文来自http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html
翻译:EasyChen 转载请保留署名和出处,保持一致。

没有太多时间进行全文翻译,就把重点挑出来,用自己的话串起来,名曰 选择性翻译。 以后可能会比较多的采用这种方式。

社会书签的tag存储一直是一个比较麻烦的问题。

一个好的数据表设计,不但要能准确查出tag,还应该支持tag的AND/OR/NOT查询。我们来看看解决方案

“MySQLicious” solution

表结构

mysqlicious database stucture

存储实例

mysqlicious sample data

Intersection (AND)

“search+webservice+semweb”类的查询:
SELECT *
FROM `delicious`
WHERE tags LIKE "%search%"
AND tags LIKE "%webservice%"
AND tags LIKE "%semweb%"

Union (OR)

“search|webservice|semweb”类的查询:

SELECT *
FROM `delicious`
WHERE tags LIKE "%search%"
OR tags LIKE "%webservice%"
OR tags LIKE "%semweb%"

Minus

“search+webservice-semweb”类的查询
SELECT *
FROM `delicious`
WHERE tags LIKE "%search%"
AND tags LIKE "%webservice%"
AND tags NOT LIKE "%semweb%"

优点:

  1. 只有一个表
  2. SQL比较直接
  3. 可以用mysql的全文检索来做,效率更高

缺点:

  1. tag的数量受到限制,通常我们都用varchar,这种字段只256个字节长。否则,你需要用text类型,速度会变慢。(Easy注,phpmore的tag用的就是TinyText)
  2. Like ‘%things%’不精确,当然某些应用中,这反而是需要的

“Scuttle” solution

数据表

database structure of scuttle

Intersection (AND)

Query for “bookmark+webservice+semweb”:
SELECT b.*
FROM scBookmarks b, scCategories c
WHERE c.bId = b.bId
AND (c.category IN (’bookmark’, ‘webservice’, ’semweb’))
GROUP BY b.bId
HAVING COUNT( b.bId )=3

首先,所有书签-tag组合被搜出来 (c.category IN ('bookmark', 'webservice', 'semweb')), ,然后选择其中包含三个的(HAVING COUNT(b.bId)=3)

Union (OR)

Query for “bookmark|webservice|semweb”:
只需要去掉 AND查询中的HAVING子句

SELECT b.*
FROM scBookmarks b, scCategories c
WHERE c.bId = b.bId
AND (c.category IN (’bookmark’, ‘webservice’, ’semweb’))
GROUP BY b.bId

Minus (Exclusion)

Query for “bookmark+webservice-semweb”, that is: bookmark AND webservice AND NOT semweb.

SELECT b. *
FROM scBookmarks b, scCategories c
WHERE b.bId = c.bId
AND (c.category IN (’bookmark’, ‘webservice’))
AND b.bId NOT
IN (SELECT b.bId FROM scBookmarks b, scCategories c WHERE b.bId = c.bId AND c.category = ’semweb’)
GROUP BY b.bId
HAVING COUNT( b.bId ) =2

好处: 我觉得这个方案比前一个方案好的最大理由是,可以有无限个tag。

“Toxi” solution

数据表

Intersection (AND)

Query for “bookmark+webservice+semweb”
SELECT b.*
FROM tagmap bt, bookmark b, tag t
WHERE bt.tag_id = t.tag_id
AND (t.name IN (’bookmark’, ‘webservice’, ’semweb’))
AND b.id = bt.bookmark_id
GROUP BY b.id
HAVING COUNT( b.id )=3

Union (OR)

Query for “bookmark|webservice|semweb”
SELECT b.*
FROM tagmap bt, bookmark b, tag t
WHERE bt.tag_id = t.tag_id
AND (t.name IN (’bookmark’, ‘webservice’, ’semweb’))
AND b.id = bt.bookmark_id
GROUP BY b.id

Minus (Exclusion)

Query for “bookmark+webservice-semweb”, that is: bookmark AND webservice AND NOT semweb.

SELECT b. *
FROM bookmark b, tagmap bt, tag t
WHERE b.id = bt.bookmark_id
AND bt.tag_id = t.tag_id
AND (t.name IN (’Programming’, ‘Algorithms’))
AND b.id NOT IN (SELECT b.id FROM bookmark b, tagmap bt, tag t WHERE b.id = bt.bookmark_id AND bt.tag_id = t.tag_id AND t.name = ‘Python’)
GROUP BY b.id
HAVING COUNT( b.id ) =2

Leaving out theHAVING COUNTleads to the Query for “bookmark|webservice-semweb”.

好处:

  1. 你可以给每个tag添加额外的信息
  2. 这是最规范的方案,第三范式。

坏处:

  1. 删除tag时,你要从多个表中删除(Easy注,Mysql5的话,可以用trigger来做)

然后我们把视线从功能转移到性能上。

A+B

250个tag

Intersection test with 300 queries, up to three tags in query, 250 tags in small dataset

999个tag

Intersection test with 300 queries, up to three tags in query, 250 tags in small dataset

A OR B

250个tag

Union test with 250 tags in small dataset

添加速度比较

Setup database schemas with the data: 250 tags in small dataset

测试代码下载 Download the source code (PHP) LGPL协议。

抱歉,评论功能未启用。