postgresql 的全文搜索

问题

FC 的搜索功能非常慢, 因为对 text 类型的字段使用了 like 查询:

”‘sql select * from ( select * from ( select * from ( select m., c.message_id as anki, c.created_date as anki_date from ( select m., c.message_id as collect, c.created_date as collect_date from ( select * from all_message m where 1=1 ) m LEFT OUTER JOIN collect c ON m.id = c.message_id and c.user_id=491 ) m LEFT OUTER JOIN anki_save c ON m.id = c.message_id and c.user_id=491 ) s ) s where upper(s.text) like ‘%怎么快了%’ or upper(s.content::text) like ‘%怎么快了%’

) s where
        s.created_at < '2017-03-28 23:53:17.427000'

 limit 10

”’

text 上是不能建索引的:

PostgreSQL said: index row requires 10320 bytes, maximum size is 8191

解决

官方让用这个:

SELECT *
FROM message
WHERE to_tsvector(text) @@ to_tsquery('美女')
limit 10

从 20s, 缩到 8s 了

但是还是慢

中文分词

索引要建立的话要指定分词来命中

http://mysql.taobao.org/monthly/2015/12/02/

http://www.jianshu.com/p/01631cafe120

太麻烦了, 先做英文的吧

英文索引

CREATE INDEX index_message_text ON message USING gin(to_tsvector('english', text));
CREATE INDEX index_message_content ON message USING gin(to_tsvector('english', content));

出现了不少报错, 估计就是里面的中文什么的:

NOTICE:  word is too long to be indexed
DETAIL:  Words longer than 2047 characters are ignored.
NOTICE:  word is too long to be indexed
DETAIL:  Words longer than 2047 characters are ignored.
NOTICE:  word is too long to be indexed
DETAIL:  Words longer than 2047 characters are ignored.
NOTICE:  word is too long to be indexed
DETAIL:  Words longer than 2047 characters are ignored.
NOTICE:  word is too long to be indexed
DETAIL:  Words longer than 2047 characters are ignored.
CREATE INDEX

现在查就快多了, 300 ms

SELECT *
FROM message
WHERE to_tsvector(text) @@ to_tsquery('test')
limit 10

当然查中文还是要 8~10s

搞定

把 upper 去了, 能用上索引, 查英文 200-300 ms 能出来

        select * from (
        select * from ( select * from (
            select m.*, c.message_id as anki, c.created_date as anki_date
                from (
            select m.*, c.message_id as collect, c.created_date as collect_date
                from (
    select * from all_message m where 1=1
    ) m
            LEFT OUTER JOIN collect c
                ON m.id = c.message_id
                and c.user_id=491
        ) m
            LEFT OUTER JOIN anki_save c
                ON m.id = c.message_id
                and c.user_id=491
        ) s ) s where
            to_tsvector(s.text) @@ to_tsquery('test') or to_tsvector(s.content::text) @@ to_tsquery('haha')

    ) s where
            s.created_at < '2017-03-28 23:53:17.427000'

     limit 10

用 to_tsvector 查中文时候反回还没有 like 快, 所以还是换回 like 了

索引在 like 下也有用, 查速度只要几百 ms