华为云云数据库GaussDB收集文献统计_云淘科技

云数据库 GaussDB

11 月 15, 2023

142 0

函数ts_stat可用于检查配置和查找候选停用词。

1
2
3

ts_stat(sqlquery text, [ weights text, ]
        OUT word text, OUT ndoc integer,
        OUT nentry integer) returns setof record

sqlquery是一个包含SQL查询语句的文本，该SQL查询将返回一个tsvector。ts_stat执行SQL查询语句并返回一个包含tsvector中每一个不同的语素（词）的统计信息。返回信息包括：

word text：词素。
ndoc integer：词素在文档（tsvector）中的编号。
nentry integer：词素出现的频率。

如果设置了权重条件，只有标记了对应权重的词素才会统计频率。例如，在一个文档集中检索使用频率最高的十个单词：

gaussdb=# SELECT * FROM ts_stat('SELECT to_tsvector(''english'', sr_reason_sk) FROM tpcds.store_returns WHERE sr_customer_sk < 10') ORDER BY nentry DESC, ndoc DESC, word LIMIT 10;
   word | ndoc | nentry 
------+------+--------
 32   |    2 |      2
 33   |    2 |      2
 1    |    1 |      1
 10   |    1 |      1
 13   |    1 |      1
 14   |    1 |      1
 15   |    1 |      1
 17   |    1 |      1
 20   |    1 |      1
 22   |    1 |      1
(10 rows)

同样的情况，但是只计算权重为A或者B的单词使用频率：

gaussdb=# SELECT * FROM ts_stat('SELECT to_tsvector(''english'', sr_reason_sk) FROM tpcds.store_returns WHERE sr_customer_sk < 10', 'a') ORDER BY nentry DESC, ndoc DESC, word LIMIT 10;
 word | ndoc | nentry 
------+------+--------
(0 rows)

父主题： 附加功能

同意关联代理商云淘科技，购买华为云产品更优惠（QQ 78315851）

内容没看懂？不太想学习？想快速解决？有偿解决：联系专家

华为云云数据库GaussDB收集文献统计_云淘科技

分类

近期文章

近期评论

友情链接

分类目录

相关文章

分类

近期文章

近期评论

友情链接

分类目录