华为云云数据库GaussDB文本检索函数和操作符_云淘科技

云数据库 GaussDB

11 月 15, 2023

84 0

文本检索操作符

描述：tsvector类型的词汇与tsquery类型的词汇是否匹配

示例：

gaussdb=# SELECT to_tsvector('fat cats ate rats') @@ to_tsquery('cat & rat') AS RESULT;
 result 
--------
 t
(1 row)

@@@

描述：@@的同义词

示例：

gaussdb=# SELECT to_tsvector('fat cats ate rats') @@@ to_tsquery('cat & rat') AS RESULT;
 result 
--------
 t
(1 row)

描述：连接两个tsvector类型的词汇

示例：

gaussdb=# SELECT 'a:1 b:2'::tsvector || 'c:1 d:2 b:3'::tsvector AS RESULT;
          result           
---------------------------
 'a':1 'b':2,5 'c':3 'd':4
(1 row)

描述：将两个tsquery类型的词汇进行“与”操作

示例：

gaussdb=# SELECT 'fat | rat'::tsquery && 'cat'::tsquery AS RESULT;
          result           
---------------------------
 ( 'fat' | 'rat' ) & 'cat'
(1 row)

描述：将两个tsquery类型的词汇进行“或”操作

示例：

gaussdb=# SELECT 'fat | rat'::tsquery || 'cat'::tsquery AS RESULT;
          result           
---------------------------
 ( 'fat' | 'rat' ) | 'cat'
(1 row)

描述：tsquery类型词汇的非关系

示例：

gaussdb=# SELECT !! 'cat'::tsquery AS RESULT;
 result 
--------
 !'cat'
(1 row)

描述：一个tsquery类型的词汇是否包含另一个tsquery类型的词汇

示例：

gaussdb=# SELECT 'cat'::tsquery @> 'cat & rat'::tsquery AS RESULT;
 result 
--------
 f
(1 row)

描述：一个tsquery类型的词汇是否被包含另一个tsquery类型的词汇

示例：

gaussdb=# SELECT 'cat'::tsquery <@ 'cat & rat'::tsquery AS RESULT;
 result 
--------
 t
(1 row)

除了上述的操作符，还为tsvector类型和tsquery类型的数据定义了普通的B-tree比较操作符（=，<等）。

文本检索函数

get_current_ts_config()

描述：获取文本检索的默认配置。

返回类型：regconfig

示例：

gaussdb=# SELECT get_current_ts_config();
 get_current_ts_config 
-----------------------
 english
(1 row)

length(tsvector)

描述：tsvector类型词汇的单词数。

返回类型：integer

示例：

gaussdb=# SELECT length('fat:2,4 cat:3 rat:5A'::tsvector);
 length 
--------
      3
(1 row)

numnode(tsquery)

描述：tsquery类型的单词加上操作符的数量。

返回类型：integer

示例：

gaussdb=# SELECT numnode('(fat & rat) | cat'::tsquery);
 numnode 
---------
       5
(1 row)

plainto_tsquery([ config regconfig , ] query text)

描述：产生tsquery类型的词汇，并忽略标点

返回类型：tsquery

示例：

gaussdb=# SELECT plainto_tsquery('english', 'The Fat Rats');
 plainto_tsquery 
-----------------
 'fat' & 'rat'
(1 row)

querytree(query tsquery)

描述：获取tsquery类型的词汇可加索引的部分。

返回类型：text

示例：

gaussdb=# SELECT querytree('foo & ! bar'::tsquery);
 querytree 
-----------
 'foo'
(1 row)

setweight(tsvector, “char”)

描述：给tsvector类型的每个元素分配权值。

返回类型：tsvector

示例：

gaussdb=# SELECT setweight('fat:2,4 cat:3 rat:5B'::tsvector, 'A');
           setweight           
-------------------------------
 'cat':3A 'fat':2A,4A 'rat':5A
(1 row)

strip(tsvector)

描述：删除tsvector类型单词中的position和权值。

返回类型：tsvector

示例：

gaussdb=# SELECT strip('fat:2,4 cat:3 rat:5A'::tsvector);
       strip       
-------------------
 'cat' 'fat' 'rat'
(1 row)

to_tsquery([ config regconfig , ] query text)

描述：标准化单词，并转换为tsquery类型。

返回类型：tsquery

示例：

gaussdb=# SELECT to_tsquery('english', 'The & Fat & Rats');
  to_tsquery   
---------------
 'fat' & 'rat'
(1 row)

to_tsvector([ config regconfig , ] document text)

描述：去除文件信息，并转换为tsvector类型。

返回类型：tsvector

示例：

gaussdb=# SELECT to_tsvector('english', 'The Fat Rats');
   to_tsvector   
-----------------
 'fat':2 'rat':3
(1 row)

to_tsvector_for_batch([ config regconfig , ] document text)

描述：去除文件信息，并转换为tsvector类型。

返回类型：tsvector

示例：

gaussdb=# SELECT to_tsvector_for_batch('english', 'The Fat Rats');
   to_tsvector   
-----------------
 'fat':2 'rat':3
(1 row)

ts_headline([ config regconfig, ] document text, query tsquery [, options text ])

描述：高亮显示查询的匹配项。

返回类型：text

示例：

gaussdb=# SELECT ts_headline('x y z', 'z'::tsquery);
 ts_headline  
--------------
 x y z
(1 row)

ts_rank([ weights float4[], ] vector tsvector, query tsquery [, normalization integer ])

描述：文档查询排名。

返回类型：float4

示例：

gaussdb=# SELECT ts_rank('hello world'::tsvector, 'world'::tsquery);
 ts_rank  
----------
 .0607927
(1 row)

ts_rank_cd([ weights float4[], ] vector tsvector, query tsquery [, normalization integer ])

描述：排序文件查询使用覆盖密度。

返回类型：float4

示例：

gaussdb=# SELECT ts_rank_cd('hello world'::tsvector, 'world'::tsquery);
 ts_rank_cd 
------------
          .0
(1 row)

ts_rewrite(query tsquery, target tsquery, substitute tsquery)

描述：替换目标tsquery类型的单词。

返回类型：tsquery

示例：

gaussdb=# SELECT ts_rewrite('a & b'::tsquery, 'a'::tsquery, 'foo|bar'::tsquery);
       ts_rewrite        
-------------------------
 'b' & ( 'foo' | 'bar' )
(1 row)

ts_rewrite(query tsquery, select text)

描述：使用SELECT命令的结果替代目标中tsquery类型的单词。

返回类型：tsquery

示例：

gaussdb=# SELECT ts_rewrite('world'::tsquery, 'select ''world''::tsquery, ''hello''::tsquery');
 ts_rewrite 
------------
 'hello'
(1 row)

文本检索调试函数

ts_debug([ config regconfig, ] document text, OUT alias text, OUT description text, OUT token text, OUT dictionaries regdictionary[], OUT dictionary regdictionary, OUT lexemes text[])

描述：测试一个配置。

返回类型：setof record

示例：

gaussdb=# SELECT ts_debug('english', 'The Brightest supernovaes');
                                     ts_debug                                      
-----------------------------------------------------------------------------------
 (asciiword,"Word, all ASCII",The,{english_stem},english_stem,{})
 (blank,"Space symbols"," ",{},,)
 (asciiword,"Word, all ASCII",Brightest,{english_stem},english_stem,{brightest})
 (blank,"Space symbols"," ",{},,)
 (asciiword,"Word, all ASCII",supernovaes,{english_stem},english_stem,{supernova})
(5 rows)

ts_lexize(dict regdictionary, token text)

描述：测试一个数据字典。

返回类型：text[]

示例：

gaussdb=# SELECT ts_lexize('english_stem', 'stars');
 ts_lexize 
-----------
 {star}
(1 row)

ts_parse(parser_name text, document text, OUT tokid integer, OUT token text)

描述：测试一个解析。

返回类型：setof record

示例：

gaussdb=# SELECT ts_parse('default', 'foo - bar');
 ts_parse  
-----------
 (1,foo)
 (12," ")
 (12,"- ")
 (1,bar)
(4 rows)

ts_parse(parser_oid oid, document text, OUT tokid integer, OUT token text)

描述：测试一个解析。

返回类型：setof record

示例：

gaussdb=# SELECT ts_parse(3722, 'foo - bar');
 ts_parse  
-----------
 (1,foo)
 (12," ")
 (12,"- ")
 (1,bar)
(4 rows)

ts_token_type(parser_name text, OUT tokid integer, OUT alias text, OUT description text)

描述：获取分析器定义的记号类型。

返回类型：setof record

示例：

gaussdb=# SELECT ts_token_type('default');
                        ts_token_type                         
--------------------------------------------------------------
 (1,asciiword,"Word, all ASCII")
 (2,word,"Word, all letters")
 (3,numword,"Word, letters and digits")
 (4,email,"Email address")
 (5,url,URL)
 (6,host,Host)
 (7,sfloat,"Scientific notation")
 (8,version,"Version number")
 (9,hword_numpart,"Hyphenated word part, letters and digits")
 (10,hword_part,"Hyphenated word part, all letters")
 (11,hword_asciipart,"Hyphenated word part, all ASCII")
 (12,blank,"Space symbols")
 (13,tag,"XML tag")
 (14,protocol,"Protocol head")
 (15,numhword,"Hyphenated word, letters and digits")
 (16,asciihword,"Hyphenated word, all ASCII")
 (17,hword,"Hyphenated word, all letters")
 (18,url_path,"URL path")
 (19,file,"File or path name")
 (20,float,"Decimal notation")
 (21,int,"Signed integer")
 (22,uint,"Unsigned integer")
 (23,entity,"XML entity")
(23 rows)

ts_token_type(parser_oid oid, OUT tokid integer, OUT alias text, OUT description text)

描述：获取分析器定义的记号类型。

返回类型：setof record

示例：

gaussdb=# SELECT ts_token_type(3722);
                        ts_token_type                         
--------------------------------------------------------------
 (1,asciiword,"Word, all ASCII")
 (2,word,"Word, all letters")
 (3,numword,"Word, letters and digits")
 (4,email,"Email address")
 (5,url,URL)
 (6,host,Host)
 (7,sfloat,"Scientific notation")
 (8,version,"Version number")
 (9,hword_numpart,"Hyphenated word part, letters and digits")
 (10,hword_part,"Hyphenated word part, all letters")
 (11,hword_asciipart,"Hyphenated word part, all ASCII")
 (12,blank,"Space symbols")
 (13,tag,"XML tag")
 (14,protocol,"Protocol head")
 (15,numhword,"Hyphenated word, letters and digits")
 (16,asciihword,"Hyphenated word, all ASCII")
 (17,hword,"Hyphenated word, all letters")
 (18,url_path,"URL path")
 (19,file,"File or path name")
 (20,float,"Decimal notation")
 (21,int,"Signed integer")
 (22,uint,"Unsigned integer")
 (23,entity,"XML entity")
(23 rows)

ts_stat(sqlquery text, [ weights text, ] OUT word text, OUT ndoc integer, OUT nentry integer)

描述：获取tsvector列的统计数据。

返回类型：setof record

示例：

gaussdb=# SELECT ts_stat('select ''hello world''::tsvector');
   ts_stat   
-------------
 (world,1,1)
 (hello,1,1)
(2 rows)

父主题： 函数和操作符

同意关联代理商云淘科技，购买华为云产品更优惠（QQ 78315851）

内容没看懂？不太想学习？想快速解决？有偿解决：联系专家