华为云AI开发平台ModelArtsNGram Count_云淘科技
概述
将分词后的句子生成连续N个词的NGram短语,并进行全局个数的统计,支持权重列输入。
输入
参数 |
子参数 |
参数说明 |
---|---|---|
inputs |
input_table |
输入表表名,输入的包含分词后的句子的数据表;必填; |
inputs |
vocab_table |
词袋词汇表;非必填; |
inputs |
count_table |
历史ngram-count输出表;非必填; |
输入参数说明
参数名称 |
参数说明 |
参数要求 |
---|---|---|
input_words_col_name |
分词列,即进行ngram分词处理的列 |
string类型;必填;仅支持单列 |
input_words_sep |
分词列中的词分隔符 |
string类型;必填;默认为” “ |
input_weight_col_name |
分词行权重 |
string类型;表列为数值类型;非必填; |
vocab_words_col_name |
词袋词汇表的词汇列列名 |
string类型;若词袋表不为空,此项为必填 |
count_gram_col_name |
每个ngram短语的词个数(n),如1-gram,2-gram…,显示1-n等 |
string类型;表列为数值类型;若历史输出表不为空,此项为必填 |
count_word_col_name |
ngram短语列 |
string类型;若历史输出表不为空,此项为必填 |
count_count_col_name |
ngram统计列 |
string类型;表列为数值类型;若历史输出表不为空,此项为必填 |
order |
ngram最大单词个数,即n-gram的n |
integer类型;必填;order范围为[1,3] |
输出
参数 |
子参数 |
参数说明 |
---|---|---|
output |
output_port_1 |
输出表表名,标签为dataframe |
输出表说明
列名 |
列名描述 |
备注 |
---|---|---|
ngram |
ngram短语词个数 |
1~n |
words |
ngram短语 |
– |
count |
个数统计 |
weight加权累计 |
1. 词袋过滤:
不在词袋中的单个词会被转为。
2. order含义:
例如order为3,则会输出1-gram 2-gram 3-gram。
3. weight列:
无weight列默认weight全为1。
4. count计算方式:
相同ngram的weight进行累加;
当前ngram-count输出表与历史ngram-count输出表相同ngram和words的count进行累加;
多列共用一列weight,如ngram相同,则对应相同weight累加作为最终count;
5. 其他:
count_gram_col_name不合法的行会被过滤掉;每行会在首尾添加标识。
样例
数据输入
input_table
sentence1 |
weight |
Try your best. |
1 |
Try to do it. |
2 |
Try to finish it tomorrow. |
2 |
You can try to do it. |
2 |
1 |
|
Why not to have a try? |
1 |
vocab_table
word |
Try |
try |
to |
do |
your |
best |
best. |
it |
not |
it. |
tomorrow. |
配置流程
运行流程
参数设置
输出结果
ngram |
words |
count |
1 |
9 |
|
1 |
|
9 |
1 |
10 |
|
1 |
Try |
5 |
1 |
best. |
1 |
1 |
do |
4 |
1 |
it |
2 |
1 |
it. |
4 |
1 |
not |
1 |
1 |
to |
7 |
1 |
tomorrow. |
2 |
1 |
try |
2 |
1 |
your |
1 |
2 |
|
1 |
2 |
|
3 |
2 |
|
5 |
2 |
|
1 |
2 |
4 |
|
2 |
it |
2 |
2 |
not |
1 |
2 |
try |
2 |
2 |
Try to |
4 |
2 |
Try your |
1 |
2 |
best. |
1 |
2 |
do it. |
4 |
2 |
it tomorrow. |
2 |
2 |
it. |
4 |
2 |
not to |
1 |
2 |
to |
3 |
2 |
to do |
4 |
2 |
tomorrow. |
2 |
2 |
try to |
2 |
2 |
your best. |
1 |
3 |
|
2 |
3 |
|
1 |
3 |
|
4 |
3 |
|
1 |
3 |
|
1 |
3 |
1 |
|
3 |
try |
2 |
3 |
it tomorrow. |
2 |
3 |
not to |
1 |
3 |
try to |
2 |
3 |
Try to |
2 |
3 |
Try to do |
2 |
3 |
Try your best. |
1 |
3 |
do it. |
4 |
3 |
it tomorrow. |
2 |
3 |
not to |
1 |
3 |
to |
1 |
3 |
to it |
2 |
3 |
to do it. |
4 |
3 |
try to do |
2 |
3 |
your best. |
1 |
父主题: 文本
同意关联代理商云淘科技,购买华为云产品更优惠(QQ 78315851)
内容没看懂? 不太想学习?想快速解决? 有偿解决: 联系专家