华为云AI开发平台ModelArtsLightGBM分类_云淘科技
概述
对mmlspark python包中LightGBM分类的封装
输入
参数 |
子参数 |
参数说明 |
---|---|---|
inputs |
dataframe |
inputs为字典类型,dataframe为pyspark中的DataFrame类型对象 |
输出
spark pipeline类型的模型
参数说明
参数 |
子参数 |
参数说明 |
---|---|---|
input_features_str |
– |
输入的列名以逗号分隔组成的字符串,例如: “column_a” “column_a,column_b” |
label_col |
– |
目标列 |
classifier_label_index_col |
– |
目标列经过标签编码后的新的列名,默认为”label_index” |
classifier_feature_vector_col |
– |
算子输入的特征向量列的列名,默认为”model_features” |
prediction_index_col |
– |
算子输出的预测label对应的标签列,默认为”prediction_index” |
prediction_col |
– |
算子输出的预测label的列名,默认为”prediction” |
probability_col |
– |
算子输出的概率列的列名,默认为”probability” |
is_unbalance |
– |
数据集是否不平衡,默认为False |
timeout |
– |
超时时间,默认为1200秒 |
objective |
– |
目标函数,支持binary,multiclass,multiclassova,默认为”binary” |
max_depth |
– |
树的最大深度,默认为-1 |
num_iteration |
– |
迭代次数,默认为100 |
learning_rate |
– |
学习率,默认为0.1 |
num_leaves |
– |
叶子数目,默认为31 |
max_bin |
– |
最大分箱数,默认为255 |
bagging_fraction |
– |
bagging的比例,默认为1 |
bagging_freq |
– |
bagging的频率,默认为0 |
bagging_seed |
– |
bagging时的随机数种子,默认为3 |
early_stopping_round |
– |
提前结束迭代的轮数,默认为0 |
feature_fraction |
– |
特征的比例,默认为1.0 |
min_sum_hessian_in_leaf |
– |
一个叶子上最小hessian和。取值区间为[0, 1],默认为1e-3 |
boost_from_average |
– |
是否将初始分数调整为标签的平均值,以加快收敛速度,,默认为True |
boosting_type |
– |
提升方法的提升类型。 可选值有:gbdt、gbrt、rf、dart、goss,默认为”gbdt” |
lambda_l1 |
– |
L1正则化系数,默认为0.0 |
lambda_l2 |
– |
L2正则化系数,,默认为0.0 |
num_batches |
– |
如果大于0,在训练中将数据集分割成不同的批次,默认为0 |
parallelism |
– |
学习树时的并行方法,支持data_parallel, voting_parallel,默认为”data_parallel” |
thresholds_str |
– |
多分类时使用,表示每个类别对应的概率值预置的数组,字符串用逗号隔开 |
样例
inputs = { "dataframe": None # @input {"label":"dataframe","type":"DataFrame"} } params = { "inputs": inputs, "b_output_action": True, "outer_pipeline_stages": None, "input_features_str": "", # @param {"label":"input_features_str","type":"string","required":"false","helpTip":""} "label_col": "", # @param {"label":"label_col","type":"string","required":"true","helpTip":""} "classifier_label_index_col": "label_index", # @param {"label":"classifier_label_index_col","type":"string","required":"false","helpTip":""} "classifier_feature_vector_col": "model_features", # @param {"label":"classifier_feature_vector_col","type":"string","required":"false","helpTip":""} "prediction_index_col": "prediction_index", # @param {"label":"prediction_index_col","type":"string","required":"false","helpTip":""} "prediction_col": "prediction", # @param {"label":"prediction_col","type":"string","required":"false","helpTip":""} "probability_col": "probability", # @param {"label":"probability_col","type":"string","required":"false","helpTip":""} "is_unbalance": False, # @param {"label":"is_unbalance","type":"boolean","required":"false","helpTip":""} "timeout": 1200.0, # @param {"label":"timeout","type":"number","required":"false","helpTip":""} "objective": "binary", # @param {"label":"objective","type":"string","required":"false","helpTip":""} "max_depth": -1, # @param {"label":"max_depth","type":"integer","required":"false","range":"[-1,2147483647]","helpTip":""} "num_iteration": 100, # @param {"label":"num_iteration","type":"integer","required":"false","range":"(0,2147483647]","helpTip":""} "learning_rate": 0.1, # @param {"label":"learning_rate","type":"number","required":"false","helpTip":""} "num_leaves": 31, # @param {"label":"num_leaves","type":"integer","required":"false","range":"(0,2147483647]","helpTip":""} "max_bin": 255, # @param {"label":"max_bin","type":"integer","required":"false","range":"(0,2147483647]","helpTip":""} "bagging_fraction": 1.0, # @param {"label":"bagging_fraction","type":"number","required":"false","helpTip":""} "bagging_freq": 0, # @param {"label":"bagging_freq","type":"integer","required":"false","range":"[0,2147483647]","helpTip":""} "bagging_seed": 3, # @param {"label":"bagging_seed","type":"integer","required":"false","range":"[0,2147483647]","helpTip":""} "early_stopping_round": 0, # @param {"label":"early_stopping_round","type":"integer","required":"false","range":"[0,2147483647]","helpTip":""} "feature_fraction": 1.0, # @param {"label":"feature_fraction","type":"number","required":"false","helpTip":""} "min_sum_hessian_in_leaf": 1e-3, # @param {"label":"min_sum_hessian_in_leaf","type":"number","required":"false","helpTip":""} "boost_from_average": True, # @param {"label":"boost_from_average","type":"boolean","required":"false","helpTip":""} "boosting_type": "gbdt", # @param {"label":"boosting_type","type":"string","required":"false","helpTip":""} "lambda_l1": 0.0, # @param {"label":"lambda_l1","type":"number","required":"false","helpTip":""} "lambda_l2": 0.0, # @param {"label":"lambda_l2","type":"number","required":"false","helpTip":""} "num_batches": 0, # @param {"label":"num_batches","type":"integer","required":"false","range":"[0,2147483647]","helpTip":""} "parallelism": "data_parallel", # @param {"label":"parallelism","type":"string","required":"false","helpTip":""} "thresholds_str": "" # @param {"label":"thresholds_str","type":"string","required":"false","helpTip":""} } lightgbm_classifier____id___ = MLSLightGBMClassifier(**params) lightgbm_classifier____id___.run() # @output {"label":"pipeline_model","name":"lightgbm_classifier____id___.get_outputs()['output_port_1']","type":"PipelineModel"}
父主题: 分类
同意关联代理商云淘科技,购买华为云产品更优惠(QQ 78315851)
内容没看懂? 不太想学习?想快速解决? 有偿解决: 联系专家