华为云AI开发平台ModelArts提交DLI Spark作业_云淘科技

AI开发平台ModelArts

5 月 05, 2023

112 0

执行ma-cli dli-job submit命令提交DLI Spark作业。

ma-cli dli-job submit命令需要指定一个位置参数YAML_FILE表示作业的配置文件路径，如果不指定该参数，则表示配置文件为空。配置文件是一个YAML格式的文件，里面的参数就是命令的option参数。此外，如果用户在命令行中同时指定YAML_FILE配置文件和option参数，命令行中指定的option参数的值将会覆盖配置文件相同的值。

命令参数预览

ma-cli dli-job submit -h
Usage: ma-cli dli-job submit [OPTIONS] [YAML_FILE]...

  Submit DLI Spark job.

  Example:

  ma-cli dli-job submit  --name test-spark-from-sdk
                          --file test/sub_dli_task.py
                          --obs-bucket dli-bucket
                          --queue dli_test
                          --spark-version 2.4.5
                          --driver-cores 1
                          --driver-memory 1G
                          --executor-cores 1
                          --executor-memory 1G
                          --num-executors 1

Options:
  --file TEXT                    Python file or app jar.
  -cn, --class-name TEXT         Your application's main class (for Java / Scala apps).
  --name TEXT                    Job name.
  --image TEXT                   Full swr custom image path.
  --queue TEXT                   Execute queue name.
  -obs, --obs-bucket TEXT        DLI obs bucket to save logs.
  -sv, --spark-version TEXT      Spark version.
  -st, --sc-type [A|B|C]         Compute resource type.
  --feature [basic|custom|ai]    Type of the Spark image used by a job (default: basic).
  -ec, --executor-cores INTEGER  Executor cores.
  -em, --executor-memory TEXT    Executor memory (eg. 2G/2048MB).
  -ne, --num-executors INTEGER   Executor number.
  -dc, --driver-cores INTEGER    Driver cores.
  -dm, --driver-memory TEXT      Driver memory (eg. 2G/2048MB).
  --conf TEXT                    Arbitrary Spark configuration property (eg. ).
  --resources TEXT               Resources package path.
  --files TEXT                   Files to be placed in the working directory of each executor.
  --jars TEXT                    Jars to include on the driver and executor class paths.
  -pf, --py-files TEXT           Python files to place on the PYTHONPATH for Python apps.
  --groups TEXT                  User group resources.
  --args TEXT                    Spark batch job parameter args.
  -q, --quiet                    Exit without waiting after submit successfully.
  -C, --config-file PATH         Configure file path for authorization.
  -D, --debug                    Debug Mode. Shows full stack trace when error occurs.
  -P, --profile TEXT             CLI connection profile to use. The default profile is "DEFAULT".
  -H, -h, --help                 Show this message and exit.

yaml文件预览

# dli-demo.yaml
name: test-spark-from-sdk
file: test/sub_dli_task.py
obs-bucket: ${your_bucket}
queue: dli_notebook 
spark-version: 2.4.5
driver-cores: 1
driver-memory: 1G
executor-cores: 1
executor-memory: 1G
num-executors: 1

## [Optional] 
jars:
  - ./test.jar
  - obs://your-bucket/jars/test.jar
  - your_group/test.jar

## [Optional] 
files:
  - ./test.csv
  - obs://your-bucket/files/test.csv
  - your_group/test.csv

## [Optional] 
python-files:
  - ./test.py
  - obs://your-bucket/files/test.py
  - your_group/test.py

## [Optional] 
resources:
  - name: your_group/test.py
    type: pyFile
  - name: your_group/test.csv
    type: file
  - name: your_group/test.jar
    type: jar
  - name: ./test.py
    type: pyFile
  - name: obs://your-bucket/files/test.py
    type: pyFile

## [Optional]
groups:
  - group1
  - group2

指定options参数提交DLI Spark作业示例：

$ ma-cli dli-job submit --name test-spark-from-sdk \
                        --file test/sub_dli_task.py \
                        --obs-bucket ${your_bucket} \
                        --queue dli_test \
                        --spark-version 2.4.5 \
                        --driver-cores 1 \
                        --driver-memory 1G \
                        --executor-cores 1 \
                        --executor-memory 1G \
                        --num-executors 1

表1 参数说明
参数名	参数类型	是否必选	参数说明
YAML_FILE	String，本地文件路径	否	DLI Spark作业的配置文件，若不传则表示配置文件为空。
–file	String	是	程序运行入口文件，支持本地文件路径、OBS路径或者用户已上传到DLI资源管理系统的类型为jar或pyFile的程序包名。
-cn / –class_name	String	是	批处理作业的Java/Spark主类。
–name	String	否	创建时用户指定的作业名称，不能超过128个字符。
–image	String	否	自定义镜像路径,格式为：组织名/镜像名:镜像版本。当用户设置“feature”为“custom”时，该参数生效。用户可通过与“feature”参数配合使用，指定作业运行使用自定义的Spark镜像。
-obs / –obs-bucket	String	否	保存Spark作业的obs桶，需要保存作业时配置该参数。同时也可作为提交本地文件到resource的中转站。
-sv/ –spark-version	String	否	作业使用Spark组件的版本号。
-st / `–sc-type	String	否	如果当前Spark组件版本为2.3.2，则不填写该参数。如果当前Spark组件版本为2.3.3，则在“feature”为“basic”或“ai”时填写。若不填写，则使用默认的Spark组件版本号2.3.2。
–feature	String	否	作业特性。表示用户作业使用的Spark镜像类型，默认值为basic。 basic：表示使用DLI提供的基础Spark镜像。 custom：表示使用用户自定义的Spark镜像。 ai：表示使用DLI提供的AI镜像。
–queue	String	否	用于指定队列，填写已创建DLI的队列名。必须为通用类型的队列。队列名称的获取请参考表1。
-ec / –executor-cores	String	否	Spark应用每个Executor的CPU核数。该配置项会替换sc_type中对应的默认参数。
-em / –executor-memory	String	否	Spark应用的Executor内存，参数配置例如2G, 2048M。该配置项会替换“sc_type”中对应的默认参数，使用时必需带单位，否则会启动失败。
-ne / –num-executors	String	否	Spark应用Executor的个数。该配置项会替换sc_type中对应的默认参数。
-dc / –driver-cores	String	否	Spark应用Driver的CPU核数。该配置项会替换sc_type中对应的默认参数。
-dm / –driver-memory	String	否	Spark应用的Driver内存，参数配置例如2G, 2048M。该配置项会替换“sc_type”中对应的默认参数，使用时必需带单位，否则会启动失败。
–conf	Array of String	否	batch配置项，参考Spark Configuration。如果需要指定多个参数，可以使用–conf conf1 –conf conf2。
–resources	Array of String	否	资源包名称。支持本地文件，OBS路径及用户已上传到DLI资源管理系统的文件。如果需要指定多个参数，可以使用–resources resource1 –resources resource2。
–files	Array of String	否	用户已上传到DLI资源管理系统的类型为file的资源包名。也支持指定OBS路径，例如：obs://桶名/包名。同时也支持本地文件。如果需要指定多个参数，可以使用–files file1 –files file2。
–jars	Array of String	否	用户已上传到DLI资源管理系统的类型为jar的程序包名。也支持指定OBS路径，例如：obs://桶名/包名。也支持本地文件。如果需要指定多个参数，可以使用–jars jar1 –jars jar2。
-pf /–python-files	Array of String	否	用户已上传到DLI资源管理系统的类型为pyFile的资源包名。也支持指定OBS路径，例如：obs://桶名/包名。也支持本地文件。如果需要指定多个参数，可以使用–python-files py1 –python-files py2。
–groups	Array of String	否	资源分组名称，如果需要指定多个参数，可以使用–groups group1 –groups group2。
–args	Array of String	否	传入主类的参数，即应用程序参数。如果需要指定多个参数，可以使用–args arg1 –args arg2。
-q / –quiet	Bool	否	提交DLI Spark作业成功后直接退出，不再同步打印任务状态。

示例

通过YAML_FILE文件提交DLI Spark作业。

$ma-cli dli-job submit dli_job.yaml

指定命令行options参数提交DLI Spark作业。

$ma-cli dli-job submit --name test-spark-from-sdk \
>                         --file test/jumpstart-trainingjob-gallery-pytorch-sample.ipynb \
>                         --queue dli_ma_notebook \
>                         --spark-version 2.4.5 \
>                         --driver-cores 1 \
>                         --driver-memory 1G \
>                         --executor-cores 1 \
>                         --executor-memory 1G \
>                         --num-executors 1

父主题： 使用ma-cli dli-job命令提交DLI Spark作业

同意关联代理商云淘科技，购买华为云产品更优惠（QQ 78315851）

内容没看懂？不太想学习？想快速解决？有偿解决：联系专家

华为云AI开发平台ModelArts提交DLI Spark作业_云淘科技

示例

分类

近期文章

近期评论

友情链接

分类目录

示例

相关文章

分类

近期文章

近期评论

友情链接

分类目录