概要

TensorFlow TransformとApache Beamを使ったKubeflowコンポーネントの検証用リポジトリである。

注意

基本はpipenvによるパッケージのバージョン管理を行う。パイプラインの起動時にpip freeze > requrements.txtを行ってからパイプライン、コンポーネントを起動する。

Kubeflowからの起動

TBD

memo
DataflowPythonJobOpは、GCSにDataflowのPythonコードとrequirements.txtファイルを置いておくことが前提になっている。
そのため実際には、コードが複数ファイルにまたがる場合には利用できず、setup.pyを利用する場合にも使うことができないように思われる。
対応方法は以下になる。

Dataflowのコンポーネントを独自に作成する。（他のコンポーネントと同様にコンテナ化）
kfpからはコンテナを指定して使うようにする。
gcp_resourcesを利用する。（パイプラインキャンセル時にDataflowもキャンセルさせるため）

Dataflowコンポーネントの起動方法

componentsの中にはApache Beamで書かれたコードが存在する。これらを直接起動するコマンドを記載する。 #TODO それぞれコンテナ化し、コンポーネントとして利用できるようにする。

wordcount

プロジェクト名、バケット名は適宜変更すること

python components/wordcount/wc.py --output gs://ca-pubtex-ai-verification-dataflow/output_wordcount --runner DataflowRunner --project ca-pubtex-ai-verification --region us-central1 --staging_location gs://ca-pubtex-ai-verification-dataflow/staging --temp_location gs://ca-pubtex-ai-verification-dataflow/tmp --job_name=wordcount-job --requirements_file requirements.txt

transform-tft

プロジェクト名、バケット名は適宜変更すること

Dataflowのコードをローカルから起動する場合

python components/transform-tft/simple_sample.py --output_dir gs://ca-pubtex-ai-verification-dataflow/output_tft --runner DataflowRunner --project ca-pubtex-ai-verification --region us-central1 --staging_location gs://ca-pubtex-ai-verification-dataflow/staging --temp_location gs://ca-pubtex-ai-verification-dataflow/tmp --setup_file components/transform-tft/setup.py

kubeflowから呼び出すコンポーネントコードから、Dataflowのコードを起動する場合

pipenv run python src/run.py --project ca-pubtex-ai-verification --region us-central1 --temp_location gs://ca-pubtex-ai-verification-dataflow/tmp --setup_file ./setup.py --output_dir gs://ca-pubtex-ai-verification-dataflow/output_tft --gcp_resources /Path/to/gcp_resource.txt --output_dir_path /Path/to/tmp.txt

src/run.pyがコンポーネントとして起動する場合のエントリポイントである。（component.yamlを参照） --gcp_resourcesと--output_dir_pathはコンポーネントのoutputsであるため、outputsを書き込むためのファイルを事前に用意すること。

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
components		components
pipelines		pipelines
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

概要

注意

Kubeflowからの起動

Dataflowコンポーネントの起動方法

wordcount

transform-tft

About

Releases

Packages

Contributors 2

Languages

tasuku-ito/kubeflow-tft-sample

Folders and files

Latest commit

History

Repository files navigation

概要

注意

Kubeflowからの起動

Dataflowコンポーネントの起動方法

wordcount

transform-tft

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages