Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GPU edge node manege docs #654

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

wbc6080
Copy link
Contributor

@wbc6080 wbc6080 commented Nov 29, 2024

  • Please check if the PR fulfills these requirements
  • The commit message follows our guidelines
  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)

Which issue(s) this PR fixes:

Fixes #

  • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

docs update

  • What is the new behavior (if this is a feature change)?

Added documentation for managing edge GPU nodes and introduce how to use GPU resources in edge applications.

@kubeedge-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign kevin-wangzefeng after the PR has been reviewed.
You can assign the PR to them by writing /assign @kevin-wangzefeng in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubeedge-bot kubeedge-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Nov 29, 2024
@tangming1996
Copy link

/lgtm

@kubeedge-bot kubeedge-bot added lgtm Indicates that a PR is ready to be merged. and removed lgtm Indicates that a PR is ready to be merged. labels Dec 2, 2024
@kubeedge-bot
Copy link
Collaborator

New changes are detected. LGTM label has been removed.

@wbc6080 wbc6080 force-pushed the add-gpu-docs branch 3 times, most recently from 47188ae to 0fa121e Compare December 5, 2024 03:11
Signed-off-by: wbc6080 <[email protected]>
Co-authored-by: ming.tang <[email protected]>
@wbc6080
Copy link
Contributor Author

wbc6080 commented Jan 14, 2025

PTAL @fisherxu @Shelley-BaoYue

Copy link
Contributor

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotta say this is very good information for user application at edge. thanks for taking care of this as documentation.

just got some minor comments, that probably good for maintenance i think.


## Abstract

With the development of edge AI, the demand for deploying GPU applications on edge nodes is gradually increasing. Currently, KubeEdge can manage GPU nodes through some configurations,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
With the development of edge AI, the demand for deploying GPU applications on edge nodes is gradually increasing. Currently, KubeEdge can manage GPU nodes through some configurations,
With the development of edge AI, it is likely that the applications at edge demand and rely on the GPU acceleration. Currently, KubeEdge can manage GPU nodes through some configurations,

## Abstract

With the development of edge AI, the demand for deploying GPU applications on edge nodes is gradually increasing. Currently, KubeEdge can manage GPU nodes through some configurations,
and allocate GPU resources to user edge applications through the k8s device-plugin component. If you need to use this feature, please refer to the steps below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and allocate GPU resources to user edge applications through the k8s device-plugin component. If you need to use this feature, please refer to the steps below.
and allocate GPU resources to user edge applications through the [k8s device-plugin](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/) component. If you need to use this feature, please refer to the steps below.


1. Install GPU driver

First you need to determine whether the edge node machine has GPU. You can use the `lspci | grep NVIDIA` command to check. Download the appropriate GPU driver according to the specific GPU model and complete the installation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it turned out that this is only for Nvidia GPUs? if that is so, probably we could change the title into How to enable Nvidia GPUs on KubeEdge or something else?


## Getting Started

### GPU running environment construction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead just point some URLs such as https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html to make sure Nvidia GPU is enabled and configured for container runtime?
This is obviously not the procedure KubeEdge provides, to avoid having the maintenance cost to catch up with Nvidia specific procedure, it would be easier for the doc maintainers?


Hosting edge GPU nodes mainly includes the following steps:

1. Manage the node to the cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here can we just point the reference to https://kubeedge.io/docs/setup/install-with-keadm/ to avoid the redundancy and possible maintenance problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants