ssh {UserName}@{IP_address}
then give the password
Sometimes an unreleased training process should be killed manually
ps aux | grep {keyword}
here we give train
as key word to find out the training process.
Get the Process ID in the list, then kill it by ID:
kill {ProcessID}
For process in GPU, use
nvidia-smi | grep 'python'
to find a process with the keyword 'python'
, and to kill, use
sudo kill -9 ${PID}
Usually there is no more graphical UI available to access files & folders when you work with workstation via your local terminal. All should be execuetd with linux commands. There're some utils that can make your work easier and more efficient:
func | app |
---|---|
Multi window; Hanging on session | tmux |
Editing | vim |
Copy / move files around | linux_cmd |
Dev env | Docker |
View hardware usage | htop |
- Remote-SSH
- Dev Containers
watch -n 0.1 nvidia-smi
the nvidia refreshes every 0.1s
in docker, run:
export CUDA_VISIBLE_DEVICES=1
tensorboard --logdir=/path/to/logs/dir/
Not that the DIR_PATH of logfile should be given, NOT the path of the logfile
e.g.
tensorboard --logdir=/data_hdd/jun/DockerMount/OpenPCDet_docker/output/kitti_models/pv_rcnn_relation_car_class_only/train-CarClass-k16-IP_mlp/20240310-135445/tensorboard/
where containing the logfile events.out.tfevents.1710078885.e8c0683eb2e1
then get the return message:
TensorFlow installation not found - running with reduced feature set.
NOTE: Using experimental fast data loading logic. To disable, pass
"--load_fast=false" and report issues on GitHub. More details:
https://github.com/tensorflow/tensorboard/issues/4784
I0311 12:00:01.281496 140462065710848 plugin.py:429] Monitor runs begin
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.10.1 at http://localhost:6007/ (Press CTRL+C to quit)
Here from the last line we know its port in SSH-remote is 6007
ssh -L 6007:localhost:6007 ${username}@${remote_host}
launch a web browser in local host, give the page link
http://localhost:6007
Then we get the tensorboard showed in local, which generated in a remote host.