sidebar_label | sidebar_position | slug |
---|---|---|
Fault Diagnosis and Analysis |
9 |
/fault_diagnosis_and_analysis |
JuiceFS client will output logs for troubleshooting during operation, the log levels are from low to high: DEBUG, INFO, WARNING, ERROR, FATAL, by default only logs above INFO level are output. If you need to output DEBUG level logs, you need to explicitly enable it when running the JuiceFS client, such as adding the --debug
option.
Different JuiceFS clients obtain logs in different ways, which are described below.
When the JuiceFS file system is mounted with the -d
option (indicating running in the background), the log will be output to the syslog and /var/log/juicefs.log
(requires v0.15 and above client, see --log
option). Depending on the operating system you are using, you can get the logs with different commands:
# macOS
$ syslog | grep 'juicefs'
# Debian based system
$ cat /var/log/syslog | grep 'juicefs'
# CentOS based system
$ cat /var/log/messages | grep 'juicefs'
# All system (require v0.15+ JuiceFS)
$ tail -n 100 /var/log/juicefs.log
You can use the grep
command to filter different levels of logs for performance analysis or troubleshooting:
$ cat /var/log/syslog | grep 'juicefs' | grep '<INFO>'
$ cat /var/log/syslog | grep 'juicefs' | grep '<WARNING>'
$ cat /var/log/syslog | grep 'juicefs' | grep '<ERROR>'
$ cat /var/log/syslog | grep 'juicefs' | grep '<FATAL>'
Depending on the version of the JuiceFS CSI Driver you are using, there will be different ways to obtain logs. For details, please refer to CSI Driver documentation.
The S3 gateway only supports running in the foreground, so client logs are output directly to the terminal. If you are deploying the S3 gateway in Kubernetes, you need to view the logs for the corresponding pod.
The logs of the application process (such as Spark executor) using the JuiceFS Hadoop Java SDK will include the JuiceFS client logs, because they are mixed with the logs generated by the application itself, and need to be filtered by specific keywords (such as juicefs
, pay attention here case is ignored).
Each JuiceFS client has an access log that details all operations on the file system, such as operation type, user ID, group ID, file inodes and how long this operation took. Access logs can be used for various purposes such as performance analysis, auditing, troubleshooting.
An example format of an access log is as follows:
2021.01.15 08:26:11.003330 [uid:0,gid:0,pid:4403] write (17669,8666,4993160): OK <0.000010>
The meaning of each column is:
2021.01.15 08:26:11.003330
: The time of the current operation[uid:0,gid:0,pid:4403]
: User ID, group ID, process ID of the current operationwrite
: Operation type(17669,8666,4993160)
: The input parameters of the current operation type. For example, the input parameters of thewrite
operation in the example are the inode of the written file, the size of the written data, and the offset of the written file. Different operation types have different parameters. For details, please refer to thevfs.go
file.OK
: Whether the current operation is successful or not, if it is unsuccessful, specific failure information will be output.<0.000010>
: The time (in seconds) that the current operation took
You can debug and analyze performance issues with access log, or try juicefs profile <mount-point>
to see real-time statistics. Run juicefs profile -h
or refer to here to learn more about this subcommand.
Different JuiceFS clients obtain access log in different ways, which are described below.
There is a virtual file named .accesslog
in the root directory of the JuiceFS file system mount point, the contents of which can be viewed by the cat
command (the command will not exit), for example (assuming the root directory of the mount point is /jfs
):
$ cat /jfs/.accesslog
2021.01.15 08:26:11.003330 [uid:0,gid:0,pid:4403] write (17669,8666,4993160): OK <0.000010>
2021.01.15 08:26:11.003473 [uid:0,gid:0,pid:4403] write (17675,198,997439): OK <0.000014>
2021.01.15 08:26:11.003616 [uid:0,gid:0,pid:4403] write (17666,390,951582): OK <0.000006>
Please refer to CSI Driver documentation and according to the version of JuiceFS CSI Driver you are using to find the mount pod or CSI driver pod, just view the .accesslog
file in the root directory of the JuiceFS file system mount point in the pod. The mount point path in the pod is /jfs/<pv_volumeHandle>
, assuming the name of the mount pod is juicefs-1.2.3.4-pvc-d4b8fb4f-2c0b-48e8-a2dc-530799435373
, <pv_volumeHandle>
is pvc-d4b8fb4f-2c0b-48e8-a2dc-530799435373
, you can use the following command to view:
kubectl -n kube-system exec juicefs-chaos-k8s-002-pvc-d4b8fb4f-2c0b-48e8-a2dc-530799435373 -- cat /jfs/pvc-d4b8fb4f-2c0b-48e8-a2dc-530799435373/.accesslog
You need to add the --access-log
option when starting the S3 gateway to specify the path to output the access log. By default, the S3 gateway does not output the access log.
You need to add the juicefs.access-log
configuration item in the client configurations of the JuiceFS Hadoop Java SDK to specify the path of the access log output, and the access log is not output by default.
By default, JuiceFS clients will listen to a TCP port locally via pprof to get runtime information such as Goroutine stack information, CPU performance statistics, memory allocation statistics. You can see the specific port number that the current JuiceFS client is listening on by using the system command (e.g. lsof
):
:::note
If JuiceFS is mounted via the root user, then you need to add sudo
before the lsof
command.
:::
$ lsof -i -nP | grep LISTEN | grep juicefs
juicefs 32666 user 8u IPv4 0x44992f0610d9870b 0t0 TCP 127.0.0.1:6061 (LISTEN)
juicefs 32666 user 9u IPv4 0x44992f0619bf91cb 0t0 TCP 127.0.0.1:6071 (LISTEN)
juicefs 32666 user 15u IPv4 0x44992f062886fc5b 0t0 TCP 127.0.0.1:9567 (LISTEN)
By default, pprof listens on port numbers starting at 6060 and ending at 6099, so the actual port number in the above example is 6061. Once you get the listening port number, you can view all the available runtime information at http://localhost:<port>/debug/pprof
, and some important runtime information is as follows:
- Goroutine stack information:
http://localhost:<port>/debug/pprof/goroutine?debug=1
- CPU performance statistics:
http://localhost:<port>/debug/pprof/profile?seconds=30
- Memory allocation statistics:
http://localhost:<port>/debug/pprof/heap
To make it easier to analyze this runtime information, you can save it locally, e.g.:
$ curl 'http://localhost:<port>/debug/pprof/goroutine?debug=1' > juicefs.goroutine.txt
$ curl 'http://localhost:<port>/debug/pprof/profile?seconds=30' > juicefs.cpu.pb.gz
$ curl 'http://localhost:<port>/debug/pprof/heap' > juicefs.heap.pb.gz
If you have the go
command installed, you can analyze it directly with the go tool pprof
command, for example to analyze CPU performance statistics:
$ go tool pprof 'http://localhost:<port>/debug/pprof/profile?seconds=30'
Fetching profile over HTTP from http://localhost:<port>/debug/pprof/profile?seconds=30
Saved profile in /Users/xxx/pprof/pprof.samples.cpu.001.pb.gz
Type: cpu
Time: Dec 17, 2021 at 1:41pm (CST)
Duration: 30.12s, Total samples = 32.06s (106.42%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 30.57s, 95.35% of 32.06s total
Dropped 285 nodes (cum <= 0.16s)
Showing top 10 nodes out of 192
flat flat% sum% cum cum%
14.73s 45.95% 45.95% 14.74s 45.98% runtime.cgocall
7.39s 23.05% 69.00% 7.41s 23.11% syscall.syscall
2.92s 9.11% 78.10% 2.92s 9.11% runtime.pthread_cond_wait
2.35s 7.33% 85.43% 2.35s 7.33% runtime.pthread_cond_signal
1.13s 3.52% 88.96% 1.14s 3.56% runtime.nanotime1
0.77s 2.40% 91.36% 0.77s 2.40% syscall.Syscall
0.49s 1.53% 92.89% 0.49s 1.53% runtime.memmove
0.31s 0.97% 93.86% 0.31s 0.97% runtime.kevent
0.27s 0.84% 94.70% 0.27s 0.84% runtime.usleep
0.21s 0.66% 95.35% 0.21s 0.66% runtime.madvise
Runtime information can also be exported to visual charts for a more intuitive analysis. The visual charts support exporting to various formats such as HTML, PDF, SVG, PNG, etc. For example, the command to export memory allocation statistics as a PDF file is as follows:
:::note The export to visual chart function relies on Graphviz, so please install it first. :::
$ go tool pprof -pdf 'http://localhost:<port>/debug/pprof/heap' > juicefs.heap.pdf
For more information about pprof, please see the official documentation.
Pyroscope is an open source continuous profiling platform. It will help you:
- Find performance issues and bottlenecks in your code
- Resolve issues with high CPU utilization
- Understand the call tree of your application
- Track changes over time
JuiceFS supports using the --pyroscope
option to pass in the pyroscope server address, and metrics are pushed to the server every 10 seconds. If permission verification is enabled on the server, the verification information API Key can be passed in through the environment variable PYROSCOPE_AUTH_TOKEN
:
$ export PYROSCOPE_AUTH_TOKEN=xxxxxxxxxxxxxxxx
$ juicefs mount --pyroscope http://localhost:4040 redis://localhost /mnt/jfs
$ juicefs dump --pyroscope http://localhost:4040 redis://localhost dump.json