-
Notifications
You must be signed in to change notification settings - Fork 5
Page fault test microbenchmark
License
gormanm/pft
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
The "overall operation" section is way out of date, sorry. Especially since restoration of multi-process version. Update is on the TODO list. Version history below is more or less up to date. "pft" : Page Fault Test Originally by Chirstoph Lameter. Modified by Lee Schermerhorn [v0.2+] to measure relative overhead of memory policy changes: reference counting, ... See "usage" in program source for command line options. Overall operation: 1) parse options, calculate memory sizes, create "communications" area in shared memory. 2) fork a child to run test "launch()" function. Originally, we did this so we could use RUSAGE_CHILDREN to obtain accumulated statistics -- faults, ... However, this included cpu time and faults from other than the test loops that we're interested in. We've changed how we capture the wall clock time and the rusage, but we still launch() the test workers [threads or processes] from a child process. Wait for the child to exit. 3) launch() function [in child]: a) creates test region of specified size and type [anon vs shm]; optionally "mbind()s" test region to install a vma/shared mempolicy. However, launch() does NOT touch the region, so as not to fault in any pages. Note: if supported by kernel [requires patch] and enabled at build time [e.g., NOCLEAR=-DUSE_NOCLEAR on make command line], '-Z' option will cause launch() to mark region as "noclear", eliminating the page zeroing overhead from the test. b) sets launch() process' scheduler policy to SCHED_FIFO and it's priority to 1 greater than will be used for worker threads/processes. c) creates/starts specified number of threads to run the "test()" function and "waits" for all threads to become "READY". Threads runs SCHED_FIFO on priority below the launch() thread. The objective is to allow a worker to run on each cpu. The launch() process/thread runs as "worker 0". d) gives all threads the "go ahead"; then run the test loop as worker 0. e) When the test loop completes, loops waiting waiting to reap al workers As each worker terminates, select the start time [and rusage] from the worker that started earliest, and the end time [and rusage] from the worker that finished last. Note: for kernels that support RUSAGE_THREAD, pft_mpol uses that to fetch each worker's rusage and sums the faults and cpu times for all workers. f) returns/exits 4) test threads: a) wait for "go ahead" from launch(). b) snap thread start time and start rusage into this thread's thread_info. c) The measured test loop: either bzero() the thread's respective test region or touch the specified number of cachelines per page therein. d) snap the thread's end time and end rusage into this thread's thread_info. e) sleep for specified delay -- default 2 sec. f) indicates 'DONE in the per thread info area. g) exits. 5) main program returns from wait() when child/launch() exits, and computes/emits results. ---------------------------- Helper scripts: See Scripts/README ------------------------------- Version history: 0.01 - first version of pft_mpol for testing mempolicy fault/allocation overhead. 0.02 - added support for agr [xmgrace] "tag" 0.02a - enhancement to helper scripts to support multiple runs per thread count and added usage string to pft_mpol. General "cleanup". 0.03 - use mmap(MAP_ANONYMOUS) for anon test memory and parent/child comm area. Added pft_mmap(), valloc_{private|shared}(), ... for this purpose. Add support for "affinitizing" pft to cpus in hopes of obtaining more repeatable results. Factor out test memory allocation and thread creation from pft results: snapshot launch() rusage just before giving threads the "go ahead", subtract this usage from final RUSAGE_CHILDREN. Added helper functions to compute elapses, cpu times as double to "improve readability" of final results reporting. 0.04 - In a further attempt to obtain repeatable and accurate results, changed to capture the end rusage in each of the test threads themselves--just after the test loops. Snap the start time and start rusage of each thread into thread_info after getting the "go ahead" from the parent, before the page fault test loop. Snap the end time and rusage after the test loop. In the parent, as threads join, select the start time and start rusage from the thread with the earlies start time, and the end time and end rusage from the thread with the latest end wall clock time. Compute difference between selected end and start rusage for determining the cpu time used and the number of faults in the test loop 0.05 - Add option to set scheduler policy of test threads, including the launch thread to SCHED_FIFO. Launch thread will run at one RT (SCHED_FIFO) priority higher to maintain control while starting threads, if nr_threads > nr_online_cpus. N.B., This is fragile. And, SCHED_FIFO seems to introduce a wall clock time delay into the tests. Under investigation. Run "thread 0" test in launch() thread, after giving other threads the go-ahead. 0.06 - Add '-L' [SHM_LOCK] and '-l' [mlock] options to test fault rate for noreclaim/mlock patches. ['-L' also in 0.05?] 0.07 - slight mods to emitted TAG and wrapper scripts to ease parsing for new pft_plot script. 0.08 - start adding back multi-process version to avoid mmap_sem on high cpu counts. Use RUSAGE_THREAD if available. 0.09 - added support for "cpus_allowed" constraint on where tests run. pft will read /proc/self/status, if possible, and extract Cpus_allowed. If the numbers of cpus allowed is < the number on-line, pft will use the allowed cpus. Otherwise, it will use cpus 0..nr_cpus_online-1. Allows taskset, numactl or cpuset constraint on pft cpu usage. 0.10 - added support for '-Z' flag to request kernel NOT to clear pages on allocations, as clear_page() tends to dominate the profiles and time to fault in a new page. This will, one hopes, give us better visibility into the behavior of the allocation path, separate from clear_page() which should be fairly constant release-to-release for a given platform. This feature requires a kernel patch. See ./Kernel/* Uses a mbind() flag--MPOL_MF_NOCLEAR--to request no clearing of a range of memory. Actually just sets "noclear" for the vma that intersects the mbind() 'start' address, if any. 0.11 - use 'cpus_allowed' handling from aim/multitask. Newer version. + option to use /dev/zero mapping instead of MAP_ANONYMOUS. add '-M' [multimap] support -- mmap() separate anon regions for each test to eliminate anon_vma sharing [WORK IN PROGRESS -- i.e, not quite working] 0.12 - rework cpus_allowed to use more libnuma support instead of parsing /proc/<pid>/status N.B., uses 'numa_num_task_cpus()' which is broken in libnuma-2.0.3. Requires 2.0.4 or patched libnuma. See Libnuma/* tweak various scripts: pft_plot.py [no legend option], pft_per_node, pft-task_thread => pft_task_thread.
About
Page fault test microbenchmark
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published