Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

理解 Linux 等待队列 #7

Open
JemmyH opened this issue Jun 8, 2021 · 3 comments
Open

理解 Linux 等待队列 #7

JemmyH opened this issue Jun 8, 2021 · 3 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@JemmyH
Copy link
Owner

JemmyH commented Jun 8, 2021

进程没活干的时候让它“睡眠”,等有活干的时候再“唤醒”它。怎么找到这个已经睡着的进程让它醒来?

这一切,都是通过一个叫 等待队列 的数据结构实现的。

@JemmyH
Copy link
Owner Author

JemmyH commented Jun 9, 2021

阻塞与非阻塞

阻塞(等待队列,中断通知)非阻塞(轮询,异步通知) 是访问设备的两种方式。

阻塞 是指进程访问某个资源时,资源没有就绪,此时进程被挂起在当前资源的等待队列里,调用者表现为“睡着”了一样;当资源就绪时,再唤醒等待队列中的所有进程,进行继续执行。

非阻塞是在资源不就绪时进程不挂起,要么返回错误,要么一直轮询等待,直到资源就绪,在这个过程中,进程一直在运行而非挂起。

要知道的是,阻塞并不是 效率低,如果进程持续轮询等待,会不断浪费 CPU 资源,如果让进程挂起,就可以将 CPU 资源让给其他进程。被挂起的进程进入休眠状态,等资源就绪时,通常会伴随一个中断,然后唤醒休眠的进程继续工作。

@JemmyH
Copy link
Owner Author

JemmyH commented Jun 9, 2021

等待队列的实现

在 Linux 内核中,等待队列通过一个双向循环链表来实现。这个双向循环链表由 链表头(wait_queue_head_t)链表结点(wait_queue_entry) 组成,看下数据结构:

// tools/include/linux/types.h
// 链表中用于链接前后结点的结构
struct list_head {
	struct list_head *next, *prev;
}

// include/linux/wait.h
struct wait_queue_head {
	spinlock_t		lock;  // 自旋锁
	struct list_head	head; // 指向链表队列中的第一个和最后一个结点
};
typedef struct wait_queue_head wait_queue_head_t;  // 链表头

// 链表中的一个结点
struct wait_queue_entry {
	unsigned int		flags; // 标志,如 WQ_FLAG_EXCLUSIVE,表示等待的进程应该独占资源(解决惊群现象)
	void			*private;  // 等待进程相关信息,如 task_struct
	wait_queue_func_t	func; // 唤醒函数
	struct list_head	entry; // 前后结点
};

他们的关系可以用下面这张图描述:
image

抛开唤醒机制,单纯看循环双链表的实现,有两个方法:

init_waitqueue_head

// kernel/sched/wait.c
void __init_waitqueue_head(struct wait_queue_head *wq_head, const char *name, struct lock_class_key *key)
{
	spin_lock_init(&wq_head->lock);
	lockdep_set_class_and_name(&wq_head->lock, key, name);
	INIT_LIST_HEAD(&wq_head->head);
}

/**
 * INIT_LIST_HEAD - Initialize a list_head structure
 * @list: list_head structure to be initialized.
 *
 * Initializes the list_head to point to itself.  If it is a list header,
 * the result is an empty list.
 */
static inline void INIT_LIST_HEAD(struct list_head *list)
{
	WRITE_ONCE(list->next, list);
	list->prev = list;
}

很简单的一个函数:初始化链表头,并将指针的 prevnext 都指向自己。

add_wait_queue

void add_wait_queue(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry)
{
	...
	__add_wait_queue(wq_head, wq_entry);
	...
}

static inline void __add_wait_queue(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry)
{
	struct list_head *head = &wq_head->head;
	struct wait_queue_entry *wq;

        // 检查 flags 相关
	list_for_each_entry(wq, &wq_head->head, entry) {
		if (!(wq->flags & WQ_FLAG_PRIORITY))
			break;
		head = &wq->entry;
	}
	__list_add(&wq_entry->entry, head, head->next);
}

static inline void __list_add(struct list_head *new,
			      struct list_head *prev,
			      struct list_head *next)
{
	if (!__list_add_valid(new, prev, next))
		return;

	next->prev = new;
	new->next = next;
	new->prev = prev;
	WRITE_ONCE(prev->next, new);
}

将链表结点元素 wq_entry 通过 头插入 的方式加到 wq_head 所连接的循环链表中。

remove_wait_queue

void remove_wait_queue(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry)
{
	...
	__remove_wait_queue(wq_head, wq_entry);
	...
}

static inline void __remove_wait_queue(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry)
{
	list_del(&wq_entry->entry);
}

static inline void list_del(struct list_head *entry)
{
	__list_del_entry(entry);
	entry->next = LIST_POISON1;  // 固定值,之后会被回收
	entry->prev = LIST_POISON2;
}

static inline void list_del(struct list_head *entry)
{
	__list_del(entry->prev, entry->next);
	entry->next = LIST_POISON1;
	entry->prev = LIST_POISON2;
}

static inline void __list_del(struct list_head * prev, struct list_head * next)
{
	next->prev = prev;
	WRITE_ONCE(prev->next, next);
}

从双链表中移除一个结点,改变这个结点的 prevnext 指向的结点之间的关系,然后将要删掉的结点的指针指向一个固定值:

/*
 * These are non-NULL pointers that will result in page faults
 * under normal circumstances, used to verify that nobody uses
 * non-initialized list entries.
 */
# define POISON_POINTER_DELTA 

#define LIST_POISON1  ((void *) 0x100 + POISON_POINTER_DELTA)
#define LIST_POISON2  ((void *) 0x122 + POISON_POINTER_DELTA)

@JemmyH JemmyH self-assigned this Jun 10, 2021
@JemmyH JemmyH added the documentation Improvements or additions to documentation label Jun 10, 2021
@JemmyH
Copy link
Owner Author

JemmyH commented Jun 28, 2021

等待 与 唤醒

主要有以下相关函数:

wait_event(queue,condition);等待以queue为等待队列头等待队列被唤醒condition必须满足否则阻塞 
wait_event_interruptible(queue,condition);可被信号打断 
wait_event_timeout(queue,condition,timeout);阻塞等待的超时时间时间到了不论condition是否满足都要返回 
wait_event_interruptible_timeout(queue,condition,timeout);  有超时同时信号可被打断

我们着重关注第一个:

/**
 * wait_event - sleep until a condition gets true
 * @wq_head: the waitqueue to wait on
 * @condition: a C expression for the event to wait for
 *
 * The process is put to sleep (TASK_UNINTERRUPTIBLE) until the
 * @condition evaluates to true. The @condition is checked each time
 * the waitqueue @wq_head is woken up.
 *
 * wake_up() has to be called after changing any variable that could
 * change the result of the wait condition.
 */
#define wait_event(wq_head, condition)						\
do {										\
	might_sleep();								\
	if (condition)								\
		break;								\
	__wait_event(wq_head, condition);					\
} while (0)

我们将 __wait_event 展开:

#define ___wait_event(wq_head, condition, state, exclusive, ret, cmd)		\
({										\
	__label__ __out;							\
	struct wait_queue_entry __wq_entry;					\
	long __ret = ret;	/* explicit shadow */				\
										\
	init_wait_entry(&__wq_entry, exclusive ? WQ_FLAG_EXCLUSIVE : 0);	\
	for (;;) {								\
		long __int = prepare_to_wait_event(&wq_head, &__wq_entry, state);\
										\
		if (condition)							\
			break;							\
										\
		if (___wait_is_interruptible(state) && __int) {			\
			__ret = __int;						\
			goto __out;						\
		}								\
										\
		cmd;								\
	}									\
	finish_wait(&wq_head, &__wq_entry);					\
__out:	__ret;									\
})

看看 init_wait_entry 中干了什么:

void init_wait_entry(struct wait_queue_entry *wq_entry, int flags)
{
	wq_entry->flags = flags;
	wq_entry->private = current;
	wq_entry->func = autoremove_wake_function;
	INIT_LIST_HEAD(&wq_entry->entry);
}

int autoremove_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, int sync, void *key)
{
	int ret = default_wake_function(wq_entry, mode, sync, key);

	if (ret)
		list_del_init_careful(&wq_entry->entry);

	return ret;
}


int default_wake_function(wait_queue_entry_t *curr, unsigned mode, int wake_flags,
			  void *key)
{
	WARN_ON_ONCE(IS_ENABLED(CONFIG_SCHED_DEBUG) && wake_flags & ~WF_SYNC);
	return try_to_wake_up(curr->private, mode, wake_flags);
}

/**
 * try_to_wake_up - wake up a thread
 * @p: the thread to be awakened
 * @state: the mask of task states that can be woken
 * @wake_flags: wake modifier flags (WF_*)
 *
 * Conceptually does:
 *
 *   If (@state & @p->state) @p->state = TASK_RUNNING.
 *
 * If the task was not queued/runnable, also place it back on a runqueue.
 *
 * This function is atomic against schedule() which would dequeue the task.
 *
 * It issues a full memory barrier before accessing @p->state, see the comment
 * with set_current_state().
 *
 * Uses p->pi_lock to serialize against concurrent wake-ups.
 *
 * Relies on p->pi_lock stabilizing:
 *  - p->sched_class
 *  - p->cpus_ptr
 *  - p->sched_task_group
 * in order to do migration, see its use of select_task_rq()/set_task_cpu().
 *
 * Tries really hard to only take one task_rq(p)->lock for performance.
 * Takes rq->lock in:
 *  - ttwu_runnable()    -- old rq, unavoidable, see comment there;
 *  - ttwu_queue()       -- new rq, for enqueue of the task;
 *  - psi_ttwu_dequeue() -- much sadness :-( accounting will kill us.
 *
 * As a consequence we race really badly with just about everything. See the
 * many memory barriers and their comments for details.
 *
 * Return: %true if @p->state changes (an actual wakeup was done),
 *	   %false otherwise.
 */
static int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags);

重点在最后的 try_to_wake_up 函数,这个函数负责将进程的状态设置为 TASK_RUNNING, 然后将其放在 可执行队列 中。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant