Kernel 的魔法 —— container_of

注意：这篇文章上次更新于523天前，文章内容可能已经过时。

最近在看 Linux 中双向链表的实现，发现大部分代码都是能看懂的，和上学时学的内容基本一致。

什么头插法，尾插法啊乱七八糟的。

但我注意到有一点似乎不一样，在学校时，我们学的链表包括数据域和指针域，但在 Linux 中双向链表的节点是这样定义的：

struct list_head {
	struct list_head *next, *prev;
};

没有数据域

没有数据域的链表该怎么使用呢？

首先，在我过往的记忆中，链表的指针是为了找到下一个节点的地址，从而能够访问到下一个节点的数据。

就像是这样：

struct node{
    int id;
    void *data;
    struct node *next;
    struct node *prev;
};

画图的话就是下面这种状态。

这种方法可以方便地访问下一个节点的数据，像是这样node->next->id

但缺点就是，针对不同类型的数据，都要创建不同类型的节点。

那么 Linux 中是如何使用双向链表的呢？

我在高通 WLAN 驱动程序的代码中找到了 list.h 的使用示例

https://github.com/WANG-Guangxin/wlan-dirver/blob/master/qca-wifi-host-cmn/scheduler/inc/scheduler_api.h#L85

/**
 * struct scheduler_msg: scheduler message structure
 * @type: message type
 * @reserved: reserved field
 * @bodyval: message body val
 * @bodyptr: message body pointer based on the type either a bodyptr pointer
 *     into memory or bodyval as a 32 bit data is used. bodyptr is always a
 *     freeable pointer, one should always make sure that bodyptr is always
 *     freeable.
 * Messages should use either bodyptr or bodyval; not both !!!
 * @callback: callback to be called by scheduler thread once message is posted
 *   and scheduler thread has started processing the message.
 * @flush_callback: flush callback which will be invoked during driver unload
 *   such that component can release the ref count of common global objects
 *   like PSOC, PDEV, VDEV and PEER. A component needs to populate flush
 *   callback in message body pointer for those messages which have taken ref
 *   count for above mentioned common objects.
 * @node: list node for queue membership
 * @queue_id: Id of the queue the message was added to
 * @queue_depth: depth of the queue when the message was queued
 * @queued_at_us: timestamp when the message was queued in microseconds
 */
struct scheduler_msg {
	uint16_t type;
	uint16_t reserved;
	uint32_t bodyval;
	void *bodyptr;
	scheduler_msg_process_fn_t callback;
	scheduler_msg_process_fn_t flush_callback;
	qdf_list_node_t node; // here
#ifdef WLAN_SCHED_HISTORY_SIZE
	QDF_MODULE_ID queue_id;
	uint32_t queue_depth;
	uint64_t queued_at_us;
#endif /* WLAN_SCHED_HISTORY_SIZE */
};

scheduler_msg 这个结构体中有许多乱七八糟的数据，像是 type reserved 等等

其中一个成员叫做 node , 类型是 qdf_list_node_t, 看名字也可以猜出一半，它就是链表的节点类型。

我们追溯一下它到底是什么类型：

typedef __qdf_list_node_t qdf_list_node_t;

typedef struct list_head __qdf_list_node_t;

所以，qdf_list_node_t 就是 Linux 中的 list_head

于是，我们可以看到在 WLAN 驱动代码中它是这样使用 Linux 中的双向链表的

在自己的 msg 结构体中有一个成员是 list_node 类型，内存布局应该向下面这样

如果是这样的布局，那就引发一个新的问题

既然指针串起来的是 list_node 类型，而不是 msg 类型，那么即使通过 next 指针找到了下一个节点，找到的也是下一个msg 的 list_node ，而不是下一个 msg 啊，该怎么访问 msg 的数据呢？

先从代码中看一下我画的 msg 内存布局对不对，于是找到了 scheduler_mq_put 函数，顾名思义，这是将 msg 放到 msg queue 中的函数，所谓的 msg queue 其实就是双向链表。

https://github.com/WANG-Guangxin/wlan-dirver/blob/master/qca-wifi-host-cmn/scheduler/src/scheduler_core.c#L251

void scheduler_mq_put(struct scheduler_mq_type *msg_q,
		      struct scheduler_msg *msg)
{
	qdf_spin_lock_irqsave(&msg_q->mq_lock);
	sched_history_queue(msg_q, msg);
	qdf_list_insert_back(&msg_q->mq_list, &msg->node);
	qdf_spin_unlock_irqrestore(&msg_q->mq_lock);
}

可以看到 qdf_list_insert_back 的第二个参数就是 msg 的链表节点成员，也就是说被串起来的就是 node，而不是 msg 本身。

这种做法肯定是反数据结构教科书的，至少我没在书本上见到这样使用链表的。

于是如何根据 msg 的 node 成员访问到 msg 的其他数据成员就成了问题，我当然知道它们内存上是连续的，但是靠强制移动指针然后转换成其他成员的类型也太不优雅了。

我想知道 Linux 的方案，又想到刚刚那个函数名字叫做 put, 那我找找 get 是如何实现的就好了。

https://github.com/WANG-Guangxin/wlan-dirver/blob/master/qca-wifi-host-cmn/scheduler/src/scheduler_core.c#L269

struct scheduler_msg *scheduler_mq_get(struct scheduler_mq_type *msg_q)
{
	QDF_STATUS status;
	qdf_list_node_t *node;

	qdf_spin_lock_irqsave(&msg_q->mq_lock);
	status = qdf_list_remove_front(&msg_q->mq_list, &node);
	qdf_spin_unlock_irqrestore(&msg_q->mq_lock);

	if (QDF_IS_STATUS_ERROR(status))
		return NULL;

	return qdf_container_of(node, struct scheduler_msg, node);
}

首先，看返回值类型，是 scheduler_msg 没错。

关键代码 qdf_list_remove_front(&msg_q->mq_list, &node); 删除链表的头部，然后把被删除的节点地址给 node 变量，然后就 return qdf_container_of(node, struct scheduler_msg, node); 了。

也就是说 qdf_container_of 的返回值就是 scheduler_msg

qdf_container_of 实现了入参是结构体成员的类型，返回值是结构体类型的指针。

既然知道了这个函数的功能，我们可以大概推测一下这个函数是如何实现的。

已知一个结构体成员变量的地址，获得这个结构体首地址的必要条件是什么？

这个结构体类型的内存结构，也就是要知道这个结构体的类型

既然知道了这个成员变量的首地址，如果知道它是哪个成员，就能够根据偏移量推算出结构体类型的首地址。

以上两个必要条件就是 qdf_container_of 的第二个和第三个参数。

在 WLAN 驱动的代码中，它最终的定义是这样的。

https://github.com/WANG-Guangxin/wlan-dirver/blob/master/qcacld-3.0/core/dp/htt/htt_h2t.c#L56

#define container_of(ptr, type, member) \
	((type *)((char *)(ptr) - (char *)(&((type *)0)->member)))

第一个参数：成员变量的指针

第二个参数：结构体类型

第三个参数：成员变量的名字

qdf_container_of(node, struct scheduler_msg, node) 这里第一个 node 是指针，第三个 node 表示名字。

那 container_of 做了些什么呢？

让我们有请 GPT 来给出专业解答吧：

这是一个在C语言中用于获得结构体中某个成员所在的结构体对象的宏。这个宏的功能正如它的名字"container_of"所表达的含义，其主要功能是找到某个成员（member）所在的容器（即包含它的结构体）。

在宏的参数中：

ptr 是指向目标成员的指针。

type 是包含这个成员的结构体类型。

member 是目标成员的名字。

该宏主要利用的是C语言中，对从0地址取得对应元素偏移量，并且利用这个偏移量从而找到这个元素实际可能存放的结构体的起始地址。

具体细解如下：

&((type *)0)->member：这一部分是做了一些计算得到了目标成员在结构体中的偏移量。首先，(type *)0把0转换成了一个类型为type的空指针。然后它去寻找member的地址，因为这个指针的基地址为0，所以这个地址就是member在结构体中的偏移量。

(char *)(ptr) - (char *)(&((type *)0)->member) 这个步骤算出了目标成员的地址 ptr 里面的数据减去该成员的偏移量，得到的其实就是包含此成员的结构体的起始地址。因为在C语言中的内存布局里，一个结构体中的成员是按照他们在结构体定义中的顺序一次存放在内存中的。

((type *)((char *)(ptr) - (char *)(&((type *)0)->member))) 这一步事将计算得来的包含此成员的结构体的起始地址强制转换为type类型的指针。

因此，整个宏的作用就是找到并返回给定成员所在的结构体的地址。

最后，

其实 list.h 中的 container_of 和高通这个定义在形式上还有点区别，但本质都是一样的。

#define container_of(ptr, type, member) ({                      \
	const typeof( ((type *)0)->member ) *__mptr = (ptr);    \
	(type *)( (char *)__mptr - offsetof(type,member) );})