|
| 1 | +# CVE-2023-4015 |
| 2 | + |
| 3 | +This documentation briefly describe the exploit. For more technical details, please look at the exploit source code. |
| 4 | + |
| 5 | +In order to trigger the vulnerability, `CAP_NET_ADMIN` is required. We can use a namespace sandbox in order to achieve this condition. |
| 6 | +Also for all allocations in the kernel heap we make do not span over multiple percpu slabs, we will pin our process to a single CPU. |
| 7 | + |
| 8 | +## Triggering the vulnerability |
| 9 | + |
| 10 | +We aim to free a `nft_chain` object resides in `kmalloc-cg-128` cache. |
| 11 | + |
| 12 | +- Batch 1 |
| 13 | + - Create a table `t` |
| 14 | + - Create a chain `c1` |
| 15 | + - Create a chain `c2` hosting a rule `r2` that has an immediate expression `e2` which binds to `c1` |
| 16 | + + `c1->use == 1` |
| 17 | +- Batch 2 |
| 18 | + - Create a chain `c3` hosting a rule `r3` that has an immediate expression `e3` which binds to `c1` |
| 19 | + + `c3` should have `NFT_CHAIN_BINDING` flag |
| 20 | + + `c1->use = 2` |
| 21 | + - Create a chain `c4` hosting a rule `r4` that has an immediate expression `e4` which binds to `c3` |
| 22 | + + However, we will not allow the rule creation to success by adding another immediate expression, which binds to a non-existant chain |
| 23 | + + At this point, `nft_rule_expr_deactivate` will be called on `r4` with `phase = NFT_TRANS_PREPARE_ERROR` |
| 24 | + + `nft_immediate_deactivate` will be called on `e4` |
| 25 | + + Since `c3` has `NFT_CHAIN_BINDING` flag, `nft_rule_expr_deactivate` will be called on `r3`, which will also deactivate `e3` |
| 26 | + + `c1->use = 1` because `c1` is bound to `e3` |
| 27 | + - Because the batch failed, transaction rollback will be executed with `phase = NFT_TRANS_ABORT` |
| 28 | + + `c3`, `r3`, `e3` will be deactivated again |
| 29 | + + `c1->use = 0` |
| 30 | +- Batch 3 |
| 31 | + - Because `c1->use = 0`, we can delete chain `c1` |
| 32 | + |
| 33 | +After this, we have a dangling reference in `e2` to the freed chain `c1`. |
| 34 | +The naming convention here is for demonstration purpose only. In the exploit it will be different. |
| 35 | +We will also create a `spray` chain in order to spray the heap using `nft_rule` object later (mostly to avoid accidentally reclaiming the freed chunk when creating new chain). |
| 36 | + |
| 37 | +## Leak kernel heap address |
| 38 | + |
| 39 | +When dumping immediate expression binding to another chain, we will get the chain's name. |
| 40 | +When the chain is freed, the buffer containing its name is also freed. The address pointing to the name is not cleared. |
| 41 | +If we reclaim the freed name buffer, but not the freed chain, we can leak data from the start of the reclaimed object until a NULL byte. |
| 42 | +With chunk size 192 (`kmalloc-cg-192`), it is less likely that we will get NULL byte in the address. |
| 43 | +So when creating `c1` rule, we set the actual name to be 129-192 bytes long (including NULL terminating character). |
| 44 | + |
| 45 | +We will use `nft_rule` as the spraying object to reclaim the freed name chunk because: |
| 46 | + |
| 47 | +- It is an elastic object so we can attack many caches |
| 48 | +- The elastic portion are flattened expression array (up to 128 expressions) and arbitrary user data (up to 255 bytes) |
| 49 | +- The first field is `list_head` so we can leak heap address of the next rule and the previous rule |
| 50 | + |
| 51 | +We create a lot of rules with some user data so that the total length of the `nft_rule` struct is in range 129-192 bytes. |
| 52 | +After spraying, we request to dump `r2` which will dump `e2` and hopefully we will get the heap address of a `nft_rule` object. |
| 53 | +If the leak fails, we will try again. |
| 54 | +We will also be able to leak the `handle` of the rule object that reclaimed the freed name chunk. |
| 55 | +It will be used to correctly free only the rule that we got the heap address for later stage. |
| 56 | + |
| 57 | +We will also add a `nft_notrack` expression to the rule so there will be a kernel pointer inside, which we will leak in the next stage once we get the heap leak. The in-memory structure layout of the sprayed rules looks like this (first 0x18 bytes are rule metadata): |
| 58 | + |
| 59 | +| Offset | Field | Value | |
| 60 | +---------|-------|-------| |
| 61 | +... |
| 62 | +0x18|expression|`nft_notrack_ops` |
| 63 | +0x20|`nft_userdata.len`|x |
| 64 | +0x21|`nft_userdata.data`|any |
| 65 | +... |
| 66 | +0xbf|`nft_userdata.data`|any |
| 67 | + |
| 68 | +## Leak kernel base address |
| 69 | + |
| 70 | +Now that we have heap leak and we know that a kernel address is inside that chunk, let's leak it by creating a fake chain with name pointing to the leaked heap region by reclaiming the freed chain (reminder: the freed `nft_chain` is in `kmalloc-cg-128` cache). |
| 71 | +This time we will spray using `userdata` of `nft_table`. We can store at most 256 bytes of arbitrary data. |
| 72 | +We create multiple `nft_table` with different names that has 128 bytes `userdata` with structure layout looks like following: |
| 73 | + |
| 74 | +| Offset | `nft_chain` field | Value | Remarks | |
| 75 | +---------|-------------------|-----------------| |
| 76 | +0x0|`list`|any| |
| 77 | +0x10|`rules.next`|heap leak|for next stage |
| 78 | +0x18|`rules.prev`|heap leak|for next stage |
| 79 | +... |
| 80 | +0x54|`flags`|`NFT_CHAIN_BINDING`|for next stage |
| 81 | +0x58|`name`|heap leak + `sizeof(struct nft_rule)`|where we put `nft_notrack_ops` in the sprayed rule above |
| 82 | +... |
| 83 | + |
| 84 | +After spraying, we request to dump `r2` which will dump `e2` and hopefully we will get the address of `nft_notrack_ops`. |
| 85 | + |
| 86 | +## RIP control and return to userspace |
| 87 | + |
| 88 | +As we have `handle` of the rule that got its address leaked, we delete it. |
| 89 | +Then, we spray a fake `nft_rule` that also act as a ROP chain. Remember that the deleted rule resided in `kmalloc-cg-192` cache. |
| 90 | +We set `dlen` of the fake rule to 1 to pass the expression loop check. |
| 91 | +We craft a fake expression that has its `ops` point to the leaked heap. We need to align `ops->deactivate` with a JOP gadget. |
| 92 | +Following that, we build a ROP chain that do `commit_creds(&init_cred)`, `switch_task_namespaces(find_task_by_vpid(getpid()), &init_nsproxy)` then return to userspace. |
| 93 | + |
| 94 | +After spraying, we delete the rule `r2` which will call `nft_rule_expr_deactivate` on `e2`. Since we prepared fake rule list for the reclaimed fake chain, and set its flag to `NFT_CHAIN_BINDING`, the fake rule will be deactivated and the fake expression's `deactivate` routine will be called, which will trigger the JOP gadget then the ROP chain. |
| 95 | + |
| 96 | +Returning to userspace, we use `setns` to escape from the jail then spawn a root shell using `execve`. |
0 commit comments