Segfault in tb_target_set_jmp_target_arm #2048

futhewo · 2024-10-31T09:16:34Z

In a very long emulation, I have a segfault (dereferencing null pointer) in qemu/tcg/aarch64/tcg-target.inc.c:tb_target_set_jmp_target.

In qemu/accel/tcg/cpu-exec.c, the function tb_set_jmp_target calls the previous function tb_target_set_jmp_target with parameters tc_ptr = 0 and tc_ptr + offset = 0.

Then tb_target_set_jmp_target calls atomic_set((uint64_t*)jmp_addr_, pair) with jmp_addr = tc_ptr + offset = 0, which segfaults.

I patched it (dirty) by adding the following code in tb_set_jmp_target:
if (TCG_TARGET_HAS_direct_jump && tb->tc.ptr) {
[…]
}

I do not know what this patch may break, but it solves the crash.
I am on commit 6ae0c97.

Feel free to ask me anything.

The text was updated successfully, but these errors were encountered:

wtdcode · 2024-10-31T10:33:54Z

Thanks for the report and investigation. Your version seems pretty old, how about current dev branch?

1144822034 · 2024-11-08T07:47:08Z

I have also encountered the same problem, and this issue has been very helpful to me

futhewo · 2024-11-11T10:56:30Z

Unfortunately, I cannot try on a newer unicorn version, sorry.
I am using unicorn as part of unicornafl++ and it uses the version I pointed out.

wtdcode · 2024-11-11T11:02:42Z

unicornafl++

What is that? Are you mentioning unicornafl?

futhewo · 2024-11-11T17:27:34Z

Yes, the one that is part of afl++, that you may find here:
https://github.com/AFLplusplus/unicornafl

wtdcode · 2024-11-11T17:28:41Z

Yes, the one that is part of afl++, that you may find here: https://github.com/AFLplusplus/unicornafl

Okay, unicornafl will bump to 2.1.2 once I fixed it.

futhewo · 2024-11-11T17:30:07Z

That would be great! Thanks.

sadeli413 · 2024-12-09T14:10:13Z

Tagging #1923 for visibility.
My temporary workaround is to destroy and reconstruct the emulator once every few million times.

wtdcode · 2025-01-04T10:11:45Z

I assume 2c688ba shall solve this. Could you have a try @futhewo ?

samuel-beckett · 2025-03-04T08:10:02Z

Hi,

Experienced same segfault on aarch64 target. After investigation the
null tc.ptr comes from the last_tb in tb_find.

Analysis

The TCG highwater has been lowered.

void tcg_prologue_init(TCGContext *s)
{
...
    s->code_gen_highwater = (char *)s->code_gen_buffer + (total_size - TCG_HIGHWATER) - s->uc->qemu_real_host_page_size;
}

Thus tcg_gen_code called from tb_find will fail tb_alloc because
of a lower highwater. However the last_tb is still set within
tb_find.

static inline TranslationBlock *tb_find(CPUState *cpu,
                                        TranslationBlock *last_tb,
                                        int tb_exit, uint32_t cf_mask)
{
...
        tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask);
}

Back from tb_gen_code in tb_find, the call to
tb_add_jump/tb_set_jmp_target will trigger the fault:

static inline TranslationBlock *tb_find(CPUState *cpu,
                                        TranslationBlock *last_tb,
                                        int tb_exit, uint32_t cf_mask)
{
...
    if (last_tb) {
        tb_add_jump(last_tb, tb_exit, tb);
	    	tb_set_jmp_target(tb, n, (uintptr_t)tb_next->tc.ptr);
			SEGV
    }
}

While my baseline is d568885d64c89db5b9a722f0c1bef05aa92f84ca.
I'm using Sai Cao patch:

git show d01904d3f33bd58c54271a7e28c402dbaddda847
commit d01904d3f33bd58c54271a7e28c402dbaddda847
Author: Sai Cao <[email protected]>
Date:   Thu Dec 19 11:29:20 2024 +0800

    Fix bug that tcg_region_alloc tbs were not flush
    this cause uc_ctl_remove_cache remove a invalid tbs.

diff --git a/qemu/tcg/tcg.c b/qemu/tcg/tcg.c
index dcacec7c..2b643b49 100644
--- a/qemu/tcg/tcg.c
+++ b/qemu/tcg/tcg.c
@@ -808,6 +808,7 @@ TranslationBlock *tcg_tb_alloc(TCGContext *s)
         if (tcg_region_alloc(s)) {
             return NULL;
         }
+        tb_flush(s->uc->cpu);
         goto retry;
     }
     s->code_gen_ptr = next;

The tb_flush done during the failed alloc, because of a lower
highwater, is reponsible for invalidating the last_tb in tb_find
whithout letting it know about.

Candidate fix

We could eventually check last_tb.tc.ptr before calling
set_jmp_target. But it is safer to detect a tb_flush earlier and
reset last_tb such as:

diff --git a/qemu/accel/tcg/cpu-exec.c b/qemu/accel/tcg/cpu-exec.c
index fc47eecb..bf9a8c3b 100644
--- a/qemu/accel/tcg/cpu-exec.c
+++ b/qemu/accel/tcg/cpu-exec.c
@@ -259,11 +259,20 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
     tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags, cf_mask);
     if (tb == NULL||tb->cflags&CF_NOCACHE) {
         mmap_lock();
+
+        unsigned tb_flush_count = uc->tcg_ctx->tb_ctx.tb_flush_count;
+
         tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask);
         mmap_unlock();
         /* We add the TB in the virtual pc hash table for the fast lookup */
         cpu->tb_jmp_cache[tb_jmp_cache_hash_func(cpu->uc, pc)] = tb;
 
+        if (uc->tcg_ctx->tb_ctx.tb_flush_count != tb_flush_count) {
+            uc->last_tb = last_tb = NULL;
+        }
+
         if (uc->last_tb) {
             UC_TB_COPY(&cur_tb, tb);
             UC_TB_COPY(&prev_tb, uc->last_tb);

wtdcode · 2025-03-04T08:11:50Z

Hi,

Experienced same segfault on aarch64 target. After investigation the null tc.ptr comes from the last_tb in tb_find.

Analysis

The TCG highwater has been lowered.

void tcg_prologue_init(TCGContext *s)
{
...
    s->code_gen_highwater = (char *)s->code_gen_buffer + (total_size - TCG_HIGHWATER) - s->uc->qemu_real_host_page_size;
}

Thus tcg_gen_code called from tb_find will fail tb_alloc because of a lower highwater. However the last_tb is still set within tb_find.

static inline TranslationBlock *tb_find(CPUState *cpu,
                                        TranslationBlock *last_tb,
                                        int tb_exit, uint32_t cf_mask)
{
...
        tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask);
}

Back from tb_gen_code in tb_find, the call to tb_add_jump/tb_set_jmp_target will trigger the fault:

static inline TranslationBlock *tb_find(CPUState *cpu,
                                        TranslationBlock *last_tb,
                                        int tb_exit, uint32_t cf_mask)
{
...
    if (last_tb) {
        tb_add_jump(last_tb, tb_exit, tb);
	    	tb_set_jmp_target(tb, n, (uintptr_t)tb_next->tc.ptr);
			SEGV
    }
}

While my baseline is d568885d64c89db5b9a722f0c1bef05aa92f84ca. I'm using Sai Cao patch:

git show d01904d3f33bd58c54271a7e28c402dbaddda847
commit d01904d3f33bd58c54271a7e28c402dbaddda847
Author: Sai Cao <[email protected]>
Date:   Thu Dec 19 11:29:20 2024 +0800

    Fix bug that tcg_region_alloc tbs were not flush
    this cause uc_ctl_remove_cache remove a invalid tbs.

diff --git a/qemu/tcg/tcg.c b/qemu/tcg/tcg.c
index dcacec7c..2b643b49 100644
--- a/qemu/tcg/tcg.c
+++ b/qemu/tcg/tcg.c
@@ -808,6 +808,7 @@ TranslationBlock *tcg_tb_alloc(TCGContext *s)
         if (tcg_region_alloc(s)) {
             return NULL;
         }
+        tb_flush(s->uc->cpu);
         goto retry;
     }
     s->code_gen_ptr = next;

The tb_flush done during the failed alloc, because of a lower highwater, is reponsible for invalidating the last_tb in tb_find whithout letting it know about.

Candidate fix

We could eventually check last_tb.tc.ptr before calling set_jmp_target. But it is safer to detect a tb_flush earlier and reset last_tb such as:

diff --git a/qemu/accel/tcg/cpu-exec.c b/qemu/accel/tcg/cpu-exec.c
index fc47eecb..bf9a8c3b 100644
--- a/qemu/accel/tcg/cpu-exec.c
+++ b/qemu/accel/tcg/cpu-exec.c
@@ -259,11 +259,20 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
     tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags, cf_mask);
     if (tb == NULL||tb->cflags&CF_NOCACHE) {
         mmap_lock();
+
+        unsigned tb_flush_count = uc->tcg_ctx->tb_ctx.tb_flush_count;
+
         tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask);
         mmap_unlock();
         /* We add the TB in the virtual pc hash table for the fast lookup */
         cpu->tb_jmp_cache[tb_jmp_cache_hash_func(cpu->uc, pc)] = tb;
 
+        if (uc->tcg_ctx->tb_ctx.tb_flush_count != tb_flush_count) {
+            uc->last_tb = last_tb = NULL;
+        }
+
         if (uc->last_tb) {
             UC_TB_COPY(&cur_tb, tb);
             UC_TB_COPY(&prev_tb, uc->last_tb);

Could you try with commit 2128e01efc81404fecfcdfe1c7bb282ebd3e87d3? This includes a relevant fix to it.

This ensures we have correct tb_flush.

wtdcode · 2025-03-04T08:14:35Z

Also I would appreciate it if you could also have a reproduction, even though it might be complex, not minimal. @samuel-beckett

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segfault in tb_target_set_jmp_target_arm #2048

Segfault in tb_target_set_jmp_target_arm #2048

futhewo commented Oct 31, 2024

wtdcode commented Oct 31, 2024

1144822034 commented Nov 8, 2024

futhewo commented Nov 11, 2024

wtdcode commented Nov 11, 2024

futhewo commented Nov 11, 2024

wtdcode commented Nov 11, 2024

futhewo commented Nov 11, 2024

sadeli413 commented Dec 9, 2024

wtdcode commented Jan 4, 2025

samuel-beckett commented Mar 4, 2025

wtdcode commented Mar 4, 2025 •

edited

Loading

Analysis

Candidate fix

wtdcode commented Mar 4, 2025

Segfault in tb_target_set_jmp_target_arm #2048

Segfault in tb_target_set_jmp_target_arm #2048

Comments

futhewo commented Oct 31, 2024

wtdcode commented Oct 31, 2024

1144822034 commented Nov 8, 2024

futhewo commented Nov 11, 2024

wtdcode commented Nov 11, 2024

futhewo commented Nov 11, 2024

wtdcode commented Nov 11, 2024

futhewo commented Nov 11, 2024

sadeli413 commented Dec 9, 2024

wtdcode commented Jan 4, 2025

samuel-beckett commented Mar 4, 2025

Analysis

Candidate fix

wtdcode commented Mar 4, 2025 • edited Loading

Analysis

Candidate fix

wtdcode commented Mar 4, 2025

wtdcode commented Mar 4, 2025 •

edited

Loading