Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

up_squared: A lot of fails in kernel and libraries components #87046

Open
arikgreen opened this issue Mar 13, 2025 · 4 comments · May be fixed by #87068
Open

up_squared: A lot of fails in kernel and libraries components #87046

arikgreen opened this issue Mar 13, 2025 · 4 comments · May be fixed by #87068
Assignees
Labels
area: Kernel bug The issue is a bug, or the PR is fixing a bug platform: X86 x86 and x86-64 priority: high High impact/importance bug

Comments

@arikgreen
Copy link
Collaborator

arikgreen commented Mar 13, 2025

Describe the bug

Since March, 08 we observed a lot of the new fails on platform: up_squared/apollo_lake
List of fail tests:

kernel.common.profiling					FAILED Timeout (device: up2, 123.277s <zephyr>)
kernel.ipi_optimize.smp					FAILED Timeout (device: up2, 123.652s <zephyr>)
kernel.threads.thread_stack				FAILED Timeout (device: up2, 123.014s <zephyr>)
kernel.multiprocessing.smp.affinity			FAILED Timeout (device: up2, 242.988s <zephyr>)
kernel.memory_protection				FAILED Timeout (device: up2, 122.961s <zephyr>)
kernel.multiprocessing.smp.minimallibc			FAILED Timeout (device: up2, 242.961s <zephyr>)
kernel.multiprocessing.smp.affinity.custom_rom_offset	FAILED Timeout (device: up2, 243.041s <zephyr>)
kernel.multiprocessing.smp				FAILED Timeout (device: up2, 243.313s <zephyr>)
libraries.libc.common.picolibc				FAILED Timeout (device: up2, 123.122s <zephyr>)
libraries.libc.common.newlib_nano			FAILED Timeout (device: up2, 122.960s <zephyr>)
libraries.libc.common.picolibc.module			FAILED Timeout (device: up2, 123.041s <zephyr>)
libraries.libc.common					FAILED Timeout (device: up2, 123.146s <zephyr>)
libraries.libc.common.newlib				FAILED Timeout (device: up2, 123.424s <zephyr>)
libraries.libc.common.minimal				FAILED Timeout (device: up2, 122.981s <zephyr>)
kernel.ipi_cascade.smp					FAILED Timeout (device: up2, 123.392s <zephyr>)

twister.log

IMO it is a regression.
Specific commit since the problem : 427f2c6

To Reproduce
run twister for the platform up_squared/apollo_lake and above test scope

west twister -p up_squared/apollo_lake -s kernel.multiprocessing.smp.affinity.custom_rom_offset

Expected behavior

test suite passed for above tests

Impact

This is a showstopper.

Logs and console output

INFO - 7 of 22 executed test configurations passed (31.82%), 0 built (not run), 15 failed, 0 errored, with no warnings in 3886.15 seconds.

Environment (please complete the following information):

  • OS: Linux Ubuntu 22.04.5 LTS
  • twister - INFO - Using Ninja..
  • twister - INFO - Zephyr version: v4.1.0-227-g427f2c60da0b
  • twister - INFO - Using 'zephyr' toolchain.
  • Commit: 427f2c60da0beb8b68e85bbcf2cbbfd59ad469f8

Additional context

@arikgreen arikgreen added bug The issue is a bug, or the PR is fixing a bug area: Kernel labels Mar 13, 2025
@arikgreen arikgreen added the platform: X86 x86 and x86-64 label Mar 13, 2025
@nashif nashif added the priority: high High impact/importance bug label Mar 13, 2025
@peter-mitsis
Copy link
Collaborator

I must have messed something up when attempting the direct IPI support for x86. For the moment, I suggest reverting the offending commit.

@peter-mitsis
Copy link
Collaborator

Did some more digging, on QEMU (where this was tested), CONFIG_X2APIC is not set. However, on upsquared/apollo_lake, CONFIG_X2APIC is enabled. This is significant as the behavior of the routine z_loapic_ipi() differs based on the enablement/disablement of this Kconfig option.

With this knowledge, we may still be able to keep the directed IPI support for x86 only when CONFIG_X2APIC is not enabled to get us past this stopper, and then work on fixing the support for it when CONFIG_X2APIC is enabled.

peter-mitsis added a commit to peter-mitsis/zephyr that referenced this issue Mar 13, 2025
It has been discovered that direct IPI support does not work
correctly when CONFIG_X2APIC is enabled. Until that can be
fixed, restrict this feature on x86 to platforms that do not
enable CONFIG_X2APIC.

Fixes zephyrproject-rtos#87046

Signed-off-by: Peter Mitsis <[email protected]>
@arikgreen
Copy link
Collaborator Author

I attached the twister log

@nashif
Copy link
Member

nashif commented Mar 14, 2025

Logs and console output

INFO - 7 of 22 executed test configurations passed (31.82%), 0 built (not run), 15 failed, 0 errored, with no warnings in 3886.15 seconds.

in the future, please do not put twister output in the log, the expectation is to have the console output from the device when running the tests, so this is more appropriate for example and can be helpful:

*** Booting Zephyr OS build 4.1.99 (delayed boot 500ms) ***
Running TESTSUITE ipi
===================================================================
START - test_arch_sched_broadcast_ipi
 PASS - test_arch_sched_broadcast_ipi in 0.038 seconds
===================================================================
START - test_arch_sched_directed_ipi

    Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/ipi_optimize/src/main.c:231: ipi_test_arch_sched_directed_ipi: (set[j] == 1 is false)
Direct-Expected 1, got 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Kernel bug The issue is a bug, or the PR is fixing a bug platform: X86 x86 and x86-64 priority: high High impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants