Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FTL crash. V6 generally unreliable. #2307

Open
dogshome opened this issue Mar 1, 2025 · 9 comments
Open

FTL crash. V6 generally unreliable. #2307

dogshome opened this issue Mar 1, 2025 · 9 comments

Comments

@dogshome
Copy link

dogshome commented Mar 1, 2025

Versions

  • Pi-hole: 604
  • AdminLTE: 603
  • FTL: 601

Platform

  • OS and version: v25.2.2 for NanoPi M4V2 running Armbian Linux 6.12.16-current-rockchip64 Armbian Noble
  • Platform: NanoPi M4

Expected behavior

A clear and concise description of what you expected to happen.
Continuous running without crash. V5 was highly reliable.

Actual behavior / bug

FTL stops, log indicates time it stopped. Pi still functional via remote terminal.
A clear and concise description of what the bug is.

Steps to reproduce

Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

2025-03-01 03:07:00.216 INFO !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-03-01 03:07:00.216 INFO Please report a bug at https://github.com/pi-hole/FTL/issues
2025-03-01 03:07:00.216 INFO and include in your report already the following details:
2025-03-01 03:07:00.216 INFO FTL has been running for 32937 seconds
2025-03-01 03:07:00.216 INFO FTL branch: master
2025-03-01 03:07:00.216 INFO FTL version: v6.0.3
2025-03-01 03:07:00.216 INFO FTL commit: 37f9a96
2025-03-01 03:07:00.216 INFO FTL date: 2025-02-28 09:26:27 +0000
2025-03-01 03:07:00.216 INFO FTL user: started as pihole, ended as pihole
2025-03-01 03:07:00.216 INFO Compiled for linux/arm64/v8 (compiled on CI) using cc (Alpine 14.2.0) 14.2.0
2025-03-01 03:07:00.216 INFO Process details: MID: 1525
2025-03-01 03:07:00.216 INFO PID: 1525
2025-03-01 03:07:00.216 INFO TID: 2124
2025-03-01 03:07:00.216 INFO Name: database
2025-03-01 03:07:00.216 INFO Received signal: Segmentation fault
2025-03-01 03:07:00.216 INFO at address: 0xaa7104abfe6b477
2025-03-01 03:07:00.216 INFO with code: SEGV_MAPERR (Address not mapped to object)
2025-03-01 03:07:00.216 INFO !!! INFO: pihole-FTL has not been compiled with glibc/backtrace support, not generating one !!!
2025-03-01 03:07:00.217 INFO ------ Listing content of directory /dev/shm ------
2025-03-01 03:07:00.217 INFO File Mode User:Group Size Filename
2025-03-01 03:07:00.217 INFO rwxrwxrwx root:root 360 .
2025-03-01 03:07:00.217 INFO rwxr-xr-x root:root 4K ..
2025-03-01 03:07:00.217 INFO rw------- pihole:pihole 786K FTL-1525-recycler
2025-03-01 03:07:00.217 INFO rw------- pihole:pihole 20K FTL-1525-dns-cache-lookup
2025-03-01 03:07:00.217 INFO rw------- pihole:pihole 25K FTL-1525-domains-lookup
2025-03-01 03:07:00.217 INFO rw------- pihole:pihole 4K FTL-1525-clients-lookup
2025-03-01 03:07:00.217 INFO rw------- pihole:pihole 569K FTL-1525-fifo-log
2025-03-01 03:07:00.217 INFO rw------- pihole:pihole 4K FTL-1525-per-client-regex
2025-03-01 03:07:00.217 INFO rw------- pihole:pihole 102K FTL-1525-dns-cache
2025-03-01 03:07:00.217 INFO rw------- pihole:pihole 8K FTL-1525-overTime
2025-03-01 03:07:00.218 INFO rw------- pihole:pihole 2M FTL-1525-queries
2025-03-01 03:07:00.218 INFO rw------- pihole:pihole 29K FTL-1525-upstreams
2025-03-01 03:07:00.218 INFO rw------- pihole:pihole 348K FTL-1525-clients
2025-03-01 03:07:00.218 INFO rw------- pihole:pihole 90K FTL-1525-domains
2025-03-01 03:07:00.218 INFO rw------- pihole:pihole 123K FTL-1525-strings
2025-03-01 03:07:00.218 INFO rw------- pihole:pihole 144 FTL-1525-settings
2025-03-01 03:07:00.218 INFO rw------- pihole:pihole 328 FTL-1525-counters
2025-03-01 03:07:00.218 INFO rw------- pihole:pihole 88 FTL-1525-lock
2025-03-01 03:07:00.218 INFO ---------------------------------------------------
2025-03-01 03:07:00.218 INFO Please also include some lines from above the !!!!!!!!! header.
2025-03-01 03:07:00.218 INFO Thank you for helping us to improve our FTL engine!
2025-03-01 03:07:00.218 INFO Waiting for threads to join
2025-03-01 03:07:00.225 INFO Terminating timer thread
2025-03-01 03:07:00.252 INFO Terminating resolver thread
2025-03-01 03:07:00.624 INFO Terminating GC thread
2025-03-01 03:07:02.219 INFO Thread database (0) is still busy, cancelling it.
2025-03-01 03:07:02.219 ERROR Error when obtaining outer SHM lock: Previous owner died
2025-03-01 03:07:02.220 ERROR Error when obtaining inner SHM lock: Previous owner died
2025-03-01 03:58:11.462 INFO Terminating NTP thread

Debug Token

  • URL:

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

@DL6ER
Copy link
Member

DL6ER commented Mar 1, 2025

Sorry for the issues you are seeing. We are recently encountering a few bugs we have not anticipated due to missing kernel features/ancient kernels. Let's find out if you are affected by the same issue. If so, we already have a fix for this.

Can you reproduce the crash (I guess so). If so, can you please follow the steps described on https://docs.pi-hole.net/ftldns/gdb/ and come back with the backtrace info? I will probably lead to the solution right away once we know where to look

@DL6ER DL6ER added the crash label Mar 1, 2025
@streamdp
Copy link

streamdp commented Mar 2, 2025

#2316 the same issue, I'm added my report there.

@dogshome
Copy link
Author

dogshome commented Mar 2, 2025

I'm away from home for a couple of nights, and pihole had been running nearly 24 hours so far.

Screen working. Will see what I come back to To Tuesday night.

@dogshome
Copy link
Author

dogshome commented Mar 4, 2025

Hi DL6ER, info as requested.

[Detaching after vfork from child process 75517]
[Detaching after vfork from child process 75519]
Quit)ng: 250 ./src/internal/atomic.h: No such file or directory
(gdb)
[Detaching after vfork from child process 75547]

Thread 14 "database" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 2832]
0x00000000007a5d80 in a_crash () at ./src/internal/atomic.h:250
warning: 250 ./src/internal/atomic.h: No such file or directory
(gdb)
Last login: Sun Mar 2 10:44:06 2025 from 192.168.1.51
pi@nanopim4v2:$ sudo -i
[sudo] password for pi:
root@nanopim4v2:
# screen -r
[Detaching after vfork from child process 75319]
[Detaching after vfork from child process 75321]
[Detaching after vfork from child process 75347]
[Detaching after vfork from child process 75349]
[Detaching after vfork from child process 75375]
[Detaching after vfork from child process 75377]
[Detaching after vfork from child process 75404]
[Detaching after vfork from child process 75406]
[Detaching after vfork from child process 75432]
[Detaching after vfork from child process 75434]
[Detaching after vfork from child process 75460]
[Detaching after vfork from child process 75462]
[Detaching after vfork from child process 75488]
[Detaching after vfork from child process 75490]
[Detaching after vfork from child process 75517]
[Detaching after vfork from child process 75519]
Quit)ng: 250 ./src/internal/atomic.h: No such file or directory
(gdb)
g after vfork from child process 75460]
[Detaching after vfork from child process 75462]
[Detaching after vfork from child process 75488]
[Detaching after vfork from child process 75490]
[Detaching after vfork from child process 75517]
[Detaching after vfork from child process 75519]
[Detaching after vfork from child process 75545]
[Detaching after vfork from child process 75547]

Thread 14 "database" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 2832]
0x00000000007a5d80 in a_crash () at ./src/internal/atomic.h:250
(gdb)ng: 250 ./src/internal/atomic.h: No such file or directory
": gcore, generate-core-file, goto-bookmark, gr, gu, guile, guile-repl.
(gdb)

(gdb) backtrace
#0 0x00000000007a5d80 in a_crash () at ./src/internal/atomic.h:250
#1 enframe (g=0x2d073478, idx=3, n=4376, ctr=)
at src/malloc/mallocng/meta.h:211
#2 __libc_malloc_impl (n=4376) at src/malloc/mallocng/malloc.c:379
#3 0x000000000055a1c8 in sqlite3MemMalloc (nByte=4368)
at /app/src/database/sqlite3.c:27121
#4 0x0000000000540ec8 in sqlite3Malloc (n=4368)
at /app/src/database/sqlite3.c:30894
#5 pcache1Alloc (nByte=) at /app/src/database/sqlite3.c:55746
#6 0x00000000005418c8 in pcache1AllocPage (pCache=0xffffa2921268,
benignMalloc=0) at /app/src/database/sqlite3.c:55834
#7 pcache1FetchStage2 (pCache=0xffffa2921268, iKey=51173, createFlag=2)
at /app/src/database/sqlite3.c:56305
#8 0x00000000005bcc4c in sqlite3PcacheFetch (pCache=,
pgno=51173, createFlag=3) at /app/src/database/sqlite3.c:54869
#9 getPageNormal (pPager=0xffffa29223f8, pgno=51173, ppPage=0xffff9eca65c0,
flags=2) at /app/src/database/sqlite3.c:62854
#10 0x000000000055e9e8 in sqlite3PagerGet (pPager=, pgno=51173,
ppPage=0xffff9eca65c0, flags=)
at /app/src/database/sqlite3.c:63046
#11 getAndInitPage (pBt=0xffffa291d028, pgno=51173, ppPage=0xffffa29339c8,
bReadOnly=) at /app/src/database/sqlite3.c:73165
#12 moveToChild (pCur=0xffffa2933940, newPgno=51173)
at /app/src/database/sqlite3.c:76196
#13 0x0000000000618074 in sqlite3BtreeCount (db=0xffffa2c5a018,
pCur=0xffffa2933940, pnEntry=)
at /app/src/database/sqlite3.c:81252
#14 sqlite3VdbeExec (p=p@entry=0xffffa2930e88)
at /app/src/database/sqlite3.c:97280
#15 0x000000000061c558 in sqlite3Step (p=0xffffa2930e88)
at /app/src/database/sqlite3.c:91504
#16 sqlite3_step (pStmt=0xffffa2930e88) at /app/src/database/sqlite3.c:91565
#17 0x00000000004998e0 in get_number_of_queries_in_DB (db=,
db@entry=0xffffa2c5a018,
tablename=tablename@entry=0x7d7950 "disk.query_storage")
at /app/src/database/query-table.c:488
#18 0x0000000000499f48 in export_queries_to_disk (final=final@entry=false)
at /app/src/database/query-table.c:734
#19 0x0000000000488d78 in DB_thread (val=)
at /app/src/database/database-thread.c:143
#20 0x00000000007b8b04 in start (p=0xffff9eca6b00)
at src/thread/pthread_create.c:207
#21 0x00000000007c2be0 in __clone () at src/thread/aarch64/clone.s:28
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)

@DL6ER
Copy link
Member

DL6ER commented Mar 4, 2025

Thank your for the backtrace. It is not immediately clear why this is happening. Could you please provide a few lines before the !!!!! in /var/log/pihole-FTL.log for (maybe) some additional context?

If there is nothing, it may be interesting to run again with debug.database = true, e.g.

sudo pihole-FTL --config debug.database true

@dogshome
Copy link
Author

dogshome commented Mar 5, 2025

[Detaching after vfork from child process 33851]

Thread 14 "database" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 1940]
0x00000000007a5d80 in a_crash () at ./src/internal/atomic.h:250
warning: 250 ./src/internal/atomic.h: No such file or directory
(gdb) Detaching after vfork from child process 33851]

Thread 14 "database" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 1940]
0x00000000007a5d80 in a_crash () at ./src/internal/atomic.h:250
warning: 250 ./src/internal/atomic.h: No such file or directory
(gdb)

Thread 14 "database" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 1940]
0x00000000007a5d80 in a_crash () at ./src/internal/atomic.h:250
warning: 250 ./src/internal/atomic.h: No such file or directory
(gdb) Detaching after vfork from child process 33851]

Thread 14 "database" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 1940]
0x00000000007a5d80 in a_crash () at ./src/internal/atomic.h:250
warning: 250 ./src/internal/atomic.h: No such file or directory
(gdb)

[Detaching after vfork from child process 33794]
[Detaching after vfork from child process 33820]
[Detaching after vfork from child process 33822]
[Detaching after vfork from child process 33849]
[Detaching after vfork from child process 33851]

Thread 14 "database" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 1940]
0x00000000007a5d80 in a_crash () at ./src/internal/atomic.h:250
warning: 250 ./src/internal/atomic.h: No such file or directory
(gdb) Detaching after vfork from child process 33851]

Thread 14 "database" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 1940]
0x00000000007a5d80 in a_crash () at ./src/internal/atomic.h:250
warning: 250 ./src/internal/atomic.h: No such file or directory
(gdb)

Thread 14 "database" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 1940]
0x00000000007a5d80 in a_crash () at ./src/internal/atomic.h:250
warning: 250 ./src/internal/atomic.h: No such file or directory
(gdb) Detaching after vfork from child process 33851]

Thread 14 "database" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 1940]
0x00000000007a5d80 in a_crash () at ./src/internal/atomic.h:250
warning: 250 ./src/internal/atomic.h: No such file or directory
(gdb)

[Detaching after vfork from child process 33794]
[Detaching after vfork from child process 33820]
[Detaching after vfork from child process 33822]
[Detaching after vfork from child process 33849]
[Detaching after vfork from child process 33851]

Thread 14 "database" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 1940]
0x00000000007a5d80 in a_crash () at ./src/internal/atomic.h:250
warning: 250 ./src/internal/atomic.h: No such file or directory
(gdb) Detaching after vfork from child process 33851]

Thread 14 "database" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 1940]
0x00000000007a5d80 in a_crash () at ./src/internal/atomic.h:250
warning: 250 ./src/internal/atomic.h: No such file or directory
(gdb)

Thread 14 "database" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 1940]
0x00000000007a5d80 in a_crash () at ./src/internal/atomic.h:250
warning: 250 ./src/internal/atomic.h: No such file or directory
(gdb) Detaching after vfork from child process 33851]

Thread 14 "database" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 1940]
0x00000000007a5d80 in a_crash () at ./src/internal/atomic.h:250
warning: 250 ./src/internal/atomic.h: No such file or directory
(gdb)

Quit
(gdb) continue

Quit
(gdb) continueQuit
(gdb) backtrace
#0 0x00000000007a5d80 in a_crash () at ./src/internal/atomic.h:250
#1 enframe (g=0x180255b8, idx=3, n=4376, ctr=)
at src/malloc/mallocng/meta.h:211
#2 __libc_malloc_impl (n=4376) at src/malloc/mallocng/malloc.c:379
#3 0x000000000055a1c8 in sqlite3MemMalloc (nByte=4368)
at /app/src/database/sqlite3.c:27121
#4 0x0000000000540ec8 in sqlite3Malloc (n=4368)
at /app/src/database/sqlite3.c:30894
#5 pcache1Alloc (nByte=) at /app/src/database/sqlite3.c:55746
#6 0x00000000005418c8 in pcache1AllocPage (pCache=0xffffb784c618,
benignMalloc=0) at /app/src/database/sqlite3.c:55834
#7 pcache1FetchStage2 (pCache=0xffffb784c618, iKey=44272, createFlag=2)
at /app/src/database/sqlite3.c:56305
#8 0x00000000005bcc4c in sqlite3PcacheFetch (pCache=,
pgno=44272, createFlag=3) at /app/src/database/sqlite3.c:54869
#9 getPageNormal (pPager=0xffffb7b7d858, pgno=44272, ppPage=0xffffb42d25c0,
flags=2) at /app/src/database/sqlite3.c:62854
#10 0x000000000055e9e8 in sqlite3PagerGet (pPager=, pgno=44272,
ppPage=0xffffb42d25c0, flags=)
at /app/src/database/sqlite3.c:63046
#11 getAndInitPage (pBt=0xffffb784cbb8, pgno=44272, ppPage=0xffffb785c708,
bReadOnly=) at /app/src/database/sqlite3.c:73165
#12 moveToChild (pCur=0xffffb785c680, newPgno=44272)
at /app/src/database/sqlite3.c:76196
#13 0x0000000000618074 in sqlite3BtreeCount (db=0xffffb7b84018,
pCur=0xffffb785c680, pnEntry=)
at /app/src/database/sqlite3.c:81252
#14 sqlite3VdbeExec (p=p@entry=0xffffb785cf58)
at /app/src/database/sqlite3.c:97280
#15 0x000000000061c558 in sqlite3Step (p=0xffffb785cf58)
at /app/src/database/sqlite3.c:91504
#16 sqlite3_step (pStmt=0xffffb785cf58) at /app/src/database/sqlite3.c:91565
#17 0x00000000004998e0 in get_number_of_queries_in_DB (db=,
db@entry=0xffffb7b84018,
tablename=tablename@entry=0x7d7950 "disk.query_storage")
at /app/src/database/query-table.c:488
#18 0x0000000000499f48 in export_queries_to_disk (final=final@entry=false)
at /app/src/database/query-table.c:734
#19 0x0000000000488d78 in DB_thread (val=)
at /app/src/database/database-thread.c:143
#20 0x00000000007b8b04 in start (p=0xffffb42d2b00)
at src/thread/pthread_create.c:207
#21 0x00000000007c2be0 in __clone () at src/thread/aarch64/clone.s:28
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)

@dogshome
Copy link
Author

Hiya, was this latest info of any use? I'm still seeing regular hanging of Pi-hole. Occassional full lock up of the SBC, but that has only happened once or twice. Note that V5 was bulletproof in the last 6 months on this platform. keith.

@DL6ER
Copy link
Member

DL6ER commented Mar 11, 2025

Sorry for the delay, could you please also provide the /var/log/pihole/FTL.log snippet before/from the crash event? This crash is happening so deeply within the sqlite3 library that will be really hard to find. The more information I have to theoretically reconstruct the exact code paths, the better.

As this may be memory corruption, it may be necessary to go one step further and use memcheck: https://docs.pi-hole.net/ftldns/valgrind/

@dogshome
Copy link
Author

Is it worth me swapping platform and OS to get reliability? I have a spare Raspberry Pi4 and am not fussy about what OS is installed. The Nanopi and Armbian was very reliable on V5, but is a nuisance on the current V6.

I can leave the nanopi running and continue to provide debug info - I have a Guest network I can tie it to. I can't imagine many others have this same combo. If you are seeing similar issues on other (more popular) platforms and feeding back helps others though, then I'll persist.

I do appreciate the rapid resonse and interest for what is a free product BTW!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants