-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When logging from multiple threads, an appender can be broken until app restart if it rolls over #2566
Comments
Are you by any chance using the same configuration in multiple web applications? On Linux you can check with |
@ppkarwasz Thanks for your response. I would have told you that for sure there are not multiple applications referencing the configuration, but We typically bring up the app server (glassfish in this case), deploy one app, and that's it. I'm aware there are classloader considerations but with a single deployment I don't think they come into play. We are using the log4j-jakarta-web module in our application if it matters. We launch the application with the How would I trace these, in your opinion?
I know it's not an exact comparable scenario (no random access file here) but this seems similar |
@ppkarwasz I did a little digging, and noticed that even on a fresh deployment, there are multiple descriptors to these files. I found an interesting java agent here and used it, to observe this pattern: First FD
Second FD:
We always initialize as such: private static final Logger log = LogManager.getLogger(class name replaced.class); Given this, as classes show up these static final lines resolve, so the timing is completely random. This has never posed a problem before. Is it possible that something in the Thanks for your time and attention. |
You can check which class creates a logger context by setting the It appears you have at least two logger contexts in your system that use the same configuration, but this should not be a problem, since multiple appenders pointing to the same file will use the same I am not familiar with Glassfish classloader structure. Is it possible that there are two |
@ppkarwasz I did a little playing around, and here's what I found.
No matter what I tried, the two spots in the stack traces would show up, well before any direct configuration or reconfiguration. I can see the sets of debug outputs run in full, from seemingly the same thread, but I suppose different class loaders? This makes me wonder, is this pattern fundamentally incompatible with a scenario that scans classes potentially multiple times on the way to startup deployment? private static final Logger log = LogManager.getLogger(class name replaced.class); I'm wondering if I should use a scenario either via dependency injection or otherwise which does not have these loggers live in static form, to make sure they don't greedily initialize in a classloader before anyone is ready to use them. The thing is, I would not expect any application code to be actively writing in the "older" class loader, so I'm not sure that even if this were all the case, it would matter since it seems like all of these operations (rolling etc) are handled by the logging caller's thread, not anything which runs on some kind of log4j threaded timer, except possibly for configuration file reloads. I just want to mention that this all worked fine for years, and something has seemed to change. It may be that adding the size trigger is just causing more rollovers and that is what is bringing this to the forefront. |
How could we reproduce this? Are you using a simple WAR application or a complex EAR archive? In general, the solution of this problem would be to ensure that |
This was just a simple war deployment. I imagine it would be the same in other app servers but I'm not 100% to be honest.
It's possible that Glassfish is hitting these via different paths for special types of classes like EntityListeners etc, based on those traces above. I will take some time on Monday (out of time today) to try and replicate this in a docker environment with Payara Micro (single jar deployment of war environments) which will make it easier to experiment with the JVM options and the log4j configurations. I want to stress, this usually works well, and the scenario mentioned is rare, but when it happens, that log is broken until restart. I'm still unclear why this happens because I don't think there are competing threads writing logs in different classloaders at the same time. Thanks again for your help with this. |
@ppkarwasz I have a couple new findings, which may make sense to you, but at the very least I'll try a new configuration and see how it goes. I was able to dig in a bit deeper with JVisualVM.
Thoughts? |
I really ought to look deeper into GlassFish. The |
It does look similar. I think with the changes I mentioned i am actually all set. I don’t see the warnings on successful rolling that a file is already open, and I don’t expect the stream closed issue either but I’ll keep an eye out. The tricky part with these EE containers is that they initialize different types of components at different and sometimes unpredictable times. Servlet like used in the web artifact is often last or at least later than you’d want to start logging. This makes relying just on the web artifact workflow a bit underwhelming. I’ll reopen if I see anything else similar, but maybe it’s worth some doc touch ups about not using the system properties in these environments and to mention the funny timing. |
Description
In some cases (this is intermittent) it seems that when a log file (synchronous appender) is being written to by several threads at once, it can get into a situation where newer messages cannot write to the logger anymore, with a stream closed exception.
This has been observed in client sites in Linux and Windows. My log files are from Windows.
This seems to have started happening somewhat recently, and the changes I can track down are:
The overall execution environment has not changed, and there has always been regular traffic to the loggers from multiple threads, and rollovers. At one point there was an issue causing windows rollovers to not work correctly which was fixed with an upgrade from 2.11.1 to 2.13.1.
On the Windows side, we usually keep a tailing program (baretail) running around the clock. This has never caused any issues for us.
An important note, is even when this works correctly on the Windows side, we have warnings printed about not being able to remove files due to access from another process, and it succeeds with some of the fall back rename/truncation/etc types of procedures. This seems to happen regardless of what is happening as far as the service remaining up, baretail remaining open, etc. I expect that these messages will be seen as an indication that something is fundamentally wrong, but these loggers work around the clock, and higher traffic loggers seem to have no issue, even though less-frequently written files (such as this
em-error.log
) file display the issue more often.Configuration
Version: 2.23.1 (but observed in versions possibly from 2.17.x on)
Operating system: Windows Server 2008 R2 64-bit, CentOS 7.x amd64
JDK: Oracle JDK 21.0.x
Logs
Configuration is attached.
log4j.xml.txt
File
server-broken.log
shows the scenario where the stream shows closed from one thread as the other seems to be doing a rollover. After this point, this logger/appender can no longer be used until app restart.server-broken.log
File
server-functional.log
shows a functional rollover scenario. I'm not sure if the threads are as busy in this case, but it does show a functional rollover without more errors subsequently, and I wanted to include it because this is what it shows when it tries to do the zip/rename/delete operations and has to fall back on some other file clearing approaches.server-functional.log
Reproduction
I am not able to reproduce this on demand, but if needed I can try to make a test case with an extreme version of this configuration (low file size rollover, low date change rollover) and hit it with many threads to see if this is reproducible.
The text was updated successfully, but these errors were encountered: