Allow to define both `[de]activation_priority` for a given type #10385

yhabteab · 2025-03-18T14:48:20Z

As the title implies, this PR allows us to optionally define both deactivation_priority and activation_priority for a given type. Currently all config objects have a sort of fixed order in which they are started and stopped, and in both cases we use the existing activation_priority field. Essentially, they're deactivated in the reverse order as they're activated. Below is a startup log snippet with the current master branch, but the output is the same with this PR as well:

[2025-03-18 15:01:48 +0100] information/ConfigItem: Triggering Start signal for config items
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'app' of type 'IcingaApplication' with priority -50
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'fooo' of type 'Zone' with priority 0
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'test!service' of type 'Service' with priority 0
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'icingaadmin' of type 'User' with priority 0
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'mbp-yhabteab' of type 'Endpoint' with priority 0
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'test_event' of type 'EventCommand' with priority 0
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'test' of type 'Host' with priority 0
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'root' of type 'ApiUser' with priority 0
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'send' of type 'CheckCommand' with priority 0
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'test!service!notify' of type 'Notification' with priority 0
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'send' of type 'NotificationCommand' with priority 0
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'api' of type 'ApiListener' with priority 50
[2025-03-18 15:01:48 +0100] information/ApiListener: 'api' started.
[2025-03-18 15:01:48 +0100] information/ApiListener: Started new listener on '[::]:5667'
[2025-03-18 15:01:48 +0100] information/IcingaDB: 'icingadb' started.
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'notification' of type 'NotificationComponent' with priority 200
[2025-03-18 15:01:48 +0100] information/NotificationComponent: 'notification' started.
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'checker' of type 'CheckerComponent' with priority 300
[2025-03-18 15:01:48 +0100] information/CheckerComponent: 'checker' started.
[2025-03-18 15:01:48 +0100] information/ConfigItem: Activated all objects.

In the current implementation, the higher priority an object has, the later it gets activated, and as you can see above, the IcingaDB feature is delayed until all the other objects have been activated, but for some reason still before the NotificationComponent and the CheckerComponent. There's an obvious reason why IcingaDB is being enabled so late, namely on Icinga 2 startup Icinga DB doesn't just hook on to the bunch of object signals, instead it waits until all of them are properly initialised and started and then starts a series of Redis connections to dump them all to Redis without any delay. Now, if we were to let IcingaDB start with an activation priority of -50 as suggested in #10191, it will pretty much be the first thing to get activated and listen and react to pretty much each and every signal from the other objects, making the parallel config dumps essentially useless, i.e. there won't be any config dump at all as the objects will be synchronised sequentially.

However, what's the reason for #10191 to set the activation priority so low, you ask? As I mentioned above, when Icinga 2 is stopped, the objects will be deactivated in reverse order, and as for IcingaDB, it will be the third object to be deactivated. However, there is a potential drawback to this, in that while Icinga DB gets deactivated so early, the other components remain active and capable of generating all kinds of relevant events, which Icinga DB will not be able to process at that point. As you can see, it is impossible to get this to work in both cases using just the activation_priority without tweaking around and introducing some hard coded checks of some specific types.

Fortunately, this PR comes to the rescue 😄! This PR now suggests introducing an additional deactivation_priority field that can optionally be defined by some types if they need to. Even with this PR, any object that doesn't explicitly define a deactivation_priority will behave the same way as with the master branch, namely automatically using its activation_priority for the deactivation process as well.

refs #10191

Al2Klimov

Really? Like, ... are we already out of ideas for this? Aren't these an option?

Stopping checker first (started last) implies we stop checking
Start API (or Endpoints) last (before checker) and stop it first, also imply stopping processing remote events

Just like with recursive mutexes, if you need a stop order different from your start order, you've probably designed something wrong...

yhabteab · 2025-03-19T09:00:33Z

Stopping checker first (started last) implies we stop checking

And what about the queued checks? The global thread pool won't even be stopped/joined before stopping the objects on daemon reload.

icinga2/lib/base/application.cpp

Lines 354 to 361 in 63926c6

    
           	Log(LogInformation, "Application", "Shutting down..."); 
        
           	ConfigObject::StopObjects(); 
        
           	Application::GetInstance()->OnShutdown(); 
        
           #ifdef I2_DEBUG 
        
           	UninitializeBase(); // Inspired from Exit() 
        
           #endif /* I2_DEBUG */

Start API (or Endpoints) last (before checker) and stop it first, also imply stopping processing remote events

Have you personally verified this, or is it merely an assumption on your part? Like did you actually check what ApiListener::Stop() does? I have even written last year what happens if the API stops here #10191 (comment) and #10179 (comment). If you still didn’t get what is #10179 all about, then you should go there and re-read al the threads.

Just like with recursive mutexes, if you need a stop order different from your start order, you've probably designed something wrong...

What am I supposed to do with this pointless analogy? Seriously, comparing mutexes to Icinga 2 objects? Yes, I know there is something wrong that this is supposed to fix, and no, I didn't design anything wrong, but it is what it is, and throwing around such useless analogies is not going to help solve any of it. The customer is suffering from #10179 for some time now and I want to help him get it fixed along with #10191 which has literally been waiting for a reaction from our side for months. I don't mind closing this PR if you have a better alternative that fixes #10179, but we won't get any further with such a vague claim.

Al2Klimov · 2025-03-19T12:20:22Z

Stopping checker first (started last) implies we stop checking

And what about the queued checks? The global thread pool won't even be stopped/joined before stopping the objects on daemon reload.

IMAO we're free to ignore them during reload.

icinga2/lib/base/application.cpp

Lines 354 to 361 in 63926c6

Log(LogInformation, "Application", "Shutting down...");

ConfigObject::StopObjects();

Application::GetInstance()->OnShutdown();

#ifdef I2_DEBUG

UninitializeBase(); // Inspired from Exit()

#endif /* I2_DEBUG */

Start API (or Endpoints) last (before checker) and stop it first, also imply stopping processing remote events

Have you personally verified this, or is it merely an assumption on your part? Like did you actually check what ApiListener::Stop() does?

My #10385 (review) is not about what Icinga already does, but what a future version IMAO shall do.

I don't mind closing this PR if you have a better alternative that fixes #10179,

better alternative that fixes #10179

These are the current APs: #10191 (review)

If we flip them, we get our deactivation prios:

checkercomponent.ti
notificationcomponent.ti
data outputs
apilistener.ti
everything else

I'd activate ApiListener after and deactivate it before anything else ex. checker (8358d2f), so that deactivation changes like this:

checkercomponent.ti, additionally wait for all Checkable#ProcessCheckResult() to finish if they handle local check results (5119115)
apilistener.ti, additionally close all connections wait for all message handlers to finish, incl. Checkable#ProcessCheckResult() for remote CRs (TODO, kinda like 5119115)
notificationcomponent.ti, additionally wait for all notifications to get sent (TODO, kinda like 5119115)
data outputs
everything else

Allow to define both [de]activation_priority for a given type

3a1ecdd

cla-bot bot added the cla/signed label Mar 18, 2025

yhabteab requested a review from julianbrost March 18, 2025 14:48

yhabteab mentioned this pull request Mar 18, 2025

Bugfix: configobject shutdown order #10191

Open

Al2Klimov requested changes Mar 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow to define both `[de]activation_priority` for a given type #10385

Allow to define both `[de]activation_priority` for a given type #10385

yhabteab commented Mar 18, 2025 •

edited

Loading

Al2Klimov left a comment

yhabteab commented Mar 19, 2025

Al2Klimov commented Mar 19, 2025

Allow to define both [de]activation_priority for a given type #10385

Are you sure you want to change the base?

Allow to define both [de]activation_priority for a given type #10385

Conversation

yhabteab commented Mar 18, 2025 • edited Loading

Al2Klimov left a comment

Choose a reason for hiding this comment

yhabteab commented Mar 19, 2025

Al2Klimov commented Mar 19, 2025

better alternative that fixes #10179

Allow to define both `[de]activation_priority` for a given type #10385

Allow to define both `[de]activation_priority` for a given type #10385

yhabteab commented Mar 18, 2025 •

edited

Loading