Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to define both [de]activation_priority for a given type #10385

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yhabteab
Copy link
Member

@yhabteab yhabteab commented Mar 18, 2025

As the title implies, this PR allows us to optionally define both deactivation_priority and activation_priority for a given type. Currently all config objects have a sort of fixed order in which they are started and stopped, and in both cases we use the existing activation_priority field. Essentially, they're deactivated in the reverse order as they're activated. Below is a startup log snippet with the current master branch, but the output is the same with this PR as well:

[2025-03-18 15:01:48 +0100] information/ConfigItem: Triggering Start signal for config items
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'app' of type 'IcingaApplication' with priority -50
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'fooo' of type 'Zone' with priority 0
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'test!service' of type 'Service' with priority 0
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'icingaadmin' of type 'User' with priority 0
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'mbp-yhabteab' of type 'Endpoint' with priority 0
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'test_event' of type 'EventCommand' with priority 0
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'test' of type 'Host' with priority 0
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'root' of type 'ApiUser' with priority 0
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'send' of type 'CheckCommand' with priority 0
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'test!service!notify' of type 'Notification' with priority 0
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'send' of type 'NotificationCommand' with priority 0
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'api' of type 'ApiListener' with priority 50
[2025-03-18 15:01:48 +0100] information/ApiListener: 'api' started.
[2025-03-18 15:01:48 +0100] information/ApiListener: Started new listener on '[::]:5667'
[2025-03-18 15:01:48 +0100] information/IcingaDB: 'icingadb' started.
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'notification' of type 'NotificationComponent' with priority 200
[2025-03-18 15:01:48 +0100] information/NotificationComponent: 'notification' started.
[2025-03-18 15:01:48 +0100] debug/ConfigItem: Activating object 'checker' of type 'CheckerComponent' with priority 300
[2025-03-18 15:01:48 +0100] information/CheckerComponent: 'checker' started.
[2025-03-18 15:01:48 +0100] information/ConfigItem: Activated all objects.

In the current implementation, the higher priority an object has, the later it gets activated, and as you can see above, the IcingaDB feature is delayed until all the other objects have been activated, but for some reason still before the NotificationComponent and the CheckerComponent. There's an obvious reason why IcingaDB is being enabled so late, namely on Icinga 2 startup Icinga DB doesn't just hook on to the bunch of object signals, instead it waits until all of them are properly initialised and started and then starts a series of Redis connections to dump them all to Redis without any delay. Now, if we were to let IcingaDB start with an activation priority of -50 as suggested in #10191, it will pretty much be the first thing to get activated and listen and react to pretty much each and every signal from the other objects, making the parallel config dumps essentially useless, i.e. there won't be any config dump at all as the objects will be synchronised sequentially.

However, what's the reason for #10191 to set the activation priority so low, you ask? As I mentioned above, when Icinga 2 is stopped, the objects will be deactivated in reverse order, and as for IcingaDB, it will be the third object to be deactivated. However, there is a potential drawback to this, in that while Icinga DB gets deactivated so early, the other components remain active and capable of generating all kinds of relevant events, which Icinga DB will not be able to process at that point. As you can see, it is impossible to get this to work in both cases using just the activation_priority without tweaking around and introducing some hard coded checks of some specific types.

Fortunately, this PR comes to the rescue 😄! This PR now suggests introducing an additional deactivation_priority field that can optionally be defined by some types if they need to. Even with this PR, any object that doesn't explicitly define a deactivation_priority will behave the same way as with the master branch, namely automatically using its activation_priority for the deactivation process as well.

refs #10191

Copy link
Member

@Al2Klimov Al2Klimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really? Like, ... are we already out of ideas for this? Aren't these an option?

  • Stopping checker first (started last) implies we stop checking
  • Start API (or Endpoints) last (before checker) and stop it first, also imply stopping processing remote events

Just like with recursive mutexes, if you need a stop order different from your start order, you've probably designed something wrong...

@yhabteab
Copy link
Member Author

  • Stopping checker first (started last) implies we stop checking

And what about the queued checks? The global thread pool won't even be stopped/joined before stopping the objects on daemon reload.

Log(LogInformation, "Application", "Shutting down...");
ConfigObject::StopObjects();
Application::GetInstance()->OnShutdown();
#ifdef I2_DEBUG
UninitializeBase(); // Inspired from Exit()
#endif /* I2_DEBUG */

  • Start API (or Endpoints) last (before checker) and stop it first, also imply stopping processing remote events

Have you personally verified this, or is it merely an assumption on your part? Like did you actually check what ApiListener::Stop() does? I have even written last year what happens if the API stops here #10191 (comment) and #10179 (comment). If you still didn’t get what is #10179 all about, then you should go there and re-read al the threads.

Just like with recursive mutexes, if you need a stop order different from your start order, you've probably designed something wrong...

What am I supposed to do with this pointless analogy? Seriously, comparing mutexes to Icinga 2 objects? Yes, I know there is something wrong that this is supposed to fix, and no, I didn't design anything wrong, but it is what it is, and throwing around such useless analogies is not going to help solve any of it. The customer is suffering from #10179 for some time now and I want to help him get it fixed along with #10191 which has literally been waiting for a reaction from our side for months. I don't mind closing this PR if you have a better alternative that fixes #10179, but we won't get any further with such a vague claim.

@Al2Klimov
Copy link
Member

  • Stopping checker first (started last) implies we stop checking

And what about the queued checks? The global thread pool won't even be stopped/joined before stopping the objects on daemon reload.

IMAO we're free to ignore them during reload.

Log(LogInformation, "Application", "Shutting down...");
ConfigObject::StopObjects();
Application::GetInstance()->OnShutdown();
#ifdef I2_DEBUG
UninitializeBase(); // Inspired from Exit()
#endif /* I2_DEBUG */

  • Start API (or Endpoints) last (before checker) and stop it first, also imply stopping processing remote events

Have you personally verified this, or is it merely an assumption on your part? Like did you actually check what ApiListener::Stop() does?

My #10385 (review) is not about what Icinga already does, but what a future version IMAO shall do.

I don't mind closing this PR if you have a better alternative that fixes #10179,

better alternative that fixes #10179

These are the current APs: #10191 (review)

If we flip them, we get our deactivation prios:

  1. checkercomponent.ti
  2. notificationcomponent.ti
  3. data outputs
  4. apilistener.ti
  5. everything else

I'd activate ApiListener after and deactivate it before anything else ex. checker (8358d2f), so that deactivation changes like this:

  1. checkercomponent.ti, additionally wait for all Checkable#ProcessCheckResult() to finish if they handle local check results (5119115)
  2. apilistener.ti, additionally close all connections wait for all message handlers to finish, incl. Checkable#ProcessCheckResult() for remote CRs (TODO, kinda like 5119115)
  3. notificationcomponent.ti, additionally wait for all notifications to get sent (TODO, kinda like 5119115)
  4. data outputs
  5. everything else

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants