C4D locks up, when opening dialog from MessageData.CoreMessage

Hi Ferdinand,

very kind, thanks 🙂

The click speed, as mentioned in my first post, is not relevant. I can reproduce it here, leaned back, having a cup of coffee between the clicks.
It is actually pretty funny. In S24 I'm no longer able to reproduce it anymore, since I had removed all other plugins, but then re-added them one after one. Problem still gone... and this made me think. Actually you made me think, something I can't tell you how grateful I am for. By now I have a hypothesis of what is going on here. A bit too early to make my incompetence public. But as soon as I have proven my theory right, I'll report back here and explain in detail, to hopefully prevent others from falling into the same self inflicted pit...

I'll be back!

Hi Ferdinand,

I'm sorry. Unfortunately I can not close this thread, yet. I honestly do not know, what's going on.
So, yes, my S24 does not expose the issue currently. This is with the actual plugin under development (not the example provided here) and after removing and re-adding all plugins. I shouldn't have left this my only test. It seems to have lead me into the wrong direction. Yet, I was so happy...

So the removal of plugins lead me to the hypothesis, that I have some interdependency in my own plugins, which may have fixed itself due to perhaps another load order.

I did not see any potential in the various MessageData components, though, but suspected something else:
Yes, here I have to admit (and I am aware, we are not supposed to be doing this), my plugins are separated into multiple submodules. So I am indeed polluting sys.path (and am aslo reloading modules in 'C4DPL_RELOADPYTHONPLUGINS'). To my excuse, with projects of a certain size I see actually no other option. Anyway, my plugins indeed share a bunch of code in various utility modules and I suspected this to be the cause of all my misery.

But by now, I am pretty sure, this is not the case anymore. Even worse, when going back to R21, the problem was there again, immediately. So I started removing plugins there again. And after removing all plugins except the above posted test example, it does still happen. And here it really happens quite fast and reproduceably. Most of the time the first time I click Cancel in the modal requester. During a row of ten consecutive tests, most of the time it happened during the first three click cycles, at most I needed five.

By the way, something I hadn't mentioned before: C4D is not only locked up, but burning on one CPU core, so my educated guess would be C4D is spinning on some spin lock, though of course it may as well be caught in an infinite loop (please lets not discuss, if spin locks belong into the set of endless loops).

Taking into account, that it only happens with a certain chance, it obviously also is some kind of race condition. Which may also explain, why small changes to the "C4D ecosystem" (i.e. removing plugins) are capable of masking the problem. And in the end CPU speeds and number of cores most likely have this potential as well. Here it's running on an quite old Intel Core i7-3930K (6 cores, 12 threads).

Nevertheless, after having spent all day with this issue, I am quite sure, the above posted code is either doing something really sinister, or, if what is does is in the realm of things we are allowed to do, then there is some issue in C4D. Most reproduceable it is in my R21. In S24 I admit, with only above test plugin in the system I can still no longer reproduce this. But I am sure, I saw it in S24 as well before, so maybe due to different runtime characteristics it is just way less likely... who knows. I could as well be wrong though, and in S24 it was a different issue in my code, which got implicitly fixed by now.

During my experiments I have further changed the example code (e.g. removed use of DialogPopup.Message(), added more buttons, to play with different ways to reproduce, removed flags not needed for reproduction, increased button sizes so Manuel could rapid fire them more easily, adcded a second CommandData to host the popup dialog...). So far, I can reproduce it only, with the intermediate popup dialog, regardless, if this popup dialog gets spawned directly from main dialog or by use of the second CommandData to own the popup dialog.

This is the part that worries me most. Could it be, the real issue is not opening the modal dialog from MessageData.CoreMessage(), but there is something wrong with spawning another asynchronous dialog from an asynchronous dialog? But I only noticed this for whatever reason due to increasing the chances for the issue via the modal dialog spawned by the MessageData? Which in the end would mean, I have a completely different and way more serious issue in my plugin lingering around. It just went by unnoticed until this dreaded modal dialog came into play...

Anyway, here's the updated version of the test plugin, just in case somebody still feels motivated to look into the issue:
TestDialogFromMessageData_2.pyp

Sorry to be a nuisance.

Cheers,
Andreas

Edit: One more finding: Once I make the popup dialog modal (which would basically destroy my workflow) the problem seems also no longer reproducable.

Edit: I was shooting too fast. I also got it with a modal popup dialog.

Hello Andreas,

I am as unsure what to do as you are. But I just saw this in your new code, which made me flinch, although I understand the idea behind it.

global g_dlg
g_dlg = DialogRequester()

Where the flinching part is the global keyword, although the idea seems valid, that Python's GC could erroneously collect the dialog, because for example something does not handle the GIL correctly in the C++ backend. And that reminded me that I found it already a bit odd that you had that function for opening the dialog floating around in your first code example. Have you tried attaching the dialog to your MessageData? Something like added at the end of the posting? This seems a bit safer than just saying "eh, something global".

PS: We will probably talk about this tomorrow, and then I will have another look, but I thought this might be worth a shot. I have written my example below "blind", it is only meant to convey an idea. It has not been tested.

Cheers,
Ferdinand

class MessageDataTest(c4d.plugins.MessageData):
    """MessageDataTest implementation that includes a replacement for your
    function OpenModalRequester().

    This is nothing special, I just added the dialog to this class, to
    ensure that we never produce a dangling dialog reference because Python's
    GC did something stupid.
    """

    # This all should not be necessary, since your dialog is modal, i.e, we 
    # should never run into the case that Python's GC does something stupid, 
    # since in your implementation we never leave OpenModalRequester() before 
    # the dialog is being closed. But here we are paranoid and attach the 
    # dialog to the MessageData, which at least judging by your example should 
    # not make a difference a difference for you. But since you have access
    # to this MessageData implementation, you could still open it with 
    #   MessageDataTest._openModalRequester()
    # from the outside if you need to. The reason for this is of course to
    # make the reference counting never go below one for the dialog.
    _requesterDialog = None

    @classmethod
    def _openModalRequester(cls):
        """Handles opening the class bound modal dialog.
        """
        # Don't repeat yourself version of opening the dialog.
        openMe = lambda item: item.Open(dlgtype=c4d.DLG_TYPE_MODAL, 
                                        pluginid=PLUGIN_ID_REQUESTER)

        # The dialog not been instantiated yet.
        if not isinstance(cls._requesterDialog, c4d.gui.GeDialog):
            cls._requesterDialog = DialogRequester()
            openMe(cls._requesterDialog)
        # There is a dialog instance, but it has been closed.
        elif not cls._requesterDialog.IsOpen():
            openMe(cls._requesterDialog)
        # This is being called not on the main thread for some reason, so we 
        # bail. The logic here being that when the modal dialog is still open,
        # we should be still on the main thread.
        else:
            pass

    def CoreMessage(self, id, bc):
        if id == PLUGIN_ID_COREMESSAGE:
            MessageDataTest._openModalRequester()
        return True

Hi Ferdinand,

in the original code this global variable does not exist. I am roughly aware of the intricacies of global variables and try to avoid them as much as our cat avoids water (for those not so familiar with cats: pretty much). It was more a sign of desperation, I added it here today, while experimenting and playing around. In the original code the dialog is actually just held in a local variable of the static global function. As it is modal, the dialog should not be needed anymore after Open() returned. But sure I will test your code as soon as I get back to work tomorrow morning.

Thanks for the suggestion and still thinking about my problem.

Cheers,
Andreas

Good morning Ferdinand,

you guessed perfectly right, here in my plugins DialogRequester is basically a more generic and versatile replacement for c4d.gui.MessageDialog() and c4d.gui.QuestionDialog(). Thus the implementation of a static OpenModalRequester()function. It's used all over the place and so far worked very nicely, until I stumbled about this issue, trying to use it in CoreMessage().

I tested your piece of code, unfortunately as we both expected, it doesn't make a difference. We can probably exclude any fears about Python's cleanup being an issue here.

One question though:
In your _openModalRequester() function, how do you derive not being in main thread, simply from checking if the dialog instance exists or is open? I can't follow your logic in the comment. After all it is more or less a static function. Even if implemented as member function, it could be called everywhere and in arbitrary context. But maybe this comment is to read only in the isolated environment of this example and other cases are simply not considered.

And lastly here is a vastly simplified example. Actually the MessageData is not even needed. Neither the intermediate popup dialog. It is enough to send the event to 'DialogMain.CoreMessage()'. The chances are a definitely lower, but I get C4D to lock up this way as well, need maybe ten to fifteen clicks.

Simplified version:
TestDialogFromMessageData_3.pyp

Cheers,
Andreas

Edit: Actually this last finding (issue also occurring with a CM send directly to the dialog itself) is quite funny. Because in the previous version of the example there's the button sending the CM directly to the MessageData ("Send CM dirctly (no issue)") and with this I am not able to get C4D to lock up. But my assumption now is, actually it has a chance to lock up as well even if ever so small for whatever reason. At least I see not real difference between CoreMessage in dialog or MessageData.

Hello Andreas,

thanks for the updates. It was a bit a shot into the dark with the garbage collection issue. I will not have time to tackle it today, but I have asked Maxime to test if he can reproduce your issue. I will have a look at it tomorrow again with the newer versions. Just for my clarity:

  1. If you understood you correctly, you said you could reproduce this more reliably on R21, right? And although R21 has left the support cycle, this would be of interest for us, since it would give us better chances of reproducing it.
  2. And my current understanding now is that you were also be able to reproduce this on R21 with any further plugins installed, right?
  3. Do you have any third party performance or security tools installed on your system? Examples would be ram managers, firewalls or virus scanners, although the last are probably not going to be an issue. The emphasis lies on third party, i.e., anything provided by Windows, e.g., Windows Defender, does not count.

Cheers,
Ferdinand

Hello Ferdinand,

  1. Yes, I can reproduce it way easier in R21.
  2. Correct. I even removed Redshift. Only above demo plugin is installed and running.
  3. Nope, no "performance tools". Only MSE/Windows Defender and Windows Firewall as protective tools.

Cheers,
Andreas

Edit: And please take your time. No stress. In the end, whatever the outcome of this thread will be, I will need to find some kind of solid workaround anyway. It will need to work in R19 to S24+ and most of these versions will never get any fixes anymore.

Hello Andreas,

sorry for the slight delay. So, I did try again with V3 of your plugin and in R21.116 I had no luck in locking up Cinema 4D. I did massage the Send CM button in short, medium, and long intervals of clicks. I did remove all my other plugins on that Cinema installation, and I did even install a tool to artificially put my machine under load so that Cinema only has a fraction of my RAM and CPU capacity left. All this was however fruitless, I am unable to reproduce this.

Which then got me thinking that I might misunderstand the circumstances that lead to the problem. Below is a screencast of what I was doing the whole time. Could you please confirm that this is what leads to a freeze on your machine?

demo2.gif

edit: We are able to reproduce this now, but it depends heavily on the exact version used. We are not able to reproduce this on S24 at all and not on R21.116 (the version I did use). But it is reproduceable on R21.105 and .202.

edit2: So, I am still unable to reproduce this in my machine. I have tested R21.116, .202 and .208 (the last R21 revision). I also did test S24.108 and .111 again without any luck. The ability to reproduce this seems to be connected the hardware configuration of the machine (I have 32GB Ram and 12GB VRam here). The weird thing is though, that we did test R21.116 on the machine where it did crash on R21.202 and could not make it crash there either.

We have will have to see what we are going to do about this and how far it reaches into revisions of Cinema that are still on the support cycle. Probably not going to be fun to find the root of this. I will post an update here once we have come to a conclusion.

Cheers,
Ferdinand

Hi Ferdinand,

thanks for your extensive testing. I feel a bit sorry for having this brought up. Also I wouldn't have expected it to be that hard and specific to reproduce, because here in my R21 chances for it to occur are not that low. Anyway, I can confirm, you are doing the exact right steps to reproduce it.

As almost always with my work, in the end I need something working on multiple versions of C4D. So during the weekend I restructured some parts, to get those requesters out of CoreMessage() as much as possible. And where this was not possible, I went back to the good old standard requesters (c4d.gui.MessageDialog()). So far, this workaround seems to hold. Fingers crossed.

So, from my side, you can almost consider this closed. It's completely up to Maxon, to judge if this may or may not point to some internal issue. And given the fact, that also on my system it seems way less reproducable in S24 (also here on my system), I can fully understand, that there are other more pressing priorities.

I said "almost considered closed", because I'd like to ask two more questions in the context of this issue:

  1. Probably most important for me: There is nothing inherently wrong with my approach of opening requesters from CoreMessage() (when paying attention to some rules, like using a specific custom message,...)?

  2. This may already be considered off topic, let me know, if I should open a separate thread. Could you provide a bit more info on how the Plugin ID parameter is used by GeDialogs? I mean, the golden rule seems to be, when you open a GeDialog from a CommandData, the CommandData's plugin ID gets passed here. Fine. But what if not? If a GeDialog gets opened from another GeDialog? Or like in my case from a MessageData? Is passing zero ok? Is passing a unique plugin ID, registered only for this purpose ok, even if it is not related to a registered CommandData? One thing C4D seems to use the ID for is to store size information. Which can already be a bit annoying, if like in my case, this dialog is used as a requester with varying content, because once it was opened with some more content, it will remain large, even when displaying less content later on. But ok, that's just cosmetic (though I was thinking to register a bunch of plugin IDs to mitigate this issue). But I could imagine the plugin ID to be used for more serious stuff as well. Like managing contexts or event loops. So long text, short question, can I do harm with this plugin ID?

Cheers,
Andreas

Hey Andreas,

no need to feel sorry. It is valuable for us to be aware of this, as this could cause more serious problems further down the road, even though it is hard to reproduce for now. We have pushed this off to QA for now, due to them having the required tools (hardware) to assess this more thoroughly.

  1. I do not see anything inherently wrong with that. But the only ones who could answer this with complete certainty are the developers who wrote the Cinema 4D core. And until we cannot say with a reasonable degree of certainty that this is a reproduceable bug that we want to address, I will not bother them with this, since there is other "stuff" in front of the queue for them anyway.

  2. I have forked your second part of the question, as I would like to keep this thread clean, as I would anticipate that QA will confirm this bug, it then going to the developers, and we will then report back here. Which will get a bit convoluted when there is a second question being discussed here. The topic can be found here.

I understand that this is not the most satisfying procedure for you, as it will take a bit of time. The issue of yours must go through our bug tracking system now first, rather than taking the shortcut we sometimes offer here, of us talking with the developers and then creating an issue if we decide to do so.

Cheers,
Ferdinand