Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json builder incorrectly caches translation when building multiple languages #13448

Open
bmispelon opened this issue Mar 21, 2025 · 0 comments
Open

Comments

@bmispelon
Copy link

bmispelon commented Mar 21, 2025

Describe the bug

I have a project that uses Sphinx as a library (using sphinx.application.Sphinx) and that builds documentation in several different languages.
We noticed that some translated strings would incorrectly persist between multiple builds. That is to say, building language A then language B would result in documentation B having some strings from language A.
Adding a combination of patch_docutils, docutils_namespace, and _clean_up_global_state helped, but not completely. In particular the string "Python Module Index" would keep whichever translation it had in the first built language.

I believe the cause of this issue is two-fold:

  1. The custom json encoder in sphinxcontrib.serializinghtml was not updated correctly after the refactor in 363cdc0 (lazily translated strings are no longer a subclass of UserString)
  2. The string "Python Module Index" comes from a class-level variable (sphinx.domains.python.PythonModuleIndex.localname) and is therefore cached the first time that module is imported. If a language is active at the time of import, the translation will stay cached even if a different language is activated.

How to Reproduce

Our translation infrastructure is fairly complex, but I managed to narrow down the bug to this testcase:

Click to unfold (80 lines)
from shutil import rmtree
from pathlib import Path
import json
import sys

from sphinx.application import Sphinx

from sphinx.locale import _TranslationProxy
from sphinx.testing.util import _clean_up_global_state
from sphinx.util.docutils import docutils_namespace, patch_docutils
from sphinxcontrib.serializinghtml import jsonimpl

CURRENTDIR = Path(__file__).parent
SOURCEDIR = CURRENTDIR / "docs"
BUILDDIR = CURRENTDIR / "build"


class FixedSphinxJSONEncoder(jsonimpl.SphinxJSONEncoder):
    def default(self, obj):
        # Handle _TranslationProxy correctly
        if isinstance(obj, _TranslationProxy):
            return str(obj)
        return super().default(obj)


def build(lang):
    with patch_docutils(SOURCEDIR), docutils_namespace():
        Sphinx(
            srcdir=SOURCEDIR,
            outdir=BUILDDIR / lang,
            doctreedir=BUILDDIR / lang / ".doctrees",
            confdir=None,
            warning=None,
            status=None,
            buildername="json",
            confoverrides={"language": lang},
        ).build()
    # Clean up global state after building each language.
    _clean_up_global_state()


def test():
    with (BUILDDIR / "en" / "py-modindex.fjson").open() as f:
        data = json.load(f)

    if (s := data["indextitle"]) == "Python Module Index":
        print("Looks like the bug was fixed, yay!")
        sys.exit(0)
    else:
        print(f"Incorrect translation detected: {s}")
        sys.exit(1)


def apply_fix():
    # An early-loading of the module forces `PythonModuleIndex.localname` to
    # be a _TranslationProxy object and not a string
    from sphinx.domains import python

    # Monkeypatch the custom JSON encoder to handle translation proxies correctly
    jsonimpl.SphinxJSONEncoder = FixedSphinxJSONEncoder


if __name__ == "__main__":
    # Clean up source and build dir (from previous runs)
    rmtree(BUILDDIR, ignore_errors=True)
    rmtree(SOURCEDIR, ignore_errors=True)
    SOURCEDIR.mkdir()

    # Create a single .rst file containing a :py:module directive
    (SOURCEDIR / "index.rst").write_text(".. py:module:: sphinxwtf")

    # Uncomment to apply the workaround fix
    # apply_fix()

    for lang in ["fr", "en"]:
        print(f"Building {lang}")
        build(lang)

    test()

This script should be run somewhere where sphinx is installed, or if you have uv you can use this oneliner (assuming you saved the code as script.py):

uv run --with sphinx script.py

(Note that this script will generate two directories docs and build in the current directory)

You can see that the script builds the same document first in French, then in English. Then it loads a json file from the English build and checks whether the "Python Module Index" string is correctly translated. I've included an apply_fix() function that fixes the issue in two steps.

Environment Information

Platform:              linux; (Linux-6.13.7-arch1-1-x86_64-with-glibc2.41)
Python version:        3.13.2 (main, Feb  5 2025, 08:05:21) [GCC 14.2.1 20250128])
Python implementation: CPython
Sphinx version:        8.2.3
Docutils version:      0.21.2
Jinja2 version:        3.1.6
Pygments version:      2.19.1

Sphinx extensions

No extensions are involved in this bug.

Additional context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants