[plugin] NewsDownloader: make <title> match less greedy (#13070)
Some checks failed
macos / macOS ${{ matrix.image }} ${{ matrix.platform }} 🔨${{ matrix.xcode_version }} 🎯${{ matrix.deployment_target }} (10.15, 13, x86-64, 15.2) (push) Has been cancelled
macos / macOS ${{ matrix.image }} ${{ matrix.platform }} 🔨${{ matrix.xcode_version }} 🎯${{ matrix.deployment_target }} (11.0, 14, ARM64, 15.4) (push) Has been cancelled

On most pages it works fine, but on pages with multiple `</title>` it can match very large amounts of text.

Valid examples include `<svg>` in the HTML, where [`<title>`](https://developer.mozilla.org/en-US/docs/Web/SVG/Element/title) is used similar to `title=""` in HTML, but of course there could also simply be invalid pages.
This commit is contained in:
Frans de Jonge
2025-01-15 10:07:27 +01:00
committed by GitHub
parent 87d1678b02
commit 8314838add

View File

@@ -321,7 +321,7 @@ function EpubDownloadBackend:createEpub(epub_path, html, url, include_images, me
local base_url = socket_url.parse(url)
local cancelled = false
local page_htmltitle = html:match([[<title>(.*)</title>]])
local page_htmltitle = html:match([[<title[^>]*>(.-)</title>]])
logger.dbg("page_htmltitle is ", page_htmltitle)
-- local sections = html.sections -- Wikipedia provided TOC
local bookid = "bookid_placeholder" --string.format("wikipedia_%s_%s_%s", lang, phtml.pageid, phtml.revid)