mirror of
https://github.com/koreader/koreader.git
synced 2025-08-10 00:52:38 +00:00
[plugin] NewsDownloader: make <title> match less greedy (#13070)
Some checks failed
macos / macOS ${{ matrix.image }} ${{ matrix.platform }} 🔨${{ matrix.xcode_version }} 🎯${{ matrix.deployment_target }} (10.15, 13, x86-64, 15.2) (push) Has been cancelled
macos / macOS ${{ matrix.image }} ${{ matrix.platform }} 🔨${{ matrix.xcode_version }} 🎯${{ matrix.deployment_target }} (11.0, 14, ARM64, 15.4) (push) Has been cancelled
Some checks failed
macos / macOS ${{ matrix.image }} ${{ matrix.platform }} 🔨${{ matrix.xcode_version }} 🎯${{ matrix.deployment_target }} (10.15, 13, x86-64, 15.2) (push) Has been cancelled
macos / macOS ${{ matrix.image }} ${{ matrix.platform }} 🔨${{ matrix.xcode_version }} 🎯${{ matrix.deployment_target }} (11.0, 14, ARM64, 15.4) (push) Has been cancelled
On most pages it works fine, but on pages with multiple `</title>` it can match very large amounts of text. Valid examples include `<svg>` in the HTML, where [`<title>`](https://developer.mozilla.org/en-US/docs/Web/SVG/Element/title) is used similar to `title=""` in HTML, but of course there could also simply be invalid pages.
This commit is contained in:
@@ -321,7 +321,7 @@ function EpubDownloadBackend:createEpub(epub_path, html, url, include_images, me
|
||||
local base_url = socket_url.parse(url)
|
||||
|
||||
local cancelled = false
|
||||
local page_htmltitle = html:match([[<title>(.*)</title>]])
|
||||
local page_htmltitle = html:match([[<title[^>]*>(.-)</title>]])
|
||||
logger.dbg("page_htmltitle is ", page_htmltitle)
|
||||
-- local sections = html.sections -- Wikipedia provided TOC
|
||||
local bookid = "bookid_placeholder" --string.format("wikipedia_%s_%s_%s", lang, phtml.pageid, phtml.revid)
|
||||
|
||||
Reference in New Issue
Block a user