-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does some RS check whether each resource is listed in the Package Document ? #810
Comments
This requirement was previously buried in OCF. See issue #626. I'm not terribly interested in what happens to invalid epubs. We allow resources to be transported in the container that aren't for use in any publication, so technically the instruction would be to not unpack the image so it's not available at all, just like the other "stuff". Whether this requirement is enforceable depends on whether there is active processing of the content. I'm hesitant to remove it, as without it it creates a situation where you could build out functionality by ignoring the manifest. |
@murata0204 is there a change you want to propose here? |
I suppose that this requirement is enforcable by epubcheck, but I don't believe that RSs bother to check it. If few RSs don't check it, future conformance testing will be hampered. I would like to replace MUST by SHOULD. |
It would seem to invalidate certain reading systems, like ibooks, that use their own configuration files, too. I'm still not sure if the rendition mapping document is used in the processing of a rendition or fits in some magical space above. And while ideal that a reading system will only unpack the resources listed in the manifest, that's not even enough. The check would also have to happen at rendering, as it's easy to dream up a scenario where a resource is legitimately available but not valid (a two rendition publication where both reference the same image but one misses the manifest entry). I had suggested we change to "It must not depend on any resources not listed in..." but then removed the comment. Maybe it is worth considering, since isn't the intent more that the RS must not fail to process a package that doesn't have a proprietary file of some sort. (I don't like that it validates proprietary implementations, but that bridge has already been crossed.) Otherwise, I guess a "should" at least makes it more realistic. |
"should" sounds good to me... |
Closing this issue as we agreed to no change. My memory fails me if it was only a deferral until after the draft, though, so reopen if I have this wrong. |
Reopening because I wrote a test for Makoto's scenario, and it is not interoperable. Thorium does not display an image that is not in the manifest, but iBooks, ADE, and Calibre do. I expect most commercial reading systems would show the image, for fear of breaking content. The language is a challenge:
What do we mean by "processing" or by "use any resources." The old com.apple.ibook.display-options.xml file certainly can influence the rendering of an EPUB. But is it a resource? Is the activity of reading that file (not mentioned in the package file) part of processing of the Package? |
Helicon Books has a packaging process that will not allow any resource that
is not listed in the package to be inside the EPUB.
Helicon Books reading application (HBreader) will also not display any
image that is not listed in manifest.
…--
Ori Idan CEO Helicon Books
http://www.heliconbooks.com
On Fri, Jan 8, 2021 at 11:32 PM Dave Cramer ***@***.***> wrote:
Reopening because I wrote a test for Makoto's scenario, and it is not
interoperable. Thorium does not display an image that is not in the
manifest, but iBooks, ADE, and Calibre do. I expect most commercial reading
systems would show the image, for fear of breaking content.
The language is a challenge:
It MUST NOT use any resources not listed in the Package Document in the
processing of the Package.
What do we mean by "processing" or by "use any resources." The old
com.apple.ibook.display-options.xml file certainly can influence the
rendering of an EPUB. But is it a resource? Is the activity of reading that
file (not mentioned in the package file) part of processing of the Package?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#810 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAB43QE4V6JTKBMMIJJ4VR3SY52WJANCNFSM4CMZE4FQ>
.
|
To be very pragmatic: this means that, under the current spec, iBooks or Calibre would not be conformant implementations, and I do not think it would by in anybody's interest to get there. A not-too-distant analogy is the HTML content vs. browser behaviour. We know that browsers do accept invalid content, and they do something reasonable with it. Similarly, while a conformant EPUB 3.3 MUST list all those resources, I think it would be perfectly fine for a RS to say that it SHOULD NOT use any of those resources. I think this type of approach should be valid for the RS-s in general, i.e., we may want to go through the RS requirements and see where similar situations occur. With the current separation of content vs. RS into two documents this may become much easier. |
There are clearly files not listed in the package document that are needed for processing (the container file), so the restriction is confusing (it also only appeared in 3.1 when we introduced packages). I believe we already give the package document logic precedence elsewhere, so it's not really a loophole to fork the standard. Is there a security reason why we need to state anything about resources not listed in the package document, though, otherwise what is the end goal of this restriction whether required/recommended? I'm not sure why unlisted resources are any less secure than listed ones, as if you can sneak a malicious file into the container and modify a content document to reference it, I can't imagine it's that hard to modify the package document, too. Accidental omission seems like the more likely cause of unlisted resources, plus the possibility of multiple renditions (but again not sure why this matters for processing a specific package document). I'm not even sure this belongs in the package document section, since resources not listed in the manifest by definition are external to any processing of the package document. It also intersects with OCF processing in a weird way. Plus there are publication resources that could be outside the container, so it can get confusing with any web-linked content. We probably need to go back to basics on this one and figure out exactly what it is we're trying to prevent and why, in other words. |
It's not about security imo, it is about interoperability. RS developers should find in the spec guidance about the standard behaviors expected by the community. Therefore, if an XHTML page contains an image which is not referenced in the package (certainly an omission), all RSes should behave the same. Note: It would be also good to allow files in the zip that are not listed in the package document, because zip archives may then contain mixed formats (for instance, an additional JSON manifest which should be left unknown from the EPUB machinery). |
But I don't see that we'll ever get interoperability if we only recommend practices. It sounds like this change justifies what everyone is doing, not moving all reading systems to provide the same experience. (Everyone passes a recommendation, in other words.) If there aren't security issues, and we want the same experience regardless of RS, it seems like the requirement should be to not ignore resources needed to render the publication regardless of whether they are listed in the manifest. But even that seems complicated if some reading systems won't unpack resources not listed in the manifest. We need a lot more depth in terms of processing logic (especially since unzipping is not required). |
I must disagree and even oppose to such requirement (for reasons related to the architecture of Readium software, which must internally list resources which are served to the internal browser engine). If the EPUB spec forbids having resources used in content but not listed in package document, the spec authors cannot force RSes to handle such ghost resources.
One must be pragmatic: EPUB defines a file format and RSes have been developed without a precisely defined processing model so far. It is too late to create fully constraining processing models now. In most cases, a proper set of best practices is the best we can do today. |
That's the big problem we face in any efforts at interoperability. And to be clear, I don't think interoperability is a realistic outcome exactly because it's far too late in the life of EPUB 3. If that's our goal, then we can't have recommendations. I'm not proposing it's the solution here. I'm still more interested in finding out what the current requirement is hoping to achieve before we try to rewrite it:
|
Perhaps there's a way to resolve this. I believe it's entirely reasonable for a reading system to display, for example, an image linked from a content document that's not listed in the manifest. I also think it's entirely reasonable for the core spec to require all resources to be listed in the manifest. How about we keep this as an authoring requirement in the core (and in EPUBCheck) and remove this restriction in the RS spec? |
+1 to Dave. |
This sounds reasonable to me, but an attentive content creator may very well ask the question of "If a Reading System does not check/care about this, why am I obligated to add a complete list of resources in my package file?" We should have a generic description why we have these, what is the reason, etc, the content creator should really follow the spec (and not only for fear of epubcheck...) (I must admit I do not have a clear answer either.) |
Hello, from the standpoint of Readium implementations, there is an expectation that publication resources are properly declared in the manifest, because each individual asset can be associated with additional properties defined at authoring time (i.e. beyond the mere fact of being present in the directory of the zip container, or on the filesystem in the case of exploded / unzipped publications). Most notably, publication resources can be obfuscated or encrypted. This kind of "meta" information is not intrinsic to the assets themselves, this requires additional authored data. For this reason, a typical publication server in Readium implementations makes no attempt to fetch "local" resources that cannot be found / are not declared in the publication manifest (I used the term "local" in contrast with "remote" HTTP resources that completely bypass the locally-instantiated publication server). Technically-speaking, it would be possible to refactor existing Readium implementations to include a fallback to the zip directory of the publication container (or the filesystem in the case of exploded / unzipped publications), whenever a referenced resource cannot be found in the publication manifest. However, personally I like to think of a "publication" as a well-defined / bounded set of resources, even if this requires more effort at authoring stage to produce the exhaustive list of referenced assets (i.e. publication manifest). |
First of all Colibrio supports loading resources that are not listed in the OPF manifest. We use EPUB OCF's (the zip file's) central directory as our "canonical manifest" as we can always trust that this is complete (which is almost never the case for the OPF). The OPF manifest we treat more like additional resource metadata for the EPUB Publication context. So I think this may be a helpful way to think about the future role of the OPF manifest. We can re-define it's usage to be a collection of extra metadata for publication resources. And use the OCF as the "real", complete manifest, which is something that we get out of the box anyway. For now though, until we decide otherwise, we should REQUIRE all publication resources to be listed in the manifest, and in the Reading System document we should tell implementors to handle exceptions by loading the resources anyway, or to degrade gracefully. PS. I am really a sucker for the manifest and am very for keeping it complete. |
I think we should keep it as it is today that authors MUST include all
resources in the manifest and an RS MUST use only resources that are in the
manifest.
We can however rephrase it for RS requirements to SHOULD NOT use resources
that are not in the manifest.
…On Tue, Jan 12, 2021 at 12:54 PM Lars Wallin ***@***.***> wrote:
First of all Colibrio supports loading resources that are not listed in
the OPF manifest.
We use EPUB OCF's (the zip files) central directory as our "canonical
manifest" as we can always trust that this is complete (which is almost
never the case for the OPF). The OPF manifest we treat more like additional
resource metadata for the EPUB Publication context.
So I think this may be a helpful way to think about the future role of the
OPF manifest. We can re-define it's usage to be a collection of extra
metadata for publication resources. And use the OCF as the "real", complete
manifest, which is something that we get out of the box anyway.
For now though, until we decide otherwise, we should REQUIRE all
publication resources to be listed in the manifest, and in the Reading
System document we should tell implementors to handle exceptions or to
degrade gracefully.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#810 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAB43QGF6XBOKUBS3V6ZBDTSZQS4RANCNFSM4CMZE4FQ>
.
|
"SHOULD NOT" would be too strict in the RS document I think. This will break many existing publications. |
Currently it is MUST NOT and that started all this discussion.
So in order to not break current implementations of RS, we can reduce it to
SHOULD NOT.
We should I think keep the requirement for packaging as it is today (MUST).
…On Tue, Jan 12, 2021 at 1:04 PM Lars Wallin ***@***.***> wrote:
We can however rephrase it for RS requirements to SHOULD NOT use resources
that are not in the manifest.
"SHOULD NOT" would be to strict in the RS document I think. This will
break *many* existing publications.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#810 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAB43QGDRZELK6X22PTQPV3SZQUDHANCNFSM4CMZE4FQ>
.
|
Further to the good points that @danielweck has made, I'd add another thing unlisted resources does is provide a means of circumventing core media type rules and fallbacks, as well as requirements on where resources are hosted. Authoring requirements are a good start, but they're also brittle as all an author has to do is ignore them depending on how they distribute the content. I wonder, similar to what @iherman says, if additional context would help here on both sides. For example, the RS requirement might become:
Similarly, on the authoring side we can note that these are the reasons why authors need to make sure the manifest is complete (i.e., to ensure complete rendering). |
@iherman : .... the content creator should really follow the spec (and not only for fear of epubcheck...) I'm creating content that meets the specification. And then I verify my EPUB3 file with EPUBCheck. The approval I received after this control process means that I am a content creator who has made a package according to specification and has technical reliability. To ensure "interoperability" understanding of my content with RS:
I have a suggestion for "interoperability" (between the creator and RS) and that this or other similar behavior can occur. Permission to Modify Content: The creator can allow RS behavior in the manifest (OPF). With this new attribute, the creator knows that its permission will be taken into account by RS. This desire of the content creator will also not be perceived as the stringency of the EPUB3 specifications (MUST). |
What we could do is to suggest that Reading Systems show a user facing warning, or a confirmation when a content document requests a resource that is not listed in the manifest. This would allow the RS to "fix the quirk", but only if granted explicit permission from the user. I have suggested a similar thing for unlisted remote resources before. |
If an HTML content document references an image file that is not listed in the current package document, does some RS refuse to handle it?
2.2 Reading System Conformance
The text was updated successfully, but these errors were encountered: