How to analyze ZIM content size? #957

benoit74 · 2025-02-24T20:42:55Z

For openzim/mwoffliner#2180 I had to analyze the ZIM content.

I did it with python-libzim binding because I'm way more comfortable with it.

The struggle I had (which luckily was not blocker) is that while it is possible to have access to an Item size (uncompressed AFAIK), I did not found any way to get its compressed size. It was hence hard to be 100% sure where the increased ZIM size went from.

Is that mostly normal since there is no such compressed size, because we only compress the cluster, not individual items? Or is it just something which is missing in the binding(s)? Should I have used another tool / zimtool to do this analysis?

At least having a rough estimation of compression factor for every item would help to analyze a bit deeper such situations. Maybe simply exposing clusters, and which cluster is used by which item, and every cluster compression factor (compressed and uncompressed size for instance) would be sufficient.

The text was updated successfully, but these errors were encountered:

benoit74 added the question label Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to analyze ZIM content size? #957

How to analyze ZIM content size? #957

benoit74 commented Feb 24, 2025 •

edited

Loading

How to analyze ZIM content size? #957

How to analyze ZIM content size? #957

Comments

benoit74 commented Feb 24, 2025 • edited Loading

benoit74 commented Feb 24, 2025 •

edited

Loading