Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to analyze ZIM content size? #957

Open
benoit74 opened this issue Feb 24, 2025 · 0 comments
Open

How to analyze ZIM content size? #957

benoit74 opened this issue Feb 24, 2025 · 0 comments
Labels

Comments

@benoit74
Copy link

benoit74 commented Feb 24, 2025

For openzim/mwoffliner#2180 I had to analyze the ZIM content.

I did it with python-libzim binding because I'm way more comfortable with it.

The struggle I had (which luckily was not blocker) is that while it is possible to have access to an Item size (uncompressed AFAIK), I did not found any way to get its compressed size. It was hence hard to be 100% sure where the increased ZIM size went from.

Is that mostly normal since there is no such compressed size, because we only compress the cluster, not individual items? Or is it just something which is missing in the binding(s)? Should I have used another tool / zimtool to do this analysis?

At least having a rough estimation of compression factor for every item would help to analyze a bit deeper such situations. Maybe simply exposing clusters, and which cluster is used by which item, and every cluster compression factor (compressed and uncompressed size for instance) would be sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant