Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Picture Description in Output #993

Open
rhlarora84 opened this issue Feb 16, 2025 · 3 comments
Open

Picture Description in Output #993

rhlarora84 opened this issue Feb 16, 2025 · 3 comments
Labels
question Further information is requested

Comments

@rhlarora84
Copy link

Question

I am using PictureDescriptionAPIOption to generate a description of the Image. I want to replace the placeholders for Image in the output (export_to_) with the picture description.
Currently there are only 3 ImageRefModel available -> EMBEDDED, PLACEHOLDER, REFERENCED and none of them adds the description of the Image.

Is there a way to do so? I see that they are part of annotations currently.

@rhlarora84 rhlarora84 added the question Further information is requested label Feb 16, 2025
@dolfim-ibm
Copy link
Contributor

We are planning to address this with custom serializers for picture items, i.e. some use cases need the description, others the text which was produced by OCR, other the graph data, etc. Some initial work on this should come in the next days.

@FloMrt
Copy link

FloMrt commented Feb 17, 2025

@dolfim-ibm If i understand correctly, that means that we can combine multiple elements in the output for picture items ?
For exemple have: description (given by a VLM) + text by OCR + reference + ... ?

That can be a very good generic solution !

@dolfim-ibm
Copy link
Contributor

Yes, that is what we would like to allow. It is clear that each use case will need a different output and instead of trying to overload with content we are thinking of making it modular and customizable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants