Picture Description in Output #993

rhlarora84 · 2025-02-16T15:42:11Z

Question

I am using PictureDescriptionAPIOption to generate a description of the Image. I want to replace the placeholders for Image in the output (export_to_) with the picture description.
Currently there are only 3 ImageRefModel available -> EMBEDDED, PLACEHOLDER, REFERENCED and none of them adds the description of the Image.

Is there a way to do so? I see that they are part of annotations currently.

dolfim-ibm · 2025-02-17T07:05:23Z

We are planning to address this with custom serializers for picture items, i.e. some use cases need the description, others the text which was produced by OCR, other the graph data, etc. Some initial work on this should come in the next days.

FloMrt · 2025-02-17T08:38:51Z

@dolfim-ibm If i understand correctly, that means that we can combine multiple elements in the output for picture items ?
For exemple have: description (given by a VLM) + text by OCR + reference + ... ?

That can be a very good generic solution !

dolfim-ibm · 2025-02-17T09:24:57Z

Yes, that is what we would like to allow. It is clear that each use case will need a different output and instead of trying to overload with content we are thinking of making it modular and customizable.

rhlarora84 added the question Further information is requested label Feb 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Picture Description in Output #993

Picture Description in Output #993

rhlarora84 commented Feb 16, 2025

dolfim-ibm commented Feb 17, 2025

FloMrt commented Feb 17, 2025

dolfim-ibm commented Feb 17, 2025

Picture Description in Output #993

Picture Description in Output #993

Comments

rhlarora84 commented Feb 16, 2025

Question

dolfim-ibm commented Feb 17, 2025

FloMrt commented Feb 17, 2025

dolfim-ibm commented Feb 17, 2025