Extract PDF images text using custom LLM model and placing image doc in proper order of pdf #1049
Unanswered
Navanit-git
asked this question in
Q&A
Replies: 1 comment 5 replies
-
I think you are doing what the picture description option allows you to do. See https://ds4sd.github.io/docling/examples/pictures_description/. You will be able to define the the vision model you prefer. |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I am working on extracting the text and image from the pdf.
This is the pdf I am using link
In this I have used below code
and this is using my 4gb GPU.
Parallely I am using vlm model too for the images
and this is taking around 16gb of GPU. Is there a way to combine both of these to get the image details in the place of image placeholder in md file.
Also if you view the pdf page three I am getting md file response like this
So how should I do that the A image gets down with A doc and lastly instead of image I want that the image description from the vlm model, without overusing the GPU.
Beta Was this translation helpful? Give feedback.
All reactions