Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset image path incorrectly loaded 多模态数据集图像路径错误 #7046

Open
1 task done
SovietLongbow opened this issue Feb 24, 2025 · 0 comments
Open
1 task done
Labels
bug Something isn't working pending This problem is yet to be addressed

Comments

@SovietLongbow
Copy link

Reminder

  • I have read the above rules and searched the existing issues.

System Info

(llama) root@autodl-container-30634997bd-fe761c7a:~# llamafactory-cli env

  • llamafactory version: 0.9.1.dev0
  • Platform: Linux-5.15.0-78-generic-x86_64-with-glibc2.35
  • Python version: 3.11.10
  • PyTorch version: 2.4.1+cu121 (GPU)
  • Transformers version: 4.45.2
  • Datasets version: 2.21.0
  • Accelerate version: 0.34.2
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: NVIDIA GeForce RTX 4090 D

Reproduction

my dataset contains records like this:
{ "messages": [ { "role": "user", "content": "解释这张图片" }, { "role": "assistant", "content": "一个坐在推车上的女孩旁边有两个双手插着口袋的小孩走在室外的道路上" } ], "images": [ "/root/autodl-tmp/image_set/3f040261c9543402c4804c520fbd29bcba5137cf.jpg" ] },

image path was under autodl-tmp folder

however when I start to evaluate using command
llamafactory-cli train \ --stage sft \ --model_name_or_path /root/autodl-tmp/swift/llava-1.5-7b-hf \ --preprocessing_num_workers 16 \ --finetuning_type lora \ --quantization_method bitsandbytes \ --template llava \ --flash_attn auto \ --dataset_dir /root/LLaMA-Factory/data \ --eval_dataset testSet00 \ --cutoff_len 1024 \ --max_samples 100000 \ --per_device_eval_batch_size 2 \ --predict_with_generate True \ --max_new_tokens 512 \ --top_p 0.7 \ --temperature 0.95 \ --output_dir saves/LLaVA-1.5-7B-Chat/lora/eval_2025-02-24-09-11-19 \ --do_predict True

it had a exception of
FileNotFoundError: [Errno 2] No such file or directory: '/root/autodl_tmp/image_set/3f040261c9543402c4804c520fbd29bcba5137cf.jpg'

the program turn a ‘-’ into a‘_' for a reason that i do not understand, causing an error

how can i solve this problem?

ps. dataset_info.json:
{ "trainSet00":{ "file_name": "/root/LLaMA-Factory/data/202502/trainSet00.json", "formatting": "sharegpt", "columns": { "messages": "messages", "images": "images" }, "tags": { "role_tag": "role", "content_tag": "content", "user_tag": "user", "assistant_tag": "assistant" } }, "testSet00":{ "file_name": "/root/LLaMA-Factory/data/202502/testSet00.json", "formatting": "sharegpt", "columns": { "messages": "messages", "images": "images" }, "tags": { "role_tag": "role", "content_tag": "content", "user_tag": "user", "assistant_tag": "assistant" } } }

Others

图像路径确实存在

@SovietLongbow SovietLongbow added bug Something isn't working pending This problem is yet to be addressed labels Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

1 participant