Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: llamaindex-cli missing glob pattern? #11798

Open
hemanth opened this issue Mar 10, 2024 · 2 comments · May be fixed by #17904
Open

[Bug]: llamaindex-cli missing glob pattern? #11798

hemanth opened this issue Mar 10, 2024 · 2 comments · May be fixed by #17904
Labels
bug Something isn't working P1

Comments

@hemanth
Copy link

hemanth commented Mar 10, 2024

Bug Description

llamaindex-cli rag --files isn't accepting glob patterns.

Version

0.0.7

Steps to Reproduce

$ llamaindex-cli rag --files node/**/*.js
usage: llamaindex-cli [-h] {rag,download-llamapack,download-llamadataset,upgrade,upgrade-file,new-package} ...
llamaindex-cli: error: unrecognized arguments: node/benchmark/_cli.js node/benchmark/_http-benchmarkers.js 

Relevant Logs/Tracbacks

No response

@hemanth hemanth added bug Something isn't working triage Issue needs to be triaged/prioritized labels Mar 10, 2024
Copy link

dosubot bot commented Mar 10, 2024

Hey @hemanth! 👋 Great to see you back here. Hope you've been doing well!

Thank you for reporting this issue. Based on the information you've provided and the context I have, it seems like the llamaindex-cli rag --files command isn't accepting glob patterns in version 0.0.7 of LlamaIndex due to how the --files argument is handled.

In the handle_cli method, the iglob function from the glob module is used to iterate over the files specified by the --files argument. The files argument is passed directly to iglob without checking if it contains a glob pattern or not. Here's the relevant part of the code:

if files is not None:
    documents = []
    for _file in iglob(files, recursive=True):
        _file = os.path.abspath(_file)
        if os.path.isdir(_file):
            reader = SimpleDirectoryReader(
                input_dir=_file,
                filename_as_id=True,
                file_extractor=self.file_extractor,
            )
        else:
            reader = SimpleDirectoryReader(
                input_files=[_file],
                filename_as_id=True,
                file_extractor=self.file_extractor,
            )

        documents.extend(reader.load_data(show_progress=verbose))

The iglob function is correctly used to iterate over files matching the glob pattern provided in the --files argument. Therefore, if the llamaindex-cli rag --files command isn't accepting glob patterns, the issue might not be with the handling of glob patterns per se but could be related to how the glob patterns are specified by the user or an environment-specific issue that affects the iglob function's ability to resolve the patterns.

I also found a similar closed issue [Bug]: llamaindex-cli unrecognized arguments: - files ./README.md which suggests using the correct command-line argument for files, which is --files or -f, not - files. The correct command should be either llamaindex-cli rag --files "./README.md" or llamaindex-cli rag -f "./README.md".

Unfortunately, I couldn't find any changes in the handle_cli method or the iglob function usage in the LlamaIndex codebase after version 0.0.7.

Could you please provide more information about your environment? Specifically, the operating system and its version, the shell you're using, and the exact command you're running? This will help us better understand the issue and find a solution.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@logan-markewich logan-markewich added P1 and removed triage Issue needs to be triaged/prioritized labels Mar 11, 2024
@hemanth
Copy link
Author

hemanth commented Mar 21, 2024

@logan-markewich I wold like pair on fixing this :)

ericwb added a commit to ericwb/llama_index that referenced this issue Feb 24, 2025
This change modifies the llamaindex-cli such that it can
handle the --files argument to properly handle glob patterns.

In order to handle globs like the example given in the issue,
the number of arguments (nargs) value must be set to + in order
to return a list of files or patterns in this case.

Because argparse now returns a list, so restructuring on how
the files are processed needed to be changed along with the
signature of the function.

Fixes run-llama#11798

Signed-off-by: Eric Brown <[email protected]>
@ericwb ericwb linked a pull request Feb 24, 2025 that will close this issue
15 tasks
ericwb added a commit to ericwb/llama_index that referenced this issue Feb 24, 2025
This change modifies the llamaindex-cli such that it can
handle the --files argument to properly handle glob patterns.

In order to handle globs like the example given in the issue,
the number of arguments (nargs) value must be set to + in order
to return a list of files or patterns in this case.

Because argparse now returns a list, so restructuring on how
the files are processed needed to be changed along with the
signature of the function.

Fixes run-llama#11798

Signed-off-by: Eric Brown <[email protected]>
ericwb added a commit to ericwb/llama_index that referenced this issue Feb 24, 2025
This change modifies the llamaindex-cli such that it can
handle the --files argument to properly handle glob patterns.

In order to handle globs like the example given in the issue,
the number of arguments (nargs) value must be set to + in order
to return a list of files or patterns in this case.

Because argparse now returns a list, so restructuring on how
the files are processed needed to be changed along with the
signature of the function.

Fixes run-llama#11798

Signed-off-by: Eric Brown <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants