-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental natural language retrievers using duck db #15642
Experimental natural language retrievers using duck db #15642
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @colombod. Do you want to include this in our llama-index-experimental
package or create your own retrievers
integration?
It looks like you might be trying to do the latter but incorporate it into the llama-index-experimental
package, which is probably not the way to go. Specifically if we put it in the experimental package, then we wouldn't need the pyproject.toml
nor would we need llama-index-retrievers-natural-language
subfolder
Him thank you for the comment, not sure I am totally following, my idea was to add it to the |
@colombod Yea as Andrei mentioned, if you want this to be in the experimental package, we can make a folder like |
...ma_index/experimental/retrievers/llama-index-retrievers-natural-language/nl_csv_retriever.py
Outdated
Show resolved
Hide resolved
...x/experimental/retrievers/llama-index-retrievers-natural-language/nl_data_frame_retierver.py
Outdated
Show resolved
Hide resolved
28772db
to
f319e51
Compare
8c6045d
to
5553944
Compare
this more about natural language retrieved more than a duck db one |
e0a1910
to
2377bef
Compare
@logan-markewich what is the issue with the build? can i get some hint / help |
This commit adds the following files: - `llama-index-retrievers-natural-language/__init__.py`: Imports `PandasQueryEngine` and `PandasInstructionParser`. - `llama-index-retrievers-natural-language/BUILD`: Adds Python sources. - `llama-index-retrievers-natural-language/nl_csv_retriever.py`: Defines the `NLCSVRetriever` class, which retrieves data from a CSV file using natural language queries. - `llama-index-retrievers-natural-language/nl_data_frame_retierver.py`: Defines the `NLDataframeRetriever` class, which retrieves data from a pandas DataFrame using natural language queries. - `llama-index-retrievers-natural-language/nl_json_retriever.py`: Defines the `NLJsonRetriever` class, which retrieves data from a JSON file using natural language queries. These retrievers provide capabilities to retrieve data based on natural language queries. They utilize the Llama Index query engine and support querying CSV, JSON, and pandas DataFrames.
This commit adds a new feature to LlamaIndex that enables the use of natural language to retrieve information from Pandas dataframes, CSV files, and JSON objects. Instead of using Python code, this feature utilizes duckDb to perform KQL queries, addressing security concerns when running arbitrary code. The duckDb session is in memory and does not alter the original data. Additionally, the schema is used to generate a description of the dataset and its potential uses. This description and ontology are then used to calculate a ranking score against the query bundle. These changes enhance LlamaIndex's capabilities by providing an alternative approach for retrieving information using natural language queries.
This commit adds a new result ranking prompt to the NLDataframeRetriever class. The prompt allows users to provide a schema and query, and asks them to rate the relevance of the schema in modeling the domain of the query. The relevance must be a number between 0 and 1, where 1 indicates high relevance and 0 indicates low relevance. The significant changes include: - Added DEFAULT_RESULT_RANKING_TMPL constant for the result ranking template - Added DEFAULT_RESULT_RANKING_PROMPTROMPT constant for the result ranking prompt template - Updated NLDataframeRetriever constructor to accept a result_ranking_prompt parameter - Initialized self._result_ranking_prompt with either the provided parameter or the default prompt template - Modified NLDataframeRetriever.complete() method to use self._result_ranking_prompt as part of the LLM completion request These changes allow users of NLDataframeRetriever to easily rank the relevance of schemas in modeling their queries, providing more accurate results.
This commit adds new natural language retrievers to the codebase. The `llama-index-retrievers-natural-language` package has been removed, and a new package called `natrual_language` has been created. Significant changes: - Deleted the `llama-index-retrievers-natural-language` package - Added the `natrual_language` package - Renamed the `BUILD` file from `llama-index-retrievers-natural-language` to `natrual_language` - Renamed the following files from `llama-index-retrievers-natural-language` to `natrual_language`: - nl_csv_retriever.py - nl_data_frame_retierver.py - nl_json_retriever.py - README.md The new natural language retrievers provide capabilities for retrieving data using natural language queries. This change enhances the functionality of the codebase by introducing more flexible and user-friendly retrieval options.
2377bef
to
d86da4f
Compare
@colombod great, it merged! Now just need to cook up an example notebook ;) |
@logan-markewich @colombod Are we still on time to fix the typo in the integration name? That will affect the import paths, it would be nice to fix before this gets spread. |
Description
This pull request adds support for natural language retrievers on top of duckDb.
Compare to other approaches this is using duckDb to perform KQL queries instead of python code. This is important as it addresses security concerns when running arbitrary code. The duckDb session is an in memory one and the original data cannot be altered by the retriever.
The schema is also used to generate a description of the set and what could be used for. The description and ontology are then used to calculate a ranking score against the query bundle.
New Package?
Did I fill in the
tool.llamahub
section in thepyproject.toml
and provide a detailed README.md for my new integration or package?Version Bump?
Did I bump the version in the
pyproject.toml
file of the package I am updating? (Except for thellama-index-core
package)Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Suggested Checklist:
make format; make lint
to appease the lint gods