Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: document processing interface #419

Merged
merged 38 commits into from
Mar 25, 2025
Merged

Conversation

micpst
Copy link
Collaborator

@micpst micpst commented Mar 18, 2025

closes #402

This PR focuses solely on reducing the complexity and unifying the interface for the document ingest api. The unstructured parser has been heavily refactored to make better use of the lib api. Also, intermediate handlers operate now on raw elements rather than intermediate representations, after a few iterations this abstraction seems no longer needed.

The docs will be updated in the next PRs, they require heavy rewrite and I don't want to make this PR >2k lines, for now I updated the api reference since it was straightforward.

Copy link
Contributor

github-actions bot commented Mar 18, 2025

badge

Code Coverage Summary

Filename                                                                                                        Stmts    Miss  Cover    Missing
------------------------------------------------------------------------------------------------------------  -------  ------  -------  ---------------------------------------------------------------
packages/ragbits-cli/src/ragbits/cli/__init__.py                                                                   31       4  87.10%   73-74, 81-82
packages/ragbits-cli/src/ragbits/cli/_utils.py                                                                     23       4  82.61%   47, 65-67
packages/ragbits-cli/src/ragbits/cli/state.py                                                                      79       3  96.20%   50-51, 61
packages/ragbits-cli/tests/unit/test_state.py                                                                      72       2  97.22%   103-104
packages/ragbits-conversations/src/ragbits/conversations/__init__.py                                                0       0  100.00%
packages/ragbits-conversations/src/ragbits/conversations/history/__init__.py                                        0       0  100.00%
packages/ragbits-conversations/src/ragbits/conversations/history/compressors/__init__.py                            3       0  100.00%
packages/ragbits-conversations/src/ragbits/conversations/history/compressors/base.py                               10       0  100.00%
packages/ragbits-conversations/src/ragbits/conversations/history/compressors/llm.py                                29       1  96.55%   79
packages/ragbits-conversations/src/ragbits/conversations/history/stores/__init__.py                                 3       0  100.00%
packages/ragbits-conversations/src/ragbits/conversations/history/stores/base.py                                    17       0  100.00%
packages/ragbits-conversations/src/ragbits/conversations/history/stores/sql.py                                     66       1  98.48%   128
packages/ragbits-conversations/tests/unit/history/test_llm_compressor.py                                           64       0  100.00%
packages/ragbits-conversations/tests/unit/history/test_sql_store.py                                                48       2  95.83%   29-30
packages/ragbits-core/src/ragbits/core/__init__.py                                                                  6       2  66.67%   8-9
packages/ragbits-core/src/ragbits/core/cli.py                                                                       6       0  100.00%
packages/ragbits-core/src/ragbits/core/config.py                                                                   17       0  100.00%
packages/ragbits-core/src/ragbits/core/options.py                                                                  17       0  100.00%
packages/ragbits-core/src/ragbits/core/types.py                                                                     9       0  100.00%
packages/ragbits-core/src/ragbits/core/audit/__init__.py                                                           74       5  93.24%   42-45, 52-54
packages/ragbits-core/src/ragbits/core/audit/base.py                                                              183      35  80.87%   156-165, 249, 256, 262-264, 271-274, 335, 337, 341-345, 390-409
packages/ragbits-core/src/ragbits/core/audit/cli.py                                                               132       2  98.48%   91-92
packages/ragbits-core/src/ragbits/core/embeddings/__init__.py                                                       5       0  100.00%
packages/ragbits-core/src/ragbits/core/embeddings/base.py                                                          16       2  87.50%   40, 53
packages/ragbits-core/src/ragbits/core/embeddings/exceptions.py                                                    17       7  58.82%   7-8, 17, 26-27, 36, 45
packages/ragbits-core/src/ragbits/core/embeddings/litellm.py                                                       38      18  52.63%   78-112
packages/ragbits-core/src/ragbits/core/embeddings/noop.py                                                          29       1  96.55%   89
packages/ragbits-core/src/ragbits/core/embeddings/sparse.py                                                        58      29  50.00%   24-25, 28, 67-97
packages/ragbits-core/src/ragbits/core/llms/__init__.py                                                             4       0  100.00%
packages/ragbits-core/src/ragbits/core/llms/base.py                                                                80       2  97.50%   67, 79
packages/ragbits-core/src/ragbits/core/llms/exceptions.py                                                          20       5  75.00%   17, 26-27, 36, 45
packages/ragbits-core/src/ragbits/core/llms/factory.py                                                             12       2  83.33%   30, 51
packages/ragbits-core/src/ragbits/core/llms/litellm.py                                                             83      22  73.49%   93, 132, 142, 172-200, 222-227, 238
packages/ragbits-core/src/ragbits/core/llms/local.py                                                               55      27  50.91%   9-12, 67-75, 87-88, 109-120, 141-157
packages/ragbits-core/src/ragbits/core/llms/mock.py                                                                30       3  90.00%   70-73
packages/ragbits-core/src/ragbits/core/prompt/__init__.py                                                           2       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/_cli.py                                                              44      21  52.27%   25-33, 47-49, 63-65, 73-75, 89-97
packages/ragbits-core/src/ragbits/core/prompt/base.py                                                              28       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/exceptions.py                                                         7       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/parsers.py                                                           35       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/prompt.py                                                           147       1  99.32%   202
packages/ragbits-core/src/ragbits/core/prompt/discovery/__init__.py                                                 2       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/discovery/prompt_discovery.py                                        36       2  94.44%   55-56
packages/ragbits-core/src/ragbits/core/utils/__init__.py                                                            0       0  100.00%
packages/ragbits-core/src/ragbits/core/utils/_pyproject.py                                                         38       1  97.37%   113
packages/ragbits-core/src/ragbits/core/utils/config_handling.py                                                    72       8  88.89%   16, 54-55, 62-63, 152-154
packages/ragbits-core/src/ragbits/core/utils/decorators.py                                                         29       0  100.00%
packages/ragbits-core/src/ragbits/core/utils/dict_transformations.py                                               72       3  95.83%   24, 27, 108
packages/ragbits-core/src/ragbits/core/utils/pydantic.py                                                           13       2  84.62%   13, 16
packages/ragbits-core/src/ragbits/core/vector_stores/__init__.py                                                    3       0  100.00%
packages/ragbits-core/src/ragbits/core/vector_stores/_cli.py                                                       50       4  92.00%   67, 89, 95, 119
packages/ragbits-core/src/ragbits/core/vector_stores/base.py                                                       70       2  97.14%   39, 182
packages/ragbits-core/src/ragbits/core/vector_stores/chroma.py                                                     75       2  97.33%   68, 106
packages/ragbits-core/src/ragbits/core/vector_stores/hybrid.py                                                     44       0  100.00%
packages/ragbits-core/src/ragbits/core/vector_stores/in_memory.py                                                  47       0  100.00%
packages/ragbits-core/src/ragbits/core/vector_stores/pgvector.py                                                  137      26  81.02%   103, 205-213, 244, 257-259, 330-365
packages/ragbits-core/src/ragbits/core/vector_stores/qdrant.py                                                     72       4  94.44%   71-86, 134
packages/ragbits-core/tests/cli/__init__.py                                                                         0       0  100.00%
packages/ragbits-core/tests/cli/test_cli_trace_handler.py                                                          48       3  93.75%   30, 43, 56
packages/ragbits-core/tests/cli/test_vector_store.py                                                              115       0  100.00%
packages/ragbits-core/tests/integration/vector_stores/test_vector_store.py                                         94       0  100.00%
packages/ragbits-core/tests/unit/__init__.py                                                                        0       0  100.00%
packages/ragbits-core/tests/unit/test_options.py                                                                   21       0  100.00%
packages/ragbits-core/tests/unit/audit/__init__.py                                                                  0       0  100.00%
packages/ragbits-core/tests/unit/audit/test_cli.py                                                                107       0  100.00%
packages/ragbits-core/tests/unit/audit/test_trace.py                                                               97       3  96.91%   16, 19, 22
packages/ragbits-core/tests/unit/embeddings/__init__.py                                                             0       0  100.00%
packages/ragbits-core/tests/unit/embeddings/test_from_config.py                                                    20       0  100.00%
packages/ragbits-core/tests/unit/llms/__init__.py                                                                   0       0  100.00%
packages/ragbits-core/tests/unit/llms/test_base.py                                                                 98       0  100.00%
packages/ragbits-core/tests/unit/llms/test_from_config.py                                                          16       0  100.00%
packages/ragbits-core/tests/unit/llms/test_litellm.py                                                              80       0  100.00%
packages/ragbits-core/tests/unit/llms/factory/__init__.py                                                           0       0  100.00%
packages/ragbits-core/tests/unit/llms/factory/test_get_preferred_llm.py                                            12       0  100.00%
packages/ragbits-core/tests/unit/prompts/__init__.py                                                                0       0  100.00%
packages/ragbits-core/tests/unit/prompts/test_parsers.py                                                           65       0  100.00%
packages/ragbits-core/tests/unit/prompts/test_prompt.py                                                           206       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/__init__.py                                                      0       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/prompt_classes_for_tests.py                                     30       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/test_prompt_discovery.py                                        18       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/ragbits_tests_pkg_with_prompts/__init__.py                       2       1  50.00%   3
packages/ragbits-core/tests/unit/prompts/discovery/ragbits_tests_pkg_with_prompts/prompts/__init__.py               3       2  33.33%   2-4
packages/ragbits-core/tests/unit/prompts/discovery/ragbits_tests_pkg_with_prompts/prompts/temp_prompt1.py          14       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/ragbits_tests_pkg_with_prompts/prompts/temp_prompt2.py          14       0  100.00%
packages/ragbits-core/tests/unit/utils/__init__.py                                                                  0       0  100.00%
packages/ragbits-core/tests/unit/utils/test_config_handling.py                                                     64       2  96.88%   27-28
packages/ragbits-core/tests/unit/utils/test_decorators.py                                                          26       2  92.31%   17, 39
packages/ragbits-core/tests/unit/utils/test_dict_transformations.py                                                69       0  100.00%
packages/ragbits-core/tests/unit/utils/pyproject/test_find.py                                                      13       0  100.00%
packages/ragbits-core/tests/unit/utils/pyproject/test_get_config.py                                                 9       0  100.00%
packages/ragbits-core/tests/unit/utils/pyproject/test_get_instace.py                                               37       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/__init__.py                                                          0       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_chroma.py                                                      65       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_from_config.py                                                 40       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_hybrid.py                                                      74       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_in_memory.py                                                  102       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_pgvector.py                                                   164       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_qdrant.py                                                      78       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/__init__.py                                            2       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/_main.py                                              92       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/cli.py                                                39       2  94.87%   85, 104
packages/ragbits-document-search/src/ragbits/document_search/documents/__init__.py                                  0       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/documents/document.py                                 66       1  98.48%   46
packages/ragbits-document-search/src/ragbits/document_search/documents/element.py                                  83      14  83.13%   96, 113, 174-182, 192, 201-203
packages/ragbits-document-search/src/ragbits/document_search/documents/exceptions.py                               16       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/documents/sources/__init__.py                          9       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/documents/sources/azure.py                            97      13  86.60%   73-74, 109-110, 181-192
packages/ragbits-document-search/src/ragbits/document_search/documents/sources/base.py                             64       4  93.75%   166-167, 170-171
packages/ragbits-document-search/src/ragbits/document_search/documents/sources/gcs.py                              65       1  98.46%   45
packages/ragbits-document-search/src/ragbits/document_search/documents/sources/git.py                              94       3  96.81%   192, 199, 215
packages/ragbits-document-search/src/ragbits/document_search/documents/sources/hf.py                               60      12  80.00%   55-58, 62-63, 94, 101-102, 117-119
packages/ragbits-document-search/src/ragbits/document_search/documents/sources/local.py                            40       2  95.00%   40, 80
packages/ragbits-document-search/src/ragbits/document_search/documents/sources/s3.py                              105      49  53.33%   51-58, 72-96, 116-132, 164, 181
packages/ragbits-document-search/src/ragbits/document_search/documents/sources/web.py                              45       2  95.56%   62, 79
packages/ragbits-document-search/src/ragbits/document_search/ingestion/__init__.py                                  0       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/enrichers/__init__.py                        4       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/enrichers/base.py                           17       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/enrichers/exceptions.py                     14       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/enrichers/image.py                          30       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/enrichers/router.py                         26       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/__init__.py                          3       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/base.py                             28       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/exceptions.py                       14       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/router.py                           24       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/unstructured.py                     63      12  80.95%   127-145, 179, 224-239
packages/ragbits-document-search/src/ragbits/document_search/ingestion/strategies/__init__.py                       5       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/strategies/base.py                          83       7  91.57%   140-144, 203, 206-207
packages/ragbits-document-search/src/ragbits/document_search/ingestion/strategies/batched.py                       68       8  88.24%   162, 202-214, 251-252
packages/ragbits-document-search/src/ragbits/document_search/ingestion/strategies/ray.py                           34       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/strategies/sequential.py                    22       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/__init__.py                                  0       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/__init__.py                       6       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/base.py                           9       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/llm.py                           25       4  84.00%   47-50
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/multi.py                         27       4  85.19%   51-54
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/noop.py                           6       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/prompts.py                       26       2  92.31%   65, 87
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/__init__.py                        3       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/base.py                           17       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/litellm.py                        18       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/noop.py                           11       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/reciprocal_ranked_fusion.py       22       2  90.91%   50, 60
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/rerankers_answerdotai.py          22       0  100.00%
packages/ragbits-document-search/tests/__init__.py                                                                  0       0  100.00%
packages/ragbits-document-search/tests/helpers.py                                                                   3       0  100.00%
packages/ragbits-document-search/tests/cli/test_ingest.py                                                          21       0  100.00%
packages/ragbits-document-search/tests/cli/test_search.py                                                          71       0  100.00%
packages/ragbits-document-search/tests/integration/__init__.py                                                      0       0  100.00%
packages/ragbits-document-search/tests/integration/test_git_source.py                                              85       6  92.94%   148-157
packages/ragbits-document-search/tests/integration/test_rerankers.py                                               26       6  76.92%   21-43
packages/ragbits-document-search/tests/integration/test_sources.py                                                 24      10  58.33%   23-33, 41-46
packages/ragbits-document-search/tests/integration/test_unstructured.py                                            12       4  66.67%   62-67
packages/ragbits-document-search/tests/unit/test_aws_source.py                                                     24       0  100.00%
packages/ragbits-document-search/tests/unit/test_azure_blob_source.py                                              76       0  100.00%
packages/ragbits-document-search/tests/unit/test_config.py                                                         63       0  100.00%
packages/ragbits-document-search/tests/unit/test_document_parser_router.py                                         24       0  100.00%
packages/ragbits-document-search/tests/unit/test_document_parsers.py                                               47       0  100.00%
packages/ragbits-document-search/tests/unit/test_document_search.py                                               218       1  99.54%   428
packages/ragbits-document-search/tests/unit/test_documents.py                                                      13       0  100.00%
packages/ragbits-document-search/tests/unit/test_element_enricher_router.py                                        23       0  100.00%
packages/ragbits-document-search/tests/unit/test_element_enrichers.py                                              48       0  100.00%
packages/ragbits-document-search/tests/unit/test_elements.py                                                       20       0  100.00%
packages/ragbits-document-search/tests/unit/test_gcs_hf_sources.py                                                 53       8  84.91%   17-18, 53-58
packages/ragbits-document-search/tests/unit/test_git_source.py                                                    135       0  100.00%
packages/ragbits-document-search/tests/unit/test_ingest_strategies.py                                              43       0  100.00%
packages/ragbits-document-search/tests/unit/test_local_file_source.py                                              13       0  100.00%
packages/ragbits-document-search/tests/unit/test_rephrasers.py                                                     40       0  100.00%
packages/ragbits-document-search/tests/unit/test_rerankers.py                                                      81       1  98.77%   25
packages/ragbits-document-search/tests/unit/test_source_discriminator.py                                           36       0  100.00%
packages/ragbits-document-search/tests/unit/test_source_exceptions.py                                              22       0  100.00%
packages/ragbits-document-search/tests/unit/test_web_source.py                                                     43       0  100.00%
packages/ragbits-document-search/tests/unit/testprojects/project_with_instance_factory/__init__.py                  0       0  100.00%
packages/ragbits-document-search/tests/unit/testprojects/project_with_instance_factory/factories.py                22       0  100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/__init__.py                                                          0       0  100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/cli.py                                                              45      16  64.44%   109-119, 130-149
packages/ragbits-evaluate/src/ragbits/evaluate/config.py                                                            6       0  100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/evaluator.py                                                        51      25  50.98%   43-48, 71-78, 99-109, 122, 136, 151-155
packages/ragbits-evaluate/src/ragbits/evaluate/dataloaders/__init__.py                                              7       2  71.43%   20-21
packages/ragbits-evaluate/src/ragbits/evaluate/dataloaders/base.py                                                  7       0  100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/metrics/__init__.py                                                  2       0  100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/metrics/base.py                                                     21       5  76.19%   24-25, 54, 67, 79
packages/ragbits-evaluate/src/ragbits/evaluate/pipelines/__init__.py                                               11       4  63.64%   23-26
packages/ragbits-evaluate/src/ragbits/evaluate/pipelines/base.py                                                   14       2  85.71%   23, 41
packages/ragbits-evaluate/src/ragbits/evaluate/pipelines/document_search.py                                        29      12  58.62%   35-36, 52-55, 61-68, 80-82
packages/ragbits-guardrails/src/ragbits/guardrails/__init__.py                                                      0       0  100.00%
packages/ragbits-guardrails/src/ragbits/guardrails/base.py                                                         15       0  100.00%
packages/ragbits-guardrails/src/ragbits/guardrails/openai_moderation.py                                            19       5  73.68%   29-33
packages/ragbits-guardrails/tests/unit/test_openai_moderation.py                                                   35       0  100.00%
TOTAL                                                                                                            7212     517  92.83%

Diff against main

Filename                                                                                           Stmts    Miss  Cover
-----------------------------------------------------------------------------------------------  -------  ------  --------
packages/ragbits-document-search/src/ragbits/document_search/_main.py                                 -6       0  +100.00%
packages/ragbits-document-search/src/ragbits/document_search/documents/document.py                    +4      -1  +1.71%
packages/ragbits-document-search/src/ragbits/document_search/documents/element.py                     -8       0  -1.49%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/enrichers/__init__.py          +4       0  +100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/enrichers/base.py             +17       0  +100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/enrichers/exceptions.py       +14       0  +100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/enrichers/image.py            +30       0  +100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/enrichers/router.py           +26       0  +100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/__init__.py            +3       0  +100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/base.py               +28       0  +100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/exceptions.py         +14       0  +100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/router.py             +24       0  +100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/parsers/unstructured.py       +63     +12  +80.95%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/strategies/base.py             +4      +1  -0.84%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/strategies/ray.py              -1       0  +100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/strategies/sequential.py       -1       0  +100.00%
packages/ragbits-document-search/tests/integration/test_unstructured.py                              -36      -6  -12.50%
packages/ragbits-document-search/tests/unit/test_document_parser_router.py                           +24       0  +100.00%
packages/ragbits-document-search/tests/unit/test_document_parsers.py                                 +47       0  +100.00%
packages/ragbits-document-search/tests/unit/test_element_enricher_router.py                          +23       0  +100.00%
packages/ragbits-document-search/tests/unit/test_element_enrichers.py                                +48       0  +100.00%
packages/ragbits-document-search/tests/unit/test_ingest_strategies.py                                 +7       0  +100.00%
TOTAL                                                                                               +328      +6  +0.45%

Results for commit: 9716f64

Minimum allowed coverage is 60%

♻️ This comment has been updated with latest results

Copy link
Contributor

github-actions bot commented Mar 18, 2025

Trivy scanning results.

Report Summary

┌─────────┬──────┬─────────────────┬─────────┐
│ Target │ Type │ Vulnerabilities │ Secrets │
├─────────┼──────┼─────────────────┼─────────┤
│ uv.lock │ uv │ 21 │ - │
└─────────┴──────┴─────────────────┴─────────┘
Legend:

  • '-': Not scanned
  • '0': Clean (no security findings detected)

For OSS Maintainers: VEX Notice

If you're an OSS maintainer and Trivy has detected vulnerabilities in your project that you believe are not actually exploitable, consider issuing a VEX (Vulnerability Exploitability eXchange) statement.
VEX allows you to communicate the actual status of vulnerabilities in your project, improving security transparency and reducing false positives for your users.
Learn more and start using VEX: https://trivy.dev/v0.60/docs/supply-chain/vex/repo#publishing-vex-documents

To disable this notice, set the TRIVY_DISABLE_VEX_NOTICE environment variable.

uv.lock (uv)

Total: 21 (MEDIUM: 11, HIGH: 9, CRITICAL: 1)

┌──────────────────┬────────────────┬──────────┬────────┬───────────────────┬───────────────┬──────────────────────────────────────────────────────────────┐
│ Library │ Vulnerability │ Severity │ Status │ Installed Version │ Fixed Version │ Title │
├──────────────────┼────────────────┼──────────┼────────┼───────────────────┼───────────────┼──────────────────────────────────────────────────────────────┤
│ aiohttp │ CVE-2024-52303 │ MEDIUM │ fixed │ 3.10.8 │ 3.10.11 │ aiohttp: aiohttp memory leak when middleware is enabled when │
│ │ │ │ │ │ │ requesting a resource... │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-52303
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-52304 │ │ │ │ │ aiohttp: aiohttp vulnerable to request smuggling due to │
│ │ │ │ │ │ │ incorrect parsing of chunk... │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-52304
├──────────────────┼────────────────┼──────────┤ ├───────────────────┼───────────────┼──────────────────────────────────────────────────────────────┤
│ gradio │ CVE-2025-23042 │ CRITICAL │ │ 4.44.1 │ 5.11.0 │ Gradio Blocked Path ACL Bypass Vulnerability │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2025-23042
│ ├────────────────┼──────────┤ │ ├───────────────┼──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-47867 │ HIGH │ │ │ 5.0.0 │ Gradio lacks integrity checking on the downloaded FRP client │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-47867
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-47870 │ │ │ │ │ Gradio has a race condition in update_root_in_config may │
│ │ │ │ │ │ │ redirect user traffic │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-47870
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-47871 │ │ │ │ │ Gradio uses insecure communication between the FRP client │
│ │ │ │ │ │ │ and server │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-47871
│ ├────────────────┼──────────┤ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-47164 │ MEDIUM │ │ │ │ Gradio's is_in_or_equal function may be bypassed │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-47164
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-47165 │ │ │ │ │ Gradio's CORS origin validation accepts the null origin │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-47165
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-47167 │ │ │ │ │ Gradio vulnerable to SSRF in the path parameter of │
│ │ │ │ │ │ │ /queue/join │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-47167
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-47868 │ │ │ │ │ Gradio has several components with post-process steps allow │
│ │ │ │ │ │ │ arbitrary file leaks │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-47868
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-47872 │ │ │ │ │ Gradio has an XSS on every Gradio server via upload of │
│ │ │ │ │ │ │ HTML... │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-47872
├──────────────────┼────────────────┤ │ ├───────────────────┼───────────────┼──────────────────────────────────────────────────────────────┤
│ jinja2 │ CVE-2024-56201 │ │ │ 3.1.4 │ 3.1.5 │ jinja2: Jinja has a sandbox breakout through malicious │
│ │ │ │ │ │ │ filenames │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-56201
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-56326 │ │ │ │ │ jinja2: Jinja has a sandbox breakout through indirect │
│ │ │ │ │ │ │ reference to format method... │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-56326
│ ├────────────────┤ │ │ ├───────────────┼──────────────────────────────────────────────────────────────┤
│ │ CVE-2025-27516 │ │ │ │ 3.1.6 │ jinja2: Jinja sandbox breakout through attr filter selecting │
│ │ │ │ │ │ │ format method │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2025-27516
├──────────────────┼────────────────┼──────────┤ ├───────────────────┼───────────────┼──────────────────────────────────────────────────────────────┤
│ litellm │ CVE-2025-0628 │ HIGH │ │ 1.55.0 │ 1.61.15 │ LiteLLM Has an Improper Authorization Vulnerability │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2025-0628
├──────────────────┼────────────────┤ │ ├───────────────────┼───────────────┼──────────────────────────────────────────────────────────────┤
│ python-multipart │ CVE-2024-53981 │ │ │ 0.0.12 │ 0.0.18 │ python-multipart: python-multipart has a DoS via deformation │
│ │ │ │ │ │ │ multipart/form-data boundary │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-53981
├──────────────────┼────────────────┤ │ ├───────────────────┼───────────────┼──────────────────────────────────────────────────────────────┤
│ starlette │ CVE-2024-47874 │ │ │ 0.38.6 │ 0.40.0 │ starlette: Starlette Denial of service (DoS) via │
│ │ │ │ │ │ │ multipart/form-data │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-47874
├──────────────────┼────────────────┤ │ ├───────────────────┼───────────────┼──────────────────────────────────────────────────────────────┤
│ transformers │ CVE-2024-11392 │ │ │ 4.44.2 │ 4.48.0 │ transformers: Hugging Face Transformers MobileViTV2 │
│ │ │ │ │ │ │ Deserialization of Untrusted Data Remote Code Execution... │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-11392
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-11393 │ │ │ │ │ transformers: Hugging Face Transformers MaskFormer Model │
│ │ │ │ │ │ │ Deserialization of Untrusted Data Remote Code... │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-11393
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-11394 │ │ │ │ │ transformers: Hugging Face Transformers Trax Model │
│ │ │ │ │ │ │ Deserialization of Untrusted Data Remote Code... │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-11394
│ ├────────────────┼──────────┤ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-12720 │ MEDIUM │ │ │ │ Transformers Regular Expression Denial of Service (ReDoS) │
│ │ │ │ │ │ │ vulnerability │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-12720
└──────────────────┴────────────────┴──────────┴────────┴───────────────────┴───────────────┴──────────────────────────────────────────────────────────────┘

@micpst micpst force-pushed the mp/refactor-document-processing branch from bdb0e54 to e1a8d59 Compare March 18, 2025 14:10
@micpst micpst force-pushed the mp/refactor-document-processing branch from d08ffb6 to 9d9622a Compare March 22, 2025 14:20
@micpst micpst marked this pull request as ready for review March 23, 2025 23:24
@micpst micpst merged commit 92306b5 into main Mar 25, 2025
7 checks passed
@micpst micpst deleted the mp/refactor-document-processing branch March 25, 2025 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

refactor: document processing interface
3 participants