Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: evaluate testing #390

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open

feat: evaluate testing #390

wants to merge 27 commits into from

Conversation

kdziedzic68
Copy link
Collaborator

No description provided.

Copy link
Contributor

github-actions bot commented Mar 5, 2025

badge

Code Coverage Summary

Filename                                                                                                        Stmts    Miss  Cover    Missing
------------------------------------------------------------------------------------------------------------  -------  ------  -------  ---------------------------------------------------------------
packages/ragbits-cli/src/ragbits/cli/__init__.py                                                                   31       4  87.10%   73-74, 81-82
packages/ragbits-cli/src/ragbits/cli/_utils.py                                                                     23       4  82.61%   47, 65-67
packages/ragbits-cli/src/ragbits/cli/state.py                                                                      79       3  96.20%   50-51, 61
packages/ragbits-cli/tests/unit/test_state.py                                                                      72       2  97.22%   103-104
packages/ragbits-conversations/src/ragbits/conversations/__init__.py                                                0       0  100.00%
packages/ragbits-conversations/src/ragbits/conversations/history/__init__.py                                        0       0  100.00%
packages/ragbits-conversations/src/ragbits/conversations/history/compressors/__init__.py                            3       0  100.00%
packages/ragbits-conversations/src/ragbits/conversations/history/compressors/base.py                               10       0  100.00%
packages/ragbits-conversations/src/ragbits/conversations/history/compressors/llm.py                                29       1  96.55%   79
packages/ragbits-conversations/src/ragbits/conversations/history/stores/__init__.py                                 3       0  100.00%
packages/ragbits-conversations/src/ragbits/conversations/history/stores/base.py                                    17       0  100.00%
packages/ragbits-conversations/src/ragbits/conversations/history/stores/sql.py                                     66       1  98.48%   128
packages/ragbits-conversations/tests/unit/history/test_llm_compressor.py                                           64       0  100.00%
packages/ragbits-conversations/tests/unit/history/test_sql_store.py                                                48       2  95.83%   29-30
packages/ragbits-core/src/ragbits/core/__init__.py                                                                  6       2  66.67%   8-9
packages/ragbits-core/src/ragbits/core/cli.py                                                                       6       0  100.00%
packages/ragbits-core/src/ragbits/core/config.py                                                                   17       0  100.00%
packages/ragbits-core/src/ragbits/core/options.py                                                                  17       0  100.00%
packages/ragbits-core/src/ragbits/core/types.py                                                                     9       0  100.00%
packages/ragbits-core/src/ragbits/core/audit/__init__.py                                                           74       5  93.24%   42-45, 52-54
packages/ragbits-core/src/ragbits/core/audit/base.py                                                              183      35  80.87%   156-165, 249, 256, 262-264, 271-274, 335, 337, 341-345, 390-409
packages/ragbits-core/src/ragbits/core/audit/cli.py                                                               132       2  98.48%   91-92
packages/ragbits-core/src/ragbits/core/embeddings/__init__.py                                                       5       0  100.00%
packages/ragbits-core/src/ragbits/core/embeddings/base.py                                                          16       2  87.50%   40, 53
packages/ragbits-core/src/ragbits/core/embeddings/exceptions.py                                                    17       7  58.82%   7-8, 17, 26-27, 36, 45
packages/ragbits-core/src/ragbits/core/embeddings/litellm.py                                                       38      18  52.63%   78-112
packages/ragbits-core/src/ragbits/core/embeddings/noop.py                                                          29       1  96.55%   89
packages/ragbits-core/src/ragbits/core/embeddings/sparse.py                                                        58      29  50.00%   24-25, 28, 67-97
packages/ragbits-core/src/ragbits/core/llms/__init__.py                                                             4       0  100.00%
packages/ragbits-core/src/ragbits/core/llms/base.py                                                                80       2  97.50%   67, 79
packages/ragbits-core/src/ragbits/core/llms/exceptions.py                                                          20       5  75.00%   17, 26-27, 36, 45
packages/ragbits-core/src/ragbits/core/llms/factory.py                                                             12       2  83.33%   30, 51
packages/ragbits-core/src/ragbits/core/llms/litellm.py                                                             83      22  73.49%   93, 132, 142, 172-200, 222-227, 238
packages/ragbits-core/src/ragbits/core/llms/local.py                                                               55      27  50.91%   9-12, 67-75, 87-88, 109-120, 141-157
packages/ragbits-core/src/ragbits/core/llms/mock.py                                                                30       3  90.00%   70-73
packages/ragbits-core/src/ragbits/core/prompt/__init__.py                                                           2       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/_cli.py                                                              44      21  52.27%   25-33, 47-49, 63-65, 73-75, 89-97
packages/ragbits-core/src/ragbits/core/prompt/base.py                                                              28       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/exceptions.py                                                         7       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/parsers.py                                                           35       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/prompt.py                                                           143       1  99.30%   201
packages/ragbits-core/src/ragbits/core/prompt/discovery/__init__.py                                                 2       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/discovery/prompt_discovery.py                                        36       2  94.44%   55-56
packages/ragbits-core/src/ragbits/core/utils/__init__.py                                                            0       0  100.00%
packages/ragbits-core/src/ragbits/core/utils/_pyproject.py                                                         38       1  97.37%   113
packages/ragbits-core/src/ragbits/core/utils/config_handling.py                                                    72       8  88.89%   16, 54-55, 62-63, 152-154
packages/ragbits-core/src/ragbits/core/utils/decorators.py                                                         29       0  100.00%
packages/ragbits-core/src/ragbits/core/utils/dict_transformations.py                                               72       3  95.83%   24, 27, 108
packages/ragbits-core/src/ragbits/core/utils/pydantic.py                                                           13       2  84.62%   13, 16
packages/ragbits-core/src/ragbits/core/vector_stores/__init__.py                                                    3       0  100.00%
packages/ragbits-core/src/ragbits/core/vector_stores/_cli.py                                                       50       4  92.00%   67, 89, 95, 119
packages/ragbits-core/src/ragbits/core/vector_stores/base.py                                                       75       2  97.33%   41, 192
packages/ragbits-core/src/ragbits/core/vector_stores/chroma.py                                                     99       3  96.97%   67, 108, 180
packages/ragbits-core/src/ragbits/core/vector_stores/in_memory.py                                                  54       2  96.30%   86, 93
packages/ragbits-core/src/ragbits/core/vector_stores/pgvector.py                                                  143      31  78.32%   101, 203-211, 241-242, 255-257, 330-370
packages/ragbits-core/src/ragbits/core/vector_stores/qdrant.py                                                     94       7  92.55%   68-85, 123, 146, 199, 206
packages/ragbits-core/tests/cli/__init__.py                                                                         0       0  100.00%
packages/ragbits-core/tests/cli/test_cli_trace_handler.py                                                          48       3  93.75%   30, 43, 56
packages/ragbits-core/tests/cli/test_vector_store.py                                                              115       0  100.00%
packages/ragbits-core/tests/integration/vector_stores/test_vector_store.py                                         82       0  100.00%
packages/ragbits-core/tests/unit/__init__.py                                                                        0       0  100.00%
packages/ragbits-core/tests/unit/test_options.py                                                                   21       0  100.00%
packages/ragbits-core/tests/unit/audit/__init__.py                                                                  0       0  100.00%
packages/ragbits-core/tests/unit/audit/test_cli.py                                                                107       0  100.00%
packages/ragbits-core/tests/unit/audit/test_trace.py                                                               97       3  96.91%   16, 19, 22
packages/ragbits-core/tests/unit/embeddings/__init__.py                                                             0       0  100.00%
packages/ragbits-core/tests/unit/embeddings/test_from_config.py                                                    20       0  100.00%
packages/ragbits-core/tests/unit/llms/__init__.py                                                                   0       0  100.00%
packages/ragbits-core/tests/unit/llms/test_base.py                                                                 98       0  100.00%
packages/ragbits-core/tests/unit/llms/test_from_config.py                                                          16       0  100.00%
packages/ragbits-core/tests/unit/llms/test_litellm.py                                                              80       0  100.00%
packages/ragbits-core/tests/unit/llms/factory/__init__.py                                                           0       0  100.00%
packages/ragbits-core/tests/unit/llms/factory/test_get_preferred_llm.py                                            12       0  100.00%
packages/ragbits-core/tests/unit/prompts/__init__.py                                                                0       0  100.00%
packages/ragbits-core/tests/unit/prompts/test_parsers.py                                                           65       0  100.00%
packages/ragbits-core/tests/unit/prompts/test_prompt.py                                                           191       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/__init__.py                                                      0       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/prompt_classes_for_tests.py                                     30       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/test_prompt_discovery.py                                        18       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/ragbits_tests_pkg_with_prompts/__init__.py                       2       1  50.00%   3
packages/ragbits-core/tests/unit/prompts/discovery/ragbits_tests_pkg_with_prompts/prompts/__init__.py               3       2  33.33%   2-4
packages/ragbits-core/tests/unit/prompts/discovery/ragbits_tests_pkg_with_prompts/prompts/temp_prompt1.py          14       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/ragbits_tests_pkg_with_prompts/prompts/temp_prompt2.py          14       0  100.00%
packages/ragbits-core/tests/unit/utils/__init__.py                                                                  0       0  100.00%
packages/ragbits-core/tests/unit/utils/test_config_handling.py                                                     64       2  96.88%   27-28
packages/ragbits-core/tests/unit/utils/test_decorators.py                                                          26       2  92.31%   17, 39
packages/ragbits-core/tests/unit/utils/test_dict_transformations.py                                                69       0  100.00%
packages/ragbits-core/tests/unit/utils/pyproject/test_find.py                                                      13       0  100.00%
packages/ragbits-core/tests/unit/utils/pyproject/test_get_config.py                                                 9       0  100.00%
packages/ragbits-core/tests/unit/utils/pyproject/test_get_instace.py                                               37       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/__init__.py                                                          0       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_chroma.py                                                      65       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_from_config.py                                                 40       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_in_memory.py                                                   85       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_pgvector.py                                                   164       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_qdrant.py                                                      81       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/__init__.py                                            2       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/_main.py                                              98       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/cli.py                                                39       2  94.87%   85, 104
packages/ragbits-document-search/src/ragbits/document_search/documents/__init__.py                                  0       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/documents/document.py                                 62       2  96.77%   100, 147
packages/ragbits-document-search/src/ragbits/document_search/documents/element.py                                  91      14  84.62%   117, 134, 195-202, 212, 221-223
packages/ragbits-document-search/src/ragbits/document_search/documents/exceptions.py                               11       1  90.91%   17
packages/ragbits-document-search/src/ragbits/document_search/documents/sources/__init__.py                          8       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/documents/sources/azure.py                            97      13  86.60%   73-74, 109-110, 181-192
packages/ragbits-document-search/src/ragbits/document_search/documents/sources/base.py                             64       4  93.75%   166-167, 170-171
packages/ragbits-document-search/src/ragbits/document_search/documents/sources/gcs.py                              65       1  98.46%   45
packages/ragbits-document-search/src/ragbits/document_search/documents/sources/hf.py                               60      12  80.00%   55-58, 62-63, 94, 101-102, 117-119
packages/ragbits-document-search/src/ragbits/document_search/documents/sources/http.py                             44       7  84.09%   55-60, 64, 77
packages/ragbits-document-search/src/ragbits/document_search/documents/sources/local.py                            40       2  95.00%   40, 80
packages/ragbits-document-search/src/ragbits/document_search/documents/sources/s3.py                              105      49  53.33%   51-58, 72-96, 116-132, 164, 181
packages/ragbits-document-search/src/ragbits/document_search/ingestion/__init__.py                                  0       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/document_processor.py                       33       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/intermediate_handlers/__init__.py            3       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/intermediate_handlers/base.py                5       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/intermediate_handlers/images.py             36       2  94.44%   71, 101
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/__init__.py                        3       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/base.py                           20       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/dummy.py                          21       7  66.67%   35, 56-62
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/unstructured/__init__.py           4       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/unstructured/default.py           48       4  91.67%   99, 104-105, 144
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/unstructured/images.py            36      14  61.11%   58-67, 74-81, 92, 105
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/unstructured/pdf.py               19       6  68.42%   23, 35-43
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/unstructured/utils.py             38      11  71.05%   71, 82-83, 98-101, 110, 121-123
packages/ragbits-document-search/src/ragbits/document_search/ingestion/strategies/__init__.py                       5       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/strategies/base.py                          69       9  86.96%   110-118, 162-163
packages/ragbits-document-search/src/ragbits/document_search/ingestion/strategies/batched.py                       54       2  96.30%   229-230
packages/ragbits-document-search/src/ragbits/document_search/ingestion/strategies/ray.py                           35       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/strategies/sequential.py                    23       2  91.30%   69-70
packages/ragbits-document-search/src/ragbits/document_search/retrieval/__init__.py                                  0       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/__init__.py                       6       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/base.py                           9       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/llm.py                           25       4  84.00%   47-50
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/multi.py                         27       4  85.19%   51-54
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/noop.py                           6       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/prompts.py                       26       2  92.31%   65, 87
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/__init__.py                        3       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/base.py                           17       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/litellm.py                        18       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/noop.py                           11       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/reciprocal_ranked_fusion.py       22       2  90.91%   50, 60
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/rerankers_answerdotai.py          22       0  100.00%
packages/ragbits-document-search/tests/__init__.py                                                                  0       0  100.00%
packages/ragbits-document-search/tests/helpers.py                                                                   3       0  100.00%
packages/ragbits-document-search/tests/cli/test_ingest.py                                                          21       0  100.00%
packages/ragbits-document-search/tests/cli/test_search.py                                                          71       0  100.00%
packages/ragbits-document-search/tests/integration/__init__.py                                                      0       0  100.00%
packages/ragbits-document-search/tests/integration/test_rerankers.py                                               26       6  76.92%   21-43
packages/ragbits-document-search/tests/integration/test_sources.py                                                 24      10  58.33%   23-33, 41-46
packages/ragbits-document-search/tests/integration/test_unstructured.py                                            48      10  79.17%   52-58, 71-77
packages/ragbits-document-search/tests/unit/test_aws_source.py                                                     24       0  100.00%
packages/ragbits-document-search/tests/unit/test_azure_blob_source.py                                              76       0  100.00%
packages/ragbits-document-search/tests/unit/test_config.py                                                         63       0  100.00%
packages/ragbits-document-search/tests/unit/test_document_processor.py                                             17       0  100.00%
packages/ragbits-document-search/tests/unit/test_document_search.py                                               219       1  99.54%   430
packages/ragbits-document-search/tests/unit/test_documents.py                                                      13       0  100.00%
packages/ragbits-document-search/tests/unit/test_elements.py                                                       20       0  100.00%
packages/ragbits-document-search/tests/unit/test_gcs_hf_sources.py                                                 53       8  84.91%   17-18, 53-58
packages/ragbits-document-search/tests/unit/test_http_source.py                                                    17       0  100.00%
packages/ragbits-document-search/tests/unit/test_ingest_strategies.py                                              22       0  100.00%
packages/ragbits-document-search/tests/unit/test_intermediate_handlers.py                                          33       0  100.00%
packages/ragbits-document-search/tests/unit/test_local_file_source.py                                              13       0  100.00%
packages/ragbits-document-search/tests/unit/test_providers.py                                                      41       0  100.00%
packages/ragbits-document-search/tests/unit/test_rephrasers.py                                                     40       0  100.00%
packages/ragbits-document-search/tests/unit/test_rerankers.py                                                      81       1  98.77%   25
packages/ragbits-document-search/tests/unit/test_source_discriminator.py                                           36       0  100.00%
packages/ragbits-document-search/tests/unit/testprojects/project_with_instance_factory/__init__.py                  0       0  100.00%
packages/ragbits-document-search/tests/unit/testprojects/project_with_instance_factory/factories.py                22       0  100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/__init__.py                                                          4       0  100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/cli.py                                                              43       3  93.02%   131, 133, 135
packages/ragbits-evaluate/src/ragbits/evaluate/config.py                                                            7       0  100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/evaluator.py                                                        60       1  98.33%   55
packages/ragbits-evaluate/src/ragbits/evaluate/optimizer.py                                                        92      18  80.43%   167-173, 192, 195-196, 199, 203-209, 212-215
packages/ragbits-evaluate/src/ragbits/evaluate/utils.py                                                            52      34  34.62%   27-34, 46-50, 64-67, 83-95, 106-115, 125-126
packages/ragbits-evaluate/src/ragbits/evaluate/dataloaders/__init__.py                                              7       0  100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/dataloaders/base.py                                                  7       0  100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/dataloaders/hf.py                                                   10       0  100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/dataloaders/local.py                                                14       1  92.86%   34
packages/ragbits-evaluate/src/ragbits/evaluate/factories/__init__.py                                               19       6  68.42%   35-37, 42-44
packages/ragbits-evaluate/src/ragbits/evaluate/metrics/__init__.py                                                  2       0  100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/metrics/base.py                                                     21       0  100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/metrics/document_search.py                                          23       0  100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/pipelines/__init__.py                                               14       1  92.86%   45
packages/ragbits-evaluate/src/ragbits/evaluate/pipelines/base.py                                                   29       1  96.55%   27
packages/ragbits-evaluate/src/ragbits/evaluate/pipelines/document_search.py                                        34       6  82.35%   62-65, 74-78
packages/ragbits-evaluate/tests/cli/test_run_evaluation.py                                                         24       0  100.00%
packages/ragbits-evaluate/tests/unit/test_dataloaders.py                                                           36       0  100.00%
packages/ragbits-evaluate/tests/unit/test_evaluation_pipeline.py                                                   71       0  100.00%
packages/ragbits-evaluate/tests/unit/test_metrics.py                                                               75       0  100.00%
packages/ragbits-evaluate/tests/unit/test_optimization.py                                                          72       0  100.00%
packages/ragbits-guardrails/src/ragbits/guardrails/__init__.py                                                      0       0  100.00%
packages/ragbits-guardrails/src/ragbits/guardrails/base.py                                                         15       0  100.00%
packages/ragbits-guardrails/src/ragbits/guardrails/openai_moderation.py                                            19       5  73.68%   29-33
packages/ragbits-guardrails/tests/unit/test_openai_moderation.py                                                   35       0  100.00%
TOTAL                                                                                                            7259     567  92.19%

Diff against main

Filename                                                                       Stmts    Miss  Cover
---------------------------------------------------------------------------  -------  ------  --------
packages/ragbits-evaluate/src/ragbits/evaluate/__init__.py                        +4       0  +100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/cli.py                             -2     -13  +28.58%
packages/ragbits-evaluate/src/ragbits/evaluate/config.py                          +1       0  +100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/evaluator.py                       +9     -24  +47.35%
packages/ragbits-evaluate/src/ragbits/evaluate/optimizer.py                      +92     +18  +80.43%
packages/ragbits-evaluate/src/ragbits/evaluate/utils.py                          +52     +34  +34.62%
packages/ragbits-evaluate/src/ragbits/evaluate/dataloaders/__init__.py             0      -2  +28.57%
packages/ragbits-evaluate/src/ragbits/evaluate/dataloaders/hf.py                 +10       0  +100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/dataloaders/local.py              +14      +1  +92.86%
packages/ragbits-evaluate/src/ragbits/evaluate/factories/__init__.py             +19      +6  +68.42%
packages/ragbits-evaluate/src/ragbits/evaluate/metrics/base.py                     0      -5  +23.81%
packages/ragbits-evaluate/src/ragbits/evaluate/metrics/document_search.py        +23       0  +100.00%
packages/ragbits-evaluate/src/ragbits/evaluate/pipelines/__init__.py              +3      -3  +29.22%
packages/ragbits-evaluate/src/ragbits/evaluate/pipelines/base.py                 +15      -1  +10.84%
packages/ragbits-evaluate/src/ragbits/evaluate/pipelines/document_search.py       +5      -6  +23.73%
packages/ragbits-evaluate/tests/cli/test_run_evaluation.py                       +24       0  +100.00%
packages/ragbits-evaluate/tests/unit/test_dataloaders.py                         +36       0  +100.00%
packages/ragbits-evaluate/tests/unit/test_evaluation_pipeline.py                 +71       0  +100.00%
packages/ragbits-evaluate/tests/unit/test_metrics.py                             +75       0  +100.00%
packages/ragbits-evaluate/tests/unit/test_optimization.py                        +72       0  +100.00%
TOTAL                                                                           +523      +5  +0.53%

Results for commit: 9c7af53

Minimum allowed coverage is 60%

♻️ This comment has been updated with latest results

@kdziedzic68 kdziedzic68 changed the title Feat/evaluate testing feat: evaluate testing Mar 5, 2025
Copy link
Contributor

github-actions bot commented Mar 5, 2025

Trivy scanning results.

Report Summary

┌─────────┬──────┬─────────────────┬─────────┐
│ Target │ Type │ Vulnerabilities │ Secrets │
├─────────┼──────┼─────────────────┼─────────┤
│ uv.lock │ uv │ 19 │ - │
└─────────┴──────┴─────────────────┴─────────┘
Legend:

  • '-': Not scanned
  • '0': Clean (no security findings detected)

For OSS Maintainers: VEX Notice

If you're an OSS maintainer and Trivy has detected vulnerabilities in your project that you believe are not actually exploitable, consider issuing a VEX (Vulnerability Exploitability eXchange) statement.
VEX allows you to communicate the actual status of vulnerabilities in your project, improving security transparency and reducing false positives for your users.
Learn more and start using VEX: https://trivy.dev/v0.60/docs/supply-chain/vex/repo#publishing-vex-documents

To disable this notice, set the TRIVY_DISABLE_VEX_NOTICE environment variable.

uv.lock (uv)

Total: 19 (MEDIUM: 10, HIGH: 8, CRITICAL: 1)

┌──────────────────┬────────────────┬──────────┬────────┬───────────────────┬───────────────┬──────────────────────────────────────────────────────────────┐
│ Library │ Vulnerability │ Severity │ Status │ Installed Version │ Fixed Version │ Title │
├──────────────────┼────────────────┼──────────┼────────┼───────────────────┼───────────────┼──────────────────────────────────────────────────────────────┤
│ aiohttp │ CVE-2024-52303 │ MEDIUM │ fixed │ 3.10.8 │ 3.10.11 │ aiohttp: aiohttp memory leak when middleware is enabled when │
│ │ │ │ │ │ │ requesting a resource... │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-52303
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-52304 │ │ │ │ │ aiohttp: aiohttp vulnerable to request smuggling due to │
│ │ │ │ │ │ │ incorrect parsing of chunk... │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-52304
├──────────────────┼────────────────┼──────────┤ ├───────────────────┼───────────────┼──────────────────────────────────────────────────────────────┤
│ gradio │ CVE-2025-23042 │ CRITICAL │ │ 4.44.1 │ 5.11.0 │ Gradio Blocked Path ACL Bypass Vulnerability │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2025-23042
│ ├────────────────┼──────────┤ │ ├───────────────┼──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-47867 │ HIGH │ │ │ 5.0.0 │ Gradio lacks integrity checking on the downloaded FRP client │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-47867
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-47870 │ │ │ │ │ Gradio has a race condition in update_root_in_config may │
│ │ │ │ │ │ │ redirect user traffic │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-47870
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-47871 │ │ │ │ │ Gradio uses insecure communication between the FRP client │
│ │ │ │ │ │ │ and server │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-47871
│ ├────────────────┼──────────┤ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-47164 │ MEDIUM │ │ │ │ Gradio's is_in_or_equal function may be bypassed │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-47164
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-47165 │ │ │ │ │ Gradio's CORS origin validation accepts the null origin │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-47165
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-47167 │ │ │ │ │ Gradio vulnerable to SSRF in the path parameter of │
│ │ │ │ │ │ │ /queue/join │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-47167
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-47868 │ │ │ │ │ Gradio has several components with post-process steps allow │
│ │ │ │ │ │ │ arbitrary file leaks │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-47868
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-47872 │ │ │ │ │ Gradio has an XSS on every Gradio server via upload of │
│ │ │ │ │ │ │ HTML... │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-47872
├──────────────────┼────────────────┤ │ ├───────────────────┼───────────────┼──────────────────────────────────────────────────────────────┤
│ jinja2 │ CVE-2024-56201 │ │ │ 3.1.4 │ 3.1.5 │ jinja2: Jinja has a sandbox breakout through malicious │
│ │ │ │ │ │ │ filenames │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-56201
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-56326 │ │ │ │ │ jinja2: Jinja has a sandbox breakout through indirect │
│ │ │ │ │ │ │ reference to format method... │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-56326
│ ├────────────────┤ │ │ ├───────────────┼──────────────────────────────────────────────────────────────┤
│ │ CVE-2025-27516 │ │ │ │ 3.1.6 │ jinja2: Jinja sandbox breakout through attr filter selecting │
│ │ │ │ │ │ │ format method │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2025-27516
├──────────────────┼────────────────┼──────────┤ ├───────────────────┼───────────────┼──────────────────────────────────────────────────────────────┤
│ python-multipart │ CVE-2024-53981 │ HIGH │ │ 0.0.12 │ 0.0.18 │ python-multipart: python-multipart has a DoS via deformation │
│ │ │ │ │ │ │ multipart/form-data boundary │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-53981
├──────────────────┼────────────────┤ │ ├───────────────────┼───────────────┼──────────────────────────────────────────────────────────────┤
│ starlette │ CVE-2024-47874 │ │ │ 0.38.6 │ 0.40.0 │ starlette: Starlette Denial of service (DoS) via │
│ │ │ │ │ │ │ multipart/form-data │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-47874
├──────────────────┼────────────────┤ │ ├───────────────────┼───────────────┼──────────────────────────────────────────────────────────────┤
│ transformers │ CVE-2024-11392 │ │ │ 4.44.2 │ 4.48.0 │ transformers: Hugging Face Transformers MobileViTV2 │
│ │ │ │ │ │ │ Deserialization of Untrusted Data Remote Code Execution... │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-11392
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-11393 │ │ │ │ │ transformers: Hugging Face Transformers MaskFormer Model │
│ │ │ │ │ │ │ Deserialization of Untrusted Data Remote Code... │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-11393
│ ├────────────────┤ │ │ │ ├──────────────────────────────────────────────────────────────┤
│ │ CVE-2024-11394 │ │ │ │ │ transformers: Hugging Face Transformers Trax Model │
│ │ │ │ │ │ │ Deserialization of Untrusted Data Remote Code... │
│ │ │ │ │ │ │ https://avd.aquasec.com/nvd/cve-2024-11394
└──────────────────┴────────────────┴──────────┴────────┴───────────────────┴───────────────┴──────────────────────────────────────────────────────────────┘

Compute the evaluation results for the given pipeline and data.

Args:
pipeline: The pipeline to be evaluated.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see an argument pipeline

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added


class EvaluationPipeline(Generic[EvaluationTargetT], WithConstructionConfig, ABC):
"""
Collection evaluation pipeline.
"""

CONCURRENCY: int = 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to have this as a part of EvaluationConfig so it could be changed if needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. also renamed the parameter to batch_size - should be more intuitive. probably we should also support multiple batching strategies here as we do withing document processing

"""
model = EvaluationConfig.model_validate(config)
dataloader: DataLoader = DataLoader.subclass_from_config(model.dataloader)
pipeline: EvaluationPipeline = EvaluationPipeline.subclass_from_config(model.pipeline)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After checking the code, I am no longer sure that moving Evaluator to EvaluationPipeline is a good move. Here you are creating an EvaluationPipeline object inside EvaluationPipeline. This is confusing for the user and mixes two different abstractions that imho should be separated - the code being evaluated and the evaluation orchestration. I think we should undo this change and bring back Evaluator.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it to be a class method now. would not say it is confusing - as well one can say that all class methods are confusing. To me stateless object is more confusing than this and actually the code which is being evaluated is the object evaluation_target (eg. DocumentSearch) and EvaluationPipeline is already kind of orchestration for eval execution

@kdziedzic68 kdziedzic68 force-pushed the feat/evaluate-testing branch from 4957491 to 175c3eb Compare March 18, 2025 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

refactor: get rid of Evaluator class feat: add tests for ragbits.evaluate module
3 participants