Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Optional CA Certificate Support to Bulk Import Functions #2672

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

abd-770
Copy link

@abd-770 abd-770 commented Feb 24, 2025

Description:

Issue: #2671
This PR addresses the above issue by adding optional CA certificate support to the below bulk import functions:

  • bulk_import
  • get_import_progress
  • list_import_jobs

Now, users with TLS-mode milvus can use these functions in pymilvus, using their SSL-certificate.

@sre-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: abd-770
To complete the pull request process, please assign xuanyang-cn after the PR has been reviewed.
You can assign the PR to them by writing /assign @xuanyang-cn in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot
Copy link

Welcome @abd-770! It looks like this is your first PR to milvus-io/pymilvus 🎉

@abd-770
Copy link
Author

abd-770 commented Feb 26, 2025

@longjiquan @XuanYang-cn
I'll modify the commit details after all the changes are finalised.
Also, is there an way to test the changes to confirm it's working as expected.

@abd-770
Copy link
Author

abd-770 commented Mar 3, 2025

@longjiquan @XuanYang-cn
Can you guys review the PR?

@XuanYang-cn
Copy link
Contributor

@abd-770 We are reviewing this PR. In the mean time, could you please fix the DCO error?

@XuanYang-cn XuanYang-cn modified the milestones: 2.5.5, 2.5.6 Mar 3, 2025
@yhmo
Copy link
Contributor

yhmo commented Mar 3, 2025

@abd-770

The "**kwargs" is passed into _post_request() method for each API, and eventually passed into the requests.post() method.

I suppose you can call the get_import_progress() like this:

resp = get_import_progress(
    url=uri,
    job_id="455462875463422512",
    verify=CERTPATH
)

@XuanYang-cn XuanYang-cn removed this from the 2.5.6 milestone Mar 3, 2025
@XuanYang-cn XuanYang-cn added the wontfix This will not be worked on label Mar 3, 2025
@abd-770
Copy link
Author

abd-770 commented Mar 3, 2025

@yhmo @XuanYang-cn
I see, but as for the user convenience, it's much preferred to add a dedicated certificate field to the function, since many users connect to milvus via TLS.
Alot of users, including me, were under the impression that these function only supports non-TLS calls and we were using the deprecated bulk import functions, until I looked into the pymilvus source code and found out about this.

We can also update the documentation for the same and provide certificate path to the syntax of the function.

@yhmo
Copy link
Contributor

yhmo commented Mar 4, 2025

@abd-770

request has two parameters for tls:

    :param verify: (optional) Either a boolean, in which case it controls whether we verify
            the server's TLS certificate, or a string, in which case it must be a path
            to a CA bundle to use. Defaults to ``True``.
    :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.

In pymilvus, for one-way tls, the client needs to pass a ca file: https://github.com/milvus-io/pymilvus/blob/master/examples/cert/example_tls1.py
For two-way tls, the client needs to provide two files: a pem and a client key: https://github.com/milvus-io/pymilvus/blob/master/examples/cert/example_tls2.py

Looks like the "verify" parameter is for one-way tls, only passing one path of ca file. It can be a boolean value which means internal ca.
The "cert" parameter is for two-way tls, which can be a tuple of two files. Seems it also can be a string for one-way tls, one ca path, I am not sure.
Your code change only handles one case: the "verify" for one ca path.

I think the cert_path should cover the three cases, I suggest to rename the cert_path to "cert"

def _post_request(
    url: str, api_key: str, params: {}, timeout: int = 20, cert: Optional[bool, str, tuple] = None, **kwargs
)

If cert is a bool, pass it as "verify".
If cert is a ca path, pass it as request("verify"), maybe also ok to pass it to request("cert"), I am not sure.
If cert is a tuple, pass it as request("cert").

@abd-770
Copy link
Author

abd-770 commented Mar 5, 2025

@yhmo
I can work on implementing the 2-way TLS for these functions.
I will create a seperate issue/PR for it.

Meanwhile for this PR, can we get it merged?

If cert is a bool, pass it as "verify".

For this case, you mean setting the value of verify=true?

        resp = requests.post(
            url=url, headers=_http_headers(api_key), json=params, timeout=timeout, verify=true, **kwargs
        )

@yhmo
Copy link
Contributor

yhmo commented Mar 5, 2025

You can read the source code of "requests.api", under the path [path_to_python_packages]/site-packages/requests/api.py

def post(url, data=None, json=None, **kwargs):
    return request("post", url, data=data, json=json, **kwargs)

def request(method, url, **kwargs):
    """Constructs and sends a :class:`Request <Request>`.

    :param method: method for the new :class:`Request` object: ``GET``, ``OPTIONS``, ``HEAD``, ``POST``, ``PUT``, ``PATCH``, or ``DELETE``.
    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary, list of tuples or bytes to send
        in the query string for the :class:`Request`.
    :param data: (optional) Dictionary, list of tuples, bytes, or file-like
        object to send in the body of the :class:`Request`.
    :param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`.
    :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
    :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
    :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
        ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
        or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content_type'`` is a string
        defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
        to add for the file.
    :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
    :param timeout: (optional) How many seconds to wait for the server to send data
        before giving up, as a float, or a :ref:`(connect timeout, read
        timeout) <timeouts>` tuple.
    :type timeout: float or tuple
    :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
    :type allow_redirects: bool
    :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
    :param verify: (optional) Either a boolean, in which case it controls whether we verify
            the server's TLS certificate, or a string, in which case it must be a path
            to a CA bundle to use. Defaults to ``True``.
    :param stream: (optional) if ``False``, the response content will be immediately downloaded.
    :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response

The description of "param verify" and "param cert".
"verify" can be a boolean or a path of certificate file
"cert" can be a path of cert file or a tuple of two cert paths

@abd-770
Copy link
Author

abd-770 commented Mar 18, 2025

@yhmo @XuanYang-cn
Two-way TLS with client side authentication is added to the bulk functions.
Can you take a look at the changes and suggest any improvements.
Also, is there any way to test these functionalities.

@abd-770
Copy link
Author

abd-770 commented Mar 23, 2025

@yhmo
can you review this PR?

@abd-770
Copy link
Author

abd-770 commented Apr 2, 2025

@XuanYang-cn ^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-dco size/M wontfix This will not be worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants