feat: support non vectorized managed function #1373

jialuoo · 2025-02-06T23:03:02Z

b/391680147

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

bigframes/functions/_function_session.py

shobsi

I'll review the test code in the next batch

bigframes/functions/_function_client.py

bigframes/functions/_function_session.py

shobsi · 2025-02-19T09:18:56Z

bigframes/functions/_function_session.py

+        def wrapper(func):
+            nonlocal input_types, output_type
+
+            if not callable(func):


Can lines 808-837 be put in a common function?

Yeah, there is a TODO on top of the wrapper. I'll use another PR to do it later if you agree.

I see, let's create an issue for tracking

bigframes/series.py

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

shobsi · 2025-02-19T18:06:51Z

bigframes/session/__init__.py

+                ssets can be located through the following properties set in the
+                object:
+
+                `bigframes_managed_function` - The bigquery managed function


We should document bigframes_bigquery_function (related to the other comment)

shobsi · 2025-02-19T18:10:24Z

bigframes/functions/_function_session.py

@@ -570,11 +647,12 @@ def try_delattr(attr):
            func.bigframes_cloud_function = (
                remote_function_client.get_cloud_function_fully_qualified_name(cf_name)
            )
-            func.bigframes_remote_function = (
+            func.bigframes_function = (


[nit] I think for clarity we should call the new attribute "bigframes_bigquery_function".

shobsi · 2025-02-25T07:19:00Z

bigframes/dataframe.py

+            # TODO(jialuo): Deprecate the "bigframes_remote_function" attribute.
+            # We have some tests using pre-defined remote_function that were
+            # defined based on "bigframes_remote_function" instead of
+            # "bigframes_bigquery_function". So we need to fix those pre-defined


Let's chat offline which tests need the logic here to depend on both attributes. If possible we should rely on the new attribute and keep the older attribute only for backward compatibility

shobsi · 2025-02-25T07:57:59Z

bigframes/functions/_function_client.py

+        is_row_processor,
+    ):
+        """Create a BigQuery managed function."""
+        self._create_bq_connection()


connection is not mandatory in managed function

shobsi · 2025-02-25T08:01:00Z

bigframes/functions/_function_session.py

+                    ibis_signature.output_type
+                ),
+                language="python",
+                runtime_version="python-3.11",


We should pick this up from the environment instead of hard coding

shobsi · 2025-02-25T08:26:12Z

bigframes/functions/_function_client.py

+
+        managed_function_options = {
+            "runtime_version": runtime_version,
+            "entry_point": "managed_func",


[nit] maybe call it "bigframes_handler"

shobsi · 2025-02-25T08:32:37Z

bigframes/functions/_function_client.py

+
+udf = cloudpickle.loads({pickled})
+
+def managed_func(*args, **kwargs):


I think kwargs is redundant here, we can just use args

shobsi · 2025-02-25T08:36:46Z

bigframes/functions/_function_session.py

+            self._try_delattr(func, "is_row_processor")
+            self._try_delattr(func, "ibis_node")
+
+            bq_function_name = name if name else func.__name__


let's not use func.__name__, multiple users using a common name with entirely different code could end up overwriting each other. See how provision_bq_remote_function is determining the name of the BQ function from the hash of the user code + dependencies

shobsi · 2025-02-25T08:51:42Z

tests/system/small/functions/test_managed_function.py

+
+
+@pytest.fixture(scope="module")
+def bq_cf_connection() -> str:


Since connection is only optional in managed udf, let's run the tests without one. We can have one or two separate tests in large tests which specifically test an explicit connection

shobsi · 2025-02-25T08:59:47Z

tests/system/small/functions/test_managed_function.py

+    pd_int64_col = scalars_pandas_df["int64_col"]
+    pd_int64_col_filter = pd_int64_col.notnull()
+    pd_int64_col_filtered = pd_int64_col[pd_int64_col_filter]
+    pd_result_col = pd_int64_col_filtered.apply(lambda x: x * x)


No need to use an independent lambda, .apply(square) would work on a pandas series. (If you found such usage elsewhere, it was probably written before the remote function could be applied on scalar directly - the op in line 62)

feat: support non vectorized managed function

86eb396

product-auto-label bot added size: xl Pull request size is extra large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Feb 6, 2025

jialuoo self-assigned this Feb 6, 2025

jialuoo added 9 commits February 6, 2025 23:11

fix mf tests

63a5145

fix dataframe apply

538723d

fix series apply

de3ed7f

fix the decorator in tests

24caee3

fix test remote func

2f0b88d

add more tests

f1843ad

remove unused import

c7dba3b

refactor rf in bff session

63765fa

del udf args

62f3baf

jialuoo commented Feb 7, 2025

View reviewed changes

bigframes/functions/_function_session.py Outdated Show resolved Hide resolved

jialuoo requested a review from shobsi February 7, 2025 18:27

jialuoo marked this pull request as ready for review February 7, 2025 18:27

jialuoo requested review from a team as code owners February 7, 2025 18:27

jialuoo added 2 commits February 12, 2025 19:13

fix docstring

7f0f752

Merge branch 'main' into mf3-scalar

d9400bd

shobsi reviewed Feb 19, 2025

View reviewed changes

jialuoo added 3 commits February 19, 2025 08:55

Merge branch 'main' into mf3-scalar

60724fa

Merge branch 'main' into mf3-scalar

f6efbe7

resolve the comments

9b5363d

shobsi mentioned this pull request Feb 21, 2025

feat: support routines with ARRAY return type in read_gbq_function #1412

Merged

4 tasks

jialuoo and others added 2 commits February 24, 2025 19:17

fix the attribute naming

eb758f7

🦉 Updates from OwlBot post-processor

601b137

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

shobsi reviewed Feb 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support non vectorized managed function #1373

feat: support non vectorized managed function #1373

jialuoo commented Feb 6, 2025 •

edited

Loading

shobsi left a comment

shobsi Feb 19, 2025

jialuoo Feb 19, 2025 •

edited

Loading

shobsi Feb 25, 2025

shobsi Feb 19, 2025

shobsi Feb 19, 2025

shobsi Feb 25, 2025

shobsi Feb 25, 2025

shobsi Feb 25, 2025

shobsi Feb 25, 2025

shobsi Feb 25, 2025

shobsi Feb 25, 2025

shobsi Feb 25, 2025

shobsi Feb 25, 2025


		udf = cloudpickle.loads({pickled})

		def managed_func(args, *kwargs):



		@pytest.fixture(scope="module")
		def bq_cf_connection() -> str:

feat: support non vectorized managed function #1373

Are you sure you want to change the base?

feat: support non vectorized managed function #1373

Conversation

jialuoo commented Feb 6, 2025 • edited Loading

shobsi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jialuoo Feb 19, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jialuoo commented Feb 6, 2025 •

edited

Loading

jialuoo Feb 19, 2025 •

edited

Loading