Skip to content

Commit aa3a735

Browse files
Merge pull request #1362 from syucream/fix/join-attrs-docs
Enhance Advanced Search documentation and join attrs implementation
2 parents 1c6c6e7 + 0c80039 commit aa3a735

File tree

2 files changed

+72
-25
lines changed

2 files changed

+72
-25
lines changed

docs/content/advanced/advanced_search.md

+43-12
Original file line numberDiff line numberDiff line change
@@ -31,17 +31,48 @@ Advanced Search is a powerful feature that allows you to search across multiple
3131

3232
### Advanced Features
3333

34-
- **Search Chain**
35-
- Follow relationships between entries
36-
- Search through referenced objects
37-
- Chain multiple searches to traverse complex relationships
38-
- Results include both direct matches and related entries
39-
40-
- **Export Functionality**
41-
- Export search results to various formats
42-
- Asynchronous processing for large result sets
43-
- Progress tracking for export tasks
44-
- Download exported files when ready
34+
#### Join Attrs
35+
36+
Join Attrs enables relationship traversal in search results. Key points:
37+
38+
- **Implementation**
39+
- Sequential processing: root -> join targets
40+
- Each join triggers new Elasticsearch query
41+
- Supports OBJECT and ARRAY type references
42+
43+
- **Critical Considerations**
44+
1. **Pagination Behavior**
45+
```python
46+
# Example: Request 100 items
47+
root_results = search(limit=100) # Returns 100 root items
48+
joined_results = join_and_filter() # May return 0-100 items
49+
next_page_starts_at = 101 # Regardless of joined result size
50+
```
51+
- Pagination applies to root level only
52+
- Join/filter operations may reduce result size
53+
- Each page may return fewer items than requested
54+
55+
2. **Performance Impact**
56+
- N+1 query pattern with multiple joins
57+
- No optimization for deep joins with filters
58+
59+
3. **Result Count Accuracy**
60+
- Total count represents root level matches only
61+
- Actual result count may be lower after joins/filters
62+
- Cannot predict exact total after joins without full scan
63+
64+
#### Search Chain
65+
- Follow relationships between entries
66+
- Search through referenced objects
67+
- Chain multiple searches to traverse complex relationships
68+
- Results include both direct matches and related entries
69+
70+
#### Export Functionality
71+
72+
- Export search results to various formats
73+
- Asynchronous processing for large result sets
74+
- Progress tracking for export tasks
75+
- Download exported files when ready
4576

4677
## Access Methods
4778

@@ -87,7 +118,6 @@ Access Advanced Search programmatically through REST endpoints:
87118
- Leverage search chains for complex relationship queries
88119
- Monitor export task progress for large result sets
89120
- Consider pagination for large result sets in API usage
90-
91121
## For Developers
92122

93123
### Architecture Overview
@@ -173,3 +203,4 @@ Access Advanced Search programmatically through REST endpoints:
173203
- Integration tests for API endpoints
174204
- Performance tests for search operations
175205
- ACL verification tests
206+

entry/api_v2/views.py

+29-13
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,11 @@ class AdvancedSearchAPI(generics.GenericAPIView):
240240
"""
241241
NOTE for now it's just copied from /api/v1/entry/search, but it should be
242242
rewritten with DRF components.
243+
244+
Join Attrs implementation notes:
245+
- Pagination is applied at root level first, then join & filter operations
246+
- This may result in fewer items than requested limit
247+
- Each join triggers a new ES query (N+1 pattern)
243248
"""
244249

245250
@extend_schema(
@@ -275,8 +280,18 @@ def _get_joined_resp(
275280
prev_results: list[AdvancedSearchResultRecord], join_attr: AdvancedSearchJoinAttrInfo
276281
) -> tuple[bool, AdvancedSearchResults]:
277282
"""
278-
This is a helper method for join_attrs that will get specified attr values
279-
that prev_result's ones refer to.
283+
Process join operation for a single attribute.
284+
285+
Flow:
286+
1. Get related entities from prev_results
287+
2. Extract referral IDs and names
288+
3. Execute new ES query for joined entities
289+
4. Apply filters if specified
290+
291+
Note:
292+
- Each call triggers new ES query
293+
- Results may be reduced by join filters
294+
- Pagination from root level may lead to incomplete results
280295
"""
281296
entities = Entity.objects.filter(
282297
id__in=[result.entity["id"] for result in prev_results]
@@ -364,21 +379,20 @@ def _get_joined_resp(
364379

365380
# === End of Function: _get_joined_resp() ===
366381

367-
def _get_ref_id_from_es_result(attrinfo):
368-
if attrinfo["type"] == AttrType.OBJECT:
369-
if attrinfo.get("value") is not None:
382+
def _get_ref_id_from_es_result(attrinfo) -> list[int | None]:
383+
match attrinfo["type"]:
384+
case AttrType.OBJECT if attrinfo.get("value") is not None:
370385
return [attrinfo["value"].get("id")]
371386

372-
if attrinfo["type"] == AttrType.NAMED_OBJECT:
373-
if attrinfo.get("value") is not None:
387+
case AttrType.NAMED_OBJECT if attrinfo.get("value") is not None:
374388
[ref_info] = attrinfo["value"].values()
375389
return [ref_info.get("id")]
376390

377-
if attrinfo["type"] == AttrType.ARRAY_OBJECT:
378-
return [x.get("id") for x in attrinfo["value"]]
391+
case AttrType.ARRAY_OBJECT:
392+
return [x.get("id") for x in attrinfo["value"]]
379393

380-
if attrinfo["type"] == AttrType.ARRAY_NAMED_OBJECT:
381-
return sum([[y["id"] for y in x.values()] for x in attrinfo["value"]], [])
394+
case AttrType.ARRAY_NAMED_OBJECT:
395+
return sum([[y["id"] for y in x.values()] for x in attrinfo["value"]], [])
382396

383397
return []
384398

@@ -443,6 +457,8 @@ def _get_ref_id_from_es_result(attrinfo):
443457
total_count = deepcopy(resp.ret_count)
444458

445459
for join_attr in join_attrs:
460+
# Note: Each iteration here represents a potential N+1 query
461+
# The trade-off is between query performance and result accuracy
446462
(will_filter_by_joined_attr, joined_resp) = _get_joined_resp(resp.ret_values, join_attr)
447463
# This is needed to set result as blank value
448464
blank_joining_info = {
@@ -465,8 +481,8 @@ def _get_ref_id_from_es_result(attrinfo):
465481
}
466482

467483
# this inserts result to previous search result
468-
new_ret_values = []
469-
joined_ret_values = []
484+
new_ret_values: list[AdvancedSearchResultRecord] = []
485+
joined_ret_values: list[AdvancedSearchResultRecord] = []
470486
for resp_result in resp.ret_values:
471487
# joining search result to original one
472488
ref_info = resp_result.attrs.get(join_attr.name)

0 commit comments

Comments
 (0)