Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable literal string for code search #33590

Merged
merged 8 commits into from
Feb 16, 2025

Conversation

darren
Copy link
Contributor

@darren darren commented Feb 14, 2025

Close: #33588

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Feb 14, 2025
@pull-request-size pull-request-size bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 14, 2025
@github-actions github-actions bot added the modifies/go Pull requests that update Go code label Feb 14, 2025
@wxiaoguang
Copy link
Contributor

The feature is really useful. There are some design concerns:

  • I think we shouldn't add a new "IsKeywordLiteral" flag, there are various indexers, so we could make them parse the input(keyword)
  • The work could be done in PerformSearch, then no need to make caller to prepare the flags
  • When trimming the quotes, does it need to also try to decode the content? For example: "var := \"value\";"?

(I think I could help to make some improvements to address the concerns and add some tests in this PR)

@pull-request-size pull-request-size bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 14, 2025
@darren darren force-pushed the bleve-literal-search branch from b57120a to e87e418 Compare February 14, 2025 05:34
@darren
Copy link
Contributor Author

darren commented Feb 14, 2025

  • I think we shouldn't add a new "IsKeywordLiteral" flag, there are various indexers, so we could make them parse the input(keyword)
  • The work could be done in PerformSearch, then no need to make caller to prepare the flags

so add IsKeywordLiteral to SearchOptions but not in prepareSearch?

Since we can make the literal search work in bleve first then in elasticsearch later,
add IsKeywordLiteral to the option so the code can be shared.

  • When trimming the quotes, does it need to also try to decode the content? For example: "var := \"value\";"?

I think it is only necessary to trim the left and right double qutote.

@wxiaoguang
Copy link
Contributor

Thank you for the update, I think I could try to make it also works with "git grep" and "elasticsearch", will make some more changes in a few days.

@pull-request-size pull-request-size bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 14, 2025
@silverwind
Copy link
Member

silverwind commented Feb 14, 2025

This would be useful indeed. I've been longing for exact search since a long time because neither "fuzzy" nor "match" mode supports exact search. Currently I'm using elasticsearch backend, so would appreciate if this syntax works for elasticsearch too.

@silverwind
Copy link
Member

silverwind commented Feb 14, 2025

BTW I don't think we should add any more search modes and ideally also not required quotes for exact search. Search should just work like GitHub where there is only one mode and it matches exact and I assume non-exact too.

@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Feb 14, 2025
@lunny lunny added this to the 1.24.0 milestone Feb 15, 2025
@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Feb 15, 2025
@lunny lunny added the reviewed/wait-merge This pull request is part of the merge queue. It will be merged soon. label Feb 15, 2025
@silverwind silverwind merged commit 950abfe into go-gitea:main Feb 16, 2025
26 checks passed
@GiteaBot GiteaBot removed the reviewed/wait-merge This pull request is part of the merge queue. It will be merged soon. label Feb 16, 2025
}
contentQuery = q
} else {
q := bleve.NewMatchQuery(opts.Keyword)
Copy link
Member

@jpraet jpraet Feb 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me like an unintended regression from https://github.com/go-gitea/gitea/pull/32210/files#diff-e1866799c5d7d231d9df896b84d4a9fa54d404d150e26ac6b205eaf0fea3790bL243-R270 that the default changed from NewMatchPhraseQuery to NewMatchQuery. When searching for something like swaggerParameterBodies it is currently searching separately on swagger, parameter and bodies.

So I wonder if it shouldn't simply be switched back to NewMatchPhraseQuery, without requiring the quotes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree (I seldom use the "code search" feature, and just tried to help to make this PR's code more general).

@jpraet @lunny Would you like to propose some PRs to fix the regression? And maybe make the elasticsearch have the same behavior (I will improve the git grep search later). Thank you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bsofiato can you confirm the switch from MatchPhraseQuery to MatchQuery for bleve in #32210 was unintentional?

Copy link
Contributor

@bsofiato bsofiato Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bsofiato can you confirm the switch from MatchPhraseQuery to MatchQuery for bleve in #32210 was unintentional?

Hi @jpraet , I made that PR some time ago, but I don't recall being intentional (i.e., thinking about the PR at hand -- search by filename --, it doesn't make much sense to change from MatchPhraseQuery to MatchQuery).

zjjhot added a commit to zjjhot/gitea that referenced this pull request Feb 17, 2025
* giteaofficial/main:
  Add API to support link package to repository and unlink it (go-gitea#33481)
  [skip ci] Updated translations via Crowdin
  Update JS and PY dependencies (go-gitea#33587)
  [chore] add git mailmap for proper attribution of authorship (go-gitea#33612)
  Move commits signature and verify functions to service layers (go-gitea#33605)
  add spacing between sign in button's icon and text (go-gitea#33609)
  enable literal string for code search (go-gitea#33590)
  [skip ci] Updated translations via Crowdin
  Artifacts download api for artifact actions v4 (go-gitea#33510)
  Fix bug when get commit (go-gitea#33602)
  Fix mirror bug (go-gitea#33597)
  Fix typo in HTML attribute (go-gitea#33599)
  Use default Git timeout when checking repo health (go-gitea#33593)
  Improve commits list performance to reduce unnecessary database queries (go-gitea#33528)
  Performance optimization for pull request files loading comments attachments (go-gitea#33585)
  Fix PR's target branch dropdown (go-gitea#33589)
@lunny lunny added the backport/v1.23 This PR should be backported to Gitea 1.23 label Feb 17, 2025
@GiteaBot
Copy link
Collaborator

I was unable to create a backport for 1.23. @darren, please send one manually. 🍵

go run ./contrib/backport 33590
...  // fix git conflicts if any
go run ./contrib/backport --continue

@GiteaBot GiteaBot added the backport/manual No power to the bots! Create your backport yourself! label Feb 17, 2025
@lunny
Copy link
Member

lunny commented Feb 17, 2025

I added the backport label here, so we can also backport the bug fixes that address the regression.

lunny pushed a commit that referenced this pull request Feb 25, 2025
Fix regression from #32210 which unintentionally changed the search mode
for bleve from MaatchPhraseQuery to MatchQuery.

On the main branch, meanwhile with #33590 a "literal code search" mode
(by using quotes) was implemented as workaround for this unexpected code
search behavior. Maybe that feature needs some redesign as it turns out
to have been caused by a regression.

But this PR at least already fixes the regression for 1.23.x
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/manual No power to the bots! Create your backport yourself! backport/v1.23 This PR should be backported to Gitea 1.23 lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. modifies/go Pull requests that update Go code size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support search for literals for repository code search with indexer
8 participants