Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make stats and string matching case-insensitive #9

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ltning
Copy link

@ltning ltning commented Sep 1, 2023

New: Strings read from the inc/excl files are rendered lowercase when adding to the internal lists. Names from log are lowercased at ingestion.

As it was, stats were produced for each variation of each name, in a case sensitive manner. This balloons the stats massively and makes it less intuitive to make per-name/domain statistics and monitoring. It is assumed that in the vast majority of cases, case is not interesting.

@@ -41,7 +41,7 @@ func (m LogMatcher) ExtractInfo(line string) LogMatch {
if len(match) > 0 {
result.Matched = true
result.QueryClient = match[1]
result.QueryName = match[2]
result.QueryName = strings.ToLower(match[2])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I think the idea is great, but I detect a possible bug here. Since BIND will log the query exactly as received, I think we want to fold the entire input down to lower case first or make the match case insensitive. If we don't, we'll have an issue under the following circumstances:

  • The list of matches includes "FOOBAR.com" (but this is folded to "foobar.com") by the changes in names_collector.go
  • The client queries for "FooBar.com" -> therefore, no match against "foobar"

Conversely, if we fold the whole line down to lower case (maybe on line 39 with line = strings.ToLower(line), then we'll have a successful match. Since DNS RFC says that DNS names are not case sensitive, a non-case match is indeed a match... I think this is fine.

To me, it feels like it's more obvious to fold to lower case than to check/enforce the regex to have the case-insensitive prepend flag. WDYT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm, I'm not sure what I missed? Wouldn't my PR make sure that 'foobar.com' is inserted in the array and by ToLower-ing the name when looking it up, isn't this going to cover both cases?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're close - the only issue is that the string coming into this function (in line) is coming straight from the log file. I did a quick test and confirmed that BIND will log the value exactly as it is received. So, earlier on in the PR, you're correctly coaxing the search strings to lower case - but at this point, the lower case string is compared to a mixed case value in line if the user queried for "FoObAr". So I'm thinking that if we convert line to lower case around line 39, it'd be all set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants