Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure we always read PSL data as UTF-8 #99

Merged
merged 1 commit into from
Sep 19, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions lib/twingly/public_suffix_list.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
require "addressable/idna"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this missing from before? I don't see how it's used now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was missing from before, and the specs will fail without it (because the spec runs this file directly – it was required indirectly before)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do Addressable::IDNA.to_ascii here (from before, not added in this PR – but there were no specs then, so it made sense to include it in the commit creating the spec file)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, didn't check the full file before so I didn't spot the call to to_ascii.

require "public_suffix"

module Twingly
Expand All @@ -7,8 +8,9 @@ class PublicSuffixList
private_constant :ACE_PREFIX

# Extend the PSL with ASCII form of all internationalized domain names
def self.with_punycoded_names
list_data = File.read(PublicSuffix::List::DEFAULT_LIST_PATH)
def self.with_punycoded_names(encoding: Encoding::UTF_8)
list_path = PublicSuffix::List::DEFAULT_LIST_PATH
list_data = File.read(list_path, encoding: encoding)
list = PublicSuffix::List.parse(list_data, private_domains: false)

punycoded_names(list).each do |punycoded_name|
Expand Down
30 changes: 30 additions & 0 deletions spec/lib/twingly/public_suffix_list_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
require "spec_helper"

require "twingly/public_suffix_list"

describe Twingly::PublicSuffixList do
describe ".with_punycoded_names" do
subject { described_class.with_punycoded_names(encoding: encoding) }

context "when the list is data is read with the default encoding" do
subject { described_class.with_punycoded_names }

it { is_expected.to be_a(PublicSuffix::List) }
end

context "when the list data is read as UTF-8" do
let(:encoding) { Encoding::UTF_8 }

it { is_expected.to be_a(PublicSuffix::List) }
end

context "when the list data is read as US-ASCII" do
let(:encoding) { Encoding::US_ASCII }

it "parsing the data will fail" do
expect { subject }.
to raise_error(ArgumentError, "invalid byte sequence in US-ASCII")
end
end
end
end