-
Notifications
You must be signed in to change notification settings - Fork 6
Deduplication --> Antispam #293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# Conflicts: # cmd/gcp/main.go
# Conflicts: # go.mod # go.sum # Conflicts: # go.sum # Conflicts: # go.sum
# Conflicts: # deployment/modules/aws/tesseract/conformance/main.tf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are dedup related variables to be deleted in the deployment directory.
deployment/live/aws/test/README.md
Outdated
2. Run `terragrunt apply`. | ||
|
||
2. Run `terragrunt apply`. If this fails to create the antispam database, | ||
connect the RDS instance to your VM using the instrunctions bellow, and run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
connect the RDS instance to your VM using the instrunctions bellow, and run | |
connect the RDS instance to your VM using the instructions below, and run |
# creating the aws_db_instance. | ||
# This requires that the machine running terraform has access | ||
# to the DB instance created above. This is _NOT_ the case when | ||
# github actions are applying the terraform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# github actions are applying the terraform. | |
# GitHub actions are applying the terraform. |
cmd/gcp/main.go
Outdated
@@ -53,7 +54,8 @@ var ( | |||
origin = flag.String("origin", "", "Origin of the log, for checkpoints and the monitoring prefix.") | |||
bucket = flag.String("bucket", "", "Name of the bucket to store the log in.") | |||
spannerDB = flag.String("spanner_db_path", "", "Spanner database path: projects/{projectId}/instances/{instanceId}/databases/{databaseId}.") | |||
spannerDedupDB = flag.String("spanner_dedup_db_path", "", "Spanner deduplication database path: projects/{projectId}/instances/{instanceId}/databases/{databaseId}.") | |||
spannerAntispamDB = flag.String("spanner_antispam_db_path", "", "EXPERIMENTAL: Spanner antispam deduplication database path projects/{projectId}/instances/{instanceId}/databases/{databaseId}.") | |||
inMemoryAntispamCacheSize = flag.Uint("inmemory_antispam_cache_size", 2<<10, "Maximum number of entries to keep in the in-memory antispam cache.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inMemoryAntispamCacheSize = flag.Uint("inmemory_antispam_cache_size", 2<<10, "Maximum number of entries to keep in the in-memory antispam cache.") | |
inmemoryAntispamCacheSize = flag.Uint("inmemory_antispam_cache_size", 2<<10, "Maximum number of entries to keep in the in-memory antispam cache.") |
cmd/aws/main.go
Outdated
dbHost = flag.String("db_host", "", "AuroraDB host") | ||
dbPort = flag.Int("db_port", 3306, "AuroraDB port") | ||
dbUser = flag.String("db_user", "", "AuroraDB user") | ||
dbPassword = flag.String("db_password", "", "AuroraDB password") | ||
dbMaxConns = flag.Int("db_max_conns", 0, "Maximum connections to the database, defaults to 0, i.e unlimited") | ||
dbMaxIdle = flag.Int("db_max_idle_conns", 2, "Maximum idle database connections in the connection pool, defaults to 2") | ||
dedupPath = flag.String("dedup_path", "", "Path to the deduplication database.") | ||
inMemoryAntispamCacheSize = flag.Uint("inmemory_antispam_cache_size", 2<<10, "Maximum number of entries to keep in the in-memory antispam cache.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inMemoryAntispamCacheSize = flag.Uint("inmemory_antispam_cache_size", 2<<10, "Maximum number of entries to keep in the in-memory antispam cache.") | |
inmemoryAntispamCacheSize = flag.Uint("inmemory_antispam_cache_size", 2<<10, "Maximum number of entries to keep in the in-memory antispam cache.") |
cmd/aws/main.go
Outdated
if *antispamDBName != "" { | ||
antispam, err = aws_as.NewAntispam(ctx, antispamMysqlConfig().FormatDSN(), aws_as.AntispamOpts{}) | ||
if err != nil { | ||
klog.Exitf("Failed to create new GCP antispam storage: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
klog.Exitf("Failed to create new GCP antispam storage: %v", err) | |
klog.Exitf("Failed to create new AWS antispam storage: %v", err) |
if *dbPort == 0 { | ||
klog.Exit("--db_port must be set") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default dbPort
is 3306 instead of 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is working as intended: it's ok to use the default port, but 0 would indicate that it's the default integer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If --db_port
is not set, the *dbPort
value would be 3306
. The log message is not valid.
if *dbUser == "" { | ||
klog.Exit("--db_user must be set") | ||
} | ||
// Empty passord isn't an option with AuroraDB MySQL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Empty passord isn't an option with AuroraDB MySQL. | |
// Empty password isn't an option with AuroraDB MySQL. |
deployment/live/aws/test/README.md
Outdated
@@ -95,7 +99,9 @@ go run ./cmd/aws \ | |||
--db_port=3306 \ | |||
--db_user=tesseract \ | |||
--db_password=${TESSERACT_DB_PASSWORD} \ | |||
--dedup_path=test-static-ct | |||
--antispam_db_name=antispam_db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--antispam_db_name=antispam_db | |
--antispam_db_name=antispam_db \ |
deployment/live/aws/test/README.md
Outdated
@@ -187,7 +193,9 @@ go run ./cmd/aws \ | |||
--db_port=3306 \ | |||
--db_user=tesseract \ | |||
--db_password=${TESSERACT_DB_PASSWORD} \ | |||
--dedup_path=test-static-ct | |||
--antispam_db_name=antispam_db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--antispam_db_name=antispam_db | |
--antispam_db_name=antispam_db \ |
ddl = [ | ||
"CREATE TABLE IF NOT EXISTS FollowCoord (id INT64 NOT NULL, nextIdx INT64 NOT NULL) PRIMARY KEY (id)", | ||
"CREATE TABLE IF NOT EXISTS IDSeq (h BYTES(32) NOT NULL, idx INT64 NOT NULL) PRIMARY KEY (h)", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ddl = [ | |
"CREATE TABLE IF NOT EXISTS FollowCoord (id INT64 NOT NULL, nextIdx INT64 NOT NULL) PRIMARY KEY (id)", | |
"CREATE TABLE IF NOT EXISTS IDSeq (h BYTES(32) NOT NULL, idx INT64 NOT NULL) PRIMARY KEY (h)", | |
] | |
ddl = [ | |
"CREATE TABLE IF NOT EXISTS FollowCoord (id INT64 NOT NULL, nextIdx INT64 NOT NULL) PRIMARY KEY (id)", | |
"CREATE TABLE IF NOT EXISTS IDSeq (h BYTES(32) NOT NULL, idx INT64 NOT NULL) PRIMARY KEY (h)", | |
] |
cmd/aws/main.go
Outdated
@@ -230,3 +236,32 @@ func storageConfigFromFlags() taws.Config { | |||
MaxIdleConns: *dbMaxIdle, | |||
} | |||
} | |||
|
|||
func antispamMysqlConfig() *mysql.Config { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func antispamMysqlConfig() *mysql.Config { | |
func antispamMySQLConfig() *mysql.Config { |
https://google.github.io/styleguide/go/decisions#initialisms
cmd/aws/main.go
Outdated
dbHost = flag.String("db_host", "", "AuroraDB host") | ||
dbPort = flag.Int("db_port", 3306, "AuroraDB port") | ||
dbUser = flag.String("db_user", "", "AuroraDB user") | ||
dbPassword = flag.String("db_password", "", "AuroraDB password") | ||
dbMaxConns = flag.Int("db_max_conns", 0, "Maximum connections to the database, defaults to 0, i.e unlimited") | ||
dbMaxIdle = flag.Int("db_max_idle_conns", 2, "Maximum idle database connections in the connection pool, defaults to 2") | ||
dedupPath = flag.String("dedup_path", "", "Path to the deduplication database.") | ||
inMemoryAntispamCacheSize = flag.Uint("inmemory_antispam_cache_size", 2<<10, "Maximum number of entries to keep in the in-memory antispam cache.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'2048' might well be enough, but I'd be tempted default to a larger size which can be tuned down later/via flag if need be.
I've been using 256k as a place-holder in the conformance bins (32+8 bytes of data ==> 10MB of raw cached data plus whatever overhead Go's map
incurs)
cmd/gcp/main.go
Outdated
@@ -53,7 +54,8 @@ var ( | |||
origin = flag.String("origin", "", "Origin of the log, for checkpoints and the monitoring prefix.") | |||
bucket = flag.String("bucket", "", "Name of the bucket to store the log in.") | |||
spannerDB = flag.String("spanner_db_path", "", "Spanner database path: projects/{projectId}/instances/{instanceId}/databases/{databaseId}.") | |||
spannerDedupDB = flag.String("spanner_dedup_db_path", "", "Spanner deduplication database path: projects/{projectId}/instances/{instanceId}/databases/{databaseId}.") | |||
spannerAntispamDB = flag.String("spanner_antispam_db_path", "", "EXPERIMENTAL: Spanner antispam deduplication database path projects/{projectId}/instances/{instanceId}/databases/{databaseId}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to remove the EXPERIMENTAL:
warning here, or also add it to the aws
flags? Should probably be consistent unless we do actually want to indicate there's a difference.
cmd/gcp/main.go
Outdated
@@ -157,9 +159,19 @@ func newGCPStorage(ctx context.Context, signer note.Signer) (*storage.CTStorage, | |||
return nil, fmt.Errorf("failed to initialize GCP Tessera storage driver: %v", err) | |||
} | |||
|
|||
var antispam tessera.Antispam | |||
// Persistent antispam is currently experimental, so there's no terraform or documentation yet! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think there might be terraform now (at least in Tessera), although docs may need updating :)
"--roots_pem_file=/bin/test_root_ca_cert.pem", | ||
"--origin=${var.base_name}${var.origin_suffix}", | ||
"--signer_public_key_secret_name=${var.signer_public_key_secret_name}", | ||
"--signer_private_key_secret_name=${var.signer_private_key_secret_name}", | ||
"--inmemory_antispam_cache_size=25000000", # About 1GB of memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For potential future CL, wondering if something like https://pkg.go.dev/github.com/dustin/go-humanize might be helpful in making numerical flags understand more friendly representations of large numbers?
internal/ct/handlers.go
Outdated
if err != nil { | ||
klog.Warningf("AddCertIndex(): failed to store certificate index: %v", err) | ||
klog.V(2).Infof("%s: %s => storage.Add", log.origin, method) | ||
index, err := log.storage.Add(ctx, entry)() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to do something with index.IsDup
and fetch the old timestamp if it's true?
storage/storage.go
Outdated
awaiter := tessera.NewPublicationAwaiter(ctx, cts.reader.ReadCheckpoint, 10*time.Millisecond) | ||
idx, cpRaw, err := awaiter.Await(ctx, f) | ||
if err != nil { | ||
return 0, 0, fmt.Errorf("error waiting for Tessera future and its integration: %v", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should create a single instance of this (maybe in the storage?) and reuse it across all add
requests, otherwise you'll have lots and lots of goroutines spun up, many requests for /checkpoint
, etc.
Might be worth upping the loop time to something a bit higher, say 200
-300ms
? 10ms
is fetching /checkpoint
100x per second!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. There's already a TODO somewhere to fine tune this value together with the request deadline. I've raised it to 200ms for now, and we can edit it later if need be.
storage/storage.go
Outdated
|
||
eIdx := idx.Index % layout.EntryBundleWidth | ||
if uint64(len(eb.Entries)) < eIdx { | ||
return 0, 0, fmt.Errorf("failed to read entry %d in entry bundle %d", eIdx, eBIdx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: entry bundle at index %d has only %d entries, but wanted at least %d
This is really a fatal log corruption failure; if ckptSize > eBIdx
, then the EB must exist and have sufficient entries.
storage/storage.go
Outdated
e := staticct.Entry{} | ||
if err := e.UnmarshalText([]byte(eb.Entries[eIdx])); err != nil { | ||
return 0, 0, fmt.Errorf("failed to unmarshal entry %d in entry bundle %d: %v", eIdx, eBIdx, e) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe one to keep in the back-pocket, but you could just read the timestamp from the first 8 bytes here, and avoid reading/parsing the rest of the entry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah good point, the timestamp is at the beginning of the entry! I think we can do even smarter things here, since we already have to parse every entries in the bundle to get a to a specific entry since we don't know the size of every entry. I've left a TODO to make this better, once the APIs are a bit more mature and we know exactly what we need to read (as a matter of fact, it's changing in this PR!). The change you suggested was too small not to be implemented, so I just did it tough.
if t > math.MaxInt64 { | ||
return 0, fmt.Errorf("invalid data tile: timestamp %d > math.MaxInt64", t) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to duplicate the check here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woops.
Towards #292
While I'm there, fix the aws test deployment instructions.