Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Fix #35 #38

Closed
wants to merge 9 commits into from
Closed

Conversation

guizmaii
Copy link
Contributor

@guizmaii guizmaii commented Nov 7, 2019

fix #35

Problems

As I understand it, the issue #35 expresses two different problems.

As I understand it, the first problem is the lack of extensibility of the error model in SimpleSource.

Currently, there's a finite set of possible error Reasons that the user has to use to produce a CommandError.
When a user want its own error to express his business error, he can only specify a message, which is a String, to customize the CommandError.
So, for now, the user business errors are expressed as aString combined with a Reason that, hopefully, matches the business error expressed in the message.

The second problem expressed in #35, is that currently the errors have to be handled in two different places:

  • The CommandHandler for business errors
  • Outside the CommandHandler for NotFoundErrors

To keep things simple, in this PR I only try to fix the first problem.

Proposal

My proposition in this PR follows these constraints.

I want the user to be able to extend our error model to express its own business errors.
I want composability.
I want something that a Java developer can understand.
I want as much as type safety as possible.
I want to stay close to the previous error management model because I can't spend 2 weeks to work on that.

My proposition in this PR to fix that problem is the following.

The typical Java error management is done via exceptions.
Exceptions are extensible (inheritance) and composable (thanks to their constructor magic).
So, in this PR I explore a solution based on exceptions.

The solution is simple.
Now, the CommandError extends RuntimeException.
The Reasons become (final) classes that extends CommandError.
If the user want to express his own errors, he can create classes extending CommandError.

Limitations

1. Exceptions serialization

SimpleSource is encoding the CommandErrors as a product of a String and a Reason (which is an enum) for a good reason: it serializes them.

As explained in this comment #38 (comment), the problem with exceptions is that, apparently, no one (except maybe Akka, I still have to check that) knows how to correctly and efficiently serialize them.

As explained in the previous comment, I implemented a serialization mechanism but:

  • it's not an efficient one
  • it produces potentially huge messages.
  • I think that it's unsafe because, IIRC, Java can only serialize classes that extend Serializable AND that contain fields that are themselves serializable. Otherwise, I don't know what happen. My hypothesis is that it crashes at runtime?
  • I don't think that it's very robust in regards of error modeling evolutions. If the user changes his encoding of one of its 'CommandError', I don't think that Java will be able to deserialize old errors.

The advantage of that mechanism is that it's generic and the user doesn't need to provide any custom serializer.

2. Java Serialization

The serialization mechanism provided uses the Java serialization mechanism which is known to be awful.

Example of use

I adapted the simplesource-examples with this solution here: https://github.com/simplesourcing/simplesource-examples/pull/21/files

@@ -207,7 +198,7 @@ private static Schema aggregateUpdateResultSchema(final Schema aggregateSchema)
private static GenericRecord fromReason(final Schema schema, final CommandError commandError) {
return new GenericRecordBuilder(schema)
.set(ERROR_MESSAGE, commandError.getMessage())
.set(ERROR_CODE, commandError.getReason().name())
.set(ERROR, Base64.getEncoder().encodeToString(SerializationUtils.serialize(commandError)))
Copy link
Contributor Author

@guizmaii guizmaii Nov 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Avro, we can maybe be more efficient in terms of space use by using Base64.getEncoder().encode in place of Base64.getEncoder().encodeToString and declare the field as a byteType here https://github.com/simplesourcing/simplesource/pull/38/files#diff-717500181ed887fe65e9940056d96a5cR236

but the problem is still the same in terms of computation time and doesn't help for the JSON serialization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested. I can't make the previous proposition work. I don't know why.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would definitely be better for Avro serialization.

Making the field a bytesType should work. You may need to wrap the byte[] as a ByteBuffer
ByteBuffer.wrap(SerializationUtils.serialize(commandError)) or something like that.

For Json, a Base64 encoded string would be acceptable.

private static final String WRITE_SEQUENCE = "writeSequence";
private static final String AGGREGATION = "aggregate_update";

private static GenericRecord fromReason(final Schema schema, final CommandError commandError) {
return new GenericRecordBuilder(schema)
.set(ERROR_MESSAGE, commandError.getMessage())
.set(ERROR_CODE, commandError.getReason().name())
.set(ERROR_CLASS, commandError.getClass().getCanonicalName())
Copy link
Contributor Author

@guizmaii guizmaii Nov 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ERROR_MESSAGE and ERROR_CLASS are here for debug purpose. They're not used in the deserialization mechanism.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we are embedding Java serialised classes in Avro, we won't otherwise be able to view the exception on via the avro console viewer, so we probably still need these, much as I'd like to remove them.

}

private static Schema aggregateUpdateResultSchema(final Schema aggregateSchema) {
final Schema reasonSchema = SchemaBuilder
.record(aggregateSchema.getName() + "Reason")
.fields()
.name(ERROR_MESSAGE).type().stringType().noDefault()
.name(ERROR_CODE).type().stringType().noDefault()
.name(ERROR_CLASS).type().stringType().noDefault()
Copy link
Contributor Author

@guizmaii guizmaii Nov 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ERROR_MESSAGE and ERROR_CLASS are here for debug purpose. They're not used in the deserialization mechanism.

@@ -207,7 +202,8 @@ private static Schema aggregateUpdateResultSchema(final Schema aggregateSchema)
private static GenericRecord fromReason(final Schema schema, final CommandError commandError) {
return new GenericRecordBuilder(schema)
.set(ERROR_MESSAGE, commandError.getMessage())
.set(ERROR_CODE, commandError.getReason().name())
.set(ERROR_CLASS, commandError.getClass().getCanonicalName())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ERROR_MESSAGE and ERROR_CLASS are here for debug purpose. They're not used in the deserialization mechanism.

@@ -272,7 +273,8 @@ public JsonElement serialize(
private JsonElement serializeReason(final CommandError commandError) {
final JsonObject wrapper = new JsonObject();
wrapper.addProperty(ERROR_MESSAGE, commandError.getMessage());
wrapper.addProperty(ERROR_CODE, commandError.getReason().name());
wrapper.addProperty(ERROR_CLASS, commandError.getClass().getCanonicalName());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ERROR_MESSAGE and ERROR_CLASS are here for debug purpose. They're not used in the deserialization mechanism.

@@ -126,7 +129,8 @@ public JsonElement serialize(
private JsonElement serializeReason(final CommandError commandError) {
final JsonObject wrapper = new JsonObject();
wrapper.addProperty(ERROR_MESSAGE, commandError.getMessage());
wrapper.addProperty(ERROR_CODE, commandError.getReason().name());
wrapper.addProperty(ERROR_CLASS, commandError.getClass().getCanonicalName());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ERROR_MESSAGE and ERROR_CLASS are here for debug purpose. They're not used in the deserialization mechanism.

import java.util.function.Function;
import java.util.function.Supplier;

public class OptionalExtension {
Copy link
Contributor Author

@guizmaii guizmaii Nov 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Remove this class

import static org.junit.jupiter.api.Assertions.assertDoesNotThrow;
import static org.junit.jupiter.api.Assertions.assertTrue;

class FutureResultTest {
Copy link
Contributor Author

@guizmaii guizmaii Nov 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed these tests just because I didn't want to bother to make them compile again.

TODO: Put back these tests

/**
* https://stackoverflow.com/a/28526052/2431728
*/
@Value(staticConstructor = "of")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Remove this class

private static final String WRITE_SEQUENCE = "writeSequence";
private static final String AGGREGATION = "aggregate_update";

private static GenericRecord fromReason(final Schema schema, final CommandError commandError) {
return new GenericRecordBuilder(schema)
.set(ERROR_MESSAGE, commandError.getMessage())
.set(ERROR_CODE, commandError.getReason().name())
.set(ERROR_CLASS, commandError.getClass().getCanonicalName())
.set(ERROR, Base64.getEncoder().encodeToString(SerializationUtils.serialize(commandError)))
Copy link
Contributor Author

@guizmaii guizmaii Nov 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inefficient in terms of computation time and space taken by the produced message.

The question is: How do you correctly and efficiently serialize an Exception?

Kryo still does not provide a solution for that apparently. See EsotericSoftware/kryo#532
Chill doesn't help either: twitter/chill#327

Maybe Akka can help us here. They have a Status type that is a kind of serializable Try. See https://github.com/akka/akka/blob/b006567751f1ef63588b00e20ec51c905e8562d3/akka-actor/src/main/scala/akka/actor/Actor.scala#L310

I don't know how they serialize it, for now.

Off topic: Interesting read: https://lemire.me/blog/2019/01/30/what-is-the-space-overhead-of-base64-encoding/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is the big problem here.
Whatever we use as exception / error type has to be serializable.
I wouldn't want to bring in any library dependencies for this either.

@guizmaii guizmaii changed the title [WIP] Fix #35 [Proposal] Fix #35 Nov 8, 2019
Copy link
Contributor

@szoio szoio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with this change. Big improvement on what we have right now, more type safe, I suppose idiomatic, and in the end cleaner and simpler.
Would like to see the error being serialized as a ByteArray for Avro serialization.
The one downside is the errors are not viewable as Avro, so we have to keep the string message as well to make it debuggable. I can live with that.
Biggest downside/risk is we are tied an opaque exception serialization mechanism that may conceivable need to change, and we have no easy way of evolving this.

@@ -207,7 +198,7 @@ private static Schema aggregateUpdateResultSchema(final Schema aggregateSchema)
private static GenericRecord fromReason(final Schema schema, final CommandError commandError) {
return new GenericRecordBuilder(schema)
.set(ERROR_MESSAGE, commandError.getMessage())
.set(ERROR_CODE, commandError.getReason().name())
.set(ERROR, Base64.getEncoder().encodeToString(SerializationUtils.serialize(commandError)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would definitely be better for Avro serialization.

Making the field a bytesType should work. You may need to wrap the byte[] as a ByteBuffer
ByteBuffer.wrap(SerializationUtils.serialize(commandError)) or something like that.

For Json, a Base64 encoded string would be acceptable.

@szoio
Copy link
Contributor

szoio commented Nov 12, 2019

Thinking about this a bit more, for me, for me Java serialisation is a bit of a deal breaker.
As much as Simple Sourcing is Java end-to-end, all the data it produces should ideally be readable by any supported Kafka client in any language.

@andybritz
Copy link
Contributor

Tbh not sure this yet achieve its primary goals.

  1. Removal of string comparisons on custom errors.
  2. Simplification

I think while this adds some typing, it comes at a high complexity and I suspect potentially with no backward compatibility. I think we need spend more time on this on.

@guizmaii
Copy link
Contributor Author

guizmaii commented Nov 19, 2019

@szoio @andybritz Yes, after a second thought, I also think that this proposal is not good enough. The serialization problem is important. We can't solve it with a simple solution like this.

I don't remember how Lagom is handling this problem, especially with Java since I always used it with Scala, but it may be interesting to see how they solve it.

The problem exposed in #35, to be fixed, requires to first find how to serialize errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Better client API for error handling
3 participants