-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor how spark options are passed to Options/JetStreamOptions builders #30
Comments
Yes I agree the V1 is not great for the options and definitely there’s added complexity stemming from the original coding that got worse as I had to add some features on top of it.
What would be good to do is have the options passed to the connector indeed be compatible with what nats.java uses such that the connector code can simply pass the options directly to the nats.java builder. If we can do this, then I'm ok with forgoing any attempt at keeping the V1 option names (especially since I haven't actually properly released V2 yet, so I'm ok with doing that break and announcing it clearly in the V2 release notes). |
I'm close to getting a working version of this.
Here's an example of what it looks like: val initDF = spark.readStream
.format("org.apache.spark.sql.nats") // will look for org.apache.spark.sql.nats.DefaultSource
.option("io.nats.client.url", s"nats://${nats_host}:${nats_port}")
.option("io.nats.client.credential.bytes", credentialBytesString)
.option("io.nats.client.keyStore", credentialBytesString)
.option("io.nats.client.keyStorePassword", nats_trust_store)
.option("io.nats.client.trustStore", credentialBytesString)
.option("io.nats.client.trustStorePassword", nats_trust_store)
.option("io.nats.client.ssl.context.factory.class", factoryClassName)
.option("io.nats.client.tls.algorithm", tls_alg)
.option("io.nats.pullsubscribe.stream", nats_subscriber_stream)
.option("io.nats.pullsubscribe.name", nats_durable) // ...durable will also work
.option("io.nats.jetstream.prefix", nats_api_prefix)
.option("io.nats.streaming.payloadCompression", "none") // none or zlib
.option("io.nats.streaming.maxBatch", "12000") // maximum messages in a minibatch
.option("io.nats.streaming.maxFetchCount", "1000") // none or zlib
.option("io.nats.streaming.maxFetchWaitMs", "5000") // none or zlib
.load() any preliminary feedback before I finalize and open a PR? |
I still need to add support for nats sink and consumer create options however I do not have a good way to test in our cloud. Can nats be self-hosted in docker? If so setting up a compose file with nats & spark could be used to run integration tests for this library |
I like this pitch for parameter names! |
and you should be able to very easily run nats in docker, at least I do |
What motivated this proposal?
I really like this library and want it to succeed. However, while trying to contribute I'm feeling significant technical debt on how options are passed in the code.
Currently the pattern for passing options to Options/JetStreamOptions is somewhat ad-hoc. I can't seem to find a common pattern for . This makes it difficult to develop on top of the v2 library and confusing to users. Ideally the option names would be self-descriptive and the nested string pattern would be useful.
For example, I'm working on a large feature for adding SSL/TLS support and there's no obvious option name for the many options I need to add. I could reuse some of them from v1 but that doesn't follow a pattern either.
The code itself is difficult to figure out and follow which makes changes a struggle. For example, "nats.host" spark option is passed to JetStreamConfig which is passed to NatsConnectionConfig which is used by the options builder. This makes adding new option a task that spans multiple files (this feels to me like a classic case of technical debt). Ideally this structure would be flattened such that adding a new option would be possible in one class with an intentional name.
I propose a new structure here which aims to standardize how options are passed to the library and used in code and reduce technical debt. By leveraging abstractions already defined in nats.java to 1) standardize spark option names, and 2) reduce the number of internal classes used by the library.
What is the proposed change?
I would love to establish a new pattern that makes exposing builder parameters to spark follow a common option name and coding pattern based on the builders that nats.java uses.
Here is what I propose for the spark option names: "nats.[alias for builder class].[builder method]"
fetch
)= "pull"The internal config classes should reflect this hierarchy directly instead of feeling arbitrary structured as is today.
For example, instead of the host option being passed from JetStreamConfig->NatsConnectionConfig->builder (which is tricky to track down and requires updating code in multiple places), the host would be passed from spark option "nats.connection. server" -> NatsConnectionConfig -> builder
This gives a clear way to use and contribute to this library that matches the nats.java documentation.
This could include a few QOL exceptions that differs (for example using "nats.connection.port" & "...host" instead of ..."server"). Following a standard by default will help new developers and users as well as keep this library in sync with its dependency on nats.java.
Who benefits from this change?
Users since the options will follow a pattern. They can leverage the nats.java docs to use this library.
Developers, since building on this library to add new features using the options will be easier.
What alternatives have you evaluated?
Maybe I'm missing something? Updating the documentation to clarify the current pattern for option names and describe how to contribute new ones would also work. However, I can't for the life of me figure out what it is. I feel like it is arbitrary as many of the nats.java library options are not passed and data is passed to builders across layers of arbitrary config classes.
The text was updated successfully, but these errors were encountered: