Skip to content

feat(input_schema): Enable secret objects #515

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

MFori
Copy link
Member

@MFori MFori commented Jun 5, 2025

Based on the issue:

An ideal solution would be if we could encrypt objects

This is proposal to enable isSecret boolean property for object fields in Input Schema.
Currently this is available only for string fields with editors either textfield or textarea.

The object fields with isSecret: true won't be able to specify some properties that "normal" object fields can do, such as: default, prefill, patternKey, patternValue, minProperties, maxProperties.
Editor can be only json or hidden. This restriction is basically the same as with the secret string property.

In addition to change in the Input Schema's JSON schema it's also needed to change the encryption/decryption logic.
If input is string, there is no change and the encrypted value is stored in ENCRYPTED_VALUE:base64:base64 form.
In case of object input, we need to keep the type of the encrypted value to be still object because of validation of input in other stages.

This propose to store encrypted objects as object with structure:

{
  "secret": "encrypted-stringified-json-of-original-object"
}

where the secret key, contains same string value as normal encrypted string property.

When decrypting an encrypted value the logic is exactly the same as with string. We would check if the object has secret field and if the value is string that match the encrypted string regex.

I didn't find any other place where would this change causing issues.

@MFori MFori self-assigned this Jun 5, 2025
@MFori MFori added the t-console Issues with this label are in the ownership of the console team. label Jun 5, 2025
@github-actions github-actions bot added this to the 116th sprint - Console team milestone Jun 5, 2025
@github-actions github-actions bot added the tested Temporary label used only programatically for some analytics. label Jun 5, 2025
@MFori MFori requested review from fnesveda, gippy, jancurn and valekjo June 5, 2025 15:22
@MFori MFori marked this pull request as ready for review June 5, 2025 15:23
@jancurn
Copy link
Member

jancurn commented Jun 6, 2025

Thanks for the proposal. I have a few notes/questions:

  • Cookies typically have array type in input schema, not object. Perhaps we could support arrays too right away?
  • Why don't we simply do JSON.stringify for any value? If the object is a string, the text will be formatted with double quotes. If an object, then with brackets. We simply encrypt this value on stroing, decrypt on retrieval, do JSON.parse() and everything works find, no? What am I missing?
  • Will this be backwards compatible? Let's say if we add isSecret: true to cookies field in Website Content Crawler, will it keep working when users call it in API the old way or have stored tasks unencrypted?

@gippy
Copy link
Member

gippy commented Jun 7, 2025

Ah sorry, the first one is my fault, I forgot that cookies are an array when specifying it

@MFori
Copy link
Member Author

MFori commented Jun 7, 2025

Cookies typically have array type in input schema, not object. Perhaps we could support arrays too right away?

Yeah, that should be straightforward, in the same way as with objects

Why don't we simply do JSON.stringify for any value? If the object is a string, the text will be formatted with double quotes. If an object, then with brackets. We simply encrypt this value on stroing, decrypt on retrieval, do JSON.parse() and everything works find, no? What am I missing?

The issue is that if the type of the field is defined as object then JSON Schema (and Ajv) validates the value to be object and client (Actor) expect to get object from Actor.getInput(). This is fulfilled if the value is not encrypted, but once we want to encrypt it and do just JSON.stringify (and encrypt this string value) the value would be string and JSON schema validation would be failing.

We can add json editor to string field so in the UI it would look like the input is object, but then in Actor the developer would need manually parse the string (which can fail, because there wouldn't be any validation that it's object when running via API).

Will this be backwards compatible? Let's say if we add isSecret: true to cookies field in Website Content Crawler, will it keep working when users call it in API the old way or have stored tasks unencrypted?

Yeah it would be, this works the same as string field decryption and if the value doesn't look like it's encrypted no decryption is done at all. Value looks like it's encrypted means in this case that the object has field secret which is string that match the regex ENCRYPTED_VALUE:base64:base64.

will it keep working when users call it in API the old way

There won't by any change in API call at all, since the input is provided unencrypted and is encrypted by API internally

@valekjo
Copy link
Member

valekjo commented Jun 11, 2025

Hey,
sorry, I won't be able to properly review it. I don't have a super strong opinion on this, but I'm more inclined to encoding the values in a different way - here is why :)

I understand that the representation was chosen so that the basic type is always matching, and it's not a problem for the validation.

But the basic type (string/array/object) is all that can be validated, nothing else - so it's not very strong.

So instead I'd go with something like ENCRYPTED_JSON_VALUE:..., which would be always string - because it's passed around as string. The user can input anything (meaning the editor can be whatever), and the Actor would decrypt and parse those fields.

Until it's decrypted, nobody should know what's inside (except the user).

From the perspective of types, you'd have string on the input, and unknown when you parse it in the Actor (it would be up to the author to deal with that).

We can also use that later to cover the string case we have now.

To compare it, the solution in the PR would look like this when you pass it encrypted:

{
	stringField: encrypted,
	objectField: { encrypted },
	arrayField: [ encrypted ],
}	

And like this when you pass it un-encrypted:

{
	stringField: raw,
	objectField: raw,
	arrayField: raw
}

Meaning that the encrypted case looks weird - we need a special representation for every type.

I'd be more inclined to

{
	stringField: encrypted,
	objectField: encrypted,
	arrayField: encrypted,
}

And if passed as not-encrypted:

{
	stringField: JSON.stringify(raw),
	objectField: JSON.stringify(raw),
	arrayField: JSON.stringify(raw),
}

Meaning that the not-encrypted case looks weird - which is fine, because the encrypted is the default.

I won't make it to the rest of the discussion, just wanted to chip in with this :)

@MFori
Copy link
Member Author

MFori commented Jun 11, 2025

So instead I'd go with something like ENCRYPTED_JSON_VALUE:..., which would be always string - because it's passed around as string. The user can input anything (meaning the editor can be whatever), and the Actor would decrypt and parse those fields.

I don't see any additional value having the ENCRYPTED_JSON_VALUE:... prefix in the encrypted string over the ENCRYPTED_VALUE because we wouldn't parse it anywhere as:

  • on FE the secret value is write only, we never open the editor and fill the stored value in
  • in Actor's Actor.getInput() we still need to return string representation, because as a developer I would expect the value match the type of the field - so again no parsing JSON string to object (it's up to Actor developer to parse it).

Generally speaking with you suggestion we still wouldn't support encryption objects and arrays.

Only one thing we can do is to add support for json editor for secret string input, which would behave in UI as an object input. But running via API would require to stringify the input.

@jancurn
Copy link
Member

jancurn commented Jun 15, 2025

I only read this quickly, but I feel:

{
	stringField: "secret_value:...",
	objectField: {  "secret": "secret_value:..."" },
	arrayField: [ {  "secret": "secret_value:..."" } ],
}	

is a weird "Potemkin" JSON, which will not even be compatible with objects produced by our editors and now the newly introduced sub-schemas, so it only solves some cases, but not the others.

The issue is that if the type of the field is defined as object then JSON Schema (and Ajv) validates the value to be object and client (Actor) expect to get object from Actor.getInput(). This is fulfilled if the value is not encrypted, but once we want to encrypt it and do just JSON.stringify (and encrypt this string value) the value would be string and JSON schema validation would be failing.

Is this really a problem? When you call an Actor via API, you pass a full unencrypted JSON input, which is validated as is, and encrypted when storing to KV-store. It's only when Actor one would fetch the JSON from KV-store they will receive an object which will not pass validation. But everyone should use Actor.getValue() which auto-decrypts it and returns valid object also, no?

@MFori
Copy link
Member Author

MFori commented Jun 16, 2025

Is this really a problem? When you call an Actor via API, you pass a full unencrypted JSON input, which is validated as is, and encrypted when storing to KV-store. It's only when Actor one would fetch the JSON from KV-store they will receive an object which will not pass validation. But everyone should use Actor.getValue() which auto-decrypts it and returns valid object also, no?

Yes when you're calling an Actor via API, this should work as you are describing, but when you would start the Actor from UI, the encrypted input is stored right as you pasted it to UI input and this encrypted value is then used to call the API. And in this step we would be sending string where object is expected. I did this diagram to showcase the issue:

secret_input

What we can do is to catch the error produced by JSON Schema and check if the field has isSecret set to true and if the string looks like encrypted object. Here we can use the @valekjo's suggestion with ENCRYPTED_JSON_VALUE prefix. I'm not sure how easy would be to catch the error from Ajv - it's probably doable, but then the input wouldn't be "valid" against the schema without our additional logic.

which will not even be compatible with objects produced by our editors and now the newly introduced sub-schemas, so it only solves some cases, but not the others.

I supposed that the secret field wouldn't be able to define sub-schema, same as we're doing with properties like pattern, minLength and maxLength which are forbidden once the string input is marked as secret. Object properties like patternKey, patternValue, minProperties and maxProperties are already excluded from secret object property in this PR.

@jancurn
Copy link
Member

jancurn commented Jun 16, 2025

Thanks for the clarification. But it really feels like we're changing the external logic (and consequently the Actor specification that we want to make an open standard) just to make some internal implementation easier for us, which is a short-sighted way to design open systems.

I'm sure there's some smarter way to do it. For example, we could prefix the encrypted JSON with ENCRYPTED_JSON/xyz:val, where xyz would be a hash of the field's schema (skipping non-important properties like description and ordering keys alphabetically for stable hashes). This way, we could validate that the schema changed and the encrypted value might no longer match, indicating it somehow in UI to the users.

Also I don't think it's a good idea to block sub-schemas for encrypted objects or arrays. For example, we want to use this new functionality for storing cookies, which need to have a specific format. If we don't validate what users enter in UI or use in API, it will be just another source of user errors and bad UX. We probably cut this corner back in the days when we designed encrypted value for strings, but let's not perpetuate that hack in the future if we can do it better now.

CCing @mtrunkat for visibility

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-console Issues with this label are in the ownership of the console team. tested Temporary label used only programatically for some analytics.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants