Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Multiple Language Support + Application Discovery in Hub #173

Merged
merged 10 commits into from
May 28, 2024
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
392 changes: 392 additions & 0 deletions enhancements/multiple-provider/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,392 @@
---
title: neat-enhancement-idea
authors:
- "jortel@redhat.com"
reviewers:
- TBD
approvers:
- TBD
creation-date: 2024-04-25
last-updated: 2024-04-25
status: provisional|implementable|implemented|deferred|rejected|withdrawn|replaced
see-also:
replaces:
superseded-by:
---

# Support Multiple Languages


## Release Signoff Checklist

- [ ] Enhancement is `implementable`
- [ ] Design details are appropriately documented from clear requirements
- [ ] Test plan is defined
- [ ] User-facing documentation is created

## Summary

### Problem ###

Currently: Each addon defines a single container.
Currently: analysis is performed by running a task that specifies the `tackle2-addon-analyzer` addon. The name of the addon is hard-coded in the UI. This addon image incorporates support goland and Java providers.

The analyzer images are being refactored and the analyzer image will no longer bundle language providers. Instead, each provider will be packaged as a separate image. To support this, addons will need to be defined multiple containers. A _main_ container for the addon and additional (sidecar) containers (one for each provider).

Future features may require runtime selection of an addon based on the _kind-of_ task and the _kind-of_ subject (application|platform|...) the task is executed on.
Examples:
- application technology discovery.
- platform discovery.
- platform analysis.

Questions/Assumptions:
- Do we need to support multi-language applications? _Yes_.
- Can the filtering of targets/rulesets and selection of providers be driven by `language` tags? _Yes_.

## Motivation

Applications may be implemented by languages other than Java.
Addons other than analysis will be soon be introduced.

### Goals

- Dynamically determine which addon to execute based on facts known about the application in inventory.
- Dynamically determine which language providers to execute based on facts known about the application in inventory.
- User can run analysis without knowing the languages.
- Targets (rules) may be filtered by language.
- Targets (rules) may be pre-filtered by the UI when language is known.
- Language defaults to Java (until automated discovery feature added).
- User initiated task execution (Example: analysis) must be prioritized.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think we need to talk in this enhancement about how we are going to deal with providers that support being run on different platforms (ie. linux/amd64 and windows/amd64).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the addon and extensions run on the same pod, the addon would need to selected by OS. The addon selector could be used for this. For example: tag:OperatingSystem=Windows would select the Windows version of the analyzer addon.


### Non-Goals

## Proposal

#### Filtering/Selection ####
Targets will be filtered and Providers will be selected by `Language` tag(s). This will likely depend on the Application Discovery feature.

#### Addon Extensions ####

An Addon Extension is primarily an optional container definition along with criteria defined by a new CR. Providers will be defined to the hub as an Addon Extension. The first supported selection criteria will be by application tag matching.

#### Task Pod ####
The task pod will be configured with a _main_ container for the addon and a _sidecar_ container for each of the providers. All containers will mount an `EmptyDir` volume into /shared for sharing files such as credentials and source code trees.

#### Custom Providers ####
Users may define new providers for testing using the UI by added a new Extension CR that is mapped to a new Language tag. Then, add/replace the language tag so that extension (provider) is selected.

Testing with the API is simpler. The user would add the Extension, and submit an analysis tag using the API that references the extension explicitly.

### User Stories

### Implementation Details/Notes/Constraints

Constraints:
- Must support running the analyzer (addon) and providers in either:
- same container
- separate containers
- Must support _remote_ providers running outside the cluster.
- Must ensure all containers are terminated when analysis has completed.
- Must provide the user with provider output/logs.
- Must provide credentials to through provider settings. Example: maven settings.
- Must support TCP port assignment which is injected to the provider settings.
- Must support provider specific settings.

### Security, Risks, and Mitigations

## Design Details (core)

Support optional addon _Extensions_. Each extension is a (sidecar) container and most likely a _service_. For example: an RDBMS or LSP provider. All containers within a task pod are injected with the same items as the _main_ addon contianer:
- Shared (EmptyDir) volume.
- Security token.
- Task (id).

Both addon and extension selection may be either:
- explicit - (specified on the task)
- determined by matched criteria.

#### Selection ####

As selector uses Label-Selector syntax and supports the following:
- **tag**:_category_=_tag_
- **platform**:(source|target)=_kind_

Examples:

Java
selector: tag:Language=Java

Java or XML
selector: tag:Language=Java || tag:Language=XML

No language tag.
selector: ! tag:Language=

Java and Linux
selector: tag:Language=Java && tag:OperatingSystem=Linux
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty concerned with a new format for selector, the concept of tag: does not exist, and does not make it clear what I can do with this. I am worried that we keep running into each other with concepts, and considering that the hub is another way of running the CLI at scale, I believe that we need to make the UX as cohesive as possible between the two.

I would prefer that either we make the selectors specific (for tag, whatever else you are thinking of selcting on) or we just start with selector and assume that it will only ever work off of tags

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the systemic use of a generic concept like selector is a good thing and makes the UX more cohesive. The analysis rules are clearly selected using a label selectors. Extensions, they are clearly extension selectors (by nature of scope). The same is true for addons.
I don't see how introducing a completely new term/convention for expressing how addons and extensions are selected makes for a better UX. IMHO, it seems more confusing.
The tag: and platform: prefix provides for extensibility without changing the API and seems like a minor conceptual addition.

Copy link
Contributor Author

@jortel jortel May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, there is not reason to expect that extensions will only be selected by tags and addons will only be selected by tags. Given that both will need to be selected for source analysis, platform analysis and TBD.
For source analysis, the addon may be selected by tag.
For platform analysis, the addon will like be selected by platform (kind).
It's reasonable to anticipate the need the same flexibility for different types of discovery tasks.

The selector is a powerful, compact and extensible mechanism for supporting this and future needs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tag: and platform: prefix provides for extensibility without changing the API and seems like a minor conceptual addition.

I worry that this prefix will be expected in other selectors and will not exist; that is the confusion that I am worried about. This may not make sense as a worry, but I wanted to bring it up.

I agree that the <prefix>: to selectors is a suitable mechanism for extending selectors as we move forward for "things" as you said.

There are things that I would want to make sure that we have if we do go forward (and I think that the worry above is worth ignoring).

  1. we should have good validation, for if the is valid. As you said, in the others it is assumed what it is and they are valid, but when we open it up, people can make mistakes that would be hard to track down (why isn't my thing being selected, OHH i am use language:Java instead of tag:Language=java as an example)
  2. we should have a discovery mechanism to know what is valid, and because these selectors are fields in k8s resources, and we want them generic, there is no good way (see csi drivers IMO). Maybe this is a UX that we have to live with, and just make sure that we have good documentation.

WDYT? does that make sense?

Copy link
Contributor Author

@jortel jortel May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Definitely. The implementation reports errors for an unknown selector prefix. And, if the tag: selector references an unknown tag category.
  2. Not sure I understand this exactly. I'd anticipated the extension and addon controllers would validate the k8s references in the CRs. Is that what you meant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome re: validation, can you add a bit about that in the enhancement and how that works? if it is there, and I missed it I am sorry.

I was kind of hoping that we started to think of a "documentation" screen. Maybe this is outside the scope of this enhancement. One of the things that we are trying to do in Kantra is exposed via open API what providers give you for rules, and I was hoping that for this, we could do something similar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the outcome here is that we need to track two things:

  1. A validation feature in the UI for when selecting the implicit selectors, that you can not use <prefix>:
  2. A validation feature in kantra to do the same.

@ibolton336 @sjd78 @pranavgaikwad let me know if you folks see issues with that.


#### Addon & Extension CRs ####

New _Extension_ CR (Spec):
```
kind: Extension
spec:
addon (string):
container (core.v1.Container):
selector (string):
metadata (object):
```
Fields:
- `extension` - declares compatibility with addons.
- `container` - defines the extension container.
- `selector` - defines selector for inclusion in the addon task pod.
- `metadata` - defines unstructured information.

Changed _Addon_ CR (Spec):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can not understand the changes the spec.

Also if we are changing a k8s API, then we need to make sure that we understand what to do on upgrade with all the fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The incompatible changes are is part of a more holistic approach and better symmetry with extensions. The additions are needed for dynamic selection of addons we'll almost certainly needed for platform discovery (future). As for upgrade. We can include a conversion web hook and version our API but:

  • web hooks are a pain.
  • the API is at alpha.1
  • the only addon CRs are ours which are installed/updated by our operator.

That said, if folks feel strongly, we can include a conversion web hook. But I really don't think it's worth it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After relooking I missed the - and + that you add that was not the -/+ from the diff, that is my fault I am sorry :)

If we are just adding fields and not changing fields or removing them, then we just need to handle the sane defaults (for the things that exist).

I agree webhook should be a last resort and is not necessary unless we are changing a field.

Field removal, on the other hand, is something that is not something we should do in one release. Instead, we should deprecate and deal with the consequences of them being left and removed in a future release. I know this is a pain, but is probably the right thing to do for our users. I think we need to come up with a plan here.

I would suggest that we should follow a similar process for our APIs that the hub API is providing as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I share the same philosophy here. However, I am all but certain there are no addon CRs in our namespace besides the (1) we install and manage. Mainly because there is no way of running other from the UI. Also, I would argue that an alpha1 version offer no guarantee against this kind of change. It's only as a matter of practicality that I propose we don't do as your suggesting.
Leaving the resources and image fields in the spec introduces the opportunity for anomolies. What if the image and resources are defined in both the (old) fields and the new container field? Seems like we would be creating a problem complexity unnecessarily.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed w/ Jeff offline:

Plan A:
we remove the fields, but turn on strict enforcing. This will enable the users who were using the API that the fields that were removed are no longer valid.

Plan B:
If there are issues with upgrade/downgrade and removing the fields, we will leave them in place but add validation that makes it so they can't be set, with a deprecation warning, and will hopefully be able to be removed in a future release.

If anyone has thoughts let us know :)

```
- image:
- imagePullPolicy:
- resources:

+ kind (string):
+ container (core.v1.Container):
+ selector:
+ metadata (object):
```

Fields:
- `kind` - declares compatibility with a _kind_ of task.
- `container` - defines the addon container.
- `selector` - defines addon selector for task (kind).
- `metadata` - defines unstructured information.
- `schema` - _FUTURE_ defines the Task.Data schema.

For symmetry and completeness, the `metadata` field is added and the separate addon fields for the container are replaced with the entire k8s _Container_. This is mainly a general improvement while we're changing the CR.

Comment on lines +175 to +183
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need a mechanism to declare the platform/os an addon image can be run on.

Example:
```
---
kind: Addon
apiVersion: tackle.konveyor.io/v1alpha1
metadata:
namespace: konveyor-tackle
name: analyzer
spec:
container:
name: addon
imagePullPolicy: Always
image: quay.io/jortel/tackle2-addon-analyzer:provider

---
kind: Extension
apiVersion: tackle.konveyor.io/v1alpha1
metadata:
namespace: konveyor-tackle
name: java
spec:
addon: analyzer
selector: tag:Language=Java || !tag:Language
container:
name: java
imagePullPolicy: Always
image: quay.io/konveyor/java-external-provider
args:
- --port
- $(PORT)
env:
- name: PORT
value: ${seq:8000}
resources:
limits:
cpu: 1
memory: 3Gi
requests:
cpu: 1
memory: 3Gi
metadata:
resources:
- selector: identity:kind=maven
fields:
- name: settings
path: /shared/creds/maven/settings.xml
key: maven.settings.path
provider:
name: java
address: localhost:$(PORT)
initConfig:
- providerSpecificConfig:
mavenSettingsFile: $(maven.settings.path)

```

#### API ####

Task API resource changes:
```
+ Extensions []string `json: "extensions,omitempty"`.
```
Example:
```
addon: analyzer
extensions:
- java
```

#### Port Injection ####

The task manager will provide a set of injector (macros). Format: ${<_macro_}. Injector macros may be specified in the container.Spec:
- command
- args
- env

Automatic port assignment will be supported using a _sequence_ (generator) macro.
Format: ${seq:_pool_}.
Example:
```
container:
name: Test
env:
- name: PORT
value: ${seq:8000}
```

#### Kinds of Tasks ####

A task (definition) defines a _kind-of_ task which performs a specific action on a specific subject.

To prevent hard-coding tag names in the UI (anywhere), we may consider data-driven approaches.
Introducing _named_ task (definitions) which represent a known _kind-of_ task. Each (definition) defines the input schema and how an addon (that implements the task) may be selected. Addons (optionally) declare that they implement a kind-of task.

New _Task_ CR:
- name
- dependencies:
- data (object)

Fields:
- `name` the kind-of task.
- `dependencies` list of task names used to define execution dependency.
- `data` - data object passed to the addon.
- `schema` - _FUTURE_ defines the Task.Data schema.

```
kind=Task
name: discovery
addon:
- name: discovery
```
```
kind=Task
name: analysis
dependencies:
- discovery
```
```
kind=Task
name: platform-analysis
```
```
kind=Task
name: platform-discovery
```

Add `Task.Kind` which can be applied by the exiting task endpoints. Either kind OR addon and extensions may be specified but not both.

```
+ Kind []string `json: "kind,omitempty"`.
```
Example:
```
kind: analyzer
```

Add a new read-only endpoints.
- /tasks/:id/attached - _download (tgz) attached files._
- /k8s/addons - _moved from /addons_
- /k8s/addons/:name - _moved from /addons/:name_, includes extensions.
- /k8s/extensions - _new extensions colleciton_
- /k8s/extensions/:name - _new extension by name_

#### Task Priority ####

Tasks will be run in order of:
- priority
- id

Pods are created with the appropriate _PriorityClass_ to ensure node scheduling matches. Task dependencies will be escalated as needed to prevent priority inversion. When (dep) tasks are already scheduled (Pending), the pod will be recreated referencing the updated priority class.

When the nodes are saturated and task pods are phase=Pending, the task manager may reschedule (delete) task with lower priority in an effort for higher priority pods to be run by the node scheduler. PriorityClass preemption should handle this.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried about anything that starts to try and reschedule things IIUC. If we are using PriorityClass already then leaving the pods in phase pending should be sufficient unless I am misunderstanding something.

Copy link
Contributor Author

@jortel jortel May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trust me, I pushed back on this as well.

Actually references to PriorityClass needs to be removed ( I thought I did).

The sections here about priority escalation and preemption are a necessary reaction to issues introduced by the automated discovery which is required by the "multi-provider feature". Mainly that analysis tasks cannot be delayed by discovery tasks.

But a few words about PriorityClasses. I looked pretty hard at them with high hopes.
A few problems:

  • they are a cluster resource that affect all pods in all namepaces. I highly doubt any customer will let our operator install them (and should not). We would need to convince customers to add the classes we need and define them on the tackle CR. Customers would need to define a default (if one does not exist) to prevent our task pods from running at hiher priority than controllers. According to @rromannissen , requiring users to do this a pretty much a deal breaker.
  • priority classes seem to only define a preference. I cannot find a claim in the documentation that the node scheduler will postpone scheduling lower priority pods (for which there is a node with available resources) to clear capacity for pending higher priority pods requiring more resources than available. This produces a priority inversion. In practice (what I observed) was that in a resource bound cluster, when many small pods are pending, larger, higher priority pods will are significantly delayed because it's easier to find small holes than large ones. Unfortunately, our analysis pods require a very large amount of memory which is likely to produce resource-based scheduling issues except on extremely large clusters.

Note: The tasking system already supports scheduling tasks (pods) in priority order. This is essential for installations where a ResourceQuota is placed in the konveyor namespace. According to @rromannissen, the quota will be common.

IMHO, this is the only complicated topic addressed by proposal. I'm continuing to run test to profile scheduling behaviors. I will add a use case to the Use Case section as told to me by @rromannissen .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is good information about priority classes; thanks for explaining that!

I want to push back on the need. IMO, the benefit is a premature optimization for a theoretical problem. From using Kantra to do language discovery, I am seeing times in milliseconds, and if we end up delaying by that time frame, it is hard for me to see the benefit as analysis is in the 10's of minutes, as far as I know.

To be clear, I would agree that this is a problem that is needed if the delay is in the minutes/10's of minutes.

Can we do this without the this, and come back if/when this does become an issue. Maybe we do something simple like an option to turn off automatic discovery for customers that do hit this issue as an escape hatch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The priority escalation features is simpler and only involves redeploying pods that are still pending. This seems straight forward and simply ensures that the pods submitted to the node scheduler match task priority. This is critical when a quota exists in the namespace.

To us, the preemption feature is the controversial one and what we pushed back on for these reasons:

  • As you stated, It seems like a premature optimization.
  • There is no way to determine which pods should be preempted. The pods blocked the analysis pod are just as likely to be belonging to other operators as our own. Even within our own, it impossible to tell. In all cases it's a best-effort attempt using a sledge hammer.

For these reasons, I made the preemption feature optional. Enabled with a setting. I agree we should consider disabling by default. Users can enable it should they experience problem described in the use case, they can enabled it. However, we'll need to convince @rromannissen.

I am not opposed to making automated discovery enabled/disabled via setting as well. Again, we'll need to get with @rromannissen as to the default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the confusion here is what would be introducing the potential delay for the user requested analysis. My understanding is that language discovery does indeed take milliseconds to execute, but automated discovery would also include the technology stack/insights discovery, which implies the execution of an analysis via LSP and can effectively take minutes to complete. The end goal for automated discovery is to surface language and technology stack information at scale throughout the entire portfolio, so applications get automatically associated to their corresponding archetypes ASAP. But that shouldn't imply that the user is unable to do anything until automated tasks are finished, as that would lead to the system being perceived as unresponsive. The problem and the request are stated in #170


Addon (adaptor)
---

#### Environment (variable) Injection ####

To injecting container environment variables into the Extension.metadata.
The format is exactly like k8s environment variable: $(_variable_).

Example:
```
container:
name: Test
env:
- name: PORT
- value: 8000
metadata:
address: localhost:$(PORT)
...
```

---

Analyzer Addon (adaptor)
---

delegated to: https://github.com/konveyor/tackle2-addon-analyzer/issues/83

#### Resource Injector ####

Aggregate the initConfig fragments from each Extension.metadata.

Inject information into the initConfig.
- credentials.

Considering: To support this, a set of injectors will be provided to get, store hub REST resource (fields). Fetched resources are assigned to variables that may be used in the metadata.
Resource:
- selector (format: \<kind>:\<key>=\<value>)
- fields[]:
- name: - name.
- path: - (optional) value stored at the specified path.
- key: - value stored in key referenced as $(_key_). Assigned to path, when specified.

Example:
```
metadata:
resources:
- kind: identity:kind=maven
fields:
- name: settings
path: /shared/creds/maven/settings.xml
key: settings.path
- kind: identity=other
fields:
- name: user
key: auth.user
- name: password
key: auth.password
provider:
initConfig:
mavenSettingsPath: $(settings.path)
user: $(auth.user)
password: $(auth.password)
```

### Test Plan

### Upgrade / Downgrade Strategy

## Implementation History

## Alternatives

## Infrastructure Needed [optional]