Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Assets generation and Platform Awareness enhancement #210

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

rromannissen
Copy link
Contributor

Since its first release, the insights that Konveyor could gather from a given application were either coming from the source code from the application itself (analysis), or from information provided by the different stakeholders involved in the management of the application lifecycle (assessment). This enhancement proposes a third way of surfacing insights about an application by gathering both runtime and deployment configuration from the very platform in which the application is running (discovery), and storing that configuration in a canonical model that can be leveraged by different Konveyor modules or addons.

Aside from that, the support that Konveyor provided for the migration process stopped when the application source code was modified for the target platform, leaving the application itself ready to be deployed but without the required assets to get it actually deployed in the target platform. For example, for an application to be deployed in Kubernetes, it is not only necessary to adapt the application source code to run in containers, but it is also necessary to have deployment manifests that define how that application can be deployed in a cluster, a Containerfile to build the image and potentially some runtime configuration files. This enhancement proposes a way to automate the generation of those assets by leveraging the configuration and insights gathered by Konveyor.

Signed-off-by: Ramón Román Nissen <rromannissen@gmail.com>

- Should there be a dynamic way of registering Platform Types, Discovery Providers and Generator Types? Should that be managed by CRs or could there be an additional mechanism? That would imply adding some dynamic behavior on the UI to render the different field associated with each of them.
- How can we store sensitive data retrieved by the Discovery Providers?
- How could we handle the same file being rendered by two different _Generators_ (charts)? Is there a way to calculate the intersection of two different Helm charts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could we handle the same file being rendered by two different Generators (charts)?

using different relase name for each generator can be an idea?
One approach may be to use a different release name for each Generator. WDYT?

Is there a way to calculate the intersection of two different Helm charts

Not aware of a way to intersect, maybe the closest is to use dependency managment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wouldn't be using the Helm release concept, as I wouldn't expect the asset generator to have any direct contact with a k8s cluster (that would be something more for a CI/CD pipeline). We are mostly using Helm to render assets via the helm template command.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that helm template might also reach out to the cluster unless the --dry-run=client flag is used. It depends on what has been coded in the charts: if the chart has been coded to check for the existence of a certain resource (secret, for instance), then helm will attempt to retrieve the secret, unless the option is specified, but then the template generated cannot be guaranteed to be the correct one:
https://helm.sh/docs/helm/helm_template/

--dry-run string[="client"]  simulate an install. If --dry-run is set with no option being specified or as '--dry-run=client', it will not attempt cluster connections. Setting '--dry-run=server' allows attempting cluster connections.

##### Repository Augmentation

- Generated assets could be stored in a branch from the target application repository, or if needed, on a separate configuration repository if the application has adopted a GitOps approach to configuration management.
- Allow architects to seed repositories for migrators to start their work with everything they need to deploy the applications they are working on right away → Ease the change, test, repeat cycle.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with "seed repositories"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add everything developers need to start deploying the application in the target platform since the very first minute. If a developer can only interact with the source code to apply changes to adapt the application for the target platform, but is not able to actually deploy the app in there to see if it works, it becomes difficult for them to know when the migration is done, at least to a point in which the organization can test that everything behaves as expected.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deployment could be done by existing CI/CD infrastructure. We implemented this approach for our customer in a workflow. When move2kube generated dockerfile and manifests we triggered tekton to build image and deploy. We provided place for customers to define how the pipeline should be triggered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the idea, our assets generator leaves the assets in a place where the corporate CI/CD can pick them up and orchestrate the deployment in whatever way they have designed. That last mile, the deployment itself, is delegated to the corporate CI/CD system, Konveyor doesn't have anything to do with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think IIUC @rromannissen, you are saying that nothing is stopping the generator from creating the TektonPipeline but applying and using that pipeline is an exercise left to users outside of Konveyor.

Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shawn-hurley that's it!


- Should there be a dynamic way of registering Platform Types, Discovery Providers and Generator Types? Should that be managed by CRs or could there be an additional mechanism? That would imply adding some dynamic behavior on the UI to render the different field associated with each of them.
- How can we store sensitive data retrieved by the Discovery Providers?
- How could we handle the same file being rendered by two different _Generators_ (charts)? Is there a way to calculate the intersection of two different Helm charts?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the open question how you can layer the file changes on top of a each other, or merge them together, so that the generators work together?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the OpenShift generator (chart) generates a Deployment.yaml and the EAP on OpenShift generator (chart) generates a different Deployment.yaml, how can we merge them? It just came to my mind that we could establish an explicit order of preference when assigning Generators to a Target Platform, so if some resources (files) overlap, the ones with the top preference override the others. That would mean no file merging, but the end result would be a composition (should we call this merge?) of the files rendered by all generators.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I maybe missing some context here but my understanding is that we would have one more more generators (configured by the users) which may provide one or more ways to deploy the same app. In my opinion we should not merge anything and provide generated manifests (in different folders) per user request and let user decide what to do about duplicity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense for the first pass. I believe this could get cumbersome, but waiting for an actual user pain makes sense to me.

- Documented way of storing configuration:
- Keys are documented and have a precise meaning.
- Similar to Ansible facts, but surfacing different concerns related to the application runtime and platform configuration.
- RBAC protected.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to explore this a little more

Is the whole configuration RBAC protected or just some fields. How is the RBAC managed from the hub?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment I think it should be something simple along the lines of "only Admins and/or Architects can see the config" considering how RBAC works now. Once we move authorization over to Konveyor itself (as we've discussed several times in the past), I think we'd have something more flexible that would allow users to have a more fine grained control over this.


- The hub generates a values.yaml file based on the intersection of the _Configuration_ dictionary for the target application, the fixed _Variables_ set in the Generator and the _Parameters_ the user might have provided when requesting the generation, in inverse order of preference (_Parameters_ have top preference over the others, then _Variables_ and finally the _Configuration_ dictionary). That file should also include values inferred from other information stored in the application profile such as tags.
- The values.yaml file is injected by the hub in a _Generator_ task pod that will execute the `helm template` command to render the assets.
- The generated assets are then placed in a branch of the repository associated with the application.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require giving access for konveyor to write to a repository, which as far as I know, it only needs read access today.

I wonder if being able to download the generated assets from the hub/ui might be a solution worth exploring.

This would allow users to put the files in gitops in an other repo, or just to use locally to test with before commiting. They could even mutate the resource before commiting.

Just something to consider, not tied to it one way or the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shawn-hurley AFAIK @jortel already has writing to a repository figured out.

Having everything committed to a repo seems cleaner to me, and a user can always make changes in the repo with a clear log of where each of those changes comes from. If we were to allow users to download the files, that would mean it would be difficult to tell which parts came from Konveyor and which ones came from a manual change.

In the end, this is all about organizations being able to enforce standards. If someone wants to override some of those standards, then they should be accountable for that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about pushing a PR or MR? it will be up to repo owners to merge the change. we may not need to have write permission to the repository.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That implies having to integrate with the different APIs of the most common Git wrappers out there: GitHub, GitLab, Bitbucket, Gitea... That means having not only to implement but also maintaining compatibility with all these APIs over time, which would require a considerable investment. I don't think that is a priority at the moment considering the resources we have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that would not necessarily be hard, but it adds a larger support burden than we would like.

We have talked about this offline, and one of the things that we talked about is that this entire flow only works for source applications, not binaries. I think for the first pass, this makes sense, and we can pivot if there are issues that customers bring up. There is no need to boil the ocean if something is working to get in the user's hands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, no requirement for users to see/download the generated templates in UI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jortel I'd say not for the moment, only committing to a repository.


There will be platform related fields in the Application entity. These fields should be considered optional, as applications can still be managed manually or via the CSV import without the need for awareness of the source platform.

A _Source Platform_ section (similar to the Source Code and Binary sections) should be included in the _Application Profile_, including the following fields:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rromannissen is it possible to have multiple source platforms for a particular application? If we think of the current functionality around cloning the source from a git repo as a "platform" (git repository, there is an api, we populate the source code from this info......and "analyze" phase is strictly the static code analysis) then we'd definitely need multiple. maybe if there is an EAP app on k8s then you would have the two different source platforms? each responsible for their own details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code retrieval remains part of the analysis phase, as repositories are not a platform in which the application is deployed, but rather a place where the application source code is stored. The analysis process should be able to surface configuration though, and I think that we should (and can) leverage analysis findings (probably coming from Insights) to populate the configuration dictionary, aside from the technology tags to automate archetype association as we do now. That should remain independent from the discovery process for different platforms described in this document.

For a "compound" scenario like EAP on K8s, I imagine having a dedicated discovery provider that can handle the specifics of that situation and be able to retrieve information for both the k8s objects and the EAP configuration. Bear in mind that using a vanilla EAP discovery provider would not work for an EAP on k8s scenario, as some (if not all) of the EAP management APIs are disabled in the image.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I could add to that, we would probably have to start scanning container layers at that point to get the information out of them. This is not impossible, and there are many ways to do this, but it is not something that we have implemented.

We should also consider this, but I think it is outside the scope of this enhancement.

Thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we consider multilayered apps (many source repos, different components deployed in more than one platform) to be in scope of this work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the cardinality we currently have in Konveyor, each component on a distributed application would be treated as what we call an application in the inventory. All components of the same distributed application could be related via runtime dependencies and common tags.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't they be associated via a migration wave as well or is that the wrong tool for the job?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migration Waves are thought to break the migration effort into different sprints to enable an iterative approach, so probably not the best tool for that. We discussed in the past the possibility of having the concept of application and components as first class entities, but that would require further changes in the API and UI/UX that I think go beyond the scope of this enhancement.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on our discussion yesterday it seems they are mostly using stateless apps. With that said it is ok to keep it out of the scope for this work.


## Open Questions

- Should there be a dynamic way of registering Platform Types, Discovery Providers and Generator Types? Should that be managed by CRs or could there be an additional mechanism? That would imply adding some dynamic behavior on the UI to render the different field associated with each of them.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dynamic behavior on the UI is solved by products like OCP with frontend plugins for different operators on plugins in RHDH (backstage). I think the question should be: "which mechanism would be a good match for existing architecture"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are just doing dynamic fields, I think it would make sense to just focus on that, as that is a much more constrained problem (read the open API spec for a "thing" to determine the type have a the right field for that type). Having a full front end plugin system is hard IMO and if we don't need that we shouldn't focus on it IMO.

In the future, we may but I think we should do that work when it becomes an acute problem users are feeling.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with limiting the scope. Based on this open question it was not clear to me what we intent to provide. Still "just" dynamic fields may grow beyond our initial design.


- Should there be a dynamic way of registering Platform Types, Discovery Providers and Generator Types? Should that be managed by CRs or could there be an additional mechanism? That would imply adding some dynamic behavior on the UI to render the different field associated with each of them.
- How can we store sensitive data retrieved by the Discovery Providers?
- How could we handle the same file being rendered by two different _Generators_ (charts)? Is there a way to calculate the intersection of two different Helm charts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I maybe missing some context here but my understanding is that we would have one more more generators (configured by the users) which may provide one or more ways to deploy the same app. In my opinion we should not merge anything and provide generated manifests (in different folders) per user request and let user decide what to do about duplicity.

- Hypervisors and VMs.
- Others...
- Assets generation:
- Flexible enough to generate all assets required to deploy an application on k8s (and potentially other platforms in the future)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The applications may be multilayered with complex deployments running in different platforms like stateless web service (PCF) and a db or cache (vm). Should we limit ourselves to only parts of the app or attempt to generate all the deployment assets? Depending on our choices we may or may not need to think about network layout and corresponding manifests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How deep we go would totally depend on the Discovery Provider logic and the Helm charts (and potentially other templating technologies in the future) associated with the generator for the target platform. The goal is to provide a framework to enable us and users to do this in a structured way.


## Proposal

### Personas / Actors

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we see a place for SRE or platform engineering in this effort?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is something I would consider once we expose this functionality via the Backstage plugin.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I raised this question since the original intention was to connect to source runtime as well as make sure it will run in target runtime without issues. This clearly requires work from SRE to configure access, CI/CD etc. although based on our discussion with customer we know it is not the highest priority atm.


There will be platform related fields in the Application entity. These fields should be considered optional, as applications can still be managed manually or via the CSV import without the need for awareness of the source platform.

A _Source Platform_ section (similar to the Source Code and Binary sections) should be included in the _Application Profile_, including the following fields:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we consider multilayered apps (many source repos, different components deployed in more than one platform) to be in scope of this work?


##### Discovery Providers

Abstraction layer responsible of collecting configuration around an application on a given platform:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assume network connectivity to a platform/agent and admin level permissions. Is this something we can expect? what should be the process for agent to be deployed/installed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for the live connection approach we'll need some valid credentials and network access. Agents will have to be deployed by the infrastructure teams managing the platforms and exposed to the Hub somehow (TBD).

- *Initial discovery*:
- Configuration dictionary gets populated with non sensitive data. Sensitive data gets redacted or defaults to dummy values.
- *Template instantiation*:
- A second discovery retrieval happens to obtain the sensitive data and inject it in the instantiated templates (the actual generated assets) without storing the data in the Configuration dictionary.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a need to protect access to generated assets with sensitive data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very likely, but that would be responsibility of the user generating the assets, meaning that they should take care of storing the assets in a secured repository.


- The hub generates a values.yaml file based on the intersection of the _Configuration_ dictionary for the target application, the fixed _Variables_ set in the Generator and the _Parameters_ the user might have provided when requesting the generation, in inverse order of preference (_Parameters_ have top preference over the others, then _Variables_ and finally the _Configuration_ dictionary). That file should also include values inferred from other information stored in the application profile such as tags.
- The values.yaml file is injected by the hub in a _Generator_ task pod that will execute the `helm template` command to render the assets.
- The generated assets are then placed in a branch of the repository associated with the application.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about pushing a PR or MR? it will be up to repo owners to merge the change. we may not need to have write permission to the repository.


##### Repository Augmentation

- Generated assets could be stored in a branch from the target application repository, or if needed, on a separate configuration repository if the application has adopted a GitOps approach to configuration management.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generated assets could be stored in a branch. We need to keep in mind that we may have sensitive information added as part of asset generation. I am not sure whether it is a good idea to store those in a repository.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly how things are done in a full GitOps approach, with configuration for different environments being stored in different configuration repositories with different security level. Nevertheless, I think it might be interesting to add an additional parameter for Template Instantiation to allow the user to prevent sensitive data to be injected in the generated assets.

##### Repository Augmentation

- Generated assets could be stored in a branch from the target application repository, or if needed, on a separate configuration repository if the application has adopted a GitOps approach to configuration management.
- Allow architects to seed repositories for migrators to start their work with everything they need to deploy the applications they are working on right away → Ease the change, test, repeat cycle.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deployment could be done by existing CI/CD infrastructure. We implemented this approach for our customer in a workflow. When move2kube generated dockerfile and manifests we triggered tekton to build image and deploy. We provided place for customers to define how the pipeline should be triggered.

- Hypervisors and VMs.
- Others...
- Assets generation:
- Flexible enough to generate all assets required to deploy an application on k8s (and potentially other platforms in the future)
Copy link

@istein1 istein1 Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be all sort of assets, and it might be tricky to provide any kind of it.
In case the discovery detects an asset Konveyor doesn't have in it's arsenal,
would the option to ask the user to provide that asset source, so that Konveyor could generate it can be considered?
Or maybe I'm getting this wrong and Konveyor is good with any asset, it would propagate it into a helm chart and then the CI/CD will handle this asset is installed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discovery providers will discover configuration, have it stored in canonical form, and then the generators will generate assets for different target platforms. Considering we will be in control of the discovery providers and generators we ship out of the box, we should put special care on coordinating them to tackle meaningful migration paths such as Cloud Foundry to Kubernetes (meaning shipping a CF discovery provider and a default Kubernetes generator).

- Managed in the administration perspective
- Potential fields:
- Name
- Platform Type (Kubernetes, Cloud Foundry, EAP, WebSphere…)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this list contain all the supported platforms?
Asking in terms design and infra needed to test this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are just examples. In a first iteration we should focus on Kubernetes and Cloud Foundry.


### Test Plan

TBD
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rromannissen , Could you please suggest a one high level end-to-end test for a common use case?
I think that would provide more calcification on the tests should be focused on.


## Design Details

### Test Plan
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mguetta1 @ibragins, @nachandr,
Would you please add here questions/thoughts/ideas on testing?

Copy link

@nachandr nachandr Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Test each of the components separately: Platform Discovery, Asset Generation.
  2. Integration tests for end-to-end workflows : Cloud Foundry -> canonical -> Helm charts.
  3. Ensure sensitive data is not exposed .

A read-only _Configuration_ dictionary should also be browsable in the _Application Profile_. For more about _Configuration_ see the [Canonical Configuration model section](#canonical-configuration-model) section.

_Target Platforms_ will be surfaced in the _Application Profile_ as read only data (can't be manually and individually associated to a single application) and inherited from the archetype.

Copy link
Contributor

@jortel jortel Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if user isn't using archetypes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jortel considering the complexity assigning target platforms for individual applications would bring, I think we can assume that archetypes are a requirement for the moment, and consider other options if requested in the future.

- Keys are documented and have a precise meaning.
- Similar to Ansible facts, but surfacing different concerns related to the application runtime and platform configuration.
- RBAC protected.
- Injected in tasks by the hub.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the discovered platform configuration is stored in the hub and addons have access to the inventory, I think the application ID should be sufficient.

note: The model we have been following is: rather than anticipating and injecting everything an addon may need, addons fetch whatever they need.

Copy link
Contributor Author

@rromannissen rromannissen Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jortel if that's how it's been done so far, sounds good to me! It makes sense for each generator task to take responsibility on retrieving the data on canonical form from the API and then transform it in the format that each templating engine requires (for example a values.yaml file for Helm).

- Live connection via API or similar methods.
- Through the filesystem accessing the path in which the platform is installed (suitable for Application Servers and Servlet containers). This would likely be modeled as an agent deployed on the platform host itself.

Configuration discovery could happen in different stages during the lifecycle of an application to avoid storing sensitive data:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the sensitive data different than creds?
Can we store encrypted like creds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jortel when this was discussed, some folks argued that storing sensitive data such as database credentials coming from, for example, an application server, in yet another place like Konveyor could be considered as a security threat, that's why this passthrough approach was suggested. I guess having the credentials to the platform that stores that sensitive data (the application server in our example) has exactly the exact same threat level, so I'm not sure about this one myself, maybe it's something we could consider in subsequent iterations if requested.

If we were to store this encrypted, I'd consider thinking about an overarching entity like secrets, and then credentials being a type of secret and sensitive data being another.

- _Icon_
- _Generator Type_: Will only include Helm for the moment, but in the future we could include other types like Ansible or other templating engines. Generator type will determine the image that gets used to handle the generator task.
- _Description_
- Repository containing the template files:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repository type: (Git|Svn)?
What if uses don't want to manage templates in a repository?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jortel I think that would bring too much unnecessary complexity for a first iteration. Let's stick to templates being managed in a repo and consider other options if requested. I'll add a field for repository type.

- _Root Path_
- _Branch_
- _Credentials_
- _Variables_: List of prefixed variables that will be injected on template instantiation (The `helm template` command for example). Variables that match name with the ones coming from the Configuration dictionary will override their value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if variables will come from the discovered configuration and be found in the templates, what is the point of defining them here? The template instantiation could simply resolve any variables found in the templates, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jortel it's a way to enable users to override certain values that might have been found on the source config and enforce certain configuration values that might have changed between environments (for example, a domain that is different in the target k8s cluster). It's also inspired in Variables from Job Templates in Ansible AWX.

- _Variables_: List of prefixed variables that will be injected on template instantiation (The `helm template` command for example). Variables that match name with the ones coming from the Configuration dictionary will override their value.
- _Parameters_: List of parameters the user will be asked for when generating assets with this template. Similar to [Surveys](https://ansible.readthedocs.io/projects/awx/en/latest/userguide/job_templates.html#surveys) in Ansible AWX.

##### Archetypes, Target Platforms and Generators
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, what if users are not using archetypes?
Should we support users selecting a generator in the generation wizard?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jortel again, I think we can assume that archetypes are a requirement for the moment, and consider other options if requested in the future.


##### Changes in the Application entity

There will be platform related fields in the Application entity. These fields should be considered optional, as applications can still be managed manually or via the CSV import without the need for awareness of the source platform.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just something to note: we probably want to have these platform-specific fields be mutually exclusive to each other so the intermediate representation doesn't get in a weird state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JonahSussman yeah, that's what the platform type field would be for

Template instantiation should be considered the act of injecting values in a template to render the target assets (deployment descriptors, configuration files...). For the Helm use case in this first iteration, the process could be as follows:

- The hub generates a values.yaml file based on the intersection of the _Configuration_ dictionary for the target application, the fixed _Variables_ set in the Generator and the _Parameters_ the user might have provided when requesting the generation, in inverse order of preference (_Parameters_ have top preference over the others, then _Variables_ and finally the _Configuration_ dictionary). That file should also include values inferred from other information stored in the application profile such as tags.
- The values.yaml file is injected by the hub in a _Generator_ task pod that will execute the `helm template` command to render the assets.
Copy link
Member

@savitharaghunathan savitharaghunathan Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to validate the generated manifests/assets or it is out of scope? I have seen this validation step as a part of CI/CD automation and local dev validation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@savitharaghunathan what kind of validation were you thinking about?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Kubernetes, there are tools like these - https://github.com/kubernetes-sigs/kubectl-validate or https://github.com/yannh/kubeconform. For others, may be validate the yaml generated using yamllint or something

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If helm is creating invalid yaml, I don't know what we are the user can do about it at this step in the process. They may have to fix it locally. I would just as soon assume that helm is generating valid yaml, and they have their own steps for making sure it is safe.

What I really don't want, is a small mistake to cause a long process to have to be re-run or for a bug in helm to block a user. If we do add this, we should still allow for users to download the files and fix them locally IMO,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @shawn-hurley, not much we can do on our side.

@jortel
Copy link
Contributor

jortel commented Jan 21, 2025

Unless I missed it-
How will the UI depict the progress and status of an application:

  • discovered on source platform (with link to task for progress and troubleshooting)
  • assets generated for specific target (with link to task for progress and troubleshooting) Example: EAP/openshift.

I don't know how much detail is necessary but feel like it should be described.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.