Skip to content

feat: add ability to export feature labels for static node #2170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vsoch
Copy link

@vsoch vsoch commented Jun 7, 2025

Problem: We want to use NFD (labels and other functionality) for HPC, but a Kubernetes environment is expected.
Solution: Add an nfd-worker --export mode that allows for generation and export of static labels.

For the HPC use case, we want to be able to export static features without Kubernetes, either to the screen (for inspection) or a file (for saving and use with a scheduler, or other compatibility checking algorithm). Adding support for this is fairly easy - we need to just expose the flag, disable the check for Kubernetes assets, and then generate the labels once and exit (without starting or sending to a server). The usage is simple and intuitive:

# Export labels to the terminal for inspection
nfd-worker --export

# Save to a file path
nfd-worker --export=labels.json

The flag is also useful for cases beyond HPC when a quick inspection is desired. Note that instead of adding two flags (a boolean for export, and string for a filepath) I opted for the design above - one flag that supports being used as a boolean or providing a path. A new flag type is added to support that, and it is generic so could be used for other future cases. For my implementation, for us to check how this new flag was set, we need to pass it to the initFlags function:

args, overrides := initFlags(flags, &exportFlag)

And I think this is OK for now, but if more flags were defined in this manner, we would want to pass a structure with all of them. I decided that was overkill for the time being. For another design choice, I think the generation of the labels should be in the same function to Run(), and to support this I check w.args.Export before setting up any servers / Prometheus, and run the generation (that will handle the export to terminal or file and exit early, if applicable). I have also added tests and documentation, and I tested on an HPC node (in user space) to make sure the labels generated without issue.

I realize I didn't open an issue for discussion - so am happy to have it here! To better explain the use case, this is a feature intended to better support compatibility checking, but for HPC. We won't be able to run compatibility checks assuming labels are available via Kubernetes, but we will be able to export them to files that represent nodes, and then be able to use associated tools to check compatibility artifacts against the known features. This is what I'm working on today. Thanks!

For the HPC use case, we want to be able to export
static features without Kubernetes.

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: vsoch
Once this PR has been reviewed and has the lgtm label, please assign marquiz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 7, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @vsoch!

It looks like this is your first PR to kubernetes-sigs/node-feature-discovery 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/node-feature-discovery has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 7, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @vsoch. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 7, 2025
Copy link

netlify bot commented Jun 7, 2025

Deploy Preview for kubernetes-sigs-nfd ready!

Name Link
🔨 Latest commit 64520e3
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-nfd/deploys/684451f1aeca62000854267d
😎 Deploy Preview https://deploy-preview-2170--kubernetes-sigs-nfd.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@ChaoyiHuang
Copy link
Contributor

# Export labels to the terminal for inspection
nfd-worker --export

# Save to a file path
nfd-worker --export=labels.json

how about using stdout to unify the commandline, for example

# Export labels to the terminal for inspection
nfd-worker --export=stdout

# Save to a file path
nfd-worker --export=labels.json

@vsoch
Copy link
Author

vsoch commented Jun 18, 2025

@ChaoyiHuang thank you for taking a look! How would you distinguish these two?

# Export labels to the terminal for inspection
nfd-worker --export=stdout
nfd-worker --export

They are both going to dump to the output stream. Did you have in mind using different streams, or minimizing other output for the rest of the run for one case? Or are you saying to keep the command the same, but have stdout? I think the issue with that would be not knowing if a user wants to write to a file called stdout vs standard output. With just --export it is clear because the variable isn't defined.

@ChaoyiHuang
Copy link
Contributor

th just --export it is clear because the variable isn't defined.

keep the command the same format (stdout is standard output).

--export is also OK, just have to code a little more.

@ChaoyiHuang
Copy link
Contributor

it may just export to standard output, and no need to support export to path.

if want to save the content to file, it can be done via redirecting the output to file, for example:

nfd-worker --export > labels.json

so the code can be simplified to just support exporting to standard output

nfd-worker --export

@vsoch
Copy link
Author

vsoch commented Jun 19, 2025

The nfd worker generates a lot of logs (and other unexpected output) so I wasn't sure how to cleanly do that. Whereas with writing the file explicitly, I can control exactly the contents.

@@ -178,7 +181,8 @@ func NewNfdWorker(opts ...NfdWorkerOption) (NfdWorker, error) {
}

// k8sClient might've been set via opts by tests
if nfd.k8sClient == nil {
// Export mode does not assume in Kubernetes environment
if nfd.k8sClient == nil && !nfd.args.Export {
Copy link
Contributor

@mfranczy mfranczy Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's better to keep the worker k8s centric (as it implements the logic for updating k8s resources), operate on raw features rather than labels, and add a new subcommand to the nfd client for dumping features.

Raw features can easily be transformed into labels if needed. However, since labels are derived from raw features and the worker configuration, relying solely on labels may cause certain node capabilities to be missed due to default or custom configurations.

The same principle applies to the image compatibility implementation in NFD, where comparisons are performed by checking the features (not the labels).

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants