Skip to content

Clarification on command buffer specification #1309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
joshqti opened this issue Feb 3, 2025 · 5 comments
Open

Clarification on command buffer specification #1309

joshqti opened this issue Feb 3, 2025 · 5 comments
Labels
cl_khr_command_buffer Relating to the command-buffer family of extension

Comments

@joshqti
Copy link
Contributor

joshqti commented Feb 3, 2025

  1. In section 5.17.7 on Mutable Commands it is stated: "To facilitate performant usage for pipelined work flows, where applications repeatedly call command-buffer update then enqueue, implementations may defer some of the work to allow clUpdateMutableCommandsKHR to return immediately". What is the expectation from the implementation. Is clUpdateMutableCommandsKHR expect to synchronously update the command buffer?

  2. clUpdateMutableCommandsKHR seems to be defined in a similar manner as clSetKernelArg. Is it not thread safe like clSetKernelArg? Clarification on how updates apply for simultaneous use required. If its not, we need to update Appendix A.

  3. Apps may beneifit from a call such as clEnqueueCommandBufferWithUpdates(mutations to command buffer), which would enqueue a mutable command buffer and take addition arguments passed to clUpdateMutableCommandsKHR. The beneifit is updates and dispatch would be atomic. Any thoughts on this proposal?

@EwanC EwanC added the cl_khr_command_buffer Relating to the command-buffer family of extension label Feb 5, 2025
@EwanC
Copy link
Contributor

EwanC commented Feb 5, 2025

1

The intention of the wording of 1) is that if an OpenCL implementation is built ontop of a driver API, then it could cache the user passed update configuration, and only propagate it to the driver API on enqueue. Equally the implementation could eagerly go straight to the driver API to propagate the update. Similarly for doing any recompilation eagerly or lazily, which is the motivation listed in the spec. Both eager and lazy ways of propagating the update should be valid implementations of the spec.

The API is synchronous, in that when it returns the only 2 mechanisms for the user to view the changes should reflect it A) clGetMutableCommandInfoKHR, and B) effect on command-buffer enqueue. For A) if an implementation wanted
to be lazy it could return it's cached values from the query.

Not sure if clSetKernelArg is analogous, in that an implementation could choose to propagate the argument updates straight away to the driver or cache them until required for a kernel enqueue. We don't have this extra wording around "may defer some work" anywhere else in the spec, so we could remove this wording from the command-buffer extension since speculating on possible implementations is probably confusing things.

2

I think it should be thread-safe. clUpdateMutableCommandsKHR doesn't take a cl_kernel object as part of it's configuration structs, and doesn't transitively modify the arguments of cl_kernels objects in commands created from it. The wording in Appendix A about exceptions to thread safe APIs is "except those that modify the state of cl_kernel objects", so I don't think the entry-point meets that criteria.

I can't recall the exact details of why clSetKernelArg isn't thread safe, so we can discuss this more if it would be advantageous to have that. Ticket #810 asks a similar question to this. Given more than one reader of the spec has to ask about this semantic we should probably add some explicit wording about it.

3

The clEnqueueCommandBufferWithUpdates proposal is tying together two concepts that aren't typically tied together, in other equivalent APIs I'm aware of update is a separate entry-points for this. e.g cudaGraphExecKernelNodeSetParams, zeCommandListUpdateMutableCommandsExp, vkUpdateDescriptorSets. My worry if we had only that entry-point is that it could lead to an unavoidable increase in submission latency on enqueue for users/implementations that want to update eagly ahead of enqueue. But we could have this entry-point as well if folks think it's useful and would be implemented differently from current update + enqueue, tangentially wonder if it would be worth considering a separate extension introducing a clEnqueueNDRangeKernelWithArgs entry-point if there are benefits to be had here.

@joshqti
Copy link
Contributor Author

joshqti commented Feb 10, 2025

  1. We are wondering if an implementation can defer updates, if CL_COMMAND_BUFFER_SIMULTENEOUS_USE is not supported?

  2. We are concened about this multi-threaded scenario: a multithreaded application calls clEnqueueCommandBuffer in thread A. Before thread A can set the state of the command to queued, thread B becomes active. In thread B the application calls clUpdateMutableCommandsKHR on the command buffer. In scenario it becomes difficult for the application to reason about when exactly the updates apply. Our thought was that declaring clUpdateMutableCommandsKHR to be thread unsafe would force the application to serialize update and subsequent use.

  3. This related to item 2, attaching the update to a queue makes it easy for the application to control when exactly the update will be applied. This is especially useful when multiple threads are using the same command buffer. We have followed this approach in our vendor extension, and it has been successful.

@aharon-abramson
Copy link
Contributor

We in Mobileye implement clUpdateMutableCommandsKHR as thread-safe, but since we don't support simultaneous executions, we don't synchronize the implementation of clEnqueueCommandBufferKHR to keep it lockless and optimized. So, we assume the users will take care of synchronizing updating the commands and enqueuing the command buffer.

EwanC added a commit to EwanC/OpenCL-Docs that referenced this issue Feb 13, 2025
Address point 1) of KhronosGroup#1309
by clarifying when an implementation may defer propogating
the changes from `clUpdateMutableCommandsKHR`.

I belive the only semantic implication of this is the removing the
restriction
> This is only possible with a command-buffer created with the CL_COMMAND_BUFFER_SIMULTANEOUS_USE_KHR flag, as without this the enqueued command-buffer must complete before any modification occurs.

So that implementations without simultaneous use support can still
defer.
EwanC added a commit to EwanC/OpenCL-Docs that referenced this issue Feb 13, 2025
Address point 1) of KhronosGroup#1309
by clarifying when an implementation may defer propogating
the changes from `clUpdateMutableCommandsKHR`.

I belive the only semantic implication of this is the removing the
restriction
> This is only possible with a command-buffer created with the CL_COMMAND_BUFFER_SIMULTANEOUS_USE_KHR flag, as without this the enqueued command-buffer must complete before any modification occurs.

So that implementations without simultaneous use support can still
defer.
@EwanC
Copy link
Contributor

EwanC commented Feb 19, 2025

PR that covers point 1 only here for review #1315

bashbaug pushed a commit that referenced this issue Apr 1, 2025
* Clarify impl deferal of clUpdateMutableCommandsKHR

Address point 1) of #1309
by clarifying when an implementation may defer propogating
the changes from `clUpdateMutableCommandsKHR`.

I belive the only semantic implication of this is the removing the
restriction
> This is only possible with a command-buffer created with the CL_COMMAND_BUFFER_SIMULTANEOUS_USE_KHR flag, as without this the enqueued command-buffer must complete before any modification occurs.

So that implementations without simultaneous use support can still
defer.

* Make change a NOTE
@EwanC
Copy link
Contributor

EwanC commented Apr 15, 2025

To summarize what I think the status of the 3 discussion points of this issue are:

  1. Clarification merged in Clarify implementation deferal of clUpdateMutableCommandsKHR #1315
  2. Clarification merged in document command buffer APIs that are not thread-safe #1345
  3. To be proposed as a separate vendor extension (Based on March 18th teleconference minutes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cl_khr_command_buffer Relating to the command-buffer family of extension
Projects
Status: Needs WG discussion
Development

No branches or pull requests

3 participants