Skip to content

v21.0.0 Release Todos #1722

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
19 tasks
zmerp opened this issue Jul 3, 2023 · 11 comments
Open
19 tasks

v21.0.0 Release Todos #1722

zmerp opened this issue Jul 3, 2023 · 11 comments
Labels
enhancement New feature or request release Relative to release branches or future releases

Comments

@zmerp
Copy link
Member

zmerp commented Jul 3, 2023

To be done not earlier than just before releasing v21.0.0 stable, or after there was a devXX bump.

  • Remove ClientControlPacket::VideoErrorReport
  • Merge ControlSocket with StreamSocket
  • Refactor packet protocol
    • Make prefix data little-endian
    • Make packet length contain itself (add 4 bytes to the count)
    • Remove extra 4 bytes to the max shard length
  • Send combined eye gaze with separate field
  • Send eyes already in local head space.
  • Use a json string in place of the VideoStreamingCapabilities packet.
  • Make multimodal input protocol default
  • Send velocity for skeletal hand tracking
  • [ ] Use mesh foveated rendering deferred. Needs protocol support to negotiate disabling FFE.
  • Add static controller offsets on the client and make parameter exposed by dashboard default to [0,0,0]
  • Include DecoderConfig packet in the video frame header, to avoid having to request another IDR and resend the DecoderConfig two times.
  • Make all stream header packets extensible with a Vec<u8> (don't use json). Values can only be appended and not removed. Alternatively investigate CapnProto or Flatbuffers which support protocol extensions natively.
  • Use mutually exclusive tracking sources (e.g. 2 eyes or combined gaze, fb or htc face tracking, fb or pico body tracking), and wrap it with Option.
  • Remove limited range encoding support
  • Change dashboard API to not require a body for requests. This aligns better with the ureq 3.0 API, and standard HTTP practices
  • Reorder settings: create SteamVR tab, rename headset to Tracking, for each tracking method, put source settings in Tracking and sink in SteamVR or Extra->Sinks (for example OSC/UDP/VMC)
  • Switch to bincode 2.0
@zmerp zmerp added the release Relative to release branches or future releases label Jul 3, 2023
@stale stale bot added the stale label Aug 7, 2023
@alvr-org alvr-org deleted a comment from stale bot Aug 8, 2023
@stale stale bot removed the stale label Aug 8, 2023
@zmerp zmerp added the enhancement New feature or request label Aug 8, 2023
@shinyquagsire23 shinyquagsire23 pinned this issue Nov 2, 2024
@shinyquagsire23
Copy link
Contributor

Adding some of my own additions:

  • Headset and controller types default to Automatic instead of Quest 2, driven by the client
    • I think it'd be neat to allow a headsets to send controller data and display a single tracked gamepad instead, maybe. Or at least differentiate between Joy-Con and whatever SLAM controllers might exist on AVP. Also, maybe differentiate controllers vs the Logitech stylus on Quest headsets.
  • Write-only SteamVR settings JSON, separate from session.json
  • (Tangential to mesh FFR) Allow padding at the edges of the video stream so that Weird Resolutions can be used?
  • Event signaling from OpenVR -> client, to allow passthrough to be driven by SteamVR
    • ie SteamVR sees a settings change, checks if passthrough was turned on, tells client to activate passthrough shaders
  • Extra video streams for alpha channel and/or depth or motion vectors? I'm less certain about this now, I think SteamVR composites all apps into a single layer.
  • Change headset FPS at runtime based on app performance?
    • Would need to handle the possibility of the client not being able to switch, due to passthrough constraints or whatever else.
  • Per-frame view transforms that don't cause SteamVR to hiccup, maybe?
    • Ocular parallax for instance can be simulated as a per-frame IPD change
    • Vision Pro also has 'comfort options' to introduce (I believe, not certain) a software-based vertical IPD (vertical offsets might also exist in other headsets as a display calibration thing, worth verifying)
  • Chaperone APIs for the client maybe

@zmerp
Copy link
Member Author

zmerp commented Nov 2, 2024

@shinyquagsire23 Your points are either don't need to be protocol breaks (they can be protocol extensions) or may take too long to be included in v21 assuming you haven't started working on it. For FFR i'm thinking we should leave it out, because we need to have android + windows + linux implementations which will take a long time. The idea is to first switch both windows and linux to wgpu like the client so it will be far easier to do all at once

@shinyquagsire23
Copy link
Contributor

Yeah I'm mostly just spitballing on things that could be marginally improved if a protocol break is already happening. ie headset info is only sent as a single string and it gets kinda iffy to change that back to usable information on the streamer, IMO it could split out info a bit more to help futureproofing (ie A Quest 3 could split as manufacturer=Meta type=Quest subtype=Quest 3, so if the Quest 4 comes out the streamer could match on Meta and Quest and then the subtype could have aesthetic fallbacks targeted at Quests).

@zmerp
Copy link
Member Author

zmerp commented Nov 2, 2024

@shinyquagsire23 I understand what you're saying, but we can still have a universal generic fallback. it's not a big deal, to avoid that the user should have both client and server on the same version. So, let's also keep the display name as string, passing it as a Platform variable could fail deserialization. On the server side we should try to match one by one the strings.

Indeed we could make things clearer and rename display_name to platform.

@shinyquagsire23
Copy link
Contributor

shinyquagsire23 commented May 9, 2025

Proposal: Compression on tracking packets (zstd? needs to be fast to compress and decompress).

  • JSON will compress down much smaller for any shenanigans similar to the v20 don't-break-the-protocol changes
  • Compression reduces network latency because the last byte arrives faster
  • Probably also good for serialized FP32, the upper bits are probably repetitive enough to save a few bytes.
  • Should be optional per-packet via header u8 in case compression makes the result larger. Could be something like 0=uncompressed, 1=zstd, 2=[future new compression]. Kinda iffy on supporting different compression formats but it could be negotiated on connect.

@curoviyxru
Copy link
Contributor

needs to be fast to compress and decompress

lz4?

@shinyquagsire23
Copy link
Contributor

@curoviyxru yeah LZ4 might be a better match, the compression+decompression time just has to be faster than the time it would take to send the byte difference between compressed and decompressed bytes over the network. I'd probably want to benchmark both just to be sure.

@zmerp
Copy link
Member Author

zmerp commented May 10, 2025

For compressing the tracking packet I was thinking of some ad hoc algorithm. A Pose could be compressed from 28 bytes to 12 bytes (DeviceMotion would be from 52 to 24).
Vec3 can be represented by 3 u16. We can represent a space which is 20x20x20 meters with a precision of 0.3mm (this conversion could be configurable). The advantage over f16 is that we don't get degraded precision just by moving a couple of meters away from the center.
Quat can be represented by a 3 component u16 vector using rodrigues formula. The size of the vector space would be 2pi x 2pi x 2pi. This has a precision of 0.0001 radians (or 0.005 deg).
For the angular acceleration we may want to stick to f16 instead

@shinyquagsire23 personally i don't like the idea of compressing the data using any other compression methods, it may end up significantly slower, or at least more complex than needed

@shinyquagsire23
Copy link
Contributor

@zmerp So, I've run tests on quantization at least in the context of hand tracking, int16 is probably my minimum but that was in view/camera space. And yeah fp16 is garbage. I'd be nervous applying int16 to velocities and head poses in particular in world space, 0.3mm will quickly become noticable at higher res and I'm not sure there's a good way to mask the quantization with dithering since velocity error would multiply.

To be honest, I have doubts that compressing pose packets will have good latency yields, but it would be good to be sure if there's (lossless) benefits on the table. Also kinda worry that some things like eye tracking might need double precision floats. But in any case I'll look into it and only push for it if the numbers look compelling enough (minimum: 1ms or more latency reduction)

@zmerp
Copy link
Member Author

zmerp commented May 11, 2025

The main usecase for this kind of compression is for trackers and skeleton (body/hands). We can choose to exclude head controllers and eyes. (Face tracking on the other hand could have more extreme compression down to 1 byte per parameter). I've heard issues with some people when enabling hand tracking. This is because hand skeleton tips the tracking structure size over the packet MTU, either converting it to a jumbo packet or triggering our packetization and reordering algorithm (i forgot how long ago the report was made).

@psych0oR
Copy link

Would be nice if FOV-Tangent was added, it really is a big deal for me at least

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request release Relative to release branches or future releases
Projects
None yet
Development

No branches or pull requests

4 participants