Replies: 1 comment
-
Specs: https://github.com/threefoldtech/zos/tree/master-3/specs/grid3 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Zosv3 node registration in TFGrid DB + required zos improvements
Node registration
In v3, all nodes, users, ... will be registered in the
TFGrid DB
. This gives us a replicated database of all entities which somehow interact on the grid, and information on how they can reach each other on the planetary network (in case of services). As a result of this, all nodes will need to register themselves in this db when they first boot. To do this, the following flow is needed:RMB
with the previously loadedtwin id
The above flow prepares a new or existing node to accept reservations through the
RMB
, over the planetary network. For now, the activation service allows nodes to get funds to execute these calls on chain. At no point should the farmer set up something manually (when it comes to the TFGrid DB, it is of course up to the famer to boot the nodes with the correct parameters).Billing
Billing will happen through the generation of a
usage report
. Every x amount of time (interval to be defined), the node sends a report containing the actually consumed resources of a workload to the chain. Based on this, the chain is able to bill the user (by deducting his balance in favor of the address set in the farm of the node). A proposal for this has been submitted (TODO: include billing proposal after approval). By making the node actively send usage reports, it is also detectable when the node is down, and no tokens will be charged for this period.Missing zos features
Rate limiting
Right now there is no rate limiting, i.e. a single workload can consume all the network and disk bandwidth. This should be rate limited to ensure proper operation, and to protect the hardware. Both are somewhat different:
Some workloads could hit the underlying disk hard, which causes degraded performance for other workloads. Also, it might be damaging for the disk itself. An example of this would be Chia mining, which is notorious for making short work of SSD's. Since all workloads which access disks will be run in a VM, we can utilise the rate limiting feature on disks from cloud-hypervisor. This should just be a sane constant for everyone.
Network is trickier, as the node does not know how much bandwidth is available in the upstream network. As such a maximum value should probably not be set. Rather, we should ensure that every workload gets fair access to the network. This could be done by installing the
cake qdisc
on the network interace (for uploading, downloading will be slightly more convoluted and requires an intermediate buffer device). Since most traffic goes tobr-pub
, we can queue there, as we don't really care how the user utilizes the network inside of his own network resource (it only affects him). In the future, it is probably better to evolve to aneBPF
based approach. This can also simplify the network setup we have right now.Beta Was this translation helpful? Give feedback.
All reactions