Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About separation MSB and LSB and quantity input? #10

Open
ICsin opened this issue Dec 12, 2022 · 7 comments
Open

About separation MSB and LSB and quantity input? #10

ICsin opened this issue Dec 12, 2022 · 7 comments

Comments

@ICsin
Copy link

ICsin commented Dec 12, 2022

After reading your paper, I have some doubts and hope to get your help.
First of all, what is the role of delta in the quantization process of the quantSign function? (Because your comment is to quantize the input into arbitrary bits, the result obtained by using delta here is still a float).
Then use FastSign in PGBinaryConv2d to binarize the input (based on quantSign).
At this time, have the high bits of the binarized input been separated? Why is the result of out_msb multiplied by two thirds.
Moreover, in the process of calculating FracBNN, I did not understand the separation process of MSB and LSB. Whether it is the shift operation described in the paper or the sparse operation of LSB, there is no clear expression in the code. The end result seems to be just choosing between out_msb and out_full。

@ychzhang
Copy link
Contributor

Let me try to split the questions as follows:

  1. What is the delta in QuantSign?
    Delta is the space among the quantization grids. When b=2, we quantize the input range [-1, 1] to {-1, -1/3, 1/3, +1}. Hence, delta=2/3 in this case.
  2. How are the bits separated in software?
    As mentioned in answer1 there are four possible inputs {-1, -1/3, 1/3, +1}. On hardware, however, they are represented as unsigned integers {00, 01, 10, 11}. We know that inputs=MSB<<1 + LSB=MSB * 2 + LSB. We found the following mapping between hardware and pytorch software satisfies the above equation: {bit_0=-1/3, bit_1=1/3}. Based on this, on pytorch side, we can separate the input bits as MSB = 1/3 * sign(inputs).
  3. Why is the result of out_msb multiplied by 2/3?
    out_msb actually computes conv(MSB<<1). As MSB = 1/3 * sign(inputs), conv(MSB)<<1 = conv(1/3sign(inputs))<<1 = 2/3conv(sign(inputs)).
  4. Why is the end result choosing between out_msb and out_full?
    PGConv works as follows: when out_msb > threshold, return out_msb + out_lsb, which is essentially out_full; when out_msb < threshold, return out_msb. Therefore, the return value is indeed choosing between out_msb and out_full.

Let me know if the above answers your question.

@ICsin
Copy link
Author

ICsin commented Dec 13, 2022

Thank you very much for your answer, I have some new questions.
In the hardware design, the input {-1, -1/3, 1/3, 1} is represented as {00, 01, 10, 11}, so should the result calculated by the hardware be multiplied by 1/3?

@ychzhang
Copy link
Contributor

Yes correct. See here:
https://github.com/cornell-zhang/FracBNN/blob/main/xcel-cifar10/source/pgconv.h#L71
msb_scale = 1/3 << 1, so essentially the msb_conv is multiplied by 1/3.

@ICsin
Copy link
Author

ICsin commented Dec 14, 2022

Thanks for your prompt response, how did you solve the problem of slow training caused by temperature encoding. Is it a configuration problem? My gpu is 3070ti

@ychzhang
Copy link
Contributor

How slow did you observe with/without thermometer encoding?
The released InputEncoder is fairly efficient in terms of GPU latency in our experiment. I can imagine the two sources of slow training: (1) expanded input channel due to encoding; (2) other quantization functions, but not related to thermometer encoding. For both we currently do not have a good solution.

@ICsin
Copy link
Author

ICsin commented Dec 15, 2022

Thanks for answering my doubts. Because I used it on 3DCNN, the delay is more obvious. I train almost 70 times slower after adding temperature encoding. Perhaps I can increase the training speed by increasing the size of the R parameter to reduce the number of expanded channels.
Thank you very much for your recent help

@ICsin
Copy link
Author

ICsin commented Dec 16, 2022

Sorry to bother you again, there is something I forgot to ask you last time.
What is the role of FastSign in the software? (Because when calculating msb, the input has to go through a FastSign).

FracBnn
Then in hardware, I want to implement FracBNN in verilog. For the calculation process of {00,01,10,11} divided into msb and lsb, last time you said to multiply by 1/3 and then shift left by 1 bit, but here in the paper is the directly calculated output whether How about multiplying the output by 1/3 and passing it to the next layer for calculation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants