Skip to content

consider shifts instead of division #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nerdralph opened this issue Feb 29, 2020 · 3 comments
Open

consider shifts instead of division #8

nerdralph opened this issue Feb 29, 2020 · 3 comments

Comments

@nerdralph
Copy link

Division on AVRs is very expensive in both time and space. Signed long division will increase code size by about 150 bytes.

Even better would be to do the averaging by summation only. Do a fixed 64 loops, adding the ADC value to a uint16_t. Since the AVR ADC is 10 bits, 64 loops will give a final result that fits in 16 bits without overflow.

@martin2250
Copy link
Owner

martin2250 commented Feb 29, 2020

Hi Ralph,

you're right, using long division for such a simple task is overkill. That said, shouldn't the compiler optimize the division if the denominator is hard-coded (may be wrong, I never tested this)? Maybe we should encourage people to use powers of two for the number of samples preferably. I don't think fixing the number of samples to 64 is a good idea, we should give users the choice to trade speed for noise.

A good compromise could be adding a function that just take a single sample, that would allow advanced users to write their own loop and be clever by using shifts and such.

Martin

@nerdralph
Copy link
Author

nerdralph commented Feb 29, 2020 via email

@jerryfaust
Copy link

jerryfaust commented May 28, 2021

Hello all.

I'm trying something different that seems to work fairly well, and I would be interested in any feedback.

I am running a 4-axis stepper motor system, and I am doing a pretty lot during each arduino loop() call. Trying out this library, calling the read() method on every loop was too costly, bogging down the axis movement. When reading this conversation and considering optimizations, it occurred to me that I could call the read method ONCE during setup(), taking the 64 readings and placing them into a rolling array. Then on each call to loop(), I have implemented a readNext() method, that just takes one reading, places it into the next position in the rolling array, and gets a new average based on that. This significantly reduced the time required during each iteration, and I think produces similar results to the original algorithm. My loop iterates over 2000 times per second when it's busy, and so the average readings are refreshed many times over. Here is (most of) the new code.

constexpr auto SAMPLES = 16;
int _values[SAMPLES];
int _currentSlot = 0;
int _lastValue = 0;
int _sum = 0;

// Call once during setup()
int ADCTouchClass::read(byte ADCChannel)
{
	//long _value = 0;
	short _value = 0; // , _sum = 0;
	for (int _counter = 0; _counter < SAMPLES; _counter++)
	{
		// set the analog pin as an input pin with a pullup resistor
		// this will start charging the capacitive element attached to that pin
		pinMode(ADCChannel, INPUT_PULLUP);

		// connect the ADC input and the internal sample and hold capacitor to ground to discharge it
		ADMUX |= 0b11111;
		// start the conversion
		ADCSRA |= (1 << ADSC);

		// ADSC is cleared when the conversion finishes
		while ((ADCSRA & (1 << ADSC)));

		pinMode(ADCChannel, INPUT);

		// get value
		_value = analogRead(ADCChannel);
		// save in rolling array
		_values[_counter] = _value;
		// add to running total
		_sum += _value;
	}
	// get average
	_lastValue = _sum / SAMPLES;
	// return 
	return _lastValue;
}

// call once at top of every loop()
int ADCTouchClass::readNext(byte ADCChannel)
{
	short _value = 0; // , _sum = 0;
	// set the analog pin as an input pin with a pullup resistor
	// this will start charging the capacitive element attached to that pin
	pinMode(ADCChannel, INPUT_PULLUP);

	// connect the ADC input and the internal sample and hold capacitor to ground to discharge it
	ADMUX |= 0b11111;
	// start the conversion
	ADCSRA |= (1 << ADSC);

	// ADSC is cleared when the conversion finishes
	while ((ADCSRA & (1 << ADSC)));

	pinMode(ADCChannel, INPUT);

	// get current value
	_value = analogRead(ADCChannel);
	// adjust sum
	_sum = _sum - _values[_currentSlot] + _value;
	// place in rolling array
	_values[_currentSlot++] = _value;
	// determine next slot to fill
	_currentSlot = _currentSlot % SAMPLES;
	// now get new sum
	//for (int i = 0; i < SAMPLES; i++)
	//{
	//	_sum += _values[i];
	//}
	// get average
	_lastValue = _sum / SAMPLES; // (_sum >> 4); // 
	// return 
	return _lastValue;
}

int ADCTouchClass::lastValue()
{
	return _lastValue;
}

Additional comments:

  1. I have stripped out the ifdef's for the Tiny processor (mostly for readability here).
  2. I have reduced the number of readings to 16 and still get what I think are reasonable results.
  3. I store the most recent average in _lastValue so that within the remainder of my loop, I can query the value without having to do another reading.
  4. I tried using shift-right rather than division (as suggested in this conversation), but did not notice any real difference in performance; perhaps because of compiler optimization, staying with powers of 2.
  5. And although I don't think it makes much difference at 16 samples, rather than re-totaling the sum on every iteration, I simply subtract the previous value in that array element and add the new value, maintaining _sum without having to add them all up again.
  6. I realize that I could factor out the common code of the two methods...

Any thoughts?

Regards,
Jerry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants