consider shifts instead of division #8

nerdralph · 2020-02-29T12:48:26Z

Division on AVRs is very expensive in both time and space. Signed long division will increase code size by about 150 bytes.

Even better would be to do the averaging by summation only. Do a fixed 64 loops, adding the ADC value to a uint16_t. Since the AVR ADC is 10 bits, 64 loops will give a final result that fits in 16 bits without overflow.

martin2250 · 2020-02-29T16:13:33Z

Hi Ralph,

you're right, using long division for such a simple task is overkill. That said, shouldn't the compiler optimize the division if the denominator is hard-coded (may be wrong, I never tested this)? Maybe we should encourage people to use powers of two for the number of samples preferably. I don't think fixing the number of samples to 64 is a good idea, we should give users the choice to trade speed for noise.

A good compromise could be adding a function that just take a single sample, that would allow advanced users to write their own loop and be clever by using shifts and such.

Martin

nerdralph · 2020-02-29T16:32:01Z

gcc will usually optimize division by a power of two into right shifts. For division by the default of 200, there's not much that can be optimized. If you want to offer the choice between quick (i.e. 1 sample) and lower noise from multiple samples, then limit the samples to 64. That's more than enough to average out the noise, and it will still fit in a uint_16. Using a long to accumulate samples increases the code size, and provides no tangible benefit over a limit of 64 samples.

…

On Sat, Feb 29, 2020 at 12:13 PM Martin Pittermann ***@***.***> wrote: Hi Ralph, you're right, using long division for such a simple task is overkill. That said, shouldn't the compiler optimize the division if the denominator is hard-coded (may be wrong, I never tested this)? Maybe we should encourage people to use powers of two for the number of samples preferably. I don't think fixing the number of samples to 64 is a good idea, we should give users the choice to trade speed for noise. A good compromise could be adding a function that just take a single sample, that would allow advanced users to be clever and use shifts and such. Martin — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#8?email_source=notifications&email_token=ABKNZ6ST4AF2GTWJY6RF2GTRFEZ25A5CNFSM4K6WKU32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENL55EA#issuecomment-592961168>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKNZ6QDEIZ7CXOV7W7S3FLRFEZ25ANCNFSM4K6WKU3Q> .

jerryfaust · 2021-05-28T02:11:04Z

Hello all.

I'm trying something different that seems to work fairly well, and I would be interested in any feedback.

I am running a 4-axis stepper motor system, and I am doing a pretty lot during each arduino loop() call. Trying out this library, calling the read() method on every loop was too costly, bogging down the axis movement. When reading this conversation and considering optimizations, it occurred to me that I could call the read method ONCE during setup(), taking the 64 readings and placing them into a rolling array. Then on each call to loop(), I have implemented a readNext() method, that just takes one reading, places it into the next position in the rolling array, and gets a new average based on that. This significantly reduced the time required during each iteration, and I think produces similar results to the original algorithm. My loop iterates over 2000 times per second when it's busy, and so the average readings are refreshed many times over. Here is (most of) the new code.

constexpr auto SAMPLES = 16;
int _values[SAMPLES];
int _currentSlot = 0;
int _lastValue = 0;
int _sum = 0;

// Call once during setup()
int ADCTouchClass::read(byte ADCChannel)
{
	//long _value = 0;
	short _value = 0; // , _sum = 0;
	for (int _counter = 0; _counter < SAMPLES; _counter++)
	{
		// set the analog pin as an input pin with a pullup resistor
		// this will start charging the capacitive element attached to that pin
		pinMode(ADCChannel, INPUT_PULLUP);

		// connect the ADC input and the internal sample and hold capacitor to ground to discharge it
		ADMUX |= 0b11111;
		// start the conversion
		ADCSRA |= (1 << ADSC);

		// ADSC is cleared when the conversion finishes
		while ((ADCSRA & (1 << ADSC)));

		pinMode(ADCChannel, INPUT);

		// get value
		_value = analogRead(ADCChannel);
		// save in rolling array
		_values[_counter] = _value;
		// add to running total
		_sum += _value;
	}
	// get average
	_lastValue = _sum / SAMPLES;
	// return 
	return _lastValue;
}

// call once at top of every loop()
int ADCTouchClass::readNext(byte ADCChannel)
{
	short _value = 0; // , _sum = 0;
	// set the analog pin as an input pin with a pullup resistor
	// this will start charging the capacitive element attached to that pin
	pinMode(ADCChannel, INPUT_PULLUP);

	// connect the ADC input and the internal sample and hold capacitor to ground to discharge it
	ADMUX |= 0b11111;
	// start the conversion
	ADCSRA |= (1 << ADSC);

	// ADSC is cleared when the conversion finishes
	while ((ADCSRA & (1 << ADSC)));

	pinMode(ADCChannel, INPUT);

	// get current value
	_value = analogRead(ADCChannel);
	// adjust sum
	_sum = _sum - _values[_currentSlot] + _value;
	// place in rolling array
	_values[_currentSlot++] = _value;
	// determine next slot to fill
	_currentSlot = _currentSlot % SAMPLES;
	// now get new sum
	//for (int i = 0; i < SAMPLES; i++)
	//{
	//	_sum += _values[i];
	//}
	// get average
	_lastValue = _sum / SAMPLES; // (_sum >> 4); // 
	// return 
	return _lastValue;
}

int ADCTouchClass::lastValue()
{
	return _lastValue;
}

Additional comments:

I have stripped out the ifdef's for the Tiny processor (mostly for readability here).
I have reduced the number of readings to 16 and still get what I think are reasonable results.
I store the most recent average in _lastValue so that within the remainder of my loop, I can query the value without having to do another reading.
I tried using shift-right rather than division (as suggested in this conversation), but did not notice any real difference in performance; perhaps because of compiler optimization, staying with powers of 2.
And although I don't think it makes much difference at 16 samples, rather than re-totaling the sum on every iteration, I simply subtract the previous value in that array element and add the new value, maintaining _sum without having to add them all up again.
I realize that I could factor out the common code of the two methods...

Any thoughts?

Regards,
Jerry

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

consider shifts instead of division #8

consider shifts instead of division #8

nerdralph commented Feb 29, 2020

martin2250 commented Feb 29, 2020 •

edited

Loading

Uh oh!

nerdralph commented Feb 29, 2020 via email

Uh oh!

jerryfaust commented May 28, 2021 •

edited

Loading

Uh oh!

consider shifts instead of division #8

consider shifts instead of division #8

Comments

nerdralph commented Feb 29, 2020

martin2250 commented Feb 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nerdralph commented Feb 29, 2020 via email

Uh oh!

jerryfaust commented May 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martin2250 commented Feb 29, 2020 •

edited

Loading

jerryfaust commented May 28, 2021 •

edited

Loading