Well the Arduino nano performance was atrocious, but I blame that mostly on the Arduino framework itself. After doing a little bit of research, it seems that reading the analog inputs occurs using a busy wait. A read can take approximately 100us, so that means that most of the time is spent waiting for the ADC to complete its conversion. That’s what I will target to make it more efficient.
So in my first rendition of the program, I read four analog inputs and four digital inputs for the inputs of the solenoids. I could just as easily have read two analog inputs, and six digital inputs to minimize the number of analog reads. The second program ArduinoBareOPP.ino uses the smaller amount of analog inputs to minimize those reads.
If the Arduino framework forces me to busy wait reading from the analog inputs, it is time to add an interrupt in. The interrupt gets called after the ADC completes. At that point, I set the mux to read from the next channel and kick that conversion off. That means that the conversions are happening in the background without busy waiting for those to complete. In the interrupt, I also fill currData which holds the value of the inputs. This is also where I convert it from an ADC sample to a single digital bit using a threshold.
Lastly, I’ve heard that reading bits using digitalRead does a whole bunch of verifications on the inputs. Why not just read directly from the Atmega328p register and do all the bits at once. The code at the beginning of “loop” does the necessary bit manipulations to get this to happen using a single read from the PORTC and PORTB registers. I haven’t converted the output bits to using direct register writes, but they happen so much less frequently, that it won’t make much difference.
So immediately after burning the code, I noticed that the LED which I use to time the loop is much less blinky. (Before, I could notice a very faint flicker in the LED, but now it seems to be continuously lit dimly.) That is a good sign. The real test is when I throw it on the logic analyzer. So now I get the main loop taking between 23.5 us and 29 us. That is a significant improvement from the original 488 us. (1/20th of the time).
So the pulses normally take 4 us, then every once in a while they take 11 us. This probably means that the interrupt takes about 6.5 us or 7 us. That also matches up nicely to the 6.5 us difference that I see in the length of time it takes to run the main loop. It also says that I should take the time to stop using digitalWrite and write directly to the register.
Here is a link to the bare metal version of the code.