Monthly Archives: September 2015

9/25/2015, Neopixel Eye Candy

Looks like this weekend is the time to try and hookup the Neopixels.  There have been a bunch of small changes that I ended up making to get things working as efficiently as possible.

So I got a little over zealous in the belief that I had infinite processing power.   The SPI bus is running at 2.4 MHz, so each bit is 417 ns.  8 * 417ns =  3.33us.  Hmmm, that’s pretty darn fast come to think of it.  Assuming 1.5 assembly instructions/clock (RISC based processor with most stuff 1 cycle, but  loading, storing, and pushing info onto the stack takes more cycles.  Branches are particularly long since the pipeline may need to be flushed), that means the processor has about 53 instructions.  In a tight loop the processor was not able to keep the FIFO filled.

The processor is only running at 24MHz (the default), so let’s kick that up to 48MHz which is what the processor can run.  That made it so the processor could keep the FIFO full without an issue.  The dream of dynamically determining pixel color is probably not possible without hand coding assembly.  It is much simpler to allocate the required memory at initialization, and update the RAM on the periodic timer.

Kept thinking about what else could be done.  The TX FIFO for the SCB can be changed to 16 bits wide instead of 8 bits wide.  This is somewhat annoying since each pixel takes 9 bytes of data, and now pixels must cross a FIFO write boundary.  Last optimization is using the TX FIFO level to kick off the processing.  If the level is set to 4, there are 5 16 bit pieces of data (4 TX FIFO slots, plus the shift register itself) of interrupt latency that can be tolerated.  That is about 33.33us of interrupt latency that can be handled without underflowing the FIFO and not properly updating the chain of Neopixels.

So for 64 Neopixels, the amount of RAM required is (64 * 9) (RAM for holding data sent on SPI bus) + 64 (holds current command for each Neopixel) = 640 bytes.  The processor has 8K, so only using half the RAM for this allows about 400 Neopixels per board.  Of course, there is no reason that you couldn’t put multiple boards into the machine to support more, but well, that seems ludicrous.

Here is a quick picture of the demo systems setup:

Neopixel Demo

Quick video of the Neopixels working:

There is something weird going on with the first Neopixel.  It might be that I got it a little too hot when I soldered the wires to the strip.  (I didn’t “tin” the pads before soldering on the leads, which meant I had to use a lot more heat).  It could also be a bug in the program, and I should probably send a blank 16 byte word before sending the first bit of data.  At this point, it is good enough to hand off.

I’m getting pretty familiar with the processor at this point, so it is getting faster for me to add new features.  I really have to work on the next serial commands to interface with the next generation of the boards.

9/17/2015, Neopixels, Get Off My Back

So multiple people have asked me about Neopixels and do I support them.  I was so busy looking at other stuff, I simply did not have the bandwidth to look into Neopixels.  As the work on SS3 was winding down this Summer, and I started to look into the next generation boards, I thought that I should do some research to make sure that I didn’t preclude their use.  In July I did read about them, and even included a wing board to make interfacing with them as simple as possible.

I was originally hesitant about Neopixels because it adds one more layer of difficulty to programming a pinball machine.  Right now, the OPP hardware only supports turning on and off lights (either incandescent or LED bulbs).  The framework supports turning the bulbs on or off, or blinking the lights slowly or rapidly.  The blinking happens automatically in the framework so the user doesn’t have to bother with changing the lights all the time.  If I switch to using Neopixels, they are not simply on/off, but now they can support different colors which will be even more difficult to use.

I spent a couple of days trying to figure out the best way to command a “smart” Neopixel controller.  One would be to continuously send the color for each pixel, and that would provide all the features.  That would use a lot of the bandwidth of the serial links between the boards and is very inefficient.

Instead, I decided to use a byte for each of the pixels.  The byte contains a command (turn pixel on, blink quickly, blink slowly, fade quickly, or fade slowly), and a color index which looks up the pixel’s color using a color table.  The color table contains 32 possible colors, and of course there will be commands to change the color table as necessary.  (Each entry in the color table contains the 8 bits for red, green and blue parts of the color).  The blink commands simply turn the pixels on and off with the chosen color in a synchronized fashion.  The fade commands go from dark to bright, then back to dark to allow the pixels to pulse.

I’m planning on trying to get the Neopixel code up and running on the PSoC 4200 board this weekend or next.  (It really depends on how quickly I can bring up the debugger).  I’ll use short button presses to change the color, and long button presses to change the commands to demonstrate that it is working properly.

Here is a concrete example why using the simple generic code that Creator provides will not work and why such code should be avoided.  To talk to the Neopixels, the processor uses the SCB SPI bus component to stream the data.  The SPI bus will be running at 2.4 MHz to meet Neopixel timing specs (3 SPI bits for every 1 Neopixel bit, matching the Lady Ada Uberguide specs).  Every 20 ms the Neopixels will be refreshed.  (This is so things such as automatic blinking and fading can happen automatically).  That boils down to 9 bytes of data/neopixel * 64 max neopixels = 576 bytes of data/20 ms.  (It will support more Neopixels, but 64 seems like a good starting point).  The standard functions created by the tool are blocking calls.  Blocking calls wait in a busy loop if they can’t put all the data on the Tx buffer.  If a solenoid needed to be fired at that time, it couldn’t happen because the processor would be busy updating the Neopixels.  Not very efficient at all.  Instead, the code will watch for the FIFO empty interrupt and when that happens toss another 8 bytes of info onto the FIFO, and then go back to normal processing.  This means that the processing of the Neopixel data is distributed over the 20 ms time period, and latencies for other processing will be reduced.

So why is the name of this post “Get Off My Back.”  I’m hoping that either this weekend or next weekend I will have the code up and running, and be able to hand off a demo unit to allow Dave to play with it and see its capabilities.  He currently makes replacement backbox lighting mods for Stern machines and does a really nice job.  This should give him a very low cost solution so he will be able to make his backbox mods that much more exciting.

8/31/2015, PSOC 4200, The Good, The Bad, and The Ugly

Nice name for the entry seeing as though it is already ten days after I started writing this post.   This entry is going to be mostly on bringing up an embedded processor from scratch.

So I started working with the PSoC 4200 over the last couple of weeks.  After going down a good number of rabbit holes, and not being able to figure out if certain tools would work together, I decided to simply suck it up and use the Cypress Creator software to start.  The bonus of that is that within a couple of hours I was able to download the example project, and burn it onto the board and run it.  I then made a small change, recompiled and threw that down onto the board to make certain that I was actually programming the board successfully.

When I bring up a processor from scratch, I like to follow a pretty rigorous path.  It basically moves from the absolutely easiest stuff to the more difficult stuff.  Here is the order that I tend to tackle the projects, or embedded programming 101:

  1. Blink an LED.  Almost every board has an LED on it, and if not throw down a resistor and an LED on one of the output pins.  The first incarnation uses a loop and a counter to blink the LED on and off.  I then modify the counter in a loop to make it blink either faster or slower to prove to myself that I am altering the code successfully.  (At this point, I hook up a debugger and make sure that I can view and step through the code to make sure that I have the debugger set up properly.  Currently, I don’t have a debugger, but I’m hoping to grab one in the next few weeks).
  2. Blink the LED using a timer to change between the LED on and the LED off.  This proves that I understand configuring the timers, and understand the clocks within the chip.  It is very easy to accidentally miss clock divider, or be off by a little bit.  This step also requires understanding clock routing within the chip.
  3. Use the timer to cause an interrupt and change the LED on/off state in an interrupt.  Requires understanding of interrupts, interrupt vector table, and how to clear the interrupt sources.
  4. Transmit words continuously on the UART interface.  Make sure that I have the baud rate set up properly for the UART, and can see the data coming back on a PC.
  5. Echo received characters on the UART interface, back to the transmit interface.  When a type an ‘x’ on the keyboard, I should see that echoed back.

After finishing those five simple steps I’m usually familiar enough with the processor, that every thing else is simply reading documentation and digging through registers.   Doing the above steps forces the programmer to read and understand the documentation since every company documents their chip in very different ways.  Some companies have a single 2000 page programming reference guide, while other companies break each hardware subsection into a different document.

So one of the absurd things with the Cypress Creator IDE is that it tries to force the programmer into using their canned components as opposed to actually programming the processor.  If a UART is needed, drag and drop it into the fake processor schematic and fill in a couple of fields.  No need to understand the registers that actually are underlying the component.  Same thing with I/O pins, SPI buses, etc.  The down side to that is the code becomes very large, because it requires all of these generic components where most of the functionality is not being used.

Here is the actual code C code that blinks the LED on the board:

typedef volatile unsigned long R32;
typedef unsigned long U32;

#define GPIO_PRT1_DR        0x40040100
#define GPIO_PRT1_PS        0x40040104
#define GPIO_PRT1_PC        0x40040108
#define GPIO_PRT1_INTR_CFG  0x4004010c
#define GPIO_PRT1_INTR      0x40040110
#define GPIO_PRT1_PC2       0x40040114
#define HSIOM_PORT_SEL1     0x40010004

int main()
{
   /* Initialization code */
   U32 count = 0;

   *((R32 *)GPIO_PRT1_DR) = 0xff;
   *((R32 *)GPIO_PRT1_PC) = 0x00180000;
   *((R32 *)GPIO_PRT1_INTR_CFG) = 0;
   *((R32 *)GPIO_PRT1_PC2) = 0;

   /* Send the GPIO bit to the hardware pin */
   *((R32 *)HSIOM_PORT_SEL1) &= ~0x0f000000;
   for(;;)
   {
      /* Place your application code here. */
      if (count == 0x10000)
      {
         *((R32 *)GPIO_PRT1_DR) = 0x00;
      }
      if (count >= 0x20000)
      {
         *((R32 *)GPIO_PRT1_DR) = 0xff;
         count = 0;
      }
      count++;
    }
 }

Looking at the above code, it is a grand total of five initialization statements, and then a simple loop with a counter changing the LED bit on and off.  Compiling the code, takes maybe two or three seconds.  It produces 100 bytes of object code.  Using the Creator tool, and a PWM to blink the LED on and off, it takes a little over twenty or thirty seconds to compile, and generates a couple K of functions that I may or may not use.  The long compile time is caused by auto generating code, routing and configuring clocks inside the chip, and running the FPGA style generator.  I’m not really using any of those resources currently, so it is simply a waste of my time.

The other issue is that Cypress has allowed their documentation to take a back seat to their code generation tool.  The documentation is not only poor, but it is incomplete.  I tried to find the address of certain registers, and it is not located in any of the documents.  I had to dig through their generated code to figure out the addresses of the registers.  I would say that their documentation is very poor, even below Microchip if that is possible.  Right now Freescale and ST Micro have very good and very complete documentation.  Microchip, and Cypress don’t compare.

They provide a bootloader.  That is fabulous especially if it is written well.  The only problem is that it seems to have been written by a first or second year co-op student.  The bootloader size should be as small as possible.  The Microchip bootloaders I wrote were either 768 or 1024 bytes depending on the processor.  The Freescale bootloader was 1024 bytes.  (It was actually about 530 bytes, but because of the flash protection scheme, code could only be protected in 512 byte blocks, so I had to move to two blocks).  Their bootloader component is 6400 bytes, or 1/5 of the 32K of code in the processor.  What the heck are they doing in there?  Maybe they are running the SETI program with the unused cycles.

A second issue with the bootloader is that it requires RAM for the interrupt vector table.  Admittedly, that is the easiest way to write a bootloader, but it means that 196 bytes of RAM are sucked up by that table.  A better, but more complex method is to create a jump table in the low flash, and move the exception vector table into the application code and not rely on .  While it will incur a couple extra clock cycles of delay during an interrupt, the RAM is preserved for use by the application code.  That would have been a show stopper with the old processor that only had 8K of Flash and 512 bytes of RAM.  Since this processor has 32K of Flash, and 4K or RAM, it is less of an issue.

Right now I don’t have a debugger.  I’m gonna grab a Pioneer board (costs about $25) which includes a debugger and a PSOC 4200 processor.  Having a debugger will really accelerate the speed of developing the code.