Measuring the Steps to Design Checkoff
I’ve recently had the opportunity to rebuild a NAND flash controller for Arasan Chip Systems, Inc. The purpose of this redesign was to upgrade an existing flash controller so that it can process flash requests and commands at even faster speeds than ever before. Indeed, this new controller is designed to handle flash chips that operate at speeds of up to 1.6GB/s while being backwards compatible with all previous speeds. Those who know my typical design goals will already know my personal goals for this design: throughput, throughput, throughput!
What I wanted to discuss today, however, was a basic management problem I came across while working on this project. At one time, during development, I was asked by Arasan’s management team where I was at in the project, and how much longer the project would take to accomplish.
Gosh, where do you start?
First, let’s face reality, despite how much an engineer would like to avoid such questions, these are valid questions. Customers have a right to know how far along you are, and when you think you’ll have the project completed. In other words, you might not like the question, but it is one that needs answering.
Perhaps the best and most truthful answer might be, “I’m working on it, and it’ll get done when it gets done.” While this might be true, it doesn’t allow you the ability to express your progress at all. Customers want to know your timeline, and they want to know that the timeline you give them is well justified. Put simply, you need to break projects down into tasks–tasks that need to take place between now and project completion. Each task then needs to be given a time estimate, leading up to an ultimate completion date for each project.
Here’s an example of what such a task list might look like.
You might find it better to view something like this via a Gantt chart rather than as a list of independent tasks, but let’s pull on this idea of a task list a bit more for discussion. Why? Because, once the interfaces to a module have been defined, then the design of each module in a larger system can become an independent task which can then be separated from the rest of the design. Of course, this only goes so far: often when designing a module, you will often discover changes that need to be made to the interface, and some tasks (like simulation) cannot begin until other tasks complete, but this is still a good place to start.
One of the benefits of independent tasks is that they can be accomplished in any order. Over time, I’ve even found myself walking around in a large project in a somewhat scatterbrained fashion and doing things out of order. This leads to two clear problems. The first is, how shall I know if an individual task is complete or not? The second one is like it. For those tasks that are not complete, how shall I measure how much of the task remains?
Answering these questions will be the topic of this article.
Measuring design progress
The current solution that I came up with to measure design progress involves building a list of design components (modules), and then discussing their status in a sort of stop-light chart. For example, Fig. 2 below shows five components of a design together with their various states of development.
Many of the parts and pieces of this example design are already posted on Github. For example, you can find the AXI Crossbar or the ZipCPU’s AXI-lite instruction fetch all available on line. You can even find my AXI DMA on github. Other components, such as the “Custom DMA”, the custom “Micro-Controller”, or the “Top Level Design” listed here, were not yet built much less posted anywhere. Each component is then given a row in this chart, where each column contains a task to be accomplished before the component is completed. Each box is then filled in with a color and a status to describe the reason for the color.
In general, my color scheme follows:
Red: I color tasks red if either the task not yet started, or if known problems exist with the task. Perhaps a formal proof fails. Perhaps the component fails in simulation. Either case would show up here in red.
Yellow: Yellow is for partial completion–a work in progress. To be yellow, the design component must at least be written, but may not (yet) pass any verification steps. When it comes to formal proofs, a yellow box would indicate that the design passed at one time, but for some reason the proof needs to be re-run–perhaps because of some changes that have since been made to the design, perhaps the interface requirements needed to be updated, or perhaps because a new configuration now needs to be tested. Whatever the reason, yellow shows progress without yet showing completion.
Green: A green box is good to go. The task is completed. Both the design portion is complete, and the given portion of testing has been completed as well. No more attention needs to be paid to a green task.
Gray: Not applicable. For example, I tend to only formally verify leaf components of my designs for complexity reasons. The top level of a design rarely sees the light of formal tools, and therefore needs to be tested via simulation only.
Now that we know the colors, let’s look at the columns.
The first column simply provides an overview of the status of the component. Is it in development? If development hasn’t yet started, the component will be colored in red. Has the component been verified? Is it being tested? Has it been signed off? Once a component has received a formal sign off, and hence is ready for integrated simulation, then I’ll color it green. All of this is captured by the first column.
The next four columns follow from my discussion of the “Four Keys to getting your design to work the first time” article. These columns indicate whether or not the design component has passed an interface check, a contract check, induction, or cover. In summary:
The interface check verifies the interfaces the component is supposed to maintain. If done well, a formal interface definition can define both sides of the interface, and so assumptions made in one module can be verified as assertions on the other side of any interface.
As an example from Fig. 2, the AXI-Lite fetch module maintains two interfaces: one with the AXI-Lite bus, and a second between the fetch and the CPU. Both interfaces need to be clearly defined, and then the design will need to pass a formal property check against this definition in order for the design to pass this box.
The contract check is an internal check, designed to prove that the design component does what it is supposed to do internally.
Continuing with the example of the AXI-Lite fetch, the contract check would follow:
First, pick an arbitrary memory address together with an arbitrary 32-bit value (the instruction) that you will assume be at that address.
I will also pick another arbitrary bit to decide whether or not reading from this address will produce a bus error or not.
When the AXI-Lite bus returns an instruction associated with that address, assume the value returned matches the arbitrary instruction selected above. The extra bit will also determine whether or not this return is OKAY or represents some form of bus error or not.
Now prove (i.e. assert) that the instruction passed to the CPU that is associated with this address matches the arbitrary instruction value picked earlier. Or, if the bus returned an error, then assert that the illegal instruction flag is sent to the CPU with this instruction.
Note that the check above follows my “master rule” of formal verification: Assume inputs, such as the instruction returned by the bus, and then assert outputs–such as the instruction returned to the CPU.
The induction check will give you the confidence you need that your interface and contract properties are not just held for the first twenty clock periods, but rather for all time.
As an example here, it can be a challenge to formally verify that a component is fully AXI compliant. There’s just a lot of assertions involved. If you only attempt 64-cycles of a bounded model check, then you’ll never know whether or not your design would fail on the 65th cycle of a 256-beat burst, or whether or not your design can handle two 32-beat bursts in a row.
If the component being tested were a cache component, the induction check might help verify that nothing spoiled the cache between when the instruction was first returned to the CPU, and when it may have been returned some time later. (How long? Longer than the bound of the proof!)
For these questions, you will need to pass induction.
Finally, the cover check is left over to make certain that, in spite of any assumptions you may have made, your design is still able to accomplish whatever tasks were assigned to it.
Years ago, when visiting a helicopter squadron, I remember being introduced to the test pilots of the squadron. Why would a squadron need test pilots, I asked? The helicopters were decades old, and their performance well established, what was the purpose of the test pilots or the test pilot section they were members of? The answer I was given surprised me: After taking a helicopter apart to fix it, it’s important to shake it around in a test flight just to make certain everything was properly put back in place. In digital design, we might call such a test a “regression” test.
In a similar fashion, I use cover checks in a sanity checking role. There’s been more than once where I’ve convinced myself that a design passes a formal verification check, only to discover later that I had erroneously assumed the design would remain in reset the whole time. A proper cover check would’ve failed in this case, which would then lead me to the problem in my proof.
Once all of the various components in a design pass their formal checks, I’ll then turn to simulation so that I can demonstrate the functionality of the design. Note that, at this point, I’m no longer trying to prove the design works–that’s already been done via formal methods. Instead, I’m now trying to demonstrate that the design works. Frankly, there’s no way I could make any simulation as thorough as the formal proofs preceding it, but there remain lots of good reasons for using simulations and bugs that still get caught in this final step.
The simulation column might also be where I’d include code coverage statistics, if the project required them.
Some projects have an additional column that I haven’t listed in this chart. This column would capture whether or not the design as a whole has been hardware tested and proven. This might mean that the design has been placed onto an FPGA, and that it has been demonstrated to work on the FPGA. It might also mean that an ASIC has been built from the design. The important part here is that, while this is a very useful column, it’s not necessarily an important check off criteria for all designs.
With that as background, let’s go back and examine Fig. 2 a bit.
As you can see from the list of components, my AXI crossbar has failed its induction check and changes have been made to it since it was last formally verified. This is evidence of work that needs to be accomplished.
The AXI-lite instruction fetch, however, had only recently been built when I put this chart together. At the time, it was still a very new design. Still, it has passed all of its property checks, and was then being used in simulation testing. (It’s now been used quite successfully in several projects, but that’s another story for another time.)
The microcontroller in this example, however, had suffered from catastrophic failures in simulation. (I had tried to skip the formal checks after making some minor changes and then to run directly to simulation. Skipping steps often feels like you are moving faster, when in this case it meant I had to go back and do some rework.) Since bugs had been found in the microcontroller, I marked it in red so that I would remember to go back and update the formal proof so these bugs never come back again.
The custom DMA should be self-explanatory–it simply hadn’t (yet) been built, so it’s line was red through and through.
That brings us to the “Top Level Design” component. This level will not be formally verified, and so those boxes have been grayed out. This is for two reasons. First, if everything beneath it was properly verified, then this level shouldn’t need to be verified at all. Second, formal verification depends upon an exhaustive search over all potential states. This search is exponentially complex and can easily grow beyond what the designer’s (patience and) computer power are able to handle. Therefore, the top level design will be the focus of a lot of simulation work, but that won’t be able to happen until all the parts and pieces composing it have been built.
Since I started building charts like this one, I’ve found other uses for building charts like this. Three particular uses include scheduling, cost estimation, and explaining design proposals in contract bids.
When scheduling, I’ll place the amount of time in each box that I expect a task to take. The sum of all the remaining times in a project should then be the amount of work time remaining in a project.
Do I always get this right? No. A better question to ask might be whether I ever get this estimate right at all. The answer is probably not, but it’s at least a good first order estimate.
If you can figure out how much time a project will take, then cost estimation becomes easier. For most of my projects, the primary cost driver is engineering time–my own personal time in particular. Once a time has been estimated for each of the tasks in the project, all that really remains is to multiply that time by an hourly rate.
Tracking the time used in any particular task is also an important part of estimating such acquisition measures as the Actual Cost of Work Performed (ACWP) or the Budgeted Cost of Work Performed (BCWP) for those customers that require these numbers.
Yes, I also use this approach when bidding for contracts as well.
However, when bidding for contracts, I’ll often list components instead of components and tasks, and then rearrange those components into something with more meaning. Perhaps its a data flow diagram of some type. Perhaps its a system block diagram. Either way, such a diagram then helps me to illustrate for the customer that I understand the task at hand, while the stop-light colors help to illustrate the status of the various library components I might bring to the project.
In short, I’ve found this stop-light approach very valuable when communicating with customers.
I’ve now used this component list progress chart for several projects and purposes. It has worked so well for me, that I intend to use it again and again on future projects as the projects and needs allow.
And ye shall hear of wars and rumours of wars: see that ye be not troubled: for all these things must come to pass, but the end is not yet. For nation shall rise against nation, and kingdom against kingdom: and there shall be famines, and pestilences, and earthquakes, in divers places. (Matt 24:6-7)