May 12, 2011
Better FPGAs, Sooner
FPGAs built on the latest 28-nanometer (nm) manufacturing processes can host designs as complex as a five-million-gate ASIC. That’s good news for designers who need to implement a lot of complex logic in an FPGA for deployment into a production system, quickly iron out bugs and then get to production fast. And it’s good news for an estimated 90 percent of ASIC designers who use FPGAs to check their work in physical prototypes.
The challenge in working with these large FPGAs has become a familiar issue for ASIC designers in the past few years – how to expand tools and methodologies beyond the scope of a small project. Making the most of today’s large FPGAs demands a combination of new organizational approaches and tools that can support them.
There’s no “one size fits all” design methodology, but many of the techniques used to speed up any design team activity apply equally well to large FPGA projects. Match the approach to the goal: Are certain portions of the design subject to “must-have” performance specs, while getting “good-enough” results in a tight timeframe is the priority for the rest of the design? Do large tasks need to be broken into smaller elements that can be worked on independently, or in parallel? What’s the best way to localize the impact of changes and shorten error-checking and correction loops? Does the design need to be built on what was done before?
A lot of this is obvious. What’s important is whether or not the design tools make it easy to apply these project-management techniques to the design goals of complex FPGAs.
Hierarchy and blocks
One of the most powerful ways to deal with a complex project is to break it down into a hierarchy of sub-tasks that can be worked on independently and in parallel.
Working within a hierarchy enables designs to be refined from broad concepts to implementation details, with each part of the design progressing independently and at a different rate. For example, this approach would allow a chip design team to finish the detailed implementation of an external bus interface before other parts of the design are complete. It would also allow changes in the design specs to be isolated into smaller portions of the overall design while preserving the rest, thus reducing schedule impact. The designer can combine the use of a bottom-up style flow, where completed blocks are integrated together, with a top-down flow, where multiple blocks of the design are being synthesized from the register-transfer level (RTL) at the same time.
Support for the kind of hierarchical and geographically distributed design approaches that many design teams have typically had to implement manually can be found in Synopsys’ latest Synplify FPGA synthesis and analysis tools. The design can be broken apart for development on multiple machines in multiple places by multiple designers. The design environment enables teams to develop in parallel, to synchronize and integrate changes, and to reuse design modules more easily.
Dividing a design in this way helps isolate “frozen” pre-verified blocks that need to be left alone from blocks that are still in progress, restricting the amount of synthesis necessary to update the overall design and helping to stabilize it.
This is done by enabling designers to define RTL partitions/blocks – so-called “locked compile points” – prior to synthesis, thus maintaining these partitions throughout synthesis and, optionally, throughout placement and routing. The software can also define compile points automatically and will propagate the design’s interface-timing constraints from the top level to each automatic compile point to ensure that timing goals are met, correctly converting any gated and generated clocks along the way.
Block-based design has its disadvantages. The design tools may not be able to optimize the design across partition boundaries, which can sometimes result in lower performance. One way to minimize this issue is to ensure that critical paths do not cross block boundaries.
The ultimate form of block-based design involves taking advantage of predefined intellectual property (IP) blocks, from your own library or from a third party. There’s a catch here for ASIC designers prototyping with FPGAs: The rework needed to move an ASIC RTL description to an FPGA flow can be significant. One solution is Synopsys DesignWare digital IP building blocks, which permit users to employ the same consistent set of IP instantiations in their RTL for both the ASIC and FPGA design flows, allowing the ASIC and FPGA tools to take care of the finer details of optimal implementation.
Being able to decompose a large FPGA project into a hierarchy of blocks that can be worked on independently provides a pathway to greater parallelism in the design tools. For example, a server farm can run several slightly different versions of a block design in parallel and then choose the one with the best results upon completion.
Multiprocessing (using multiple CPU cores on a workstation in parallel) can accelerate synthesis by up to 2X when synthesizing the partitions/blocks of a design in parallel and combining the results at the end. Multiprocessing is a natural way to boost runtimes in a block-based flow, especially when the synthesis tool can automatically partition the design to farm out synthesis to the available processors.
By isolating parts of the design that require tuning, designers can save iteration runtime for synthesis by only re-synthesizing the block that is changing. Runtime for place and route is also reduced by performing incremental place and route on only the parts of the design that changed while attempting to meet timing specs.
Some place-and-route tools support incremental schemes that limit place-and-route changes to only those parts of the netlist that have been revised, yet still try to meet timing. While this technique can halve overall place-and-route runtimes, it can cope with only a limited number of iterations before running a full place-and-route again is necessitated. This approach is also likely to offer limited success on critical paths, or in “high utilization” designs that use a high proportion of the available gates. In these cases, even small changes can sometimes have a wide effect.
Along with sheer hardware and partitioning power, raw synthesis runtime acceleration can provide designers with fast design feedback and fast initial board implementations [see Figure 1].
Figure 1: Synopsys’ estimates suggest that the latest improvements to Synplify Premier’s fast synthesis mode can accelerate runtimes by up to 4X compared with standard logic synthesis.
For example, Synplify’s fast synthesis mode can speed things up by disabling certain optimizations. The trade-off may be a design implementation that has a slower clock speed and uses more chip resources, but this may not matter if the goal is just to get rapid feedback on some design ideas.
There are various ways of using fast synthesis, depending on a designer’s goals. For example, a designer wanting to tune a design’s RTL constraints can use fast synthesis to perform a number of synthesis-only iterations within normal, tight constraints. But if the goal is simply to get a design into an FPGA and onto a board as quickly as possible, running the synthesis with loose timing constraints should save 25 percent of the normal runtime.
The same approach is possible in place and route, which typically takes more than half of the total design iteration time for a multi-million-gate FPGA project. Some FPGA place-and-route tools have fast or low-effort modes, which sacrifice the quality of results in favor of shorter runtimes. This feature can be useful for checking a small design change.
There are other enhancements that can help designers balance design time and quality of results. For example, some synthesis tools can be set to keep running despite encountering errors so that they don’t have to be restarted every time a handful of errors is detected.
Meeting design goals
The aforementioned project-management techniques can help teams develop new versions of their designs more quickly, trading off the quality of results for reduced runtimes. Eventually, though, a team’s attention will turn to meeting the final design goals, particularly in terms of power consumption and performance.
Design teams can manage dynamic power by utilizing features in Synplify Premier that generate switching activity during synthesis, which can be used for power analysis and driving power optimizations with FPGA vendor tools. Other features include automatic power optimizations for third-party RAM and DSP blocks, choosing smaller blocks and powering down functions when such blocks are not being accessed.
Another capability of value to design is applying the full power of the tool’s logic synthesis, placement and physical-synthesis engines in a coordinated way to meet timing requirements. This can be useful when a design meets timing constraints during logic synthesis but breaks them during the place-and-route phase, usually due to poor timing correlation between the two environments.
Physical synthesis tools improve timing predictability by both synthesizing and placing the logic to build a more accurate interconnect utilization and delay model, and then passing the resultant placement constraints onto the FPGA vendor’s router.
Many divide-and-conquer techniques are now applicable to FPGA design and will be well recognized by ASIC designers and by managers endeavoring to complete large projects. What may be news is that a combination of support for hierarchical and block-based design, both bottom-up and top-down flows, and advanced optimization functions is now available in a single set of FPGA synthesis tools.
About the authors
Angela Sutton brings more than 20 years of experience in semiconductor design tools to her role as staff product marketing manager for Synopsys, Inc. She is responsible for the FPGA Implementation Product Line. Prior to joining Synopsys, Ms. Sutton worked as senior product marketing manager in charge of FPGA implementation tools at Synplicity, Inc., which was acquired by Synopsys in May 2008. Ms. Sutton has also held various business development, marketing and engineering positions at Cadence, Mentor Graphics and LSI Logic. At LSI Logic she was responsible for marketing its line of digital video semiconductor products and platforms.
Ms. Sutton holds a BSc. in Applied Physics from Durham University, UK, and a PhD. in Engineering from Aberdeen University, UK.
Jeff Garrison brings more than 20 years of experience in marketing and software engineering to his role as director of product marketing for FPGA implementation at Synopsys, Inc. His responsibilities include product strategy, definition, and launch for Synopsys’ FPGA products including Synplify, Synplify Pro and Synplify Premier.
Prior to joining Synopsys, Mr. Garrison worked as senior director of product marketing at Synplicity, Inc., which was acquired by Synopsys in May 2008. Mr. Garrison also held positions as a senior product marketing manager for several IC design products at Cadence Design Systems, product engineering and technical support for VLSI Technology and worked in the software support division of Hewlett Packard. Mr. Garrison holds a bachelors degree in computer science from Indiana University.