Network Systems DesignLine | Tip of the Week: Correcting soft errors can't be an afterthought

Get the latest news, products and how-to information on network systems. Sign up for the Network Systems DesignLine newsletter, a weekly e-mail guide dedicated to the needs of engineers developing networking equipment and components. Here is our RSS feed.








 Network Systems DesignLine » How-To » IP Networking

 
 HOW-TO : IP Networking

Tip of the Week: Correcting soft errors can't be an afterthought

To ensure the highest network QoS and compliance with service level agreements, the ternary content-addressable memory (TCAM) must use up-to-date error correction code (ECC) techniques. Here they are.
Print This Story Send As Email Discuss This Story Reprints



Network Systems Designline

Rate this article
WORSE | BETTER
1 2 3 4 5
Why talk about soft errors at all? Surely, the problems were identified decades ago and counter-measures put in place? Well, yes--up to a point. Design counter-measures such as reducing crosstalk, and manufacturing counter-measures such as eliminating boron phosphor silicate glass, using low alpha packaging, and eliminating lead have all contributed significantly to reducing the failure rate due to soft errors.

The problem is that these reduction techniques cannot completely eradicate soft errors. In addition, reducing errors due to cosmic radiation requires several feet of shielding concrete, and is generally not practical. Worse yet, the consequences of a soft error can be orders of magnitude greater in today's complex systems than in the simpler systems of yesteryear. For example, a modern complex system may well require a re-boot to recover from a particular soft error, severely affecting quality of service (QoS). That's why the soft error problem--which started out as a static random access memory (SRAM) problem in aeronautics and space applications--is increasingly exercising network equipment providers, and they are imposing ever-more stringent soft error specifications and restrictions.

SRAM failures in time (FIT) per megabit of memory have fallen with each successive process technology node. But what about ternary content-addressable memory (TCAM)? The TCAM failure rates are much the same as those of SRAM at the 90nm node, but the TCAM failure rate has trended upward over the last two nodes, so we have to prepare for the likelihood that the TCAM failure rate will exceed the SRAM failure rate at 65nm. And, given that the rate is per megabit, the continual increases in memory capacity will exacerbate the problem further.

This trend is critical, because TCAM is the underlying technology in the network search engines that provide network packet classification services and packet forwarding.

According to Olivier Lauzeral, president of iRoC Technologies Corporation, a provider of soft error assessment tools, soft error professional services and radiation test services, "The great majority of failures in SRAM and TCAM are due to single event upsets, or single bit errors."

Consequently, to ensure the highest network QoS and compliance with service level agreements, the TCAM must employ the most up-to-date error correction code (ECC) techniques. So, what are these techniques?

Error correction code is applied at several levels in a system to minimize the amount of time that a soft error survives. For instance, in high reliability systems, ECC is applied all the way down the function chain from the application level to ensure correct system function. However, the point at which an error--or, rather, the effect thereof--is detected has a major impact on QoS. Continuous error monitoring and elimination at the box and network level may well avoid many catastrophic failures. However, it may not be possible to correct errors detected "late" in the application chain, so detecting the error at the chip level is the highest reliability approach.

The question is: should ECC be integrated into the TCAM or should it be applied externally? After all, external ECC has worked for years and it is the industry's time-honored means of correcting memory errors.

External ECC requires the system designer to develop the requisite circuitry with a consequent increase in design time, effort and cost, component cost and board space. It also requires command and control from the system's processors and may increase system latency, thus potentially adversely affecting system performance. These disadvantages--especially the performance disadvantages--will grow even worse as the TCAMs become larger.

Integrating ECC into the TCAM eliminates all of these problems. Integrated ECC circuitry is optimized specifically for the TCAM, and thus adds the absolute minimum gate count and cost necessary to do the job. In addition, the ECC operation is "hidden" by the latency of the device, a parameter that is specified and predictable across the range of operating conditions. And the semiconductor manufacturer does the design and verification work. For the system designer, it becomes an "X"--a don't care.

It is for these reasons that IDT integrated ECC into its search accelerator. The ECC corrects single errors and detects double errors in the core, and checks bus parity on the interfaces. Very importantly, it checks content that has not been recently accessed to detect any "silent" failures. These are failures that would otherwise remain undetected until the data is accessed, and could have catastrophic consequences in the case of, for example, a misdirected 911 emergency call. Moreover, the integrated ECC operates in background mode and thus does not degrade search performance.

As we progress down the process technology curve, as TCAMs become ever larger, and as we squeeze every last bit of performance out of them, integrated ECC for soft error correction is the only cost-effective solution for equipment providers. Correcting soft errors cannot be an afterthought.

About the Author
Dave Cech is Director of Product Management and Marketing for the IP Co-processor product line at Integrated Device Technology Inc. (IDT; San Jose, Calif.). He earned a bachelor's degree in mathematics and economics from the University of California, Santa Barbara, and a master's degree in business administration from California's Golden Gate University.He can be reached at:

Print This Story Send As Email Discuss This Story Reprints

 
eSearch  

 Top 5 Most Read
 How-To Stories
1. 2. 3. 4. 5.

 Top 5 Most Read
 News Stories
1. 2. 3. 4. 5.

  • Introduction to Optical Transmission Systems

  • Optimizing Embedded Systems for Broadband 10 Gigabit Ethernet Connectivity

  • Interfacing a DS3231 with an 8051-Type Microcontroller

  • The entire library >>  

     
     Top 5 Most Read
     Product Stories
    1. 2. 3. 4. 5.

     Sponsor

    EE Times TechCareers
    Search Jobs

    Enter Keyword(s):


    Function:


    State:
      

    Post Your Resume
    -----------------
    Employers Area
    Most Recent Posts More career-related news, resources and job postings for technology professionals

     Tech Library
    ¤ Looking for the appropriate Industry Association? This comprehensive, up-to-date list will take you to the right Web site for the help you need.

    ¤ Got a question about a standard? Here are direct links to resources detailing the industry's most important communications standards.

    ¤ Freshen up on technology, new and old, with these links to interesting and informative tutorials.

    More from TechLibrary

    Welcome to our DesignLine network of web communities. On these sites, we provide practical how-to technical information for engineers and engineering managers involved in Automotive,audio, DSP, DTV, EDA, Industrial Control, Mobile Handset, Power Management, Programmable Logic,RF,Video, and Wireless networking design. Check out the sites and let us know your thoughts.
     



    Career Center | CommsDesign.com | Embedded.com | EE Times | TechOnline
    Planet Analog | DeepChip | eeProductCenter | Electronic Supply & Manufacturing | Webinars