Network Systems DesignLine | Start your crypto engine--cryptographic acceleration in SoCs

Get the latest news, products and how-to information on network systems. Sign up for the Network Systems DesignLine newsletter, a weekly e-mail guide dedicated to the needs of engineers developing networking equipment and components. Here is our RSS feed.








 
 HOW-TO

Start your crypto engine--cryptographic acceleration in SoCs

Cryptographic offload engines span such markets as DRM, VPN, Storage and MACsec. Implementing configurable engines enables you to meet performance requirements while preserving gate count economics required by end market cost goals of SoC designers. Here's how.
Print This Story Send As Email Discuss This Story Reprints

Page 1 of 2

Network Systems Designline

Rate this article
WORSE | BETTER
1 2 3 4 5
Encryption requirements are now found in almost every new SoC design. From digital rights management, through storage security and virtual private network (VPN) applications, security is becoming a mandatory feature. The throughput requirement in modern networks is also rising significantly and as such the processing required for encryption and decryption is substantial. This article focuses on symmetric offload in a packet processing system for IPsec but the concepts apply equally well to SSL, SRTP and link security. The assumption is that the keys have already been derived through an administrative process or key exchange through asymmetric cryptography in software and therefore the SoC designer is focusing on the bulk encryption and hashing of packets in a virtual private networking (VPN) enabled gateway design.

A typical SoC architecture for such a gateway is shown in Figure 1. This approach to cryptographic offload is sometimes referred to as look-aside security offload in contrast to a flow through engine which captures VPN traffic directly from the MAC and processes it in line without significant processor interaction. The crypto engines presented in this paper are optimal in gate count and throughput for applications in gateways, security appliances and handheld devices. The engines scale well in throughput from 1 Mbps up to 1 Gbps. Flow-through engines are best suited for ultra-high performance applications at 1 to 10 Gbps rates in high end security appliances.


Figure 1. Gateway SoC Architecture

To better understand the operation of the SoC system design, it worth reviewing the routing and transformation that a packet is subjected to. A packet arriving at the gateway through an IPsec VPN tunnel active on the WAN port undergoes the following transformations:

  • The inbound WAN MAC DMAs the packet into embedded memory
  • NAT software determines that the packet requires an IPsec transform
  • Software matches the packet to a security policy and security association
  • The packet transformation and decryption is done in software and/or hardware.
  • Software routes the packet to the L2 Switch
  • The L2 switch forwards packet to the destination LAN port

The designer therefore must decide how to implement the packet transformation and cryptographic operations. Will the embedded CPU and associated software be able to handle the load? Or will some form of offload engine be required to reach the overall system performance goal? This paper explores the offload options available to designers and provides guidance on the class of engine employed for each level of throughput required. The terms cryptographic, crypto, cipher are used interchangeably in the following text. Generally these terms may be applied to hashing operations as well.

Processor-based security
The first option is to perform all security processing on the existing embedded processor. Extensive analysis of the cryptographic load on common embedded processors such as ARM and MIPS has been performed and a load factor of 30 MIPS per Mbps for the three-pass version of the Data Encryption Standard commonly referred to as triple-DES or 3DES and 10 MIPS per Mbps for the Advanced Encryption Standard (AES) is derived. 3DES remains relevant to current cryptographic design as it is a popular cipher option for IPsec used in virtual private networking.

AES is the preferred cipher recommended by the U.S. National Institute of Science and Technology (NIST) and is slowly replacing 3DES but the migration is gradual with many legacy devices still active in networks. If an IPsec virtual private network design were to target 10 Mbps for example and the traffic exhibited a mix of 50% AES and 50% 3DES, the overall load on the processor would be 20 MIPS--a reasonable load for today's high speed embedded processors but a heavy burden for the low cost, low performance processors found in many handset devices.

Consider what happens if the traffic is increased to 100 Mbps The processor load jumps to 200 MIPS which means that a significant percentage of the capability of the embedded processor is used for symmetric cryptography leaving little capacity for other activities required for the gateway. Options often considered by designers is to use a larger processor, increase the clock rate or even add an additional processor, but these solutions are rarely optimal from a perspective of cost--e.g. incremental license fees, process migration expense, and larger gate counts.

Today's SoC designs can leverage one of five offload options which are explored in this paper:

  • Implement Processor Instruction Set Enhancements
  • Integrate Slave Cryptographic IP Cores
  • Integrate Cryptographic Cores with Linear Mastering Capability
  • Integrate Cryptographic Cores with Scatter/Gather Mastering Capability
  • Integrate Packet Transformation Engines

Option 1: Enhanced instruction set
Some embedded processor architectures such as the Tensilica Xtensa and ARC processor cores allow the designer to add custom instructions. Cryptographic functions typically have many fine grained bit manipulation operations and as a result these operations take a large number of instructions to perform in software. Adding enhanced instructions can help by decreasing the total number of instructions required to perform a cryptographic operation. Some examples include:

  • Barrel shift/rotate instructions.
  • Wide register manipulation. e.g. AES works on 128 bit words.
  • Specific bit manipulation instructions: e.g. an instruction to perform the AES/DES S-box manipulation on a wide register.
  • Galois Field arithmetic

This approach has a significant drawback in that the enhanced instruction set must be supported in the tool chain. If this issue can be overcome, the designer can expect an improvement of up to 50% in throughput which may meet the system performance requirement. If not, the addition of a separate cryptographic engine as an IP core will be the right path to follow.

Option 2: Pure slave cryptographic IP cores
The next level of offload is the addition of a pure slave crypto module. This brings cryptographic acceleration with a minimal gate footprint as the crypto module consists of only the circuitry required to implement the datapath algorithm. Software must sequence each block of data through the cipher engine which is simply memory mapped as a set of control and data registers. This greatly accelerates the cryptographic algorithm, but will require substantial host processor involvement to feed the cipher datapath (See Figure 2).


Figure 2. Slave cryptographic IP cores

This offload design greatly accelerates the cryptographic operation; but there is still substantial processor involvement to feed the cipher core. Designers therefore generally apply this class of offload engine to IPsec requirements up to 10 Mbps in applications such as residential gateway and VDSL modems.

Page 2: next page Print This Story Send As Email Discuss This Story Reprints

Page 1 | 2


 
eSearch  

 Top 5 Most Read
 How-To Stories
1. 2. 3. 4. 5.

 Top 5 Most Read
 News Stories
1. 2.

  • Introduction to Optical Transmission Systems

  • Optimizing Embedded Systems for Broadband 10 Gigabit Ethernet Connectivity

  • Interfacing a DS3231 with an 8051-Type Microcontroller

  • The entire library >>  

     
     Top 5 Most Read
     Product Stories
    1. 2. 3.

     Sponsor

    EE Times TechCareers
    Search Jobs

    Enter Keyword(s):


    Function:


    State:
      

    Post Your Resume
    -----------------
    Employers Area
    Most Recent Posts
    GE Corporation seeking Lead Systems Analyst in Van Buren Township, MI

    Osram Sylvania seeking Sr Applications Engineer in Danvers, MA

    Accolo, Inc. seeking User Experience Engineer in Reston, VA

    Johnson Controls, Inc seeking Project Development Engineer in Pittsburg, PA

    WhiteHat Security seeking User Interface Engineer in Santa Clara, CA

    More career-related news, resources and job postings for technology professionals


     Tech Library
    ¤ Looking for the appropriate Industry Association? This comprehensive, up-to-date list will take you to the right Web site for the help you need.

    ¤ Got a question about a standard? Here are direct links to resources detailing the industry's most important communications standards.

    ¤ Freshen up on technology, new and old, with these links to interesting and informative tutorials.

    More from TechLibrary

    Welcome to our DesignLine network of web communities. On these sites, we provide practical how-to technical information for engineers and engineering managers involved in Automotive,audio, DSP, DTV, EDA, Industrial Control, Mobile Handset, Power Management, Programmable Logic,RF,Video, and Wireless networking design. Check out the sites and let us know your thoughts.
     



    Career Center | CommsDesign.com | Embedded.com | EE Times | TechOnline
    Planet Analog | DeepChip | eeProductCenter | Electronic Supply & Manufacturing | Webinars