Aldec-Aldec, Inc., a leader in mixed RTL simulation and verification, announces support for the VHDL IEEE 1076™-2008 standard. 

Alvand
- Faraday Selects Alvand Technologies AFE (Analog Front End) IP
 to Support Next Generation Wireless Applications.

 
Calypto® Design Systems Inc., the leader in sequential analysis technology, today announcedthat the Semiconductor Technology Academic Research Center (STARC) has adopted Calypto’s PowerPro MG product for their STARCAD-CEL Version 4.0 design flow

 

 

07
Sep
Search
Forgot Password ?

 

Products » IP » AST Proprietary IP's » CDSP Core » CDSP-k5

General Description of AST's CDSP-k5 Customizable Complex DSP Core
(preliminary information
)

The CDSP-k5 is the fifth member in a family of high performance customizable fixed-point DSP cores. It is meant to be used as an embedded cell in ASICs developed on most of the 0.25u and below technologies. It is highly customizable and can be targeted at a large number of technologies thanks to its parameterized, HDL-only based design.

The CDSP-k5 processes two words in parallel (up to 32 bits each), interpreted as either a complex number, or as a pair of real numbers. The CDSP-k5 hosts a 4-unit complex multiplier allowing speeds of one ComplexMAC/Clock cycle, or two parallel RealMACs/Clock cycle; in both cases saturation logic can be enabled to assist these operations.

The CDSP-k5 includes a high performance, customizable hardware acceleration unit that has been optimized for most of the common DSP algorithms (MAC/FIR/Correlation, LMS, IIR, FFT/iFFT, Search/Min/Max, Matrix/Vector operations). The modular design of the core allows stripped-down versions to be easily obtained, while a number of list-box/check-box customizable features enable on-the-fly tuning of the design to match the user's specifications. The user-guided customization process can thus achieve a highly efficient, low power and small area implementation, making the CDSP-k5 well suited for high-volume, low-cost applications, while still delivering world-class performance.

Some of the CDSP-k5 general registers can be used to interface application-specific hardware accelerators; this offers the user a convenient and effective way to tightly interact with the internal CDSP structure. Also, both the ALU and the MAC can be completely replaced and/or complemented with user-defined hardware structures.

A number of productivity tools have been developed to ease the elaboration/deploying of DSP applications on the CDSP. These include an Assembly Language Integrated Development Environment (aIDE) and a collection of standard DSP functions (CDSPLib). A K&R C Language Integrated Development Environment (cIDE) is currently under development
 
 

Architectural features:

  • Single-cycle execution for most instructions.
  • Operates directly on complex numbers, or on pairs of real numbers
  • Highly orthogonal, two-operand instruction set, with one operand residing in a register, and the other in a register or memory location
  • Unified data memory addressing replaces the traditional DSPs' X and Y data memories
  • Configurable MAC unit optimized for most of the common DSP algorithms, enabling execution speeds comparable with the cutting edge parallel DSP processors on the market.
  • Saturation logic built in both the ALU and the MAC units
  • Up to four index registers fully featured with modulo and bit-reversed post-increment addressing capability
  • Zero-cycle Block-repeat capability plus a standard looping instruction
  • Dynamic shift instruction, plus a choice of static shifts (both arithmetic and logic).
  • Compact code and large addressing space
  • Low power dissipation achieved by blocking the logic modules that are inactive in every clock cycle
  • Less than one cycle response when in wait mode allowing fast synchronization with predictable asynchronous events
  • Six internal 64-bit data busses enabling up to six internal complex-data transfers per cycle, or twelve internal real-data transfers per cycle
  • Special bank-based memory architecture enabling efficient usage of data types that are smaller than a processor word
  • Synchronous program memory implementable as a RAM/ROM combination, enabling the DSP with run-time programmability
  • Interface registers to allow application-specific hardware acceleration modules to be tightly integrated with the core

 


Customizable features include:

  • The size of the processor word (up to 32 bits)
  • The RAM and ROM sizes
  • The number of general registers
  • The number of index registers and the features of the address generators, including modulo and bit-reversed addressing modes
  • The performance of the MAC unit, ranging from a simple, one result-bit per cycle multiplier, up to state-of-the-art, fully pipelined, single-cycle complex hardware accelerator
  • The saturation and rounding options built in the ALU and the MAC
  • The amount of shifting for the static shift instructions
  • The addressing space (up to 2 GW)
  • The number, size and operation mode of the communication ports
  • And more...

 


Performance for a typical 0.25u/3V technology implementation:

  • The CDSP-k5 is implemented in two versions: a 4-stage pipeline version CDSP-k5-4, and a 6-stage pipeline version CDSP-k5-6. The 4-stage version consolidates (chains) memory accesses with internal DSP processing in the same clock cycle, while the 6-stage version pipelines the memory accesses and internal DSP processing; this leads to a double clock speed for the 6-stage version as compared to the 4-stage version. The critical path inside the CDSP-k5 is less than 5ns, leading to 200MHz operation for the 6-stage version (CDSP-k5-6), and 100MHz operation for the 4-stage version (CDSP-k5-4).
  • The CDSP-k5 operates at a sustained rate of 100MIPS at 100MHz, or 200MIPS at 200MHz. A very high performance is achieved for typical DSP algorithms by having up to eight internal arithmetic units, plus two address generator units, working in parallel every clock cycle. The CDSP's ALU and MAC units have been desigend with special emphasys on efficient usage of the hardware resources during typical DSP algorithms, leading to effective 2GOPS (Giga Operations Per Second) speeds. Examples of algorithms that fully utilize this computing power are Complex FIR, Complex Correlation, Complex Matrix Multiplication, Complex Energy calculation. For other algorithms such as Real FIR, Real Correlation, Real Energy calculation, FFT, iFFT, LMS-based Complex FIR update and Echo Calcellation, speeds between1GOPS and 1.5GOPS are obtained.

CDSP k2 | CDSP-k5 | Software Tools