# **Design and Implementation of Word-Parallel Digital Associative Memories**

Yusuke Oike<sup>†</sup>, Makoto Ikeda<sup>†‡</sup>, and Kunihiro Asada<sup>†‡</sup>

<sup>†</sup>Dept. of Electronic Engineering, University of Tokyo <sup>‡</sup>VLSI Design and Education Center (VDEC), University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan {y-oike,ikeda,asada}@silicon.u-tokyo.ac.jp

Abstract— We present word-parallel digital associative memories with exact Hamming/Manhattan distance computation. A logic-in-memory digital implementation achieves the word-parallel and hierarchical search architecture. It attains a high-speed operation with a large input number, and detects the data close to the input with a fewer number of clocks. The circuit implementation allows unlimited data capacity and achieves a low-voltage operation under 1.0 V for system-on-a-chip applications. The capacity scalability makes it easy to compute a function of Manhattan distance evaluation using thermometer encoding. We have designed 64-bit 32-word associative memories using a 1PSM 0.18  $\mu$ m CMOS process. It achieves 411.5 MHz and 40.0 MHz operations at a supply voltage of 1.8 V and 0.75 V, respectively.

## I. INTRODUCTION

Some applications, such as data compression, pattern recognition, multi-media and intelligent processing, require considerable memory access and processing time. Therefore, context addressable memories have been developed to reduce the access and data processing time for nearest-match detection [1]–[5]. Their circuit implementations are compact since they employ analog circuit techniques for Hamming/Manhattan distance estimation. However, there are difficulties in operating them with faultless precision in a deep sub-micron process (DSM) and at a low-voltage supply. Moreover, the feasible data capacity is limited by the analog operation. Therefore, they are not suitable for a system-on-a-chip VLSI in DSM process technologies.

We have proposed a hierarchical search architecture capable of word-parallel Hamming/Manhattan distance computation [6]. It has three principal advantages: (1) The first advantage is that the hierarchical search architecture enables a high-speed search in a large database. The search clock period is limited by  $O(\sqrt{N})$ or  $O(\log M)$  at an *N*-bit *M*-word data capacity. In addition, theoretically there are no limitations on the data patterns *M*, the bit length *N*, and the data distance *D*. (2) The second advantage is a low-voltage operation in a DSM process. The circuit implementation has a tolerance for device fluctuation and allows a lowvoltage operation of less than 1.0 V, which is difficult to attain using the conventional analog approaches. (3) The third advantage is that it provides additional functions for associative processing. The present architecture provides data addresses with the exact Hamming/Manhattan distance in the sorted order.

#### II. HIERARCHICAL ARCHITECTURE AND CIRCUIT DESIGN

We propose a logic-in-memory architecture using wordparallel search signal propagation via chained search circuits. The Hamming distance search operation includes data comparison, search signal propagation, and mismatch masking. First, an input is compared with all the template data using an XOR gate in bit parallel. Then, the mismatch bits are counted with the chained search circuits in word parallel as shown in Fig.1. The template data are divided into blocks and connected by hierarchical nodes as shown in Fig.2 since the search clock period is limited by the search signal propagation via the chained search circuits. The hierarchical node provides a permission signal to the next block and the next hierarchical node. The permission signal makes a mismatch bit maskable.



Fig. 1. Circuit configurations of an associative memory cell: (a) using a static circuit implementation, (b) using a dynamic circuit implementation. (SRAM part of even-numbered cell is omitted)



Fig. 2. Block diagram: (a) associative memories, (b) a hierarchical word structure.

## **III. CHIP IMPLEMENTATION**

We have designed 64-bit 32-word associative memories with a static circuit implementation shown in Fig.1 (a) using a 1P5M 0.18  $\mu$ m CMOS process<sup>1</sup>. Fig.3 shows the chip microphoto-

<sup>&</sup>lt;sup>1</sup>The VLSI chip in this study has been fabricated through VLSI Design and Education Center (VDEC), University of Tokyo in collaboration with Hitachi Ltd. and Dai Nippon Printing Co.





Fig. 4. Measured waveforms of the search signal propagation.

graph and the cell layouts. It also contains 64-bit 2-word associative memories with a dynamic circuit implementation shown in Fig.1 (b) for the feasibility test. A two-stage hierarchical structure is implemented as shown in Fig.2 (b). The number of blocks and each bit length are optimally designed to minimize the critical path since the number of hierarchical nodes on each propagation path is different.

#### **IV. MEASUREMENT RESULTS**

Fig.4 shows measured waveforms using an electron beam probe at room temperature. The delay time of search signal propagation is 2.18 ns in the worst case. The associative memories are capable of Manhattan distance computation using a thermometer encoding technique [5] in addition to Hamming distance computation. Fig.5 shows function test results of Manhattan distance computation. The operated clock cycles represent the distance of the detected data. Therefore, all the data are detected in the sorted order of the Hamming/Manhattan distance. Furthermore, the associative memories provide the detected data addresses with the strictly exact Hamming/Manhattan distance regardless of the bit length, the number of words, and the data distance. This feature is important to ensure high capacity scalability and high reliability in distance computation.

The measurement results show that the operation speed is 411.5 MHz and 40.0 MHz at a supply voltage of 1.8 V and 0.75 V, respectively. Fig.6 (a) shows the operation speed as a function of a supply voltage from 0.75 V to 1.8 V. The total search time increases in proportion to the distance of detected data. For example, the nearest-match detection is completed in 17 clock periods (i.e. 41.3 ns) when the nearest-match data has a 16-bit distance from an input. The worst-case operation of nearest-match detection or data sorting requires 65 clock periods, thus, it takes 158.0 ns. Fig.6 (b) shows the relation between the search clock period and the data capacity. The hierarchical search architecture maintains a high-speed search operation in a large database. Table I summarizes the chip specifications.



Fig. 6. Performance evaluation: (a) operation frequency vs. power supply voltage, (b) data capacity vs. search clock period.

| TABLE I |  |
|---------|--|
|---------|--|

| Chip specifications.     |                                                              |  |
|--------------------------|--------------------------------------------------------------|--|
| Process                  | 1P5M 0.18 μm CMOS process                                    |  |
| Power Voltage Supply     | 0.75 V – 1.8 V                                               |  |
| Capacity                 | 64 bit $\times$ 32 word                                      |  |
| Module Size              | $475 \mu\text{m} \times 1160 \mu\text{m} (0.55 \text{mm}^2)$ |  |
| Measured Operation Speed | 411.5 MHz (@ 1.8V)                                           |  |
|                          | 40.0 MHz (@ 0.75V)                                           |  |
| Max. Search/Sorting Time | 158.0 ns (0-bit to 64-bit HD)                                |  |
| Power Dissipation        | 51.3 mW (@ 1.8V, 400MHz)                                     |  |
|                          | 1.18 mW (@ 0.75V, 40MHz)                                     |  |

## V. CONCLUSIONS

We have proposed a new concept and circuit implementation of high-speed and low-voltage associative memories with exact Hamming/Manhattan distance computation. A hierarchical search architecture attains a high-speed search operation in a large database. The digital circuit implementation enables a high tolerance for device fluctuation and a low-voltage operation under 1.0 V. Furthermore, it is capable of a continuous search operation for data sorting in addition to the traditional nearestmatch detection. We have designed 64-bit 32-word associative memories using a 0.18  $\mu$ m CMOS process. It achieves 411.5 MHz and 40.0 MHz operations at a supply voltage of 1.8 V and 0.75 V, respectively.

#### References

- T. Yamashita et al., "Neuron MOS Winner-Take-All Circuit and Its Application to Associative Memory," *ISSCC Dig. Tech. Papers*, pp. 236 – 237, 1993.
- [2] M. Nagata et al., "A Minimum-Distance Search Circuit using Dual-Line PWM Signal Processing and Charge-Packet Counting Techniques," *ISSCC Dig. Tech. Papers*, pp. 42 – 43, 1997.
- [3] M. Ikeda et al., "Time-Domain Minimum-Distance Detector and Its Application to Low-Power Coding Scheme on Chip-Interface," *Proc. of Eur. Solid-State Circuit Conf. (ESSCIRC)*, pp. 464 467, 1998.
  [4] H. J. Mattausch et al., "An Architecture for Compact Associative Mem-
- [4] H. J. Mattausch et al., "An Architecture for Compact Associative Memories with Deca-ns Nearest-Match Capability up to Large Distances," *ISSCC Dig. Tech. Papers*, pp. 170 – 171, 2001.
- [5] H. J. Mattausch et al., "Fully-Parallel Pattern-Matching Engine with Dynamic Adaptability to Hamming or Manhattan Distance," *Symp. on VLSI Circuits Dig. Tech. Papers*, pp. 252 – 255, 2002.
- [6] Y. Oike et al., "A High-Speed and Low-Voltage Associative Co-Processor With Hamming Distance Ordering Using Word-Parallel and Hierarchical Search Architecture," *Proc. of IEEE Custom Integrated Circuits Conf.* (CICC), pp. 643 – 646, 2003.