Smart Image Sensors and Associative Engines
for Three Dimensional Image Capture

A Dissertation
Submitted to the Department of
Electronic Engineering,
the University of Tokyo
in Partial Fulfillment of the Requirements
for the Degree of Doctor of Philosophy

Supervisor: Professor Kunihiro Asada

Yusuke Oike

DEPARTMENT OF ELECTRONIC ENGINEERING,
THE UNIVERSITY OF TOKYO

December 2004
Abstract

This thesis focuses on smart image sensors and associative engines for three-dimensional image capture. We address current issues in a high-speed and high-resolution 3-D image capture system, and propose new frame access techniques, sensing schemes, sensor architectures, and circuit designs. We also propose new associative engines with high capacity scalability for 3-D image processing.

Chapter 2 proposes a high-speed dynamic frame access technique for a real-time and high-resolution 3-D image sensor. It makes a compact pixel circuit available and achieves a high-speed position detection on the sensor plane. A $640 \times 480$ 3-D image sensor has been designed and successfully demonstrated in a real-time and high-resolution range finding system. It attains 65.1 range maps/s and 0.87 mm range accuracy at a distance of 1200 mm. A scale-up version with $1024 \times 768$ pixels has been also developed. Furthermore, we have proposed a column-parallel ambient light suppression technique that is applicable to the dynamic frame access technique. A $352 \times 288$ 3-D image sensor efficiently reduces a high-contrast ambient light, device fluctuations, and select timing variations. These techniques realize a real-time 3-D image capture system with a high pixel resolution using a low-intensity beam projection.

Chapter 3 presents a row-parallel frame access architecture for a 1,000-fps range finder. It employs a chained search circuit embedded in a pixel. A $375 \times 365$ ultra fast range finder attains 394.5 kHz frame access rate, which is capable of 1052 range maps/s. The present techniques will open the way to the future applications which require extremely high-speed and high-accuracy 3-D image capture.

Chapter 4 proposes a new sensing scheme of low-intensity beam detection for a robust range finding system. It realizes high sensitivity, high selectivity, and availability in wide-range background illumination. A $120 \times 110$ position sensor achieves a high-sensitive light detection of -18.0 dB signal-to-background ratio in 48.0 dB background illumination. It has advantages to the application fields which require a safe light projection for human eyes in various measurement conditions.

Chapter 5 presents a pixel-level color image sensor with efficient ambient light suppression. A $64 \times 64$ prototype image sensor realizes a support capability of innate color capture and
object extraction for image recognition in various measurement situations. Furthermore, we have presented a low-intensity beacon detector for augmented reality systems. A $128 \times 128$ prototype beacon detector achieves a high-speed beacon detection of $4850 \text{ bit/ID-sec}$ with -10.0 dB signal-to-background ratio. It enables to get a scene image, locations, IDs and additional information of multiple target objects simultaneously in real time. These features realize a robust augmented reality system in various scene conditions.

Chapter 6 proposes a new concept and circuit implementation for a high-speed and low-voltage associative engine with exact Hamming distance search. It achieves no limitation of data capacity and keeps a high speed operation in a large database due to a hierarchical search architecture and a synchronous search logic embedded in a memory cell. The circuit implementation realizes high tolerance for device fluctuations in DSM process technologies and a low-voltage operation under 1.0V. A 64-bit 32-word associative engine achieves an operation speed of 411.5 MHz at 1.8 V, and also attains a low-voltage operation of 40 MHz at 0.75 V.

Chapter 7 shows a hierarchical multi-chip architecture using fully digital and word-parallel associative memories based on Hamming distance. The multi-chip structure efficiently realizes high capacity scalability by using an inter-chip pipelined priority decision circuit. The performance evaluation shows that the hierarchical multi-chip architecture is capable of a high-speed and continuous associative processing based on Hamming distance with a megabit data capacity.

Chapter 8 describes a new word-parallel architecture and digital circuit implementation for accurate and wide-range Manhattan distance computation, which employs a hierarchical search path and a weighted search clock technique. The weighted search clock technique performs wide-range associative processing with fewer additional cycles. An associative engine, with 64 words of 8 bit $\times$ 32 element, has successfully performed the Manhattan distance computation. The worst-case search time of a sorting of all the stored data is 5.85 us at a supply voltage of 1.8 V.

Chapter 9 discusses an associative processing for 3-D image capture. We address a 3-D object-clipping algorithm, and present an associative processing flow using a chain search algorithm. We have demonstrated the feasibility of the associative processing for 3-D object clipping.

The frame access techniques and sensing schemes efficiently realize a high-speed, high-resolution and robust 3-D image capture system. And then, the digital associative processing
architectures attain a high-speed data search and a high capacity scalability. Therefore, the proposed smart image sensors and associative engines will make significant contributions to advancement of 3-D image capture systems and become a driving force of future applications with high-quality 3-D images.
Acknowledgements

I would like to express my heartfelt gratitude to Prof. Kunihiro Asada for his keen insight, guidance, encouragement, and faith in me throughout my graduate studies. His enthusiasm for teaching and research offered challenging opportunities to express my creativity without barriers, and his constant support and fruitful discussion on my research, since my undergraduate years, led me to become a full-fledged member of society and brought my research to success. I feel very fortunate to have taken him as my supervisor, and the precious experiences in my study days will be irreplaceable assets in my life.

I am deeply grateful to Prof. Makoto Ikeda for meaningful discussion on my research and for making many opportunities for my chip fabrication. He was willing to spend time providing a comfortable environment to promote my research progress and a relaxed atmosphere to exchange opinions. His constructive support was indispensable for making my research activities successful.

I wish to give my thanks to Mr. Hiroaki Yoshida, who is a research colleague in Asada-Ikeda laboratory since my undergraduate years, for his frank discussion and unique ideas. His aggressive attitude toward research gave me fresh incentive and encouragement to enhance my ability.

I would like to acknowledge Mr. Toru Nakura, who is a research colleague in Asada-Ikeda laboratory, for his extensive knowledge and professional experience. His thoughtful advice and suggestions enlightened me various possibilities of professional career development.

I would like to extend hearty thanks to Dr. Tomohiro Nezuka, who is currently in Thine Electronics, Inc., for his technical advice and fruitful discussion on design of image sensors. I worked hard on trying to follow his technical knowledge, design techniques, and enthusiasm for research and development. The inherited knowledge, experience, and enthusiasm were fundamental for the success of my research and will be invaluable assets in my professional career.

I am grateful to Dr. Hiroaki Yamaoka, who is currently in Toshiba Corp., for his relaxed talk on both private and professional topics. His generous personality provided a friendly atmosphere and a pleasant time in the laboratory.
I would like to give my thanks to Dr. Tohru Ishihara, who is currently in Fujitsu Laboratories of America, Inc., for his assistance for chip design environment, reliable advice in chip design, and contribution as a network administrator of the laboratory.

I am thankful to all the colleagues in Asada-Ikeda laboratory for their helpful advice, heartfelt encouragement, comfortable research circumstance and pleasant time: in particular, Mr. Ruotong Zheng, for his generous assistance for establishing chip design environment; Mr. Tetsuya Iizuka, for his contribution as a network administrator of the laboratory; Ms. Noriko Yokochi and Ms. Naomi Yoshida, for their helpful assistance for my research activities in the laboratory.

I am also grateful to all the past colleagues in Asada-Ikeda laboratory for their invaluable advice and suggestions: in particular, Dr. Takahiro Yamashita, who is currently in Semiconductor Technology Academic Research Center (STARC), for his professional expertise on circuit design; Dr. Satoshi Komatsu, who is currently in VLSI Design and Education Center (VDEC), the University of Tokyo, for his practical experience on chip test and analysis; Dr. Yoshinori Murakami, who is currently in Nissan Motor Co., Ltd., for his penetrating comments from industrial perspective.

I would like to acknowledge my dissertation committee for their extremely valuable suggestions and comments: Prof. Tadashi Shibata, for his expertise and willingness to make unique and constructive suggestions on my research topic; Prof. Kiyoharu Aizawa, for his inspiring suggestions to expand my ideas to new application fields; Prof. Hideki Imai, for his precious comments for finding value in my research from his professional perspective; and Prof. Minoru Fujishima, for his empirical knowledge and comments on circuit design.

I would like to express my appreciation to Prof. Jun Ohta, Prof. Shoji Kawahito, Prof. Takayuki Hamamoto, Prof. Takayasu Sakurai, Prof. Tadahiro Kuroda, Prof. Tetsushi Koide, Prof. Kazutoshi Kobayashi, Prof. Makoto Nagata, and Dr. Kenichi Okada, for their precious suggestions on my research, for giving opportunities of technical discussion, and for their considerate support for the success of my technical presentations.

I am grateful to the Takeda Foundation for financial support of the Takeda Scholarship Award. I could dedicate myself to research for the three years owing to the full scholarship. I would like to acknowledge Prof. Yasuo Tarui and all the members of the foundation for their exciting discussion on various technical fields.

I would like to thank all the members of VLSI Design and Education Center (VDEC), the University of Tokyo, for their support in chip fabrication. The VLSI chips in this study
have been designed with CAD tools of Synopsys, Inc. and Cadence Design Systems, Inc., and fabricated through the chip fabrication program of VDEC, in collaboration with Rohm Corp., Hitachi Ltd., Semiconductor Technology Academic Research Center (STARC), Toppan Printing Corp., and Dai Nippon Printing Corp.

Finally, I would like to express my greatest appreciation to my parents, Hirokazu and Setsuko, and my elder brother, Shunsuke, for their constant support and encouragement in my life, and I also wish to express my genuine gratitude to my fiancee, Yukari, for her tender love and mental sustenance.
# Contents

Abstract i

Acknowledgements iv

List of Figures xx

List of Tables xxi

## Chapter 1 Introduction

1.1 Background ......................................... 1

1.2 Key Components of 3-D Image Capture ........................ 5

1.2.1 Smart Image Sensors ................................ 6

1.2.2 Associative Engines ................................ 8

1.3 Research Objectives and Thesis Organization .................. 9

## Chapter 2 Real-Time and High-Resolution 3-D Image Sensors

2.1 Introduction ......................................... 12

2.2 Concept of High-Speed Dynamic Access ........................ 13

2.3 Circuit Configurations ................................ 14

2.3.1 Sensing Procedure ................................ 14

2.3.2 Pixel Circuit .................................... 16

2.3.3 Adaptive Threshold Circuit ............................ 17

2.3.4 Time-Domain Analog-to-Digital Converters ................. 17

2.3.5 Binary-Tree Priority Address Encoder ........................ 19

2.3.6 Intensity-Profile Readout Circuit ........................ 21

2.4 Design of 640 × 480 Real-Time 3-D Image Sensor ............. 22

2.4.1 Sensor Configuration ................................ 22

2.4.2 Chip Implementation ................................ 23

2.5 Development of Real-Time 3-D Image Capture System .......... 24

2.5.1 Overall System Configuration .......................... 24
3.5.1 Chip Implementation ........................................... 69
3.5.2 Limiting Factors of Frame Rate ............................... 71
3.5.3 Access Rate and Pixel Resolution ............................. 72
3.5.4 Fast Range Detection with Stereo Range Finders .......... 73
3.5.5 Measurement Results ........................................ 76
3.6 Design of 375 × 365 Ultra Fast Range Finder ................. 79
  3.6.1 Sensor Configuration ........................................ 79
  3.6.2 Chip Implementation ....................................... 79
3.7 Measurement Results .............................................. 80
  3.7.1 Frame Access Rate ........................................ 80
  3.7.2 Range Accuracy ........................................... 85
  3.7.3 Ultra Fast Range Finding ................................. 86
3.8 Summary ....................................................... 88

Chapter 4  High-Sensitive Demodulation Sensors for Robust Beam Detection 89
4.1 Introduction .................................................... 89
4.2 Sensing Scheme and Circuit Realization ....................... 90
  4.2.1 Demodulation Sensing Scheme .............................. 90
  4.2.2 Pixel Circuit Realization .................................... 91
4.3 Sensor Configurations ............................................ 93
4.4 Chip Implementation ........................................... 95
4.5 Measurement Results ........................................... 97
  4.5.1 Measurement Setup and Preliminary Tests .................. 97
  4.5.2 Sensitivity and Dynamic Range ............................. 98
  4.5.3 Selectivity .................................................. 101
  4.5.4 Frame Rate ................................................ 101
  4.5.5 Range Finding Results ..................................... 102
4.6 Summary ....................................................... 104

Chapter 5  Extension of Demodulation Sensing 106
5.1 Introduction .................................................... 106
5.2 Concept of Color Demodulation Imaging ....................... 107
  5.2.1 Target Applications ...................................... 107
  5.2.2 System Configuration ..................................... 108
5.2.3 Sensing Scheme with Ambient Light Suppression .............. 109
5.3 Circuit Configurations of Color Demodulation ....................... 111
  5.3.1 Pixel-Level Color Demodulation ................................ 111
  5.3.2 Pixel Circuit ................................................. 112
  5.3.3 Asymmetry Offset of Bidirectional Integration .................. 113
  5.3.4 Simulation of Pixel-Level Demodulation ......................... 116
5.4 Design of 64 × 64 Color Demodulation Imager ....................... 117
5.5 Measurement Results of Color Demodulation Imager .................. 119
  5.5.1 Efficient Ambient Light Suppression ............................. 119
  5.5.2 Pixel-Level Color Imaging ..................................... 122
  5.5.3 Application to Time-of-Flight Range Finding .................... 123
5.6 ID Beacon Detector for Augmented Reality System ................... 124
5.7 Circuit Configurations of ID Beacon Detector ....................... 126
  5.7.1 Pixel Circuit and Operation .................................... 126
  5.7.2 Analog and Digital Readout Circuits ............................ 128
5.8 Design of 128 × 128 ID Beacon Detector ............................. 129
  5.8.1 Sensor Configuration ........................................... 129
  5.8.2 Chip Implementation ........................................... 130
5.9 System Setup for Augmented Reality ................................. 131
  5.9.1 System Configuration .......................................... 131
  5.9.2 Beacon Protocol .............................................. 132
5.10 Measurement Results of ID Beacon Detector ......................... 134
  5.10.1 Frame Rate with ID-Beacon Detection ......................... 134
  5.10.2 Sensitivity and Dynamic Range ................................ 135
  5.10.3 Performance Comparison ...................................... 135
5.11 Summary ............................................................. 136

Chapter 6 Digital Associative Engine for Hamming Distance Search 137
6.1 Introduction ......................................................... 137
6.2 Concept of Digital Hamming Distance Search ......................... 138
  6.2.1 Basic Search Operation ......................................... 138
  6.2.2 Word-Parallel and Hierarchical Search Structure ................ 139
  6.2.3 Manhattan-Distance Evaluation Using Thermometer Encoding .... 141
6.3 Circuit Configuration ................................................. 142
6.3.1 Logic-in-Memory Search Circuit ................................................. 142
6.3.2 Priority Address Encoder ...................................................... 143
6.4 Chip Implementation .............................................................. 145
6.5 Measurement Results and Discussions ......................................... 146
  6.5.1 Function Tests ................................................................. 146
  6.5.2 Area and Capacity ............................................................ 147
  6.5.3 Operation Speed .............................................................. 149
  6.5.4 Power Dissipation ............................................................ 151
6.6 Summary .................................................................................. 151

Chapter 7 Scalable Multi-Chip Architecture Using Digital Associative Engines 153
  7.1 Introduction ............................................................................. 153
  7.2 Concept of Scalable Multi-Chip Architecture .............................. 154
    7.2.1 Performance Characteristics of Digital Associative Engine ....... 154
    7.2.2 Multi-Chip Structures ....................................................... 155
  7.3 Circuit Realization and Operation .............................................. 158
    7.3.1 Hierarchical Inter-Chip Connections .................................. 158
    7.3.2 Extended Associative Memory Configuration ....................... 159
    7.3.3 Pipelined Priority Decision Circuit .................................... 160
  7.4 Module Generator for Various Capacities .................................. 162
  7.5 Performance Evaluation .......................................................... 165
    7.5.1 Area and Capacity ............................................................ 165
    7.5.2 Search Cycle Time and Inter-Chip Bit Rate ......................... 165
    7.5.3 Hamming-Distance Search Time ........................................ 166
  7.6 Summary .................................................................................. 168

Chapter 8 Digital Associative Engine with Wide Search Range Based on Manhattan Distance 169
  8.1 Introduction ............................................................................. 169
  8.2 Manhattan Distance Search Algorithm and Circuit Realization .... 170
    8.2.1 Element Circuit Structure ................................................ 170
    8.2.2 Absolute Flag Generation ................................................ 172
    8.2.3 Distance Counting Operation ............................................ 172
    8.2.4 Weighted Search Clock Technique .................................... 174
8.2.5 Nearest Match Detection in Candidates ......................................... 175
8.3 Chip Implementation ........................................................................... 176
8.4 Measurement Results and Discussions .................................................. 177
  8.4.1 Operation Speed and Power Dissipation ........................................... 177
  8.4.2 Search Range .................................................................................. 179
  8.4.3 Area and Capacity .......................................................................... 179
8.5 Summary .............................................................................................. 180

**Chapter 9  Associative Processing for 3-D Image Capture** 182
  9.1 Introduction ......................................................................................... 182
  9.2 Associative Processing for 3-D Object Clipping ...................................... 183
  9.3 Circuit Configurations ......................................................................... 186
  9.4 Performance Evaluation ...................................................................... 187
  9.5 Summary .............................................................................................. 188

**Chapter 10  Conclusions** 189

**Bibliography** 193

**List of Publications** 203
List of Figures

1.1 3-D image capture. .................................................. 2
1.2 Typical 3-D measurement methods: (a) the stereo-matching method, (b) the depth-from-defocus method, (c) the time-of-flight method, (d) the light-section method. .................................................. 3
1.3 Principle of the light-section range finding. ......................... 4
1.4 Principle of triangulation-based range calculation. ................. 5
1.5 The state-of-the-art image sensors with 3-D imaging capability based on the light-section method. .................................................. 6
1.6 Imaging system configurations: (a) the conventional imaging system, (b) a smart imaging system. .................................................. 7
1.7 Parallel image processing configurations. ............................ 8
2.1 Conventional frame access techniques: (a) analog readout, (b) digital readout. 13
2.2 High-speed dynamic access technique. .............................. 14
2.3 Sensing procedure of the high-speed dynamic access. ............. 15
2.4 Pixel circuit configuration and operation. ........................... 16
2.5 Schematic and operation of the adaptive thresholding and TDA-ADC. 18
2.6 Relation between a pixel value and a discharging time of \( V_{col} \) at a threshold level. .................................................. 19
2.7 Schematic of a binary-tree priority encoder. ........................ 20
2.8 Timing diagram of the high-speed position detection. ............. 21
2.9 Block diagram of the sensor. .............................. 22
2.10 Chip microphotograph. ............................................ 23
2.11 Overall system configuration. ........................................ 24
2.12 Photographs of the 3-D image capture system. .................... 26
2.13 Measurement result of 2-D image capture. ........................ 27
2.14 Measurement result of sheet beam detection. ...................... 28
2.15 Range finding speed and pixel resolution with comparison. ....... 29
2.16 Measured range accuracy ................................................. 30
2.17 Measurement results of 3-D image capture .......................... 31
2.18 Measured 3-D images of moving objects ............................. 32
2.19 3-D image capture system using multiple range finders ........... 33
2.20 Photographs of 3-D image capture system using multiple range finders . . . 34
2.21 Synthesized 3-D image using multiple range finders ............... 35
2.22 Block diagram of the 1024 × 768 3-D image sensor ................. 36
2.23 Chip microphotograph .................................................... 37
2.24 Possible range finding rate of the XGA 3-D image sensor ............ 38
2.25 Possible range accuracy of the XGA 3-D image sensor ............. 39
2.26 Measured images and object extraction: (a) a 2-D image with 1024 × 768 pixels, (b) a range map, (c) object extraction using range information .... 40
2.27 Reconstructed 3-D images: (a) a wireframe model, (b) a texture-mapped 3-D object .................................................. 41
2.28 Measurement setup for real-time 3-D image capture with XGA pixel resolution 42
2.29 Measured 3-D images of a moving object using the XGA 3-D image sensor . . 42
2.30 Active pixel detection in the high-speed dynamic access technique ........ 43
2.31 Concept of ambient light suppression for the high-speed dynamic access technique .................................................. 44
2.32 Pixel circuit with pixel-parallel ambient suppression ................. 45
2.33 Timing diagram of pixel-parallel suppression circuit: (a) 2-D imaging mode, (b) 3-D imaging mode ......................................... 46
2.34 Chip microphotograph and pixel layout ................................ 46
2.35 Preliminary tests of pixel-parallel ambient light suppression: (a) camera module, (b) 2-D image without ambient light suppression, (c) 2-D image with ambient light suppression ......................................... 47
2.36 Adaptive threshold circuit for high-speed dynamic access ........... 48
2.37 Error condition of the high-speed dynamic access technique under strong ambient light .............................................. 49
2.38 Adaptive reset level control circuit for column-parallel suppression technique 50
2.39 Chip microphotograph .................................................... 51
2.40 Photo diode structure with an n⁺-diff/p-sub photo diode ............... 52
2.41 Photo diode structure with a biased transistor and an n-well/p-sub photo diode 52
2.42 Simulation results of column-parallel suppression of ambient light levels. 53
2.43 Simulation results of column-parallel suppression of select timing variations. 53
2.44 Simulation results of column-parallel suppression of device fluctuations. 54
2.45 Measured waveforms of the column outputs: (a) without reset feedback, (b) with reset feedback. 54
2.46 Timing diagram of the column-parallel timing calibration. 55
2.47 Measurement setup: (a) front side of the camera board, (b) back side of the camera board, (c) system overview, (d) a measured 2-D image, (e) a measured range map. 56
2.48 Reconstructed wireframes. 56

3.1 Frame access methods: (a) raster scan, (b) row-access scan, (c) row-parallel scan. 59
3.2 Position detection flow: (a) the conventional row-access scan method, (b) the proposed row-parallel scan method. 60
3.3 Row-parallel position detection architecture. 61
3.4 Schematic of a pixel circuit. 62
3.5 Timing diagram of row-parallel position detection. 63
3.6 Procedure of row-parallel active pixel search. 64
3.7 Bit-streamed column address flow for row-parallel address acquisition. 65
3.8 Schematic of a row-parallel processor. 65
3.9 Timing diagram of a row-parallel processor. 66
3.10 A triangulation-based light-section range finding system: (a) system configuration, (b) relation between a range accuracy and a beam position on the focal plane. 67
3.11 Sub-pixel center position detection: (a) single-sampling method, (b) multi-sampling method. 68
3.12 Sub-pixel resolution as a function of the number of samplings. 68
3.13 Block diagram of a prototype position detector. 69
3.14 Simplified row-parallel processors implemented in the prototype position detector. 69
3.15 Chip microphotograph. 70
3.16 Limiting factors of frame rate in a reset-per-frame mode and a reset-per-scan mode. 71
3.17 Simulated search time per frame for position detection of the fabricated chip. 72
3.18 Simulated search time in high pixel resolution. 73
3.19 System configuration of fast range detection using stereo range finders. 74
3.20 Principle of fast range detection using stereo range finders. 75
3.21 Measurement system. 76
3.22 Measurement results. 78
3.23 Simplified block diagram of 4×4 pixels. 80
3.24 Chip microphotograph and pixel layout. 81
3.25 Pipeline operation diagram. 82
3.26 Cycle time of active pixel search and data readout. 82
3.27 Test equipment for the worst-case frame access. 83
3.28 Measured waveforms of the worst-case frame access to an electrical test pattern at 432 MHz. 84
3.29 Measured range accuracy: (a) single-sampling mode, (b) multi-sampling mode. 85
3.30 Photograph of a range finding system. 86
3.31 Measurement result of range finding. 87

4.1 Basic idea of the demodulation sensing. 90
4.2 Pixel circuit implementation of the demodulation sensing. 91
4.3 Timing diagram of the pixel circuit operation. 92
4.4 Array structure and timing diagram. 94
4.5 Pixel layout. 95
4.6 Chip microphotograph. 96
4.7 Measurement setup. 97
4.8 Photographs of the measurement setup: (a) a camera module with the position sensor; (b) a spot beam source with X-Y scanning mirrors. 98
4.9 High sensitive position detection in nonuniform background illumination. 98
4.10 Sensitivity and dynamic range. 99
4.11 Selectivity of the demodulation sensing. 101
4.12 Relation between the correlation frequency and the sensitivity. 102
4.13 Linearity of the measured range data. 103
4.14 Measured range maps. 104

5.1 Preprocessing for image recognition. 107
5.2 System configuration using a modulated RGB flashlight.  
5.3 Photocurrent demodulation by two in-pixel integrators: (a) the conventional demodulation, (b) the proposed demodulation.  
5.4 Timing diagram of photocurrent demodulation: (a) the conventional demodulation, (b) the proposed demodulation.  
5.5 Pixel configuration: (a) two integrators per pixel, (b) pixel-level color demodulation with four integrators per pixel, (c) timing diagram of a projected RGB flashlight.  
5.6 Pixel circuit configuration and layout in a 0.35 \( \mu \)m process technology.  
5.7 Timing diagram.  
5.8 Asymmetry offset of bidirectional integration.  
5.9 Simulation waveforms of pixel-level demodulation: (a)–(d) the present sensing scheme, (e) the conventional sensing scheme.  
5.10 Sensor block diagram.  
5.11 Schematic of offset canceller.  
5.12 Implemented charge-distributed 8-bit A/D converter.  
5.13 Chip microphotograph.  
5.14 Output voltage vs. modulated light intensity \( E_R \): (a) \( E_{bg} = 0 \, \mu W/cm^2 \), (b) \( E_{bg} = 200 \, \mu W/cm^2 \), (c) \( E_{bg} = 500 \, \mu W/cm^2 \), (d) conventional demodulation without efficient ambient light suppression.  
5.15 Saturation level of \( E_R \) vs. ambient light intensity \( E_{bg} \): (a) measurement results of the present sensing scheme, (b) reference of the conventional sensing scheme.  
5.16 Offset voltage \( V_{Oo} \) vs. ambient light intensity \( E_{bg} \).  
5.17 Measurement results of color imaging with ambient light suppression.  
5.20 Augmented reality system with active optical devices.  
5.21 Pixel circuit configuration.  
5.22 Timing diagram of the pixel circuit.  
5.23 Analog/digital readout circuit.  
5.24 Timing diagram of digital readout.
5.25 Block diagram of the smart image sensor. .................. 130
5.26 Chip microphotograph and pixel layout. .................. 131
5.27 Measurement system structure. ......................... 132
5.28 Measured waveforms. ................................. 132
5.29 Coding method and packet format. ......................... 133
5.30 Reproduced image with ID information. .................. 134
5.31 Sensitivity and dynamic range of ID beacon detection. .................. 135

6.1 Basic Hamming distance search operation without hierarchical structure. .. 139
6.2 Hierarchical structure: (a) search signal path, (b) permission signal path. .. 140
6.3 Operation diagram of hierarchical search. .................. 140
6.4 Manhattan-distance estimation using thermometer encoding. .................. 141
6.5 Static circuit implementation of the associative memory cell: (a) odd-numbered cell, (b) even-numbered cell. .................. 142
6.6 Timing diagram of search circuit. ......................... 143
6.7 Dynamic circuit implementation of the associative memory cell: (a) odd-numbered cell, (b) even-numbered cell. .................. 144
6.8 Schematics of: (a) detected data selector, (b) binary-tree priority encoder. .. 144
6.9 Block diagram: (a) associative engine, (b) word structure. .................. 145
6.10 Chip microphotograph. ................................. 146
6.11 Functional test results of Hamming-distance estimation. .................. 147
6.12 Functional test results of Manhattan-distance estimation. .................. 148
6.13 Layout of the associative memory cell: (a) static circuit implementation, (b) dynamic circuit implementation. .................. 148
6.14 Measured waveforms of the search signal propagation. .................. 149
6.15 Operation frequency and power supply voltage. .................. 150
6.16 Cycle time and data capacity. .......................... 150

7.1 Operation diagram of a fully digital and word-parallel associative memory . 154
7.2 Possible multi-chip structures: (a) a bus structure with a scan controller, (b) a star structure with a WTA processor, (c) the present hierarchical structure. . 156
7.3 Examples of inter-chip wiring in a multi-chip structure: (a) a star structure, (b) the present hierarchical structure. .......................... 157
7.4 Hierarchical multi-chip structure using embedded binary-tree pipelined priority decision circuits. .................................................. 158
7.5 Block diagram of associative memory for multi-chip configuration. ........ 160
7.6 Simplified schematics of binary-tree priority decision circuits: (a) intra-chip priority decision circuit and address encoder, (b) inter-chip pipelined priority decision circuit. .................................................. 161
7.7 Timing diagram of PPD circuit for 8 chips. .................................................. 162
7.8 Module generator functions. .................................................. 163
7.9 Module generator execution example. .................................................. 164
7.10 Examples of module generation: (a) 128-bit 256-word module for a single chip, (b) 256-bit 256-word module for 16-chip structure. ....................... 164
7.11 Search cycle time and inter-chip bit rate. .................................................. 166
7.12 Additional latency for the multi-chip structure. .................................................. 167
7.13 Total search time as a function of Hamming distance of the detected data. .... 167

8.1 Application examples of Manhattan-distance search. ......................... 170
8.2 Block diagram: (a) an 8-bit element structure, (b) a word structure with hierarchical search path. .................................................. 171
8.3 Circuit configuration of an 8-bit element cell. .................................................. 172
8.4 Search operation flow: (a) absolute flag generation, (b) distance counting operation, (c) weighted search clock supply. .................................................. 173
8.5 Word-parallel distance calculation circuits using autonomous weighted search clocks. .................................................. 174
8.6 Nearest match detection flow in candidates. .................................................. 175
8.7 Circuit configuration: (a) a nearest match detector for candidates, (b) a binary-tree priority encoder simplified with 8 inputs. ....................... 176
8.8 Block diagram of Manhattan-distance associative engine. ..................... 177
8.9 Chip microphotograph and layout of an element cell. ......................... 178
8.10 Power supply voltage vs search clock period. .................................................. 179
8.11 Characteristics of the present continuous search operation for wide-range associative processing. .................................................. 180

9.1 Basic operation of associative processing for 3-D object clipping. ........... 183
9.2 Associative processing flow for 3-D image capture. .................................................. 185
9.3 Word structure and circuit configuration. ........................................ 186
9.4 Simulation results of 3-D object clipping. ........................................ 187
## List of Tables

<table>
<thead>
<tr>
<th>Table</th>
<th>Title</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>2.1</td>
<td>Chip specifications.</td>
<td>24</td>
</tr>
<tr>
<td>2.2</td>
<td>Performance of the VGA 3-D image sensor.</td>
<td>30</td>
</tr>
<tr>
<td>2.3</td>
<td>Chip specifications.</td>
<td>37</td>
</tr>
<tr>
<td>2.4</td>
<td>Chip specifications.</td>
<td>47</td>
</tr>
<tr>
<td>2.5</td>
<td>Chip specifications.</td>
<td>52</td>
</tr>
<tr>
<td>3.1</td>
<td>Chip specifications.</td>
<td>70</td>
</tr>
<tr>
<td>3.2</td>
<td>Measurement results and comparisons.</td>
<td>79</td>
</tr>
<tr>
<td>3.3</td>
<td>Chip specifications.</td>
<td>81</td>
</tr>
<tr>
<td>3.4</td>
<td>Chip performance.</td>
<td>87</td>
</tr>
<tr>
<td>4.1</td>
<td>Chip specifications.</td>
<td>96</td>
</tr>
<tr>
<td>4.2</td>
<td>Performance specifications.</td>
<td>103</td>
</tr>
<tr>
<td>5.1</td>
<td>Specifications of the prototype image sensor.</td>
<td>119</td>
</tr>
<tr>
<td>5.2</td>
<td>Parameters of the beacon detector.</td>
<td>131</td>
</tr>
<tr>
<td>5.3</td>
<td>Performance comparison.</td>
<td>136</td>
</tr>
<tr>
<td>6.1</td>
<td>Specifications of the digital associative engine.</td>
<td>151</td>
</tr>
<tr>
<td>7.1</td>
<td>Comparison among multi-chip structures.</td>
<td>157</td>
</tr>
<tr>
<td>7.2</td>
<td>Area of associative memory module.</td>
<td>165</td>
</tr>
<tr>
<td>8.1</td>
<td>Core area and SRAM ratio.</td>
<td>180</td>
</tr>
<tr>
<td>8.2</td>
<td>Specifications of the associative engine.</td>
<td>181</td>
</tr>
</tbody>
</table>
Chapter 1

Introduction

1.1 Background

Three dimensional image capture has a wide variety of application fields such as computer vision, robot vision, position adjustment and so on. In recent years, we often see 3-D computer graphics in movies and televisions, and interactively handle them using personal computers and video game machines. In the near future, a 3-D imaging system will be applied to more and more various applications such as 3-D movies, object extraction, gesture recognition, virtual reality, and security. Then the latest and future 3-D applications will require a high-speed, high-quality and robust 3-D imaging system.

3-D image capture is mainly composed of range finding and 3-D data processing as shown in Figure 1.1. A range finder acquires object shapes and locations in a target scene. A 3-D data processor operates texture mapping, object segmentation and so on. As the process technology progresses, the number of transistors on an LSI chip has been increasing and an image sensor attains higher speed and resolution. Furthermore, signal processing functions are integrated in a sensor chip as a smart image sensor [1], [2]. A smart image sensor thus has a possibility of high-speed and high-quality range finding. A 3-D data processor also becomes able to handle large amounts of image data at high speed due to the process technology advancement. In particular, a highly parallel image processor, such as an associative engine, is able to close the performance gap between a signal processor and memories in the high-quality 3-D image capture. A smart image sensor and an associative engine will be key components of the advanced 3-D imaging system.

3-D imaging systems have been realized on the basis of classic range finding methods such as the stereo-matching method [3]–[9], the depth-from-defocus method [10]–[12], the time-of-flight method [13]–[19], and the light-section method [20]–[27]. These methods
are categorized as either a passive range finding method or an active range finding method. Typical passive range finding methods are the stereo-matching method and the depth-from-defocus method. The stereo-matching method provides a simple system configuration with two or more cameras as shown in Figure 1.2 (a). The stereo-matching processing, however, requires huge computational effort in a case of a high pixel resolution, and the range resolution and accuracy depend on target surface patterns. Therefore, the stereo-matching method is being used for 3-D image capture with rough range accuracy. The depth-from-defocus method estimates a distance between a camera and a target object by fine focal adjustment as shown in Figure 1.2 (b). The range resolution and accuracy strongly depend on a target condition since the depth-from-defocus method requires explicit surface patterns and edges of a target object to adjust the focus. On the other hand, typical active range finding methods are the time-of-flight method and the light-section method. In the time-of-flight method, a projected light is reflected from a target object with some delay in proportion to the distance as shown in Figure 1.2 (c). The arrival time of the reflected light is acquired by a special photo
detector. The range resolution is basically determined by the time resolution independently of a target distance, therefore the time-of-flight method is suitable for a long-distance range finding. The range accuracy is, however, limited at a couple of centimeters by an electronic shutter speed of a special photo detector.

The light-section method has a capability of high range accuracy, and it is efficient for high-quality 3-D image capture in a middle-range target scene. A light-section range finding system consists of a sheet beam projector and a position sensor as shown in Figure 1.2 (d). A sheet beam is projected on a target object at an angle of $\alpha_p$, and a position sensor obtains a target scene image as shown in Figure 1.3. The sensor detects a position of the reflected beam on the sensor plane, and it provides the incidence angle of $\alpha_i$. A distance between a target object and a position sensor is acquired by triangulation. Figure 1.4 shows a principle of the triangulation-based range calculation. An image sensor detects a projected beam at $e(x_e, y_e)$ on the sensor plane in a case that a target object is placed at $p(x_p, y_p, z_p)$. The incidence
angles, $\alpha_i$ and $\theta$, are given by
\[ \tan \alpha_i = \frac{f}{x_e}, \quad \tan \theta = \frac{f}{y_e}, \quad (1.1) \]
where $f$ is a focal depth of a camera. $\alpha_i$ and $\alpha_p$ are also represented by
\[ \alpha_p = \frac{l}{d/2 - x_p}, \quad (1.3) \]
\[ \alpha_i = \frac{l}{d/2 + x_p}, \quad (1.4) \]
where $l$ is a length of a perpendicular line from a target position, $p$, to $x$-axis. Therefore, $x_p$ and $l$ are given by
\[ x_p = \frac{d(\tan \alpha_p - \tan \alpha_i)}{2(\tan \alpha_p + \tan \alpha_i)}, \quad (1.5) \]
\[ l = \frac{d \tan \alpha_p \tan \alpha_i}{\tan \alpha_p + \tan \alpha_i}. \quad (1.6) \]
Here, $y_p = l \sin \theta$ and $z_p = l \cos \theta$. Thus, $y_p$ and $z_p$ are also given by
\[ y_p = \frac{d \tan \alpha_p \tan \alpha_i \sin \theta}{\tan \alpha_p + \tan \alpha_i}, \quad (1.7) \]
\[ z_p = \frac{d \tan \alpha_p \tan \alpha_i \cos \theta}{\tan \alpha_p + \tan \alpha_i}. \quad (1.8) \]

The light-section range finding realizes high-accuracy 3-D image capture, however, many frames are necessary for the position detection during the beam scanning in order to acquire
Figure 1.4 Principle of triangulation-based range calculation.

A range map. For example, a 1024 × 1024 range map in video rate requires a high-speed image capture of over 30,000 frames per second (fps). It is difficult for a standard image sensor to attain such a high frame rate. Figure 1.5 shows the state-of-the-art image sensors with 3-D imaging capability based on the light-section method. A high-speed CMOS active pixel sensor (CMOS APS) using column-parallel analog-to-digital converters (ADCs) has achieved 500 fps with a 1024 × 1024 pixel resolution [28]. Moreover, one of the state-of-the-art high-speed image sensors achieves 10,000 fps with a 352 × 288 pixel resolution by using pixel-parallel ADCs [29]. A standard frame access architecture like these high-speed 2D image sensors, however, makes it difficult to realize a high-speed and high-quality 3-D imaging system as shown in Figure 1.5. Smart position sensors have been reported for fast range finding [25]–[27]. These position sensors are customized for quick position detection of an incident sheet beam on the sensor plane, nonetheless their performances are nonqualified for a real-time 3-D image capture with a high pixel resolution. Therefore, new frame access architectures are desired for a high-quality 3-D image capture system.

1.2 Key Components of 3-D Image Capture

In this section, concepts of a smart image sensor and an associative engine are presented as key components of a high-speed and high-quality 3-D image capture system.
1.2.1 Smart Image Sensors

A 3-D image capture system based on the light-section method requires an image sensor with several features such as high-speed position detection, availability in wide-range ambient illumination, robust beam selectivity, and so on. It is generally difficult for a standard image sensor, such as CCD imagers and CMOS APS’s, to realize the application-specific features. Therefore, a special image sensor customized for 3-D image capture is being desired to satisfy the application requirements.

A smart image sensor is an application specific image sensor with signal processing functions on the sensor chip, which is also called a computational image sensor, a functional image sensor, or a vision chip [1], [2], [30]. In the conventional imaging systems, an image sensing device and a signal processing device are separated as shown in Figure 1.6 (a). The imaging system has a lot of flexibility of image processing, however all the image data must be transferred from an image sensor to a signal processor through an analog-to-digital converter. On the other hand, a smart image sensor includes processing elements on the focal plane as shown in Figure 1.6 (b). And then, many smart image sensors have been reported for various configurations and various functions. For example, an edge detection function [31]–[33], a noise reduction function [34]–[36], a variable resolution scan [37]–[39], a motion detection function [40]–[46], and an image compression function [47]–[50] have been
Figure 1.6 Imaging system configurations: (a) the conventional imaging system, (b) a smart imaging system.
implemented as a smart image sensor. These smart image sensors take advantage of a two dimensional array structure for the parallel signal processing.

A smart image sensor has a potential capability of a high-speed and high-quality 3-D imaging system, and then some smart image sensors have been developed as a high-speed range finder based on the light-section method [20]–[27]. However, the state-of-the-art smart image sensors are not capable of the future 3-D imaging systems for 3-D movies and scientific surveillance as shown in Figure 1.5. The 3-D imaging applications need higher speed, higher pixel resolution, higher range accuracy, more robustness and so on. Therefore, a new sensing scheme and a new frame access architecture are required for a smart image sensor as a key component of 3-D image capture.

1.2.2 Associative Engines

The growing processor-memory performance gap becomes an impediment to system performance, particularly where applications require vast amounts of memory bandwidth [51]–[53]. Many image preprocessing algorithms require huge amounts of memory access, and then they cause the memory bottlenecks in a standard microprocessor configuration. Therefore, the integration of processing into memories has been proposed and implemented for various image processing algorithms, [9], [54]–[61], as shown in Figure 1.7. Although the parallel image processors generally make a sacrifice of flexibility, they achieve high-speed image processing. These parallel image processors are usually applied to two dimensional image filtering and pattern matching, but they are also expected to be applicable to three
dimensional data processing.

An associative engine is one of the parallel image processors based on content addressable memories (CAMs), in which similar data to a given input are retrieved from pre-stored data. It has a wide variety of applications such as pattern recognition, code-book-based data compression, multimedia, intelligent processing and learning systems. Basic CAMs have been developed to reduce the memory access and data processing time as reported in [62]–[66]. They have a capability of quick detection of complete match data in pre-stored data. Furthermore, advanced CAMs with associative processing based on Hamming or Manhattan distance have been developed for more flexible and complex data processing [67]–[76].

3-D data processing also requires huge amounts of memory access and data processing time, thus the parallel image processors based on associative memories are efficient for high-speed and high-quality 3-D image capture. However, the conventional associative memories employing analog circuit techniques have critical problems in device scaling, capacity scalability, search range, search precision, and so on. Therefore, a new associative engine with a high capacity scalability and a flexible search function is desired for 3-D data processing such as calibration, object segmentation, target recognition and so on.

1.3 Research Objectives and Thesis Organization

This thesis focuses on smart image sensors and associative engines for three dimensional image capture. New sensor architectures and circuit designs are presented for advanced 3-D image capture systems and augmented image sensing systems in Chapter 2 through Chapter 5. Then, new architectures and circuit realization of digital associative engines are shown in Chapter 6 through Chapter 8, and applied to a 3-D image capture system in Chapter 9.

Chapter 2 proposes a high-speed dynamic frame access technique and circuit implementation to realize a real-time and high-resolution 3-D image sensor. Ambient light suppression techniques are also proposed for low-intensity beam detection in the dynamic frame access. A prototype 3-D image sensor with 640 × 480 pixels using the dynamic frame access technique attains a real-time and high-resolution range finding system. Then, a scale-up version with 1024 × 768 pixels is also developed. Furthermore, a 352 × 288 3-D image sensor with column-parallel ambient light suppression is presented to demonstrate the feasibility of the proposed techniques and the applicability to a real-time, high-resolution and robust 3-D image capture system.

Chapter 3 targets 1,000-fps range finding based on the light-section method for new appli-
cations of 3-D image capture such as shape measurement of structural deformation and destruction, scientific observation of high-speed moving objects, fast visual feedback systems in robot vision, and quick inspection of industrial components. A concept of row-parallel position detection is presented for the ultra fast range finding. A $375 \times 365$ range finder with new row-parallel search circuits is shown together with the measurement results.

Chapter 4 shows a demodulation sensing scheme for high-sensitivity beam detection in wide range of ambient light illumination. It realizes a robust range finding system using a low-intensity beam projection in nonideal measurement conditions. A $120 \times 110$ range finder presents the special features of robust beam detection. It is applicable to a triangulation-based range finding using a spot beam projection, and then it successfully captures a range map of a target object in a high-contrast ambient light.

Chapter 5 introduces two smart image sensors as extension of the demodulation image sensor. One is a pixel-level color demodulation image sensor for support of image recognition. It detects a projected flashlight with suppression of an ambient light based on the demodulation sensing scheme. Every pixel provides innate color and depth information of a target object for color-based categorization and depth-key object extraction. A prototype image sensor with $64 \times 64$ pixels shows the feasibility of the color demodulation function. The other is a low-intensity ID beacon detector for augmented reality systems. It enables to get a scene image, locations, IDs, and additional information of multiple target objects simultaneously in real time. A prototype image sensor with $128 \times 128$ pixels demonstrates the low-intensity ID beacon detection.

Chapter 6 proposes a new concept and circuit implementation for a high-speed associative engine with exact Hamming distance computation. It employs a word-parallel and hierarchical search architecture using a logic-in-memory digital implementation. The circuit implementation enables high tolerance for device fluctuations in a deep sub-micron process and a low-voltage operation.

Chapter 7 shows a scalable multi-chip architecture based on the digital associative processing presented in Chapter 6. A multi-chip structure is most efficient for the scalability like standard memories. The present architecture attains the fully chip- and word-parallel Hamming distance computation with faultless precision, no throughput decrease, and additional clock latency of $O(\log P)$ for a configuration with $P$ chips. The performance evaluations demonstrate the capacity scalability, which is important to handle large amounts of range data at high speed in a 3-D image capture system.
Chapter 8 proposes a hardware-oriented search algorithm based on Manhattan distance. The search algorithm is efficiently implemented using the hierarchical search structure presented in Chapter 6. The word-parallel digital associative engine attains accurate and wide-range Manhattan distance computation. It has a wide variety of application fields such as pattern recognition, data compression, and intelligent processing. Furthermore, it is suitable for 3-D data preprocessing such as object segmentation, calibration, and target recognition.

Chapter 9 introduces associative processing for 3-D image capture. 3-D object clipping is efficiently implemented by using the associative engine based on Manhattan distance. Based on the performance estimation, the possibility of real-time and high-resolution 3-D image processing is shown.

Finally, Chapter 10 gives conclusions of this thesis.
Chapter 2

Real-Time and High-Resolution 3-D Image Sensors

2.1 Introduction

This chapter targets a real-time and high-resolution 3-D image sensor, which captures a range map with over VGA (640 × 480) pixel resolution at a speed of 30 range maps/s. As presented in Chapter 1, a range finding system based on the light-section method requires thousands of images every second for a real-time 3-D image capture system. For example, a video-rate 3-D imaging with a 1024 × 1024 pixel resolution needs over 30,000 fps. It is difficult for a standard readout architecture such as CCD, thus smart position sensors for the fast range finding have been reported in [25]–[27]. [25] employs a row-parallel winner-take-all (WTA) circuit to realize 100 range maps/s with 64 × 64 range data. Its pixel size is smaller than [26] because of the row-parallel architecture. The pixel resolution, however, is limited by the precision of the current-mode WTA circuit. Therefore, it is difficult to realize enough high frame rate for a real-time and high-resolution 3-D imaging system. A 3-D image sensor using a pixel-parallel architecture [26] is capable of 30 range maps/s with a 192 × 124 pixel resolution. It requires a large pixel circuit area for an analog-to-digital converter and frame memories. To reduce the pixel circuit, a 320 × 240 (QVGA) color imager, which is designed with analog frame memories out of a pixel array, has been developed [27]. The maximum range finding speed is limited at 15 range maps/s with a 160 × 120 pixel resolution. As shown in Figure 1.2, a new frame access technique with a compact pixel configuration is required to attain a real-time and high-resolution 3-D image capture.

We propose a new concept of high-speed dynamic access in Section 2.2. Section 2.3 presents circuit configurations for the high-speed dynamic access technique. Section 2.4 describes design of a 640 × 480 real-time 3-D image sensor. Section 2.5 gives a detail account
of a real-time 3-D imaging system using the 640 $\times$ 480 3-D image sensor. Section 2.6 shows the measurement results. Section 2.7 presents a 3-D image capture system using multiple cameras for full 3-D model reconstruction. Section 2.8 describes a 1024 $\times$ 768 3-D image sensor as a scale-up implementation of the present techniques. In Section 2.9, we propose pixel-parallel and column-parallel ambient light suppression techniques which are adapted to use in the proposed access technique. Finally, Section 2.10 summarizes this chapter.

2.2 Concept of High-Speed Dynamic Access

Figure 2.1 (a) and (b) show the conventional frame access techniques using analog readout and digital readout, respectively. In the analog frame access technique, pixel values are read out via source follower circuits in the same way of a standard CMOS APS as shown in Figure 2.1 (a). The peak position of pixel values is detected after the pixel values are converted to digital values. Column-parallel ADCs [28] make the frame access speed faster, however it takes a couple of micro seconds per row access. Therefore, the frame access speed is too slow to realize a real-time range finder though it attains a high pixel resolution. The digital frame access technique is often used for the state-of-the-art range finders such as [26]. A pixel array provides digital outputs as the pixel values, therefore they are quickly obtained by sense amplifiers as shown in Figure 2.1 (b). It achieves a high-speed frame access of a
couple of 10 ns/row, however the pixel resolution is limited by the large pixel circuit.

We propose a high-speed dynamic access technique which attains both high pixel resolution and high-speed frame access as shown in Figure 2.2. In the present technique, each pixel provides an analog value, but the readout scheme is based on a dynamic logic operation such as the digital access technique. The present access technique makes efficient use of the output timing variations resulting from the pixel values. The pixel values are reflected in the transient timings of sense-amplified outputs. Therefore, active pixels with a strong incident intensity are quickly detected by time-domain thresholding. It allows a compact pixel configuration similar to a standard CMOS APS, and attains a high-speed frame access of a couple of 10 ns/row.

2.3 Circuit Configurations

2.3.1 Sensing Procedure

Figure 2.3 shows a sensing procedure of the high-speed dynamic access. In the light-section range finding, an image sensor receives a scene image and a projected sheet beam. For 2-D image capture, all pixels are accessed using a raster scan to read out the pixel values. For 3-D image capture, an image sensor obtains a position of the projected sheet beam on the sensor plane. The position detection is carried out as follows.

(a) A row line is accessed using the high-speed dynamic access technique to acquire a position of the projected sheet beam on the sensor plane. The dynamic access is carried out by an adaptive threshold circuit and time-domain approximate ADCs (TDA-ADCs).
Figure 2.3 Sensing procedure of the high-speed dynamic access.

(b) The pixels which receive a strong beam intensity are detected in the row line. The detected pixels are over the threshold level which is adaptively determined by the darkest pixel intensity. The adaptive thresholding is implemented using a slope detector of each column output in time domain to realize quick detection of active pixels. It is important for the high-speed access and detection of active pixels since the threshold operation requires cancellation of timing fluctuations of the row access speed and robustness in overall scene illuminance.

(c) The pixel values over the threshold level are converted to digital by column-parallel TDA-ADCs. The results of TDA-ADCs contribute to improve a sub-pixel accuracy due to a gravity center calculation using an intensity profile of a projected beam. The adaptive threshold circuit and the approximate ADCs are operated at the same time as the dynamic readout operation.
(d) The results of the adaptive thresholding are transferred to the next pipeline stage to get the left and right edge addresses of the active pixels. A binary-tree priority encoder (PE) provides a location of the active pixels and also selects an intensity profile of the active pixels for the third pipelined stage.

(e) The third stage selectively provides the intensity profile of the active pixels as significant information for a high-accuracy range finding.

In this procedure, the image sensor quickly acquires the location and intensity profile of a projected sheet beam as requisites for high-accuracy triangulation, and reduces the data transmission to attain high frame rate for a real-time and high-resolution range finding.

### 2.3.2 Pixel Circuit

Figure 2.4 shows the pixel circuit configuration and operation diagram. The present sensing scheme allows the same pixel configuration as a 3-transistor CMOS APS [28]. This pixel structure realizes smaller pixel area and higher pixel resolution than the conventional range finders [25]–[27]. In 2-D imaging, a node of $N_1$ is connected to a supply voltage of $V_{dd}$ and a node of $N_2$ is led to a source follower circuit so that pixels work as the conventional APS. In 3-D imaging, a node of $N_1$ is precharged to a high level before selected, and a node of $N_2$ is connected to the ground level of $V_{ss}$. A bias voltage of $V_{bn}$ in Figure 2.5 is set to a high level in order to connect $N_2$ to the ground level. After selected, the column output of $N_1$ begins to decrease according to each pixel value as shown in Figure 2.4. Namely the output of $N_1$
associated with active pixels is decreasing more slowly so that the time to a threshold voltage is delayed more as well. In the readout method, the relative intensity of active pixels is acquired shortly after the row access, by means of the time-domain dynamic readout scheme with adaptive thresholding.

2.3.3 Adaptive Threshold Circuit

In general, the conventional position sensors detect high intensity pixels using a predetermined threshold intensity. However, the optimal threshold is influenced by a fluctuation of the row access speed. It also depends on the overall scene illuminance. In the present sensing scheme, the threshold intensity of $E_{th}$, shown in Figure 2.3 (b), is adaptively determined by the weakest intensity in each row as shown in Figure 2.5 (b) and (c). A column output, $CMP_1$, associated with an inactive pixel is changed first, and then it initiates a common trigger signal of $COM$. The common trigger signal, $COM$, propagates to trigger inputs of column-parallel latch sense amplifiers through delay elements of $T_{th}$ and $T_{res}$, which determines a latch timing of the column output of $CMP_i$. $DCK_0$, which is a delayed signal of $COM$ by $T_{th}$, triggers the first stage of the latch sense amplifiers. The first delay, $T_{th}$, keeps a threshold margin of $\Delta E_{th}$, shown in Figure 2.3 (b), from the darkest level in time domain. It cancels a fluctuation of row access speed, which is mainly caused by column-line parasitic resistances. In addition, it achieves robustness in overall scene illuminance. The first stage outputs, $ACT$, indicate whether a pixel is activated or not. They are transferred to the next priority encoder stage.

Figure 2.6 shows the relation between a voltage value, $V_{pd}$, at a photo diode and a discharging time of $V_{col}$ at an adaptive threshold level. The voltage level, $V_{pd}$, decreases from a reset level of $V_{rst}$ dependently on the incident light. And then $\Delta V_{pd}$ is converted to the discharging time. The reset voltage, $V_{rst}$, enables to adjust the adaptive threshold level, $\Delta E_{th}$, corresponding to the delay of $T_{th}$ as shown in Figure 2.6. For example, $\Delta V_{pd}$ of 200mV corresponds to discharge periods of 1.72 ns and 7.68 ns when we provide $V_{rst}$ of 2.5 V and 1.8 V, respectively.

2.3.4 Time-Domain Analog-to-Digital Converters

An intensity profile of active pixels is acquired by a column-parallel time-domain approximate ADC (TDA-ADC) at the same time as the adaptive thresholding. The common trigger signal, $COM$, continues to propagate through a delay of $T_{res}$ as SA clock signals, $DCK_n$, as
**Figure 2.5** Schematic and operation of the adaptive thresholding and TDA-ADC.
shown in Figure 2.5 (c). $DCK_n$ latches the column outputs, $CMP_1$, at the $n$-th stage one after another as shown in Figure 2.5 (b). The arrival timing of a column output depends on the pixel value, so the results of TDA-ADCs, $INT$, show an approximate intensity of the active pixels, which is normalized by the darkest pixel intensity in the row. For example in Figure 2.5, the common trigger signal, $COM$, is initiated by $CMP_1$ from the darkest pixel, and then $COM$ generates $DCK_n$ in column parallel. The SAs’ results for $CMP_1$ are all ‘0’ since the pixel value is below the threshold level. On the other hand, those for $CMP_2$ are ‘0000011’ and the number of ‘1’ represents the intensity over the threshold level. The number of ‘1’ is encoded in column parallel and transferred to the intensity profile readout circuit, that is, the result, $INT$, is ‘010’ as the pixel intensity associated with $CMP_2$ in Figure 2.5. The high-speed readout scheme using the present circuits provides a location of the detected pixels and its intensity profile simultaneously.

2.3.5 Binary-Tree Priority Address Encoder

Figure 2.7 shows a schematic of a binary-tree priority encoder (PE), which receives $ACT$ from the adaptive threshold circuit. The schematic represents a 16-input PE. A 640-input PE is necessary for a $640 \times 480$ (VGA) pixel resolution. It consists of a mask circuit, a binary-
tree priority decision circuit, and an address encoder. At the mask circuit, $ACT_n$ is compared with the neighbors, $ACT_{n+1}$ and $ACT_{n-1}$, to detect the left and right edges using XOR circuits. The priority decision circuit receives $PRI_IN_n$ from the mask circuits and generates an output at the minimum address of active pixels, for example, $PRI_OUT_3$ in Figure 2.7. The left and right edge addresses are encoded at the address encoder. After the first-priority edge has been encoded, the edge is masked by $PRI_OUT_n$ and $MCK$. And then a location of the next-priority active pixels is encoded. The priority decision circuit keeps a high speed in a large input number due to a binary-tree structure and a compact circuit cell. The delay increases in proportion to $\log(N)$, where $N$ is the input number.

Figure 2.7 Schematic of a binary-tree priority encoder.
Chapter 2  Real-Time and High-Resolution 3-D Image Sensors

2.3.6 Intensity-Profile Readout Circuit

Using the location of active pixels from the priority decision circuit, an intensity profile of a projected beam is quickly read out by an intensity profile readout circuit. It is utilized for an off-chip gravity center calculation for a high sub-pixel accuracy. An intensity profile of eight active pixels from the left edge is read out in parallel. The width of a projected sheet beam can be controlled within eight pixels per row. Even if the width is over eight pixels, the center position can be calculated using the left and right edge addresses. A 3-b intensity profile achieves a high sub-pixel accuracy under 0.1 pixel theoretically.

Figure 2.8 shows a timing diagram of the high-speed position detection. Three pipeline stages take five clock cycles to detect the location address and the intensity profile of active pixels in each row. A sheet beam scans a target scene using a mirror controlled by a triangular waveform. Then a range map is acquired in one way of the mirror scan. That is, 30 range maps/s requires a mirror scan of 15 Hz. For example, 480 row access cycles are carried out 640 times in a mirror scan on a target scene to get $640 \times 480$ range data.

Figure 2.8 Timing diagram of the high-speed position detection.
2.4 Design of 640 × 480 Real-Time 3-D Image Sensor

2.4.1 Sensor Configuration

To start with a feasibility study, we have designed and fabricated a prototype chip with 128 × 128 pixels using a 0.6 µm standard CMOS process [77]. And then, we have designed a 3-D image sensor with 640 × 480 pixels using the dynamic access technique based on the successful experiments of the prototype. Figure 2.9 shows a block diagram of the 640 × 480 3-D image sensor. It consists of a 640 × 480 (VGA) pixel array, address decoders for row select and reset, column-parallel readout amplifiers with a column selector for 2-D imaging, and a column-parallel position detector for 3-D imaging. The sensor has two readout operations: a standard analog readout and a fast dynamic readout. These readout operations are carried out in a time-division mode for 2-D and 3-D imaging. A column-parallel position detector is composed of 3-stage pipeline modules, which are an adaptive threshold circuit with time-domain approximate ADCs, a priority address encoder, and an

![Figure 2.9 Block diagram of the sensor.](image-url)
intensity profile readout circuit. It produces the location address of a projected beam and its intensity profile. It achieves high-speed position detection and reduction of redundant information for a real-time and high-resolution 3-D imaging system.

2.4.2 Chip Implementation

We have designed and fabricated a 640 × 480 3-D image sensor using the present architecture and circuits in a 0.6 µm standard CMOS process with 2-poly-Si 3-metal layers. Figure 2.10 shows the chip microphotograph. The sensor has a 640 × 480 pixel array, row select and reset decoders, 2-D image readout circuits, an adaptive threshold circuit with column-parallel TDA-ADCs, a 640-input priority encoder and an intensity profile readout circuit in 8.9 mm × 8.9 mm die size. It has been designed without on-chip correlation double sampling (CDS) circuits and ADCs for 2-D imaging, but they can be implemented on the chip as the same as other standard CMOS imagers to reduce fixed pattern noise (FPN) and to achieve high-speed 2-D imaging. A pixel of the 3-D image sensor consists of a photo diode and 3 transistors. The pixel area is 12 µm × 12 µm with 29.5% fill factor. The photo diode is formed by an n⁺-diffusion in a p-substrate. Table 2.1 summarizes the specifications.
### Table 2.1 Chip specifications.

<table>
<thead>
<tr>
<th>Process</th>
<th>2P3M 0.6 μm CMOS process</th>
</tr>
</thead>
<tbody>
<tr>
<td>Die size</td>
<td>8.9 mm × 8.9 mm</td>
</tr>
<tr>
<td># pixels</td>
<td>640 × 480 pixels (VGA)</td>
</tr>
<tr>
<td># FETs</td>
<td>1.12M FETs</td>
</tr>
<tr>
<td>Pixel size</td>
<td>12.0 μm × 12.0 μm</td>
</tr>
<tr>
<td># FETs/pixel</td>
<td>3 FETs</td>
</tr>
<tr>
<td>Fill factor</td>
<td>29.54 %</td>
</tr>
</tbody>
</table>

---

**Figure 2.11** Overall system configuration.

#### 2.5 Development of Real-Time 3-D Image Capture System

##### 2.5.1 Overall System Configuration

Figure 2.11 shows an overall system configuration using the real-time VGA 3-D image sensor. The system consists of a camera module with the sensor, a laser beam source with a scanning mirror, and a host computer. The camera module has an integrated system controller, which is implemented on an FPGA. The system controller and the host computer are connected by a Fast SCSI interface. The host computer issues system parameters and operation commands to the system controller and receives measured range data.
2.5.2 System Controller

A real-time range finding system using a high-speed smart image sensor requires high-speed control, processing and data transmission. We have integrated these functions in an FPGA. It performs some operation modes such as 2-D imaging, active pixel detection, range finding, calibration and so on. In a 2-D operation mode, it acquires a scene image via external 8-bit ADCs. In a 3-D operation mode, it acquires positions and intensity profiles of a projected sheet beam. It also controls a scanning mirror through an external 12-bit DAC in synchronization with the sensor control. The system controller has setting parameters of the measurement system such as a field angle and a baseline distance, which are downloaded from a host computer in advance. The range data are calculated using the setting parameters in the system controller as pre-processing. The range data are transferred to a host computer using a Fast SCSI interface. A SCSI controller is also implemented in the FPGA. The system controller operates at 40 MHz. The data rate of the SCSI interface is 9.3 MB/s.

2.5.3 Software Development

The developed camera module with the system controller is recognized as a scanner device by Windows 98/2000 on a host computer. A developed GUI software communicates with the system controller via a SCSI interface to download the setting parameters and to acquire the measurement results. A calibration target, which has a known shape, is measured to get calibration parameters at the beginning. The software has a capability of calibration of measured range data in real time. It also has a capability of real-time 2-D/3-D image display.

2.5.4 Real-Time 3-D Image Capture System

Figure 2.12 shows photographs of the 3-D image capture system. The camera board has the VGA 3-D image sensor, the integrated system controller, power supply circuits, a Fast SCSI interface, 8-bit ADCs, a 12-bit DAC for mirror control, and peripheral logic circuits. The laser beam source with a rod lens has a power of 300 mW and a wavelength of 665 nm. The scan mirror can operate up to 100 Hz. The measured data are transferred and displayed on a host computer in real time as shown in Figure 2.12. The current system requires a strong and sharp sheet beam since the photo sensitivity is low due to a standard CMOS process, which is not customized for an image sensor.
2.6 Measurement Results

2.6.1 2-D Imaging and Position Detection

Figure 2.13 shows a 2-D image captured by the present sensor. The sensor has 8-parallel analog outputs and provides a gray scale image by external ADCs. Figure 2.14 shows an example of position detection of a projected sheet beam. In the measurement, the sheet beam is projected on a sphere target object. The sensor provides the left and right edge addresses of consecutively active pixels in row. That is, a target scene image is unnecessary for the
range finding since the required information is selectively provided as the position addresses. The redundant data suppression reduces a bandwidth usage of the measurement system. A reconstructed image of the detected positions is also shown in Figure 2.14. It provides an intensity profile of the active pixels between the left edge and the right edge in order to improve the sub-pixel resolution. The range data are calculated by triangulation using the locations and the intensity profiles of the projected sheet beam.

### 2.6.2 Range Finding Speed

In 2-D imaging, eight pixel values are read out in parallel and the readout operation takes 2 $\mu$s. The maximum 2-D imaging speed is 13 fps using 8-parallel high-speed external ADCs. It has a potential of higher 2-D imaging speed since it is easy to implement the conventional readout techniques, such as column-parallel ADCs, in the present sensor architecture. In 3-D imaging, the precharge voltage of $V_{pc}$ is set to 3.5 V and the compared voltage of $V_{cmp}$ at adaptive thresholding is set to 3.0 V. Active pixels in a row line are detected in 50 ns at 100 MHz operation. Delay time of the priority encoder stage is 17.2 ns for the left and right edges. Readout time of the intensity profile is 21.5 ns. The location and intensity profile of a projected sheet beam on the sensor plane is acquired in 24.0 $\mu$s because of the pipeline operation. The position detection rate for a projected sheet beam is 41.7k lines/s. Scanning the sheet beam, the 3-D image sensor realizes 65.1 range maps/s with a VGA pixel resolution.

Figure 2.15 shows the pixel resolution and the 3-D imaging speed of the present image.
Chapter 2  Real-Time and High-Resolution 3-D Image Sensors

sensor with comparison among the previous designs. A high-speed 2-D imager [28] achieves a 500 fps 2-D imaging with 1M pixels due to column-parallel ADCs, however it is difficult for their architecture to realize a real-time 3-D imaging based on the light-section method. The state-of-the-art range finders [25]–[27] achieve more than 15 range maps/s. Their pixel circuits are too large to realize an over-VGA pixel resolution, and their architectures are intolerant to keep a real-time 3-D imaging rate in a high pixel resolution as shown in Figure 2.15. The present 3-D image sensor is the first real-time range finder with a VGA pixel resolution based on the light-section method.

Figure 2.14 Measurement result of sheet beam detection.
2.6.3 Range Accuracy

Figure 2.16 shows measured distances of a white flat board at 30 range maps/s. The baseline distance between a camera and a beam source is 431.5 mm. The view angle of the camera is 30 degree. A target object is placed at a distance of around 1200 mm from the camera. The present 3-D image sensor acquires the intensity profile of a projected sheet beam to achieve a high sub-pixel accuracy. The standard deviation of measured error is 0.26 mm and the maximum error is 0.87 mm at a distance of 1170 mm – 1230 mm by a gravity center calculation using the intensity profiles. For comparison, the standard deviation of measured error is 0.54 mm and the maximum error is 2.13 mm by the conventional binary-based position calculation. That is, the 3-D image sensor achieves less than half range error of the conventional methods based on a binary image. An intensity profile could be distorted by device fluctuations, but the measurement results show that it is effective to get an approximate intensity profile of active pixels. Table 2.2 summarizes the performance of the present 3-D image sensor with a VGA pixel resolution.
2.6.4 Real-Time 3-D Image Capture

The present 3-D image sensor is capable of capturing a 2-D image and a 3-D image in time division. Figure 2.17 shows measured images by the present 3-D image sensor. A target is placed at a distance of 1200 mm from the camera. The distance between the camera and the beam scanner is 431.5 mm. Figure 2.17 (a) is a captured VGA 2-D image of a hand. Figure 2.17 (b)–(d) are its range maps displayed from different view angles. The brightness of the range maps represents the distance from the range finder to the target object. The range data has been already plotted in 3-D space, so it can be rotated freely as shown in Figure 2.17 (b)–(d). Figure 2.17 (e) is a wire frame reproduced by the measured range data and Figure 2.17 (f) is a closeup of Figure 2.17 (e). The measured images show that the real-time 3-D image sensor with a VGA pixel resolution realizes high-spatial- and high-range-resolution...
Figure 2.17 Measurement results of 3-D image capture.

3-D imaging.

The image sensor has a possibility of detection failure on a black or complementary red part of a target object since the reflected intensity of a projected beam degrades. A long exposure avoids the detection failure with a voltage control of \( V_{rst} \) and \( V_{cmp} \) on condition that the reflected beam is still stronger than the high contrast scene. Therefore the projected beam intensity also limits the range finding speed in proportion. The current 3-D imaging system requires a strong beam intensity of 300 mW in a room with a constant ambient light to achieve the maximum range finding speed. In the future, it can be improved by a high-sensitivity photo diode with a micro lens, a correlation technique to suppress an ambient light
Figure 2.18 Measured 3-D images of moving objects.

and so on.

Figure 2.18 shows measured 3-D images in real time. In the real-time 3-D imaging, the baseline is set to 300.0 mm. The measured range data can be displayed at any view angle. In Figure 2.18, the range data are plotted as a wire frame at two view angles. In addition, the color of wire frames represents the distance from the camera by the brightness. The brighter regions are closer to the camera than the darker ones. We captured 350 range maps in 15.0 seconds. That is, the 3-D imaging system achieves 23.3 range maps/s, which is limited by the data storage speed on a host computer and the data bandwidth between a camera and a host computer.
2.7 3-D Model Reconstruction by Multiple Cameras

2.7.1 System Configuration

Figure 2.19 shows a 3-D image capture system using multiple range finders. It is capable of capturing a full 3-D model of a target object. Multiple range finders, which consist of a 3-D image sensor and a sheet beam projector presented in Figure 2.12, are placed around a target object. A calibration target is placed at the center position among the range finders. It is a cube with 20 cm on a side, and it is used to acquire intra- and inter-camera calibration parameters before the 3-D image capture [78]. The intra-camera calibration parameters provide the relation between a 3-D image sensor and a sheet beam projector in the range finder. On the other hand, the inter-camera calibration parameters provide the relation among the calibration target and the range finders. The range finding method using a calibration cube enables to reconstruct a full 3-D model from range data measured in multiple directions. Figure 2.20 shows a photograph of a prototype 3-D image capture system using multiple range
finders. The distance between adjacent range finders is 1200 mm in this measurement setup.

### 2.7.2 3-D Model Reconstruction by Multiple Cameras

We obtained range data of a target object using two range finders as a preliminary test of the multiple camera system. Figure 2.21 presents a synthesized 3-D model which is reconstructed from range data measured in two different directions. Figure 2.21 (a) shows a target object. Two range finders provide two wireframes of a target object from the different view points as shown in Figure 2.21. The captured wireframes are calibrated in the world coordinate using 12 camera parameters and 8 projector parameters which are acquired for each range finder with a calibration target. The two wireframes are synthesized with a mean range error of 1.6 mm by the calibration method as shown in Figure 2.21.
2.8 Scale-Up Implementation

2.8.1 Design of 1024 × 768 3-D Image Sensor

We have designed a 1024 × 768 (XGA) 3-D image sensor using the proposed dynamic access technique as a scale-up implementation. The XGA 3-D image sensor has been fabricated in a 0.35 μm standard CMOS process. Figure 2.22 shows a block diagram of the 3-D image sensor. It consists of a pixel array, a row reset decoder, a row select decoder and a pixel value readout circuit with a column select decoder. Moreover, a position detector is implemented in the bottom part of the sensor, which consists of an adaptive threshold circuit and two priority address encoders. An intensity profile detector with column-parallel time-domain ADCs is implemented in the top part. The position detector of the bottom part is composed of two pipeline stages. The first stage is the adaptive threshold circuit and the edge detection circuit. It provides the left and right edge positions of consecutively active pixels to the next stage. The second stage is the priority encoders, which provide the addresses of the left and right edges. The edge positions detected by the second stage are masked, and then the next position of active pixels is encoded in the next cycle. The intensity profile detector in the top part has the column-parallel time-domain ADCs to acquire an 8-scale intensity profile.
Figure 2.22 Block diagram of the 1024 × 768 3-D image sensor.
of active pixels. The acquired intensity profile is selectively read out by the center position of active pixels, which is calculated by the results of the position detector.

Figure 2.23 shows the chip microphotograph, and Table 2.3 summarizes the chip specifications. The image sensor has 1024 × 768 pixels (XGA) in a 9.8 mm × 9.8 mm chip. The total number of transistors is 3.20M transistors. The pixel size is 8.4 μm × 8.4 μm with 29.0% fill factor.

2.8.2 Performance Evaluation

Figure 2.24 shows the range finding speed of the XGA 3-D image sensor estimated by a circuit simulation. The adaptive threshold circuit detects active pixels in 30.0 ns after a
Figure 2.24 Possible range finding rate of the XGA 3-D image sensor.

Figure 2.25 shows a possible range accuracy of the XGA 3-D image sensor in an ideal situation. A range accuracy of the light-section method depends on not only the pixel resolution but also the setup parameters, for example, a baseline distance between a camera and a beam source, a target distance, a view angle of camera and so on. In this simulation, the baseline distance is 300 mm, the target distance is 1100 mm, and the view angle is 20 degree. An intensity profile acquired by the time-domain ADCs can improve the range accuracy according to the number of scales as shown in Figure 2.25. The maximum range error is 0.36 mm at a distance of 1100 mm in a normal position detection without an intensity profile. Furthermore the range accuracy achieves less than 0.19 mm theoretically by using an 8-scale intensity profile provided by the time-domain ADCs.
Chapter 2  Real-Time and High-Resolution 3-D Image Sensors

2.8.3 Measurement Results

The XGA 3-D image sensor has been applied to a range finding system for a preliminary test. The measurement system is composed of a camera board with the sensor, a scanning mirror, a laser beam source of 300 mW and 665 nm wavelength, and a host computer. The host computer is equipped with digital parallel I/O boards of 2 MB/s for sensor control, an 8-bit A/D board for 2-D imaging, and a 12-bit D/A board for mirror scanning. The host computer controls the sensor and the sheet beam projector, acquires data from the sensor, and calculates 3-D position data. In the measurement setup, the viewing field of the camera is $400 \text{ mm} \times 300 \text{ mm}$ at a distance of 1100 mm. The baseline between the camera and the sheet beam projector is 300 mm.

Figure 2.26 (a) shows a measured 2-D image of a target scene with $1024 \times 768$ pixels. Figure 2.26 (b) is a range map reconstructed from the measured 3-D data. In the range map, brightness represents a distance from the camera. That is, the bright area is close to the camera and the dark area is far from the camera. A range finding system can be applied to various application fields. For example, object extraction is promptly realized by a range map as shown in Figure 2.26 (c). The object extraction method provides a depth-key system in stead of a chroma-key system. In the depth-key system, a blue-back screen is unnecessary.
Therefore it can be applied to a realistic synthesizing system of real images and computer graphics, which has been reported in [18]. Figure 2.27 shows another application of the 3-D imaging system. The light-section 3-D measurement with a high pixel resolution provides a precise wireframe model as shown in Figure 2.27 (a), which cannot be realized by the time-of-flight techniques [13]–[19] and the conventional light-section techniques [20]–[27]. A texture-mapped 3-D object is reconstructed by the wireframe model and the captured 2-D image as shown in Figure 2.27 (b).

### 2.8.4 Real-Time Range Finding

Figure 2.28 shows a measurement setup for real-time 3-D image capture with the XGA 3-D image sensor. The system controller is implemented in an FPGA (Altera FLEX10K200E) to achieve a high-speed system control and a high-speed data rate. The system controller
and Fast SCSI interface are mounted on a camera board as shown in Figure 2.28 (a). The system controller operates at a speed of 40 MHz. A laser beam source of 300 mW and 665 nm wavelength is placed at a distance of 150 mm from a camera board. A target distance is set to around 450 mm, and the measurable area is $144 \text{ mm} \times 110 \text{ mm}$ as shown in Figure 2.28 (b). Figure 2.29 shows measurement results of real-time 3-D image capture using the XGA 3-D image sensor. In the real-time 3-D image capture, a range map has $384 \times 240$ 3-D position data, and the system achieves 18.0 range maps/s. The limiting factor of resolution and range finding rate is data bandwidth between the camera module and a host computer via a Fast SCSI interface of 9.3 MB/s.

### 2.9 Ambient Light Suppression Techniques

#### 2.9.1 Concept of Ambient Light Suppression

The present dynamic access determines active pixels based on the pixel values, that is, it strongly depends on the incident light intensity. Therefore, the dynamic access technique requires a sufficiently strong laser beam for the active pixel detection as well as the conventional techniques based on the light-section method [20]–[25]. Figure 2.30 shows the active pixel detection of the high-speed dynamic access. Pixel values are determined by the total
Figure 2.28 Measurement setup for real-time 3-D image capture with XGA pixel resolution.

Figure 2.29 Measured 3-D images of a moving object using the XGA 3-D image sensor.
of an ambient light intensity, $E_{bg}$, and a laser beam intensity, $E_{sig}$. In the access technique, a threshold level is determined by the darkest intensity level. For example, Figure 2.30 (a) is the minimum detectable intensity of a projected laser beam since it has a potential to exceed the threshold level at a target surface illuminated by the darkest ambient light. Figure 2.30 (b) can be detected in case that a laser beam is projected at a bright part of a target surface. On the other hand, it becomes nondetectable at a dark part of a target surface as shown in Figure 2.30 (c). Many applications, however, generally require a low-intensity beam projection for eye safety and robust 3-D image capture.

Figure 2.31 shows a concept of ambient light suppression for the dynamic access technique. It is based on the inter-frame difference method, where the difference signals between two subsequent frames are used to detect the projected light. In the first frame access, a laser beam projection turns off, and an image sensor captures the ambient light level, $E_{bg}$. And then, each reset level is biased according to the ambient light level for the next frame. In the second frame, where a laser beam turns on, the ambient light level is canceled by the adaptive reset level. Therefore, all the intensity levels, as shown by (a) through (c) in Figure 2.31, become detectable.

A new circuit realization is, however, necessary for the ambient light suppression in the dynamic access technique because the pixel values are not directly provided by the access technique for high-speed active pixel detection. We propose two ambient light suppression techniques: a pixel-parallel suppression technique and a column-parallel suppression technique. The proposed pixel-parallel implementation is a simple way using in-pixel frame memories, but the pixel circuit becomes larger. On the other hand, the proposed column-parallel implementation, which employs a new reset level feedback circuit, efficiently sup-
Chapter 2  Real-Time and High-Resolution 3-D Image Sensors

Figure 2.31 Concept of ambient light suppression for the high-speed dynamic access technique.

presses a high-contrast ambient light, device fluctuations, and timing variations of a row access.

2.9.2  Pixel-Parallel Suppression Circuit

Figure 2.32 shows a pixel circuit configuration of the pixel-parallel ambient light suppression. The in-pixel correlation double sampling (CDS) circuit, which is reported in [79], usually operates for 2-D imaging as shown in Figure 2.33 (a). First, a voltage level of a photo diode, \( V_{pd} \), is reset by \( RST \). After photo current integration, \( \phi_2 \) initializes \( V_{sh} \) to \( V_{ini} \) at a sample and hold circuit. And then, \( V_{pd} \) is reset again for the next frame while \( \phi_1 \) turns on. Finally, the output voltage, \( V_{out} \), is obtained according to the signal level, \( V_{sig} \), when \( \phi_1 \) turns off. The pixel values are read out during photo current integration for the next frame. This operation is capable of reset noise suppression by the in-pixel CDS operation. On the other hand, the pixel circuit is also capable of ambient light suppression as shown in Figure 2.33 (b). In the first frame, a sheet beam projector turns off, and the pixel circuit acquires an ambient light level, \( V_{bg} \). \( V_{sh} \) is boosted from \( V_{ini} \) by \( V_{bst} \), which keeps up with \( V_{bg} \). After
that, a sheet beam projector turns on, and then the pixel receives the total level, \( V_{\text{sig}} \), of the ambient light and the projected beam. Finally, the pixel circuit provides the output level, \( V_{\text{out}} \). The output level represents the project beam intensity since \( V_{\text{sh}} \) has been boosted according to the ambient light level.

### 2.9.3 Feasibility Tests of Pixel-Parallel Suppression

We have designed a 176 × 144 (QCIF) 3-D image sensor with the pixel-parallel ambient light suppression in a 0.35 µm standard CMOS process. Figure 2.34 shows the chip microphotograph and the chip components. It consists of a pixel array with 176 × 144 pixels, a row reset decoder, a row select decoder, control signal drivers, the adaptive threshold circuit, the binary-tree priority encoder, analog readout circuits, column-parallel gain amplifiers, 8-bit ADCs shared by 8 columns, and output buffers. A pixel consists of a photo diode and 10 transistors including 2 MOS capacitors. A photo diode is formed by an n-well in a p-substrate. The fill factor is 22.0 %. The pixel layout is shown in Figure 2.34. The chip specifications are summarized in Table 2.4.

Figure 2.35 shows preliminary test results of the pixel-parallel ambient light suppression. Figure 2.35 (a) is a photograph of a camera module using the designed 3-D image sensor. Figure 2.35 (b) presents a captured 2-D image without ambient light suppression, that is, a
Figure 2.33  Timing diagram of pixel-parallel suppression circuit: (a) 2-D imaging mode, (b) 3-D imaging mode.

Figure 2.34  Chip microphotograph and pixel layout.
### Table 2.4 Chip specifications.

<table>
<thead>
<tr>
<th>Specification</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Process</td>
<td>2P3M 0.35 µm CMOS process</td>
</tr>
<tr>
<td>Die size</td>
<td>4.9 mm × 4.9 mm</td>
</tr>
<tr>
<td># pixels</td>
<td>176 × 144 pixels (QCIF)</td>
</tr>
<tr>
<td>Pixel size</td>
<td>12.8 µm × 12.8 µm</td>
</tr>
<tr>
<td># FETs/pixel</td>
<td>10 FETs (inc. 2 capacitors)</td>
</tr>
<tr>
<td>Fill factor</td>
<td>22.0 %</td>
</tr>
</tbody>
</table>

Figure 2.35 Preliminary tests of pixel-parallel ambient light suppression: (a) camera module, (b) 2-D image without ambient light suppression, (c) 2-D image with ambient light suppression.

normal 2-D image. Figure 2.35 (c) shows a captured 2-D image with ambient light suppression. A target scene illuminated by an ambient light is successfully suppressed. The target scene provides a full output range from 0 V to 2.1 V, and it is suppressed down to less than 100 mV by the ambient light suppression. We have successfully obtained the left and right edge positions of a pulsed spot beam by the high-speed dynamic access technique as shown in Figure 2.35 (d). The spot laser beam of 10 mW and 635 nm wavelength is modulated with a pulse frequency of 20 kHz. The 3-D image sensor is able to ignore a spot laser beam without a pulse modulation as shown in Figure 2.35 (e). Therefore, the 3-D image sensor is applicable to a light-section range finding system under a strong ambient light due to the pixel-level constant light suppression technique.
2.9.4 Column-Parallel Suppression Circuit

The pixel-parallel suppression technique is capable of ambient light suppression to detect a low-intensity projected beam. However, there are other factors which limit the detection sensitivity. One is select timing variations in the dynamic access technique among column lines as shown in Figure 2.36 (a). Figure 2.36 shows circuits and operations of the original high-speed dynamic access technique. The timing variations are caused by parasitic capacitances and resistances of a row select line. The other is device fluctuations of the readout transistors, which cause variations of the discharging speed as shown in Figure 2.36 (b). These variations make timing errors in the adaptive threshold circuit as shown in Figure 2.36 (c). These limiting factors are not suppressed by the pixel-parallel suppression technique. Furthermore, the pixel-parallel suppression technique requires a large pixel circuit. It becomes a critical problem to attain a high pixel resolution.

Figure 2.37 shows an error condition of the original high-speed dynamic access technique under a strong ambient light. Figure 2.37 (a) is simulation waveforms of column lines, $V_{\text{col}}$, from pixels with various ambient light levels, $E_{\text{bg}}$. Figure 2.37 (b) presents simulation waveforms of column lines, $V_{\text{col}}$, from pixels with a projected beam and various ambient light...
levels. In this simulation, the projected beam intensity is set to $25 \ E_o$ and ambient light levels are swept from $E_o$ to $20 \ E_o$. $E_o$ corresponds to 20 mV at a photo diode. The column outputs, $COL_j$, are generated by comparison between $V_{col}$ and a reference voltage. These transient timings are fluctuated by variations of the ambient light intensity as shown in Figure 2.37 (c) and (d). The threshold timing is determined by the earliest transient timing of $COL_j$, which is delayed by $\Delta T_{th}$ from the $COL_j$. In this case, the projected beam intensity is enough to ignore the variations of ambient light intensity. On the other hand, a projected beam becomes nondetectable in case that the projected beam intensity is insufficient. Figure 2.37 (e)–(h) are simulation waveforms with an insufficient beam intensity of $10 \ E_o$. In this case, the transient timings of active and inactive pixels are overlapped, and the active pixel detection is failed. Select timing variations and device fluctuations also make the similar error condition.

We propose a column-parallel ambient light suppression using adaptive reset level control as shown in Figure 2.38. In the column-parallel suppression technique, a pixel circuit is the same configuration as the original dynamic access technique, that is, it basically consists of a photo diode and three transistors. The column-parallel feedback circuits obtain the column
outputs at the sample timing of $SCK$ in the dynamic access operation. The sampled voltage levels, which represent the pixel values resulting from an ambient light, are used for the next reset levels. That is, the next reset levels, $V_{fb}$, are boosted from the initial reset level by the ambient light level. Therefore, the impact of an ambient light is suppressed in the next dynamic access, where a projected sheet beam turns on. It also has a capability of suppression of the select timing variations among column lines and the device fluctuations of the readout transistors.

### 2.9.5 Feasibility Tests of Column-Parallel Suppression

We have designed a $352 \times 288$ (CIF) 3-D image sensor with the column-parallel suppression technique in a 0.35 $\mu$m standard CMOS process. Figure 2.39 shows the chip layout and the components. It consists of a pixel array with $360 \times 296$ pixels, a row select decoder, a row reset decoder, the adaptive threshold circuit, the binary-tree priority address encoder, the column-parallel adaptive reset feedback circuits, a sample timing generator, and analog output buffers with a column select decoder. The number of effective pixels is $352 \times 288$. The die size is 4.9 mm $\times$ 4.9 mm. We have designed two pixel types of a standard structure.
Chapter 2  Real-Time and High-Resolution 3-D Image Sensors

and a high-sensitivity structure using a biased transistor as shown in Figure 2.40 and Figure 2.41, respectively. The standard structure consists of an n\textsuperscript{+}-dif/p-sub photo diode and three transistors with 29.0 % fill factor in 7.9 \( \mu \text{m} \times 7.9 \mu \text{m} \). The high-sensitivity structure consists of an n-well/p-sub photo diode and four transistors with 25.1 % fill factor in 7.9 \( \mu \text{m} \times 7.9 \mu \text{m} \). The chip specifications are summarized in Table 2.5.

Figure 2.42 shows simulation waveforms of the column-parallel suppression of ambient light levels. In the first dynamic access, where a projected sheet beam turns off, the column output timings of \( \text{COL}_j \) are varied according to the ambient light levels. The ambient light levels are swept from \( E_o \) to 20 \( E_o \), where \( E_o \) corresponds to 20 mV at a photo diode. In the second dynamic access, after the reset feedback by \( V_{fb} \), the output timings without an incident beam become congruent with each other. Furthermore, the output timings with an incident beam of 10 \( E_o \), which also include various ambient light levels from \( E_o \) to 20 \( E_o \), become congruent in the second dynamic access. Therefore, the column-parallel suppression technique enables to detect a low-intensity beam in various ambient light situations.

Figure 2.43 shows simulation waveforms of the column-parallel suppression of select timing variations. In the designed CIF 3-D image sensor, the select timing variations are 400 ps. The simulation results show that the select timing variations of 400 ps are suppressed within
Figure 2.40 Photo diode structure with an n\textsuperscript{+}-dif/p-sub photo diode.

Figure 2.41 Photo diode structure with a biased transistor and an n-well/p-sub photo diode.

Table 2.5 Chip specifications.

<p>| | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Process</td>
<td>2P3M 0.35 ( \mu \text{m} ) CMOS process</td>
</tr>
<tr>
<td>Die size</td>
<td>4.9 mm ( \times ) 4.9 mm</td>
</tr>
<tr>
<td># effective pixels</td>
<td>352 ( \times ) 288 pixels (CIF)</td>
</tr>
<tr>
<td>Pixel size</td>
<td>7.9 ( \mu \text{m} ) ( \times ) 7.9 ( \mu \text{m} )</td>
</tr>
<tr>
<td># FETs/pixel</td>
<td>3 FETs (n\textsuperscript{+}-dif/p-sub type)</td>
</tr>
<tr>
<td></td>
<td>4 FETs (n-well/p-sub type)</td>
</tr>
<tr>
<td>Fill factor</td>
<td>29.0 % (n\textsuperscript{+}-dif/p-sub type)</td>
</tr>
<tr>
<td></td>
<td>25.1 % (n-well/p-sub type)</td>
</tr>
</tbody>
</table>
Figure 2.42 Simulation results of column-parallel suppression of ambient light levels.

Figure 2.43 Simulation results of column-parallel suppression of select timing variations.

190 ps. In this case, the timing variations are enough small to ignore the impact on robustness of the dynamic access technique. The impact, however, becomes larger in a higher pixel resolution such as XGA. Therefore, the suppression of select timing variations is important and efficient for high-quality 3-D image capture.

Figure 2.44 shows simulation waveforms of the column-parallel suppression of device fluctuations. The output timings are also varied due to the device fluctuation of readout transistors. The discharging speed of the dynamic access varies by transistor characteristics with device fluctuations. In this simulation, SS represents a transistor model of a slow NMOS and a slow PMOS. TT represents a typical transistor model, and FF provides a fast NMOS.
Figure 2.44 Simulation results of column-parallel suppression of device fluctuations.

Figure 2.45 Measured waveforms of the column outputs: (a) without reset feedback, (b) with reset feedback.

and a fast PMOS. All the output timings, $COL_j$, are congruent with each other in the second access. The column-parallel suppression technique successfully reduces the timing variations resulting from transistor fluctuations.

Figure 2.45 shows measured waveforms of the column outputs, $V_{col}$, using partially shielded pixels embedded in the pixel array for functional tests. Four pixels are implemented in the pixel array with full open, two-thirds open, one-third open, and closed metal shields, respectively. Thus, the incident light levels into the photo diodes are different according to the aperture ratio. Figure 2.45 (a) shows the waveforms of $V_{col}$ in the first frame access. In the first frame access, the pixel values are read out by the original high-speed dynamic acc-
cess technique. Therefore the incident light variations cause the timing variations as shown in Figure 2.45 (a). In a measurement setup of 3-D image capture, the timing variations are caused by a high-contrast ambient light, device fluctuations of the readout transistors, and select timing variations. In the second frame access, that is after the reset feedback operation, the output timings of $V_{col}$ become congruent and the variations are suppressed. The column-parallel suppression technique makes the adaptive threshold level stable, and enables robust position detection of an incident sheet beam. Figure 2.46 shows a timing diagram of the column-parallel timing calibration. The feedback levels are provided to the pixels as reset levels for the next integration. A pulsed laser beam turns on during the second integration period without reset and select operations.

Figure 2.47 shows a measurement setup of the 3-D image sensor with column-parallel timing calibration. The system consists of a camera board with the 3-D image sensor, a scan mirror, a laser beam source, and a host PC. Figure 2.47 (a) and (b) show the front and back sides of the camera board, respectively. The camera board is composed of the 3-D image sensor with a lens, an 8-bit ADC for 2-D imaging, a 12-bit DAC for mirror scan, an FPGA for system control and data transmission, peripheral logics and analog circuits. The FPGA operates at 20 MHz. A SCSI interface is also implemented in the FPGA to communicate between the camera board and the host PC. Figure 2.47 (c) shows a photograph of the measurement setup. The distance between the camera and the bean scanner is 200 mm, and the distance of a target object is 1100 mm. The laser beam source has a power of 300 mW and a wavelength of 665 nm. Figure 2.47 (d) and (e) are a measured 2-D image and a range.
Figure 2.47 Measurement setup: (a) front side of the camera board, (b) back side of the camera board, (c) system overview, (d) a measured 2-D image, (e) a measured range map.

Figure 2.48 Reconstructed wireframes.
map of a target object. In the range map, the brightness represents the distance from the camera. Figure 2.48 shows an example of reconstructed wireframes measured by the 3-D image sensor.

2.10 Summary

We have proposed a high-speed dynamic frame access technique and circuit implementation for a real-time and high-resolution 3-D image sensor. The high-speed readout scheme realizes to make a standard and compact pixel circuit available and to get a location and an intensity profile of a projected sheet beam on the sensor plane quickly. The column-parallel position detector reduces redundant data transmission for a real-time measurement system. A $640 \times 480$ 3-D image sensor has been successfully demonstrated in a real-time and high-resolution range finding system. The maximum range finding speed is 65.1 range maps/s. The maximum range error is 0.87 mm and the standard deviation of error is 0.26 mm at 1200 mm distance due to a gravity center calculation with an intensity profile. We have shown a range finding system using multiple range finders for a full 3-D model capture. A scale-up version with $1024 \times 768$ pixels has been also developed.

Furthermore, we have proposed the pixel-parallel and column-parallel ambient light suppression techniques which are adapted to use in the proposed access technique. A $352 \times 288$ 3-D image sensor with column-parallel ambient light suppression has been presented. The proposed column-parallel suppression technique employs adaptive reset feedback circuits, and efficiently reduces a high-contrast ambient light, device fluctuations, and select timing variations. It realizes a high-speed 3-D image capture system using a low-intensity beam projection, and attains the robust dynamic frame access in a high-speed operation and a high pixel resolution.
Chapter 3

Row-Parallel Position Sensors for Ultra Fast Range Finding

3.1 Introduction

This chapter targets 1,000-fps range finding based on the light-section method for new applications of 3-D image capture. The ultra fast range finding provides a possibility of additional applications such as shape measurement of structural deformation and destruction, scientific observation of high-speed moving objects, quick inspection of industrial components, and fast visual feedback systems in robot vision. A 1,000-fps range finding system based on the light-section method requires very high frame rate for position detection of a projected sheet beam. For example, a 1000-fps range finding system with a practical pixel resolution such as $320 \times 240$ (QVGA) pixels requires over 300 kHz frame access rate since a range map is reconstructed from 320 frames of a scanning sheet beam in the QVGA pixel resolution. Such a very fast frame access rate is unrealizable even for the state-of-the-art smart position sensors [20]–[27] as well as high-speed 2-D image sensors [28], [29], since they have achieved a frame access rate less than 50 kHz at a maximum. Therefore, a new frame access architecture is necessary for the ultra fast range finding.

In Section 3.2, we present a new row-parallel active pixel search architecture. Section 3.3 shows circuit configurations and operations of the row-parallel active pixel search. Section 3.4 proposes a multi-sampling technique for sub-pixel position detection. Section 3.5 presents preliminary tests of a prototype position detector with $128 \times 16$ pixels, and discusses the potential capacity and the limiting factors. Section 3.6 shows design of a $375 \times 365$ ultra fast range finder. In Section 3.7, a system setup and measurement results are presented, and Section 3.8 summarizes this chapter.
3.2 Concept of Row-Parallel Position Detection

The conventional image sensors generally employ a raster scan method or a row-access scan method. The raster scan method sequentially accesses all the pixels for a few active pixels on the focal plane as shown in Figure 3.1 (a). The row-access scan method also needs to access all the pixel values. In the row-access image sensors such as [25]–[27] and [80], the active pixels in a row line can be scanned and detected in column parallel as shown in Figure 3.1 (b). Therefore, the row-access scan method is more suitable for high-speed position detection than the raster scan method. Figure 3.2 (a) shows a position detection flow of the row-access scan method. First, some pixels are activated by a strong incident beam. And then pixel values in a row line are read out. The active pixels are scanned and detected in column parallel. The left and right edge addresses of consecutively activated pixels are acquired. If another incident beam exists in the row line, the search and address encoding operations are

![Figure 3.1 Frame access methods: (a) raster scan, (b) row-access scan, (c) row-parallel scan.](image)
repeated. After that, the next row line is accessed and the pixel values are read out again. The access and search operations are repeated in proportion to the number of row lines of the sensor array. It becomes a bottleneck of the frame access rate. Therefore, the frame access rate is limited at around 50 kHz.

Figure 3.1 (c) shows the proposed row-parallel scan method on the focal plane. In the row-parallel scan method, active pixels in every row line are simultaneously scanned in row parallel. And then the addresses are acquired also in row parallel. Therefore, there is no access iteration in proportion to the pixel resolution as shown in Figure 3.2 (b). The present row-parallel architecture is implemented on the sensor plane as shown in Figure 3.3. The row-parallel search operation is carried out by a chained search circuit embedded in a pixel. Search signals are provided from the left part of the sensor. They propagate from a pixel to the next pixel one after another via the in-pixel search circuit in row parallel. And then the search propagation is interrupted at the active pixel in every row line. In terms of address acquisition, it is practically difficult to implement address encoders in every row since a regularly spaced array structure is necessary for an image sensor. If an address encoder is normally implemented in a pixel, it requires many transverse wires per row and a large circuit area per pixel. Therefore we propose a bit-streamed column address flow for row-parallel address acquisition with a compact circuit implementation. Column address streams are injected at the top part of the sensor in column parallel, and change their directions at pixels.
detected by the search circuits. The address acquisition scheme requires just one vertical wire per column and one transverse wire per row, so it is suitable for a high-resolution pixel array. A pixel consists of a photo detector, a 1-bit A/D converter, a search circuit and a part of address encoder. The proposed search procedure and circuit implementation are capable of faster position detection, higher scalability of pixel resolution, smaller pixel size, and fairly simple control than the conventional row-parallel structures such as [25] and [81].

3.3 Circuit Configurations and Operations

3.3.1 Pixel Circuit

Figure 3.4 shows a pixel circuit configuration with row-parallel position detection functions. It consists of a photo detector with a reset circuit, a 1-bit A/D converter with a data latch circuit, a pixel value readout circuit, a search mode switch circuit, a chained search circuit, and a part of address encoder. A voltage of $V_{pd}$ is set to a reset voltage of $V_{rst}$ by $RST$. The 1-bit A/D converter receives $V_{pd}$ and determines the pixel value. $V_{pd}$ becomes a low level in a case of an active pixel with a strong incident intensity. Therefore it provides ‘0’ for an active pixel value, and ‘1’ for an inactive pixel value. A transistor biased by $V_b$ contributes to reduce the short-circuit current and to control the threshold level of A/D conversion. The pixel value readout circuit provides a binary image for functional tests. The search mode switch circuit and the chained search circuit are devoted to a row-parallel active pixel search. The address encoding part connects a column address line with a row address.
3.3.2 Row-Parallel Chained Search Operation

The row-parallel search operation is carried out by the chained search circuit embedded in a pixel. It detects the left edge of consecutively activated pixels in each row. Figure 3.5 shows a timing diagram of the pixel circuit. Figure 3.6 shows a procedure of the row-parallel active pixel search. The search mode switch circuit, which is implemented by a pass-transistor XOR, provides a control signal, $CTR$, of the search circuit. For the left edge detection, $LSW$ is set to a high level and $RSW$ is set to a low level. As the result of pixel activation, the active pixel values are ‘0’ and the others are ‘1’ as shown in Figure 3.6 (a). A search signal, $SCH_0$, is provided to the left edge in each row line. It passes through inactive pixels one after another via in-pixel search circuits since the control signal, $CTR$, is a high level. The search signal propagation is interrupted at the first-encountered active pixel as shown in Figure 3.6 (b), that is, it detects the left edge of consecutively activated pixels. After row-parallel address acquisition, $LSW$ turns off and $RSW$ turns on. All the pixel values are inverted for the right edge detection as shown in Figure 3.6 (c). Namely the active pixel values change to ‘1’ and the interrupted search signal immediately starts again from the left edge. It passes through
active pixels one after another and then stops at the next pixel of the right edge.

The worst delay of the search operation is a signal propagation through all the pixels in a row line. Therefore the search clock cycle is determined by the worst-case delay. The center position of incident beam can be calculated by the left and right edge addresses. The number of search cycles are regardless of the number of consecutively activated pixels. If another active pixel exists on the same row, all the pixel values can be inverted again by LSW and RSW switching. The search operation restarts from the detected right edge to the next left edge. Therefore the row-parallel search operation has a capability of position detection for multiple incident beams by the search continuation. The last search signal, $SCH_n$, provides whether no active pixel exists in each row as a search completion signal.
3.3.3 Row-Parallel Address Acquisition

Figure 3.7 shows a row-parallel operation for address acquisition using a bit-streamed column address flow in a case of the left edge detection. A column address line is connected with a row address line by a part of address encoder in the detected pixel. The row-parallel address acquisition needs just two pass transistors in a pixel as shown in Figure 3.4. The two input signals are $SCH_i$ and $SCH_{i+1}$. At the detected left edge, $SCH_i$ from the previous pixel becomes a high level, but the next search signal $SCH_{i+1}$ is still a low level since the search signal stops. Therefore both of two inputs, $SCH_i$ and $SCH_{i+1}$, are set to a high level at the detected pixel. A bit-streamed address signal is provided from a column address line to a row address line via the two pass transistors. The column address streams never conflict with each other in the same row line since the left or right edge is detected by the row-parallel search in each row. The bit-streamed address signals are injected from the LSB to the MSB, and received by the row-parallel processors. The number of address acquisition cycles is a logarithmic order of the horizontal pixel resolution.
3.3.4 Row-Parallel Processing

The range-finding image sensor has row-parallel processors, which receive bit-streamed address signals, $ADD_j$, and search completion signals, $SCH_{375}$. Figure 3.8 shows a schematic of the row-parallel processor. It consists of a selector with a signal receiver, a full adder, 18-bit registers, 18-bit output buffers, and data readout circuits. The selector switches the processing functions, which are an address acquisition mode and an activation counting mode. Figure 3.9 shows a timing diagram of the row-parallel processor. A bit-streamed address signal is received by a low-threshold inverter because the address signal can not swing to the

---

**Figure 3.7** Bit-streamed column address flow for row-parallel address acquisition.

**Figure 3.8** Schematic of a row-parallel processor.
supply voltage due to the pass transistors in a pixel. In a multi-sampling operation, the row-parallel processor counts the number of usable pixel activations using the search completion signal since some search operations sometimes include no active pixel. The address acquisition mode and the activation counting mode are switched by $MLT$. The left edge address is stored in the registers. Then the right edge address is accumulated on the left edge address by $CK_r$ and $CK_w$ in a sequential order from the LSB to the MSB. $ENB$ is provided to disable an input of the full adder for carry accumulation in a multi-sampling operation. The accumulated address represents the center position of active pixels. The results are transferred to the output buffers by $TR$, and then they are read out by $SEL_k$ during the search operation for the next frame. The row-parallel processing is concurrently executed with the row-parallel address acquisition. The row-parallel processor has a capability of a multi-sampling operation by the high-speed position detection.

### 3.4 Multi-Sampling Position Detection

3-D range data are calculated by a beam projection angle of $\alpha_p$ and an incident angle of $\alpha_i$ as shown in Figure 3.10. The incident beam angle, $\alpha_i$, is provided from the incident beam position on the focal plane. Therefore the range resolution and accuracy depend on the resolution of position detection on the sensor. Sub-pixel resolution of position detection effi-
Chapter 3  Row-Parallel Position Sensors for Ultra Fast Range Finding

Figure 3.10 A triangulation-based light-section range finding system: (a) system configuration, (b) relation between a range accuracy and a beam position on the focal plane.

...ciently improves the range accuracy. A multi-sampling technique is implemented to acquire an intensity profile of the incident beam for a fine sub-pixel resolution.

In a multi-sampling method, all the pixel values are updated again and again during a photo integration. Pixels with a stronger incident intensity are activated faster and found many times in multiple samplings as shown in Figure 3.11. In the conventional single sampling mode, the acquired data are binary and so the calculated center position has a 0.5 sub-pixel resolution as shown in Figure 3.11 (a). On the other hand, the number of samplings represents the scales of intensity profile as shown in Figure 3.11 (b). Some scales provide a fine sub-pixel resolution of center position detection for range accuracy improvement. Figure 3.12 shows a theoretical estimation of the sub-pixel resolution as a function of the number of samplings. A Gaussian distribution is assumed as the beam intensity profile. The sub-pixel resolution is efficiently improved in 2 – 8 samplings. For example, a 4-sampling mode has a capability of a 0.2 sub-pixel resolution.

3.5 Preliminary Tests of 128 × 16 Position Detector

In this section, we present preliminary tests of a prototype position detector with 128 × 16 pixels to show the feasibility of the proposed row-parallel search operation and to discuss the potential capability in a higher pixel resolution. We introduce two operation modes of a reset-per-scan mode and a reset-per-frame mode, and discuss their advantages and drawbacks. In addition, we propose a fast range detection system with stereo range finders as one of applications using the prototype position detector.
Figure 3.11 Sub-pixel center position detection: (a) single-sampling method, (b) multi-sampling method.

Figure 3.12 Sub-pixel resolution as a function of the number of samplings.
Chapter 3  Row-Parallel Position Sensors for Ultra Fast Range Finding

Figure 3.13 Block diagram of a prototype position detector

Figure 3.14 Simplified row-parallel processors implemented in the prototype position detector.

3.5.1 Chip Implementation

Figure 3.13 shows a block diagram of a 128 × 16 prototype position detector using the proposed row-parallel search technique. It consists of a 128 × 16 pixel array, a column address generator, row-parallel processors with 32 bit SRAMs per row, and a memory controller.

The position detector is designed with row-parallel processors simplified for a single-sampling function as shown in Figure 3.14. A row-parallel processor consists of a latch sense amplifier to get a column address stream, a full adder, random access memories with a read/write circuit, output buffers for pipeline data readout, and some control logics. It receives the bit-serial-streamed addresses of \( x_i \) and \( x_j + 1 \) in row-parallel address acquisition when the left and right active pixels with a strong incident intensity are at \( x_i \) and \( x_j \). A latch
sense amplifier holds the bit-serial address of $x_i$ and stores it to 32-bit SRAMs if the search signal does not arrived, that is, an active pixel still exists in the row. On the other hand, a reserved address, 0, is stored in the SRAMs in a case of no active pixel in the row. It is interpreted as no active pixel in post handling. When the frontier position of the scanning laser beam is needed, the address data in SRAMs are transferred to output buffers and read out. To get the center position of active pixels for a standard range finding system, the next bit-serial address of $x_j + 1$ is accumulated on the left edge address of $x_i$ in row parallel before transferred and read out. The 32-bit SRAMs have a capability of four edge addresses of active pixels or four accumulated addresses of the left and right active edges. The preprocessing contributes to reduce data transmission and also realizes to get the positions of multiple sheet beams in one frame.

We have designed and fabricated the prototype position detector in a 0.35 $\mu$m standard CMOS process. Figure 3.15 shows a microphotograph of the fabricated chip. The pixel circuit has a photo diode and 18 transistors in 16.25 $\mu$m $\times$ 16.25 $\mu$m pixel area with 20.15 % fill factor. The position sensor occupies 2.5 mm $\times$ 0.3 mm. Table 3.1 summarizes the chip specifications.

---

**Figure 3.15** Chip microphotograph.

**Table 3.1** Chip specifications.

<table>
<thead>
<tr>
<th>Process</th>
<th>2P3M 0.35 $\mu$m CMOS process</th>
</tr>
</thead>
<tbody>
<tr>
<td>Sensor size</td>
<td>2.5 mm $\times$ 0.3 mm</td>
</tr>
<tr>
<td># pixels</td>
<td>128 $\times$ 16 pixels</td>
</tr>
<tr>
<td>Pixel size</td>
<td>16.25 $\mu$m $\times$ 16.25 $\mu$m</td>
</tr>
<tr>
<td># trans. / pixel</td>
<td>18 transistors</td>
</tr>
<tr>
<td>Fill factor</td>
<td>20.15 %</td>
</tr>
</tbody>
</table>
3.5.2 Limiting Factors of Frame Rate

A range finding system based on the light-section method is realizable by two ways of position detection, a reset-per-scan mode and a reset-per-frame mode. The pixels with high integration level resulting from a strong incident intensity are activated in the position detection modes.

In a reset-per-scan mode, the integration time of each pixel takes one scan interval after a reset operation. The activated frontier positions of the scanning beam are detected during the integration. Here the limiting factors of frame rate are the access rate for active pixels and the incident intensity of scanning beam. The frame rate of $f_{psd}$ is given by

$$f_{psd} = \frac{1}{\max(T_{acc}, T_{pa1})} = \min(f_{acc}, f_{pa1})$$

(3.1)

where $T_{acc}$ and $f_{acc}$ are the access time and rate for active pixels, and $T_{pa1}$ and $f_{pa1}$ are the pixel activation time and rate with a scanning beam as shown in Figure 3.16 (a). The access rate of $f_{acc}$ is determined by the search time for active pixels. The pixel activation rate of $f_{pa1}$ is associated with the integration time to exceed a threshold level after reset, and decided by the intensity of the scanning beam. The reset-per-scan mode has a possibility of a high frame rate by a short access interval though it needs a plenty strong incident intensity of a projected beam against ambient illumination. But then this mode is not applicable to some specific cases with multiple and complex-shaped target objects since we assume the projected beam is scanned in one direction from the left to the right on the sensor plane.

In a reset-per-frame mode, the integration time takes one frame interval with a reset operation. The operation of position detection is carried out after the integration with a reset.

Figure 3.16 Limiting factors of frame rate in a reset-per-frame mode and a reset-per-scan mode.
Figure 3.17 Simulated search time per frame for position detection of the fabricated chip.

operation. Thus the frame interval is the total of integration time and access time for active pixels. The frame rate of $f_{psd}$ is given by

$$f_{psd} = \frac{1}{T_{acc} + T_{pa2}} = \frac{f_{acc} \cdot f_{pa2}}{f_{acc} + f_{pa2}}$$

(3.2)

where $T_{pa2}$ and $f_{pa2}$ are the pixel activation time and rate with scanning beam in a reset-per-frame mode as shown in Figure 3.16 (a). The pixel activation rate of $f_{pa2}$ is determined by the intensity of the scanning beam in the same way as $f_{pa1}$. The sensitivity of the reset-per-frame mode is, however, lower than that of the reset-per-scan mode since the projected beam has an intensity profile with spatial distribution as shown in Figure 3.16 (b). The intensity of inactive pixels, which is under the threshold level of $E_{th}$, is wasted by a reset operation in the next frame of the reset-per-frame mode. The efficiency, $Q$, is given by

$$Q = \frac{E_{act}}{E_{all}}$$

(3.3)

where $E_{all}$ is the total intensity of the projected beam and $E_{act}$ is the total intensity at active pixels. Therefore the pixel activation rate of the reset-per-frame mode is lower than that of the reset-per-scan mode in the same situation as shown in Figure 3.16 (a). A high-speed access rate of $f_{acc}$ makes the frame rate faster though $f_{pa2}$ is dominant in a situation without a sufficient beam intensity. Differently from the reset-per-scan mode, the reset-per-frame mode can be applied to multiple and complex-shaped target objects since the location of a projected beam on the sensor plane is unrestricted due to a reset operation per frame.

### 3.5.3 Access Rate and Pixel Resolution

Figure 3.17 shows post-layout simulation results of a search time for the row-parallel position detection. The maximum propagation delay of a search signal is 71 ns, and the 7-bit address acquisition for 128 columns takes 140 ns. The total search time to get the position of the left edge is 216 ns per frame.
In a reset-per-frame mode, the frame interval, which is the total of the photo integration time and the search time for active pixels, is 30.2 $\mu$s where we assume the photo integration time is 30 $\mu$s. In a reset-per-scan mode, the search operation is repeated and the frontier positions of the scanning sheet beam are detected during the photo current integration. The frame interval is the same as the search time if we have a plenty strong intensity of a scanning beam. Figure 3.18 shows the relation between the row-pixel resolution and the search time of active pixels. Here we assume that the column-pixel resolution is the same as the row-pixel resolution (i.e. $N \times N$ pixel resolution) and the active pixels are laid in the same vertical line because it is the worst case due to the maximum capacitive load of the address line. A real-time range finding with 30 range maps/s and $1024 \times 1024$ pixels requires 32.5 $\mu$s search time. The present architecture achieves 918 ns search time per frame at a 1024-pixel horizontal resolution in a 0.35 $\mu$m CMOS process as shown in Figure 3.18. It achieves enough speed for not only real-time but also beyond-real-time range finding and visual feedback.

### 3.5.4 Fast Range Detection with Stereo Range Finders

We present a fast range detection system using stereo range finders as one of applications using the prototype position detector as shown in Figure 3.19. In the reset-per-scan mode,
the scanning sheet beam activates pixels from the right to the left on the sensor plane. Then, two position sensors detect the edge of the active pixels. The difference between \( x_R \) and \( x_L \) represents the distance from the position sensors. Here, the edge address of the left position sensor is \( x_L \), and that of the right one is \( x_R \). The light-section system usually uses a pair of one laser scanner and one sensor since the range data can be acquired by them using a triangulation principle. It is, however, difficult for a standard range finding configuration to realize a 1,000-fps range finding system because it requires very fast and accurate swing control of a beam scanner. The range detection system using stereo range finders is capable of ultra fast range finding without accurate beam scanning.

Figure 3.20 shows a principle of the fast range detection using stereo range finders. Two position sensors detect the positions of the beam reflection on the sensor planes, respectively. For example, we assume that the right position sensor detects it as \( e_1 \) at \( x = x_R \), and the left one detects it as \( e_2 \) at \( x = x_L \) when a target object is placed at \( p(x_p, y_p, z_p) \). \( \alpha_1 \) and \( \alpha_2 \) are given by the detected positions, \( x_R \) and \( x_L \), as follows:

\[
\tan \alpha_1 = -\frac{f}{x_R}, \tag{3.4}
\]

\[
\tan \alpha_2 = \frac{f}{x_L}. \tag{3.5}
\]

where \( f \) is a focal depth of cameras. Substituting \( \alpha_p \) and \( \alpha_l \) in Eq. (1.3) through Eq. (1.8)
with $\alpha_1$ and $\alpha_2$, $p(x_p, y_p, z_p)$ is obtained. From Eq.(3.4) and Eq.(3.5), $x_R - x_L$ is given by
\[ x_R - x_L = \frac{f(\tan \alpha_1 + \tan \alpha_2)}{\tan \alpha_1 \tan \alpha_2}. \tag{3.6} \]

Compared between Eq.(1.8) and Eq.(3.6), we obtain
\[ z_p = \frac{f \cdot d \cos \theta}{x_R - x_L}. \tag{3.7} \]

Figure 3.20 Principle of fast range detection using stereo range finders.

Rough range data can be calculated more simply for some applications of quick range detection such as collision prevention. From Eq.(3.7), $|z_p|$ is a monotone increasing function of $x_R - x_L$ as follows:
\[ |z_p| \propto \frac{1}{x_R - x_L}. \tag{3.8} \]

Therefore, the difference between two addresses represents the distance between the sensors and a target object. Thus, we can define a threshold level, $d_{th}$, for range detection, and we can quickly determine if an object is placed within $z_{th}$ or not as follows:
\[ x_R - x_L < d_{th} \quad \text{(near from the threshold),} \tag{3.9} \]
\[ x_R - x_L > d_{th} \quad \text{(far from the threshold),} \tag{3.10} \]

where we assume a target field angle of $y$-axis is narrow and $\cos \theta = 1$. The range threshold, $z_{th}$, for the range detection is given by
\[ z_{th} = \frac{d \cdot f}{d_{th}}. \tag{3.11} \]
The range detection system using stereo range finders is suitable for high-speed range finding applications such as collision prevention since it enables a high-speed scanning beam and a simple range calculation.

3.5.5 Measurement Results

A measurement setup of the prototype position detector has been developed as shown in Figure 3.21. It consists of the position detector on a test board, a scanning mirror with a laser beam source of 300 mW and 665 nm wavelength, an FPGA for system control, and a host PC. In this system, the position detector and a scanning mirror are controlled by the FPGA, and the acquired position data are transferred to a host PC after capturing. The FPGA was operated at 80 MHz due to the limitation of the testing equipment. In this case, the search time was 450 ns per frame and a photo integration time was 30 $\mu$s at $V_{rst}$ of 1.4 V. The search time is limited by the control speed of the FPGA in the measurement. To realize a 2-camera system for a high-speed 3-D imaging, the hardware cost becomes double for two position sensors. The computational effort of range calculation is almost the same since just the detected positions of the additional sensor is used for triangulation instead of a swing position of scanning mirror. The data transmission, however, becomes double if the range calculation is not carried out on the FPGA.
Figure 3.22 shows the measurement results of the present position detector. The positions of the left and right active pixels were acquired as shown in Figure 3.22 (a). That is, the projected sheet beam is located between these edges on the sensor plane. The position detector has a row-parallel processor to calculate the center position on the chip to reduce the data transmission.

Figure 3.22 (b) shows sequentially captured positions of a scanning sheet beam of 2 kHz by a reset-per-frame mode. Here the position detector provides the center address calculated by the row-parallel processor. The measurement result shows that the access rate of $f_{acc}$ is 2.22 MHz and the pixel activation rate of $f_{pad2}$ is 33.3 kHz. In the measurement, the center position of a projected beam is calculated on the sensor plane, so two search operations for the left and right active pixels are needed. A 256 effective pixel resolution is realized by the center calculation to improve the range accuracy. Here the frame interval takes 30.9 $\mu$s per frame, which includes 30.0 $\mu$s integration time. Thus the frame rate of $f_{psd}$ is 32.2 kHz.

Figure 3.22 (c) shows the frontier positions of a scanning sheet beam during a photo integration in a reset-per-scan mode. In the measurement situation, 2 kHz mirror scanning within the camera angle is limited by a scan drive of the galvanometer mirror. Though the frame interval of 4 $\mu$s is sufficient to get the position of 2 kHz scanning beam, this sensor achieves up to 2.22 MHz as the same as the access rate of $f_{acc}$. In this regard, the scan speed requires 17.4 kHz to get the full performance of the position sensor with a 128-pixel horizontal resolution. Therefore the frame rate of $f_{psd}$ could be limited by the pixel activation rate of $f_{pad1}$ if the intensity of a projected beam is insufficient. The pixel-activation rate of a reset-per-scan mode can be 233 kHz in the measurement system, where the efficiency $Q$ of Eq. (3.3) is about 1/7. That is, the possible frame rate of $f_{psd}$ with a 128-pixel horizontal resolution is 233 kHz. On the other hand, the measurement results also show that the position sensor achieves a frame rate of 2.22 MHz if we have an acceptable test equipment with a plenty strong projected beam and a higher-speed scanning mirror. To achieve the maximum frame rate of the present sensor, we need a high-power laser beam source with 2.5 W. It can be reduced by using a high-sensitive photo detector instead of the current photo detector in a standard digital CMOS process. The performance evaluation and comparisons are summarized in Table 3.2. The simulation and measurement results show that the proposed row-parallel search architecture has a potential capability of ultra fast range finding over 1,000 range maps/s with a high pixel resolution.
Chapter 3  Row-Parallel Position Sensors for Ultra Fast Range Finding

Figure 3.22 Measurement results.
receive the bit-streamed address signals and the search completion signals from the right provided from the address generators to each vertical wire. Then the bit-streamed address are connected with neighbor pixels by a search signal path. Column address streams are scanner at the left part, and a multiplexer at the bottom part. These components are controlled at the top part, row-parallel processors with data registers and output buffers at the right part, a row scanner at the left part, and a multiplexer at the bottom part. These components are controlled by an on-chip sensor controller with a phase locked loop (PLL) module. Pixels in a row line are connected with neighbor pixels by a search signal path. Column address streams are provided from the address generators to each vertical wire. Then the bit-streamed address signals are injected to horizontal wires at the detected pixels. The row-parallel processors receive the bit-streamed address signals and the search completion signals from the right pixels in each row.

### 3.6 Design of 375 × 365 Ultra Fast Range Finder

#### 3.6.1 Sensor Configuration

We have designed a 375 × 365 ultra fast range finder using the proposed row-parallel search architecture. Figure 3.23 shows an overview of the row-parallel scan image sensor simplified to 4 × 4 pixels. It consists of a pixel array, bit-streamed column address generators at the top part, row-parallel processors with data registers and output buffers at the right part, a row scanner at the left part, and a multiplexer at the bottom part. These components are controlled by an on-chip sensor controller with a phase locked loop (PLL) module. Pixels in a row line are connected with neighbor pixels by a search signal path. Column address streams are provided from the address generators to each vertical wire. Then the bit-streamed address signals are injected to horizontal wires at the detected pixels. The row-parallel processors receive the bit-streamed address signals and the search completion signals from the right pixels in each row.

#### 3.6.2 Chip Implementation

A 375 × 365 3-D range-finding image sensor using the present row-parallel architecture has been fabricated in a 0.18 μm standard CMOS process with 1-poly-Si 5-metal layers. The die size is 5.9 mm × 5.9 mm. Figure 3.24 shows the chip microphotograph and the pixel layout. The sensor consists of a 375 × 365 pixel array, a column-parallel address generator, and row-parallel processors with 18-bit registers and output buffers. A row scanner and a column multiplexer are also implemented to acquire a binary 2-D image for test. The row-

---

**Table 3.2 Measurement results and comparisons.**

<table>
<thead>
<tr>
<th></th>
<th># pixels</th>
<th>frame access rate</th>
<th>range maps/s (rps)</th>
<th>limiting factor</th>
</tr>
</thead>
<tbody>
<tr>
<td>The present prototype</td>
<td>128 × 16</td>
<td>32.2 kHz(1)</td>
<td>252 rps</td>
<td></td>
</tr>
<tr>
<td>– reset per frame</td>
<td>1024 × 1024</td>
<td>(31.4 kHz)(2)</td>
<td>30.6 rps</td>
<td></td>
</tr>
<tr>
<td>The present prototype</td>
<td>128 × 16</td>
<td>233 kHz(1)</td>
<td>1.74k rps</td>
<td></td>
</tr>
<tr>
<td>– reset per scan</td>
<td>128 × 16</td>
<td>2.22 MHz(1)</td>
<td>17.3k rps(4)</td>
<td></td>
</tr>
<tr>
<td>1024 × 1024</td>
<td>1.09 MHz(2)</td>
<td>1.06k rps</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Brajovic et al. [25]</td>
<td>32 × 64</td>
<td>6.4 kHz</td>
<td>100 rps</td>
<td></td>
</tr>
<tr>
<td>Sugiyama et al. [27]</td>
<td>160 × 120</td>
<td>3.3 kHz</td>
<td>15 rps</td>
<td></td>
</tr>
<tr>
<td>Required rate for real time</td>
<td>1024 × 1024</td>
<td>30.7 kHz</td>
<td>(for 30 rps)</td>
<td></td>
</tr>
</tbody>
</table>

(1) Measurement results with 2 kHz scanning beam of 300 mW.  
(2) Simulation results in parentheses.  
(3) Possible range finding rate with high-speed scanning mirror.  
(4) Possible range finding rate with strong beam intensity.
parallel operations are executed by an on-chip sensor controller with a phase locked loop (PLL) module. 3.74 M transistors are totally implemented. The supply voltage is 1.8 V. The pixel size is 11.25 $\mu m \times 11.25 \mu m$ with 22.8% fill factor. It consists of a photo diode and 24 transistors. The photo diode is composed of an $n^+$/diffusion and a p-substrate. It is split into several rectangular slices to improve the sensitivity since the present CMOS process has no option of silicide layer removal. Table 3.3 shows the chip specifications.

### 3.7 Measurement Results

#### 3.7.1 Frame Access Rate

The row-parallel position detection is pipelined in three stages on the sensor as shown in Figure 3.25. The first stage is a photocurrent integration for pixel activation. The second stage is a row-parallel operation of active pixel search and address acquisition. The last stage
Figure 3.24 Chip microphotograph and pixel layout.

Table 3.3 Chip specifications.

<table>
<thead>
<tr>
<th>Specification</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Process</td>
<td>1P5M 0.18 μm CMOS process</td>
</tr>
<tr>
<td>Die size</td>
<td>5.9 mm × 5.9 mm</td>
</tr>
<tr>
<td>Resolution</td>
<td>375 × 365 pixels</td>
</tr>
<tr>
<td>Pixel size</td>
<td>11.25 μm × 11.25 μm</td>
</tr>
<tr>
<td>Fill factor</td>
<td>22.8 %</td>
</tr>
<tr>
<td>Pixel configuration</td>
<td>1 PN-junction PD, 24 FETs / pixel</td>
</tr>
<tr>
<td>Total FETs</td>
<td>3.74 M transistors</td>
</tr>
</tbody>
</table>

is a data readout operation from output buffers. The photocurrent integration period is called a pixel activation time. It depends on the incident beam intensity and the sensitivity of a photo diode. That is, the pixel activation time can be controlled by the beam intensity. On the other hand, the access time is limited by a search operation with address acquisition or a data readout operation. Therefore our principal aim is to achieve a short access time for high-speed position detection.

Figure 3.26 shows a cycle time of each pipeline stage at a 400 MHz operation. The worst case of search signal propagation takes 90 ns. Thus the search path refresh and the search operations for the left and right edges need 90 ns, respectively. The row-parallel address
Figure 3.25 Pipeline operation diagram.

Figure 3.26 Cycle time of active pixel search and data readout.
acquisition takes less than 200 ns in the worst case. The worst case of address acquisition means that all the detected pixels are placed on the same column because the load capacitance of a column address generator becomes largest and limits the injection speed of the bit-streamed column address signals. The total cycle time of search and address acquisition is 670 ns. The limiting factor of access time is the digital readout stage from output buffers, which requires 2737.5 ns. Therefore the search and address acquisition can be repeated 4 times in the data readout period with keeping the frame access rate.

We have tested the maximum access rate of the designed sensor. The sensor has a function of user-specified pixel activation. The worst-case situation is set by an electrical pattern on the sensor plane. Figure 3.27 shows a data readout circuit and a test equipment for probing the output signals. Output buffers in each row are selected by $SEL_k$. The position results are read out by dynamic readout circuits precharged by $PRE$, and received by sense amplifiers synchronized with $SACK$. The reference voltage of $V_{ref}$ is set to 300 mV below the supply voltage. The output signals are probed with parasitic capacitances of $C_{IN}$ and $C_{PB}$, which are 7 pF and 13 pF, respectively. All the active pixels are set in the 374-th column as the worst-case situation. The expected results were successfully acquired up to a 432 MHz operation. Figure 3.28 shows measured waveforms of the worst-case frame access to an electrical test pattern at 432 MHz. The image sensor achieves a frame access rate of 394.5
Figure 3.28 Measured waveforms of the worst-case frame access to an electrical test pattern at 432 MHz.
Figure 3.29  Measured range accuracy: (a) single-sampling mode, (b) multi-sampling mode.

kHz, which corresponds to 1052 range maps/s with $375 \times 365$ range data. The data rate is 144 M bit/pin·sec in the maximum frame access rate.

### 3.7.2 Range Accuracy

Figure 3.29 shows the measured range accuracy at a target distance of around 600 mm. The X-axis means a target distance and the Y-axis means a measured distance. Figure 3.29 (a) shows the measured results in the conventional single sampling mode. The maximum range error is 2.78 mm and the standard deviation of error is 1.02 mm. The conventional single sampling mode achieves 0.46 % range accuracy by a 0.5 sub-pixel resolution. The
range error is typically-dominated by a pixel quantization error of position detection on the focal plane. Therefore the range error can be suppressed by a multi-sampling technique with 4 scales as shown in Figure 3.29 (b). The maximum range error is 1.10 mm and the standard deviation is 0.47 mm in the same situation. The multi-sampling mode achieves 0.18 % range accuracy, which corresponds to a 0.2 sub-pixel resolution.

The range accuracy can be suffered from a threshold fluctuation of pixel activation on the sensor plane. The peak-to-peak threshold fluctuation is about 150 mV including the reset voltage drop on the sensor, which is measured by binary 2-D images in various reset voltages. An intensity profile with 4 scales is, however, not fatally suffered from the fluctuation because the fluctuation has strong correlation with the location on the sensor and it is enough small to calculate the center position in a local area. The timing of pixel activation is separated from the search and address acquisition operations as shown in Figure 3.5. That is, the pixel activation is executed after the search path refresh and before the search signal propagation. Therefore the pixel activation is not affected by the crosstalk caused by digital signaling on the focal plane.

3.7.3 Ultra Fast Range Finding

Figure 3.30 shows a photograph of the present measurement setup. The baseline between a camera and a beam projector is set to 180 mm. The target distance is 600 mm and the target scene is $90 \times 90 \text{ mm}^2$. A 300 mW laser beam is expanded by a rod lens as a sheet beam with 5 mm width. The beam wavelength is 665 nm. Figure 3.31 shows an example of measured...
range images. The measured 3-D data are plotted on three-dimensional coordinates as a wireframe model (a) of a target object (b) in Figure 3.31. In the present measurement setup, the limiting factor of the range finding is the pixel activation time. And so the system requires a higher sensitive photo detector or a sharp and strong laser beam. Table 3.4 summarizes the chip performances.

Figure 3.31 Measurement result of range finding.

<table>
<thead>
<tr>
<th>Table 3.4 Chip performance.</th>
</tr>
</thead>
<tbody>
<tr>
<td>Supply voltage</td>
</tr>
<tr>
<td>Max. clock freq.</td>
</tr>
<tr>
<td>Frame access rate</td>
</tr>
<tr>
<td>Data rate</td>
</tr>
<tr>
<td>Range finding speed</td>
</tr>
<tr>
<td>Sub-pixel resolution</td>
</tr>
<tr>
<td>Range accuracy</td>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td>Power dissipation</td>
</tr>
</tbody>
</table>
3.8 Summary

We have proposed a row-parallel frame access architecture for a 1,000-fps range finder, which has many potential applications such as shape measurement of structural deformation and destruction, quick inspection of industrial components, scientific observation of high-speed moving objects, and fast visual feedback systems in robot vision. The row-parallel search operations are executed by a chained search circuit embedded in a pixel on the focal plane. The bit-streamed column address flow realizes row-parallel address acquisition with a compact circuit implementation. Moreover, a multi-sampling technique is available for range accuracy improvement.

We have shown the feasibility and the potential capability using a prototype position detector with $128 \times 16$ pixels. A $375 \times 365$ ultra fast range finder has been also designed and fabricated in a 1P5M $0.18 \mu m$ standard CMOS process. It achieves a high-speed frame access rate with multiple samplings. The maximum frame access rate is 394.5 kHz with 4 samplings, which is capable of 1052 range maps/s in case that the measurement setup has a plenty strong beam intensity. Then, it provides 1.10 mm range accuracy at a target distance of 600 mm. It has been improved up to a 0.2 sub-pixel resolution by the multi-sampling technique. The present techniques and circuits will open the way to the future applications which require extremely high-speed and high-accuracy 3-D image capture.
Chapter 4

High-Sensitive Demodulation Sensors for Robust Beam Detection

4.1 Introduction

This chapter describes a demodulation position sensor with efficient ambient light suppression for a robust range finding system. Particularly, some applications of 3-D image capture, such as a walking robot and a recognition system in vehicles, require both of availability in various background illumination and safe light projection for human eyes. The conventional image sensors and range finders detect a position of peak intensity on the sensor plane to acquire a position of a projected beam in a range finding system [20]–[27]. Therefore, these sensors require a strong beam projection when a target object is placed in a nonideal environment with a strong ambient light.

A possible method to realize suppression of the background illumination is an interframe difference method, where the difference signals between two subsequent frames are used to detect a projected light. This method has been also implemented in the proposed high-speed dynamic access as presented in Section 2.9, however it takes at least a frame interval time for the ambient light suppression. Color filters mounted on the sensors can suppress the background illumination and realize high-sensitivity photo detection. Sunlight, however, contains distributed wavelengths with strong intensity, so that the color filters are not enough for some applications. A high-sensitivity position sensor with a capability of electronic suppression of the background illumination is required in such situations.

A correlation technique, such as [82]–[84], is a possible solution to the problems. These correlation sensors can suppress the background illumination to obtain a high sensitivity. Its dynamic range, however, is limited by the linear difference circuit due to the voltage signal saturation. It is not applicable for a strong contrast image in an outdoor environment.
We have proposed a new sensing scheme for high-sensitivity and wide-dynamic-range photo detection which employs a logarithmic-response correlation circuit [85]. It has successfully overcome the saturation problem of [82]–[84] resulting from an ambient light. In this chapter, we propose a new circuit realization using a current-mode suppression circuit to improve the light detection sensitivity. Section 4.2 presents a concept of the demodulation sensing scheme and a pixel circuit realization using a current-mode suppression circuit. Section 4.3 describes sensor configurations and peripheral circuits. Section 4.4 shows design of a 120 × 110 position sensor using the demodulation sensing scheme. Section 4.5 presents performance evaluation and application to range finding. Finally, Section 4.6 summarizes this chapter.

4.2 Sensing Scheme and Circuit Realization

4.2.1 Demodulation Sensing Scheme

Figure 4.1 illustrates a sensing scheme for high-sensitivity and wide-dynamic-range photo detection. In the light-section range finding system, a laser beam modulated by a pulse generator is projected on a target object. The photo detector receives a reflection of the projected laser beam and the background illumination together. A photo current generated by the incident light is fed into a low-pass filter. An output current of the low-pass filter is subtracted from the original photo current. The subtraction is realized using a current-mode circuit instead of a voltage mode circuit in [85] to avoid saturation. The output current is alternating when the incident light includes a modulated light. A logarithmic-response
Figure 4.2 Pixel circuit implementation of the demodulation sensing

circuit limits the amplitude of current swing to avoid a saturation problem of a correlation circuit after the constant current suppression. The limited current swing is divided into two integrators by an external correlation signal. A marked difference voltage between the outputs of each integrator is acquired only when the incident light has the correlation frequency. The low-pass filter and the current-mode subtraction circuit realize the adaptive suppression of constant illumination. The logarithmic-response circuit and the correlation circuit are dedicated to wide-dynamic-range and high-sensitivity photo detection.

4.2.2 Pixel Circuit Realization

Figure 4.2 shows a pixel circuit implementation of the present demodulation sensing. The pixel consists of a photo diode, a current-mode suppression circuit with low-pass filters, a bias circuit for the low-pass filters, a logarithmic I-V converter, two integrators for correlation, and two source follower circuits for readout. The transistor size (W/L) is also shown by micrometers ($\mu$m) in Figure 4.2. The size of coupled or cascaded transistors is omitted in
A photo current of $I_{pd}$ is generated in proportion to the incident light intensity. The photo current is copied as a current of $\alpha I_{pd}$, where $\alpha$ is a gain of the current copier circuit. Its average current, $\alpha I_{avg}$, is generated by a low-pass filter and it is subtracted from $\alpha I_{pd}$. The low-pass filter consists of two biased transistors ($M_0$ and $M_1$) and two capacitors ($C_0$ and $C_1$). The biased transistors are used for a resistor of the low-pass filter, which are based on HRES (Horizontal RESistor) presented in [86]. A drain-source current, $I_{M0}$, of the transistor, $M_0$, is controlled by the gate voltage of $V_{g0}$. The bias circuit makes the gate-source voltage of $V_{q}$ constant in each pixel for constant resistance. The saturation current of the biased transistor, $M_0$, is half of the bias current of $I_b$ controlled by $V_r$.

Figure 4.3 shows a timing diagram of the pixel circuit operation. Here, $f_0$ is a correlation frequency. When the incident light includes a modulated light, the photo current, $I_{pd}$, has two components of a constant current of $I_{dc}$ by an ambient light and an alternating current of $I_{ac}$ by a modulated light.

$$I_{pd} = I_{dc} + I_{ac}. \quad (4.1)$$
The low-pass filter generates the average current, $\alpha I_{\text{avg}}$, as follows.

$$\alpha I_{\text{avg}} = \alpha I_{\text{pd}} = \alpha (I_{\text{dc}} + I_{\text{ac}}). \quad (4.2)$$

The constant current, $I_{\text{dc}}$, is adaptively suppressed by the current-mode suppression circuit. Here, a time constant of the low-pass filter is designed at 1.2 ms in a typical situation. It can be adjusted by the external bias voltage, $V_r$. The output current, $I_{\text{mod}}$, of the suppression circuit is given by

$$I_{\text{mod}} = \alpha I_{\text{pd}} - \alpha I_{\text{avg}} = \alpha (I_{\text{ac}} - I_{\text{ac}}). \quad (4.3)$$

The output current, $I_{\text{mod}}$, is converted to a voltage level of $V_{\text{mod}}$ by a logarithmic-response circuit.

$$V_{\text{mod}} = \beta \log(I_0 + I_{\text{mod}}), \quad (4.4)$$

where $\beta$ is a gain factor of the logarithmic-response circuit and $I_0$ is an offset current. The output is divided into two capacitors, $C_2$ and $C_3$, by the external signals, $MPY^+$ and $MPY^-$, synchronized with the correlation frequency. The voltages, $V_{mpy+}$ and $V_{mpy-}$, at $C_2$ and $C_3$ are read out as $V_{out+}$ and $V_{out-}$ by source follower circuits, respectively.

When the incident light contains only the background illumination, the photo current is constant and $I_{\text{mod}}$ is zero. In this case, the difference voltage between $V_{out+}$ and $V_{out-}$ is zero, and the pixel is recognized as an inactive pixel. On the other hand, the marked difference between $V_{out+}$ and $V_{out-}$ is acquired only when the incident light has the frequency synchronized with the correlation signal. The pixel is recognized as an active pixel when the difference voltage exceeds the reference voltage, $V_{\text{cmp}}$, as follows.

$$V_{out+} - V_{out-} \geq V_{\text{cmp}}. \quad (4.5)$$

### 4.3 Sensor Configurations

Figure 4.4 shows a sensor structure with the present photo detectors. It consists a pixel array, a row-select address decoder, row buffers of correlation signals, column-parallel subtraction circuits and comparators with a column-select decoder. Both of the output voltages, $V_{out+}$ and $V_{out-}$, are read out into the subtraction circuit. The difference voltage between $V_{out+}$ and $V_{out-}$ is compared with the reference voltage of $V_{\text{cmp}}$ at the column-parallel comparators. All pixels of a selected row are determined to be activated or not in parallel. Figure 4.4 also
shows its timing diagram. After a pixel is selected, its output voltages, $V_{out^+}$ and $V_{out^-}$, are sampled on each node of $C_{dif}$ by $\phi_1$. When $\phi_2$ turns on, a voltage of $V_+$ at a node of $C_{dif}$ is given by

$$V_+ = V_{out^+} - V_{out^-} + V_o,$$

where $V_o$ is an offset voltage for adjustment of the input range of the comparator. The reference voltage, $V_{cmp}$, of the comparator is given by

$$V_{cmp} = V_{ref} + V_o.$$

$V_+$ is compared with $V_{cmp}$ at a latch sense amplifier when $\phi_3$ turns on. A pixel is activated when the difference voltage exceeds the threshold voltage of $V_{ref}$. When the incident light of the selected pixel contains a modulated light synchronized with the correlation frequency, the difference voltage becomes large as shown in Case 1 of Figure 4.4. Alternatively, the
Chapter 4  High-Sensitive Demodulation Sensors for Robust Beam Detection

Variations in characteristics of two readout ways for $V_{out+}$ and $V_{out-}$ cause an offset between the output voltages, $V_+$ and $V_-$. And the comparator requires a large margin of the threshold level. That is, $V_{cmp}$ should become higher, and then a large difference voltage between $V_{out+}$ and $V_{out-}$ is required. It means that the variations are a possible reason to decrease the sensitivity of the present sensing scheme. It is suppressed by the threshold margin at column-parallel comparators to detected active pixels with a correlative incident light. On the other hand, the uniformity of the circuits over the array hardly influences the performance since the suppression of an ambient light and the correlation of a incident light are carried out in pixel parallel.

4.4 Chip Implementation

We had designed and fabricated a prototype chip with $16 \times 16$ photo detectors using a 0.6 $\mu$m standard CMOS process [87] for a preliminary test. And then, we have designed a $120 \times 110$ position sensor for robust beam detection based on the successful experiments of the prototype. Figure 4.5 shows a pixel layout of the designed position sensor. It consists of a photo diode, 43 transistors, including 4 MOS capacitors. Capacitance of $C_0$ and $C_1$, which...
are shown in Figure 4.2, is 370 fF, and that of \(C_2\) and \(C_3\) is 150 fF. The pixel area is 60 \(\mu m \times 60 \mu m\) with 13.5 \% fill factor. The photo diode is formed by an \(n^+\)-diffusion in a p-substrate. Figure 4.6 shows a chip microphotograph of the 120 \(\times\) 110 position sensor. The process technology is a standard 0.6 \(\mu m\) CMOS process with 2-poly-Si and 3-metal layers. The die size is 8.9 mm \(\times\) 8.9 mm. It consists of a pixel array of 120 \(\times\) 110 pixels, a row select decoder, control signal drivers for demodulation, column-parallel subtraction circuits, and column-parallel comparators. Table 4.1 summarizes the chip specifications.
4.5 Measurement Results

4.5.1 Measurement Setup and Preliminary Tests

For performance evaluation, a measurement setup has been constructed with a laser pointer with a 635 nm wavelength, a pulse generator for modulation, an LCD light projector for nonuniform background illumination, and a host computer as shown in Figure 4.7. Figure 4.8 shows a camera module with the 120 × 110 position sensor and a spot beam source with X-Y scanning mirrors.

Figure 4.9 shows a preliminary test of position detection for a low-intensity beam projection against strong and nonuniform background illumination. A modulated laser beam corresponding to 4 klx is projected on a target object. The maximum intensity of the background illumination is about 80 klx. In this measurement, the correlation frequency is set at 8 kHz and the correlation operation lasts 0.7 ms. A distance between the position sensor and the target object is about 600 mm. The position sensor clearly detects a position of the projected laser beam as shown in Figure 4.9. The light detection has a tolerance to not only nonuniform background illumination but also target colors. In this measurement setup, range
Figure 4.8 Photographs of the measurement setup: (a) a camera module with the position sensor; (b) a spot beam source with X-Y scanning mirrors.

Figure 4.9 High sensitive position detection in nonuniform background illumination.

data of a target object are acquired by triangulation using X-Y scanning of the spot laser beam.

4.5.2 Sensitivity and Dynamic Range

Figure 4.10 shows the relation between a background intensity, $E_{bg}$, and the minimum detectable intensity, $E_{sig_{min}}$, of a projected light. In this measurement for sensitivity and dynamic range, the modulation frequency is 1 kHz and the frame interval is 5 ms. To evaluate the sensitivity of the light detection, intensities of the projected light and the background illumination are measured by a photo current, $I_{pd}$, generated by each incident light. It is be-
cause the projected laser beam has only a 635 ns wavelength, which is relatively sensitive for the photo detector, and the background light contains distributed wavelengths. Illuminance corresponding to the background photo current is shown in the upper axis of Figure 4.10 as a reference.

The experimental results of the present sensor are shown by (a) in Figure 4.10. The present sensor enables to use a low-intensity projected light due to the suppression of an ambient light. The minimum SBR (Signal-to-Background Ratio), which stands for the sensitivity of the light detection, is -22.8 dB. SBR is defined as follows:

\[
SBR = 10 \log \frac{E_{\text{sig, min}}}{E_{\text{bg}}}. \tag{4.8}
\]

In addition, the high-sensitivity light detection is available without saturation in a wide range of background illumination. The high sensitivity under -18 dB SBR is achieved in more than 48 dB range of background illumination. For example, the projected light intensity can
be equivalent to $\sim 1.2 \times 10^3$ lx in outdoor environment, where the background intensity is $\sim 1.1 \times 10^5$ lx. It also can be equivalent to $\sim 22$ lx in a room, where the background intensity is $\sim 1.0 \times 10^3$ lx.

Figure 4.10 shows that the sensitivity becomes worse under low-level irradiance conditions due to a response speed and device mismatch of the current mirror, hence a higher intensity level of the projected beam is required to keep the correlation speed and S/N under the low-level irradiance conditions. The maximum dynamic range is limited by the test equipment. According to a circuit simulation, the limiting factor of dynamic range will be a saturation problem of the logarithmic-response photo detector. In other words, the reverse bias voltage at a photo diode becomes low due to a strong incident light so that the photo diode cannot get the photo current in proportion to the incident light.

For comparison, the capabilities of our previous work [85] and the conventional correlation sensors [82]–[84] are shown by (b) and (c) in Figure 4.10, respectively. The present position sensor is more applicable to a wide variety of applications than the conventional sensors due to the higher sensitivity and dynamic range. The high-sensitivity and wide-dynamic-range beam detection is achieved by a current-mode dc suppression circuit for saturation avoidance and a correlation circuit for small signal accumulation with a logarithmic-response circuit.

In this measurement, a noise level caused by various reasons such as transistor mismatch has been evaluated by the threshold adjustment of a column-parallel comparator under a constant incident illumination since the present sensor provides only a binary image based on correlation. The correlation output of the column-parallel subtract circuit is theoretically the same level of $V_o$, which is shown in Figure 4.4. That is, the noise level can be acquired by the threshold adjustment as the offset voltage from $V_o$. The average noise level of the present sensor was 42.3 mV. The standard deviation of the noise level was 15.7 mV. In the range finding, the threshold voltage is set to the total voltage of $V_o$, the average noise level, and a threshold margin. The noise fluctuation is suppressed by the threshold margin of 100 mV to detect the active pixels.

A range finding based on the light-section method generally suffers from reflectance variations of a target surface. However, the damage to the present system is less than the conventional systems since it keeps the signal-to-background ratio to detect the projected beam in wide range. That is, it is because the reflectivity variations often influence both the ambient light and the projected beam though it depends on the spectrum of their wavelengths.
4.5.3 Selectivity

The correlation technique suppresses another projected light with a modulation frequency of \( f_1 \), which is not equal to a correlation frequency of \( f_0 \). Figure 4.11 shows the difference voltage of the correlation outputs, which is \( V_{out+} - V_{out-} \), at various incident light frequencies. In this measurement, the correlation frequency of \( f_0 \) is set to 1 kHz and the frame interval is 5 ms. The measurement result shows that the suppression ratio is less than -7 dB. Particularly, the suppression ratio of even harmonics of \( f_0 \) is less than -13 dB. Thus the projected light of even-harmonics frequencies can be ideally separated in a multiple-light-projection system. Such a separation of concurrently projected lights is important for a triangulation-based range finding to reduce a dead angle, where an object is illuminated by multiple light sources from different directions.

4.5.4 Frame Rate

The present position sensor has a trade-off between the sensitivity and the frame rate. Figure 4.12 shows the relation between the correlation frequency and the sensitivity. The gain of correlation decreases by a high correlation frequency due to parasitic capacitances of a photo
Figure 4.12 Relation between the correlation frequency and the sensitivity.

diode. That is, a time constant of the photo diode and the logarithmic-response circuit is a limiting factor of the demodulation sensing technique. The present position sensor attains a correlation frequency of 10 kHz at -16 dB SBR, and the correlation interval is 0.5 ms in this situation. That is, a possible frame rate of the position sensor is 2000 fps at -16 dB SBR. The achievable frame rate at -22.8 dB, which is the minimum SBR of the present sensor, is 400 fps using a 2 kHz correlation frequency. And the frame rate at -18 dB SBR, which is available in 48 dB range of background illumination, is 1200 fps using a 6 kHz correlation frequency.

4.5.5 Range Finding Results

We have applied the 120 × 110 position sensor to a triangulation-based range finding system using a spot beam projection. Figure 4.13 shows the range accuracy of the range finding system. A target object is a flat panel, and it is placed at a distance from 1000 mm to 1100 mm. The maximum range error over the full area is 3.2 mm, and the standard deviation of range error is 0.89 mm. In an effective area of 110 × 100 pixels, the maximum measurement error is 1.5 mm, and a standard deviation of range error is 0.60 mm. The range finding system
103

Chapter 4  High-Sensitive Demodulation Sensors for Robust Beam Detection

Figure 4.13  Linearity of the measured range data.

Table 4.2  Performance specifications.

<table>
<thead>
<tr>
<th>Specification</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Power supply</td>
<td>5.0 V</td>
</tr>
<tr>
<td>Sensitivity (SBR)</td>
<td>-22.8 dB SBR</td>
</tr>
<tr>
<td>Dynamic range</td>
<td>&gt; 48 dB ( &lt; -18 dB SBR)</td>
</tr>
<tr>
<td>Selectivity</td>
<td>-13 dB suppression ratio</td>
</tr>
<tr>
<td></td>
<td>(for even harmonics of $f_0$)</td>
</tr>
<tr>
<td>Light detection rate</td>
<td>2000 fps (at -16 dB SBR)</td>
</tr>
<tr>
<td>Depth resolution</td>
<td>1.5 mm at 1000 mm</td>
</tr>
<tr>
<td>Power dissipation</td>
<td>250 mW</td>
</tr>
</tbody>
</table>

attains an accuracy of 0.3 % at a distance of 1000 mm.

120×110-point range data of a target object are acquired by X-Y scanning of a spot laser beam. In the condition of -13 dB SBR, range maps are acquired as shown in Figure 4.14 (a), (b) and (c). Brightness of the range map represents the distance from the range finder to the target. A wire frame of the target object (d) is reproduced from the range data as shown in Figure 4.14 (e). The performance specifications of the position sensor and the range finding system are summarized in Table 4.2. In the measurement system using a spot beam projection with X-Y scanning, the range finding takes about 66 seconds. It can be about
0.5 seconds due to few frames per range map in a case of a sheet beam projection with X scanning. In addition, the present sensor has a possibility of a higher range finding speed by means of a higher sensitive photo diode customized for image sensors since the correlation speed is limited by the photo diode using a standard CMOS process.

4.6 Summary

We have proposed a new sensing scheme of low-intensity beam detection for a robust range finding system. A correlation circuit and a current-mode suppression circuit of constant illumination realize high sensitivity, high selectivity, and availability in wide-range background illumination. A 120 × 110 position sensor for robust range finding has been designed and successfully tested. The position sensor achieves high-sensitive light detection of -18 dB SBR in 48 dB background illumination. It also realizes high selectivity to detect only a target beam in a high contrast ambient light due to -13 dB suppression of another incident light.
with even harmonics of a correlation frequency. We have discussed a trade-off between the
sensitivity and the frame rate, and presented the maximum frame rate of 2,000 fps at -16 dB
SBR. We have applied the position sensor to a triangulation-based range finding system. It
achieves a range accuracy with in 1.5 mm at a distance of 1000 mm. The present position
sensor has advantages to future application fields which require a safe light projection for
human eyes in various measurement environments.
Chapter 5

Extension of Demodulation Sensing

5.1 Introduction

This chapter describes a pixel-level color image sensor and a low-intensity ID beacon detector as extension of the demodulation sensing scheme.

In Section 5.2 through Section 5.5, we present a pixel-level color image sensor with efficient ambient light suppression using a modulated RGB flashlight. The image sensor employs bidirectional photocurrent integrators for pixel-level demodulation and ambient light suppression. The demodulation function contributes to avoid saturation from ambient illumination and to provide innate color information without false color and intensity loss of color filters. The demodulation function has a possibility of TOF range finding to realize depth-key object extraction. These features dedicate to support of image recognition in various imaging situations. Section 5.2 describes a concept of the color demodulation imaging. Section 5.3 presents circuit configuration of the color demodulation. Section 5.4 shows design of a prototype color demodulation imager with 64 × 64 pixels. The performance evaluation based on measurement results is discussed in Section 5.5.

In Section 5.6 through Section 5.10, we present a low-intensity ID beacon detector for augmented reality (AR) systems. AR systems are designed to provide an enhanced view of the real world with meaningful information from a computer. Our target AR system uses an optical device with ID beacon such as a blinking LED. The present ID beacon detector realizes analog readout for 2-D image capture and high-speed digital readout for ID beacon detection simultaneously. The pixel circuit has a logarithmic-response photo detector and an adaptive modulation amplifier to detect a low-intensity ID beacon in wide range of background illumination. Section 5.6 introduces an augmented reality system with active optical devices. Section 5.7 describes circuit configurations and operations of the proposed ID beacon detec-
Chapter 5  Extension of Demodulation Sensing

5.2 Concept of Color Demodulation Imaging

5.2.1 Target Applications

In recent years, image recognition systems have become important in applications such as security systems, intelligent transportation systems (ITS), factory automation, and robotics. Object extraction from a captured scene is important for such recognition systems. Object extraction generally requires huge computational effort, thus, it is desirable to extract target objects by flashlight decay [88] or time-of-flight (TOF) range finding [18], as shown in Figure 5.1. Color information is also useful for identifying a target object. However, it is difficult for a standard image sensor to acquire the innate color since the color imaging results are strongly affected by ambient illumination. Therefore, a function of ambient light suppression is efficient for image recognition.

Some image sensors with photocurrent demodulation, such as [82]–[85] and the proposed position sensor in Chapter 4, have been presented to suppress a constant light. The conventional techniques [82]–[84] have two photocurrent integrators. One accumulates a signal light

Figure 5.1 Preprocessing for image recognition.
and an ambient light together, and then the other accumulates only an ambient light. Therefore, its dynamic range is limited by the ambient light intensity. A logarithmic-response position sensor, which is presented in Chapter 4, expands the dynamic range due to adaptive ambient light suppression. The signal gain, however, changes with the incident light intensity, hence it is not suitable for capturing a scene image. We propose an imaging system configuration using a modulated flashlight and a demodulation image sensor for support of image recognition in various measurement situations. It is capable of providing innate color and depth information of a target object for color-based categorization and depth-key object extraction.

5.2.2 System Configuration

Figure 5.2 shows an imaging system configuration using a modulated RGB flashlight. The RGB flashlight contains three color projections, which are modulated by $\phi_R$, $\phi_G$, and $\phi_B$, respectively. The duty ratio is set to 25%. Each modulation phase is shifted 90 degrees. A photo detector receives the modulated lights, $E_R$, $E_G$, and $E_B$, from a target scene together with an ambient light, $E_{bg}$. An ambient light is provided from the sum, a fluorescent light, etc. Therefore, the ambient light intensity, $E_{bg}$, is constant or low frequency. A photocurrent,
$I_{pd}$, is generated in proportion to the incident intensity, $E_{total}$, as follows:

$$I_{pd} \propto E_{total} = \begin{cases} 
E_R + E_{bg}, & \text{if } t = nT \sim nT + \Delta T \\
E_G + E_{bg}, & \text{if } t = nT + \Delta T \sim nT + 2\Delta T \\
E_B + E_{bg}, & \text{if } t = nT + 2\Delta T \sim nT + 3\Delta T \\
E_{bg}, & \text{otherwise},
\end{cases}$$

(5.1)

where $T$ is a cycle time of modulation, $\Delta T$ is a pulse width of each flashlight, and $n$ is the number of modulation cycles in exposure time. The photo detector has four integrators with a demodulation function. $I_{pd}$ is accumulated in each integrator synchronized with $\phi_R$, $\phi_G$, and $\phi_B$. Then, all integrators subtract an ambient light level, $E_{bg}$, from the total level in a modulation cycle of $T$. The short-interval subtraction contributes to suppress the influence to color information by an ambient light. The color sensing has no intensity loss caused by color filters.

The flashlight imaging originally realizes rough range finding based on flashlight decay [88]. It is sometimes utilized for object extraction, however, the reliability comes under the influence of surface reflectance. Thus, it is difficult to identify multiple objects in a target scene. On the other hand, TOF range finding attains more efficient object extraction, which is called a depth-key technique [18]. A demodulation function is capable of TOF range finding as presented in [13] and [89], and the present system is also capable of depth-key object extraction.

### 5.2.3 Sensing Scheme with Ambient Light Suppression

The conventional demodulation sensors [82]–[84] have two photocurrent integrators as shown in Figure 5.3 (a). Photocurrents, $I_{sig}$ and $I_{bg}$, are generated by a modulated light, $E_{sig}$, and an ambient light, $E_{bg}$, respectively. While the flashlight projection turns on, the total photocurrent of $I_{sig}$ and $I_{bg}$ is accumulated in one of the photocurrent integrators as shown in Figure 5.4 (a). And then, the photocurrent, $I_{bg}$, is accumulated in the other photocurrent integrator while the flashlight projection turns off. The signal level, $V_{sig}$, is calculated from the accumulation results, $V_{sig+bg}$ and $V_{bg}$, after an exposure period.

$$V_{sig} = V_{sig+bg} - V_{bg} = \sum_{i=0}^{n} \frac{(I_{sig} + I_{bg}) \cdot \Delta T}{C_{pd}} - \sum_{i=0}^{n} \frac{I_{bg} \cdot \Delta T}{C_{pd}},$$

(5.2)
Figure 5.3 Photocurrent demodulation by two in-pixel integrators: (a) the conventional demodulation, (b) the proposed demodulation.

Figure 5.4 Timing diagram of photocurrent demodulation: (a) the conventional demodulation, (b) the proposed demodulation.
where \( C_{pd} \) is a parasitic capacitance of a photo diode. Therefore, the dynamic range of [82]–[84] is limited by a saturation level \( V_{sat} \) as follows:

\[
V_{sig+bg} < V_{sat}.
\]

(5.3)

The conventional techniques are easy to saturate the signal level owing to an ambient light.

On the other hand, the present sensing scheme suppresses an ambient light at short intervals during an exposure period as shown in Figure 5.3 (b) and Figure 5.4 (b). In a modulation cycle, the photocurrents, \( I_{sig} \) and \( I_{bg} \), are accumulated in each photocurrent integrator in the same way as the conventional sensing scheme. And then, the ambient light intensity is subtracted from the photocurrent integrators in every modulation cycle. Therefore, the signal level, \( V_{sig} \), is directly provided from a pixel output as follows:

\[
V_{sig} = \sum_{i=0}^{n} \left( \frac{(I_{sig} + I_{bg}) \cdot \Delta T}{C_{pd}} - \frac{I_{bg} \cdot \Delta T}{C_{pd}} \right).
\]

(5.4)

Thus, the dynamic range is given by

\[
V_{sig} < V_{sat}.
\]

(5.5)

In the present sensing scheme, a short demodulation cycle of \( T \) makes the dynamic range higher since it avoids the saturation caused by an ambient light. The other photocurrent integrator provides \( V_O \) as the offset level to cancel asymmetry of bidirectional integration.

### 5.3 Circuit Configurations of Color Demodulation

#### 5.3.1 Pixel-Level Color Demodulation

The present sensing scheme employs a bidirectional photocurrent integrator. It is implemented by discrete-time voltage integrators and a fully differential amplifier with bidirectional output drive as shown in Figure 5.5 (a). The gain of the fully differential amplifier is set to 1. In this implementation, a photo detector has two integrators. Thus, a full color pixel requires three photo detectors, which consist of three photo diode, three amplifiers, and six photocurrent integrators. In the present imaging system, a photo diode can be shared by the integrators as shown in Figure 5.5 (b) since three color projections are separately modulated as shown in Figure 5.5 (c). The pixel-level color demodulation reduces the circuit area for full color imaging. Furthermore, a captured color image has no false color due to the pixel-level imaging.
Figure 5.5 Pixel configuration: (a) two integrators per pixel, (b) pixel-level color demodulation with four integrators per pixel, (c) timing diagram of a projected RGB flashlight.

5.3.2 Pixel Circuit

Figure 5.6 shows a pixel circuit configuration and a pixel layout in a 0.35 μm CMOS process technology. It consists of a photo diode (PD), a fully differential amplifier, four integrators (Σ) with a demodulation function, and four source follower circuits. The gain of the fully differential amplifier is set to 1. The pixel size is 33.0 μm × 33.0 μm with 12.4 % fill factor.

Figure 5.7 shows a timing diagram of the pixel circuit. φrst initializes all photocurrent integrators. φpd resets Vpd at a photo diode. φp and φm switch between an accumulation mode and a subtraction mode. φs and φr perform a sample-and-hold operation for four integrators. φs, φg, φb, and φo make a photocurrent integrator active. In the reset period, all integrators are initialized by φrst, and Vpd at a photo diode is reset to Vrst by φpd. In the first ΔT, the photo detector accumulates the total photocurrent of IR and IBG in a photocurrent integrator, Σ1, since a projected flashlight contains a red light of ER. Then, it accumulates IG and IB together with IBG in Σ2 and Σ3 in the second and third ΔT, respectively, after Vpd has been reset again. Finally, IBG is accumulated in Σ4, and subtracted from all integrators in the fourth ΔT. The modulation cycle, T, is repeated during an exposure period. The pixel values, VR, VG, VB, and VO, are read out through the source follower circuits as the output signals, VRo.
\[ V_{mod} = \begin{cases} V_{pd++}, & \text{if } \phi_p = H \text{ and } \phi_m = L \\ V_{pd--}, & \text{if } \phi_p = L \text{ and } \phi_m = H \end{cases} \] (5.6)
The bidirectional integration is realized by switching two outputs of the fully differential amplifier, \( V_{pd+} \) and \( V_{pd-} \), as shown in Figure 5.8. They are given by

\[
\begin{align*}
V_{pd+} &= A_p \cdot \Delta V_{pd} - \Delta V_+ , \\
V_{pd-} &= - ( A_m \cdot \Delta V_{pd} - \Delta V_- ) , \\
\Delta V_{pd} &= \frac{I_{total} \cdot \Delta T}{C_{pd}} , \\
A_p &\approx A_m \approx 1 ,
\end{align*}
\]

where \( A_p \) and \( A_m \) are the gain of the fully differential amplifier in an accumulation mode and a subtraction mode, respectively. Both of them are set to 1, however, they are not exactly the same because of the device fluctuation. \( \Delta V_+ \) and \( \Delta V_- \) are the offset levels of \( V_{pd+} \) and \( V_{pd-} \) from the reference voltage \( V_{ref} \), respectively. \( I_{total} \) is a photocurrent generated by an incident light. From Eq. (5.4), we have

\[
V_{sig} = \sum_{i=0}^{n} \left( (A_p \cdot \Delta V_{sig+bg} - \Delta V_+) - (A_m \cdot \Delta V_{bg} - \Delta V_-) \right) ,
\]

\( \text{Figure 5.7 Timing diagram.} \)
considering the offset variations of bidirectional integration. $\Delta V_{\text{sig}+\text{bg}}$ and $\Delta V_{\text{bg}}$ are given by

\[ \Delta V_{\text{sig}+\text{bg}} = \frac{(I_{\text{sig}} + I_{\text{bg}}) \cdot \Delta T}{C_{pd}}, \]  
\[ \Delta V_{\text{bg}} = \frac{I_{\text{bg}} \cdot \Delta T}{C_{pd}}. \]  

Substituting Eq. (5.12) and Eq. (5.13) into Eq. (5.11) gives

\[ V_{\text{sig}} = V_{\text{out}} + \Delta V_{\text{gain}} + \Delta V_{\text{bias}}. \]  

$V_{\text{out}}$ is a signal level which is required for a color image. $\Delta V_{\text{gain}}$ is an offset level caused by the gain variations. $\Delta V_{\text{bias}}$ is an offset level caused by the bias fluctuations.

\[ V_{\text{out}} = \frac{A_p \cdot I_{\text{sig}} \cdot n \Delta T}{C_{pd}}, \]  
\[ \Delta V_{\text{gain}} = \frac{(A_p - A_m) \cdot I_{\text{bg}} \cdot n \Delta T}{C_{pd}}, \]  
\[ \Delta V_{\text{bias}} = -n(\Delta V_+ - \Delta V_-). \]
Chapter 5  Extension of Demodulation Sensing

Figure 5.9 Simulation waveforms of pixel-level demodulation: (a)–(d) the present sensing scheme, (e) the conventional sensing scheme.

On the other hand, the fourth integrator accumulates $I_{bg}$, and then it subtracts $I_{bg}$ from the accumulation. The output level, $V_O$, is given by

$$V_O = \sum_{i=0}^{n} \left( A_p \cdot \Delta V_{bg} - \Delta V_i \right) - \left( A_m \cdot \Delta V_{bg} - \Delta V_{-} \right)$$

Therefore, the significant signal level, $V_{out}$, is acquired as follows.

$$V_{out} = V_{sig} - V_O.$$  (5.19)

The fourth integrator contributes to suppress the asymmetry offset of bidirectional integration.

5.3.4 Simulation of Pixel-Level Demodulation

Figure 5.9 shows simulation waveforms of the pixel-level demodulation with efficient ambient light suppression. In the simulation condition, a photocurrent, $I_{bg}$, is set to 200 nA,
which is generated by an ambient light of $E_{bg}$. Signal photocurrents, $I_R$, $I_G$, and $I_B$, are set to 40 nA, 80 nA, and 120 nA, respectively, which are generated by a modulated RGB flashlight. A parasitic capacitance of a photo diode, $C_{pd}$, is 73 fF. A sampling capacitance, $C_s$, is 12 fF. An integration capacitance, $C_i$, is 17 fF. $\Delta T$ is set to 0.1 ms. A modulation cycle of 0.4 ms is repeated 25 times in exposure time.

The signal levels are acquired as $|V_R - V_O|$, $|V_G - V_O|$, and $|V_B - V_O|$ with suppressing an ambient light $E_{bg}$ as shown by (a)–(c) in Figure 5.9. $V_O$ is the output of the fourth integrator, and it means the asymmetry offset of bidirectional integration as shown by Eq. (5.18). The present sensing scheme avoids saturation from ambient light intensity, $E_{bg}$, as shown by Eq. (5.4). In the conventional sensing as shown by (e) in Figure 5.9, the signal level can be saturated by a strong ambient light intensity since the integrator accumulates $E_B$ and $E_{bg}$ together without suppressing $E_{bg}$ during an exposure period as shown by Eq. (5.2).

**5.4 Design of 64 × 64 Color Demodulation Imager**

We have designed and fabricated a prototype image sensor with $64 \times 64$ pixels in a 0.35 μm CMOS process. Figure 5.10 illustrates the sensor block diagram. The sensor consists of a $64 \times 64$ pixel array, a row select decoder, control signal drivers, column amplifiers with a column select decoder, a correlation double sampling (CDS) circuit, an offset canceller, an 8-bit

---

**Figure 5.10** Sensor block diagram.
Chapter 5  Extension of Demodulation Sensing

118

charge-distributed ADC, and a sensor controller. The CDS circuit suppresses a fixed pattern
noise caused by the column amplifiers. The offset canceller, which is shown in Figure 5.11,
subtracts a demodulation offset level, \( V_{Oo} \), from signal output voltages, \( V_{Ro} \), \( V_{Go} \), and \( V_{Bo} \).
The signal output voltages are sampled by \( \phi_{sub} \) at capacitors, \( C_{sub} \), and then \( V_{Oo} \) is subtracted
from them. \( V_{zero} \) is a bias level of the CDS circuit. A charge-distributed ADC, which is
shown in Figure 5.12, is designed for 8-bit analog-to-digital conversion. All components
are operated by an on-chip sensor controller. Figure 5.13 shows the chip microphotograph.
Specifications of the prototype image sensor are summarized in Table 5.1.
Chapter 5  Extension of Demodulation Sensing

Figure 5.13 Chip microphotograph.

Table 5.1 Specifications of the prototype image sensor.

<table>
<thead>
<tr>
<th>Specification</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Process</td>
<td>3-metal 2-poly-Si 0.35 μm CMOS</td>
</tr>
<tr>
<td>Die size</td>
<td>4.9 mm × 4.9 mm</td>
</tr>
<tr>
<td># of pixels</td>
<td>64 × 64 pixels</td>
</tr>
<tr>
<td>Pixel size</td>
<td>33.0 μm × 33.0 μm</td>
</tr>
<tr>
<td>Pixel config.</td>
<td>1 PD, 57 FETs and 5 capacitors</td>
</tr>
<tr>
<td>Fill factor</td>
<td>12.4 %</td>
</tr>
</tbody>
</table>

5.5 Measurement Results of Color Demodulation Imager

5.5.1 Efficient Ambient Light Suppression

Figure 5.14 shows measurement results of a signal output voltage, $|V_{Ro} - V_{Oo}|$, as a function of a modulated light intensity, $E_R$. A modulated light and a constant light are directly projected on the sensor plane using red LEDs of 630 nm wavelength. The modulated light has a modulation cycle of 0.2 ms and a pulse width of 0.05 ms. The exposure time is 10 ms. Figure 5.14 (a) shows a signal output voltage with no ambient light. In this case, the present demodulation technique has high linearity as is the conventional demodulation technique. On the other hand, Figure 5.14 (d) shows that the conventional technique saturates the signal level because of a strong ambient light which has 200 μW/cm² and 500 μW/cm², respectively. In these cases, the present demodulation technique efficiently avoids saturation and keeps high...
Figure 5.14 Output voltage vs. modulated light intensity $E_R$: (a) $E_{bg} = 0 \mu W/cm^2$, (b) $E_{bg} = 200 \mu W/cm^2$, (c) $E_{bg} = 500 \mu W/cm^2$, (d) conventional demodulation without efficient ambient light suppression.

The noise floor of the prototype image sensor is 15.6 mV_{p-p} and 3.4 mV_{rms}, which is measured by $|V_{Ro} - V_{Oo}|$ under a constant light. It contains the gain variations caused by integration capacitance fluctuations of $C_i$.

Figure 5.15 shows a saturation level of a modulated light intensity, $E_R$, as a function of an ambient light intensity, $E_{bg}$. Figure 5.15 (b) shows that the conventional technique is not suitable for various ambient light conditions since the saturation level is limited by the total level of $E_R$ and $E_{bg}$. On the other hand, the saturation level of the present technique is not limited by the total intensity as shown in Figure 5.15 (a) though it is slightly affected by an offset level, $V_O$, caused by asymmetry of bidirectional integration. Therefore, the present image sensor is capable of various measurement situations.

Figure 5.16 shows the reason why the saturation level decreases depending on an ambient light intensity in the present demodulation technique. Ideally, the offset level, $V_O$, is independent of $E_{bg}$. However, it contains an offset factor caused by the gain variations, $\Delta V_{gain}$, as shown by Eq. (5.16). $\Delta V_{gain}$ is proportional to an ambient light intensity. Thus, the saturation level of $V_{sig}$ in Eq. (5.14) decreases because of the asymmetry offset of bidirectional integration.
**Figure 5.15** Saturation level of $E_R$ vs. ambient light intensity $E_{bg}$: (a) measurement results of the present sensing scheme, (b) reference of the conventional sensing scheme.

**Figure 5.16** Offset voltage $V_{Oo}$ vs. ambient light intensity $E_{bg}$. 
5.5.2 Pixel-Level Color Imaging

We have demonstrated color imaging using the present image sensor and a modulated RGB flash light as shown in Figure 5.17. The prototype flashlight projector has 8 red LEDs, 8 green LEDs and 16 blue LEDs, whose wavelengths are 630 nm, 520 nm and 470 nm, respectively. The total power consumption is 474 mW. The flashlight and an ambient light of a fluorescent lamp provide around 500 lux and 120 lux, respectively, on a target scene at a distance of 30 cm from the sensor. Color image reconstruction requires the modulated flashlight intensity, the flashlight distribution on a target scene, and the spectral-response characteristics of the image sensor. In this measurement, we acquired the sensitivity of all pixels for the prototype flashlight projector by using a white board. It provides calibration parameters for non-uniformity of a modulated flashlight, spectral-response characteristics and sensitivity variations from integration capacitance fluctuations. A target scene is shown in Figure 5.17 (b), and a captured color image is shown in Figure 5.17 (c). It is reconstructed from the sensor outputs of Figure 5.17 (d)–(f). It has color information corresponding to $64 \times 64 \times 3$ pixels of a standard color imager since every pixel provides RGB colors.
5.5.3 Application to Time-of-Flight Range Finding

Figure 5.18 (a) shows a system configuration of TOF range finding. A pulsed light is reflected from a target object with a delay time of $T_d$ as shown in Figure 5.18 (b). The delay, $T_d$, resulting from a target distance, $L_o$, changes demodulation outputs, $V_1$ and $V_2$. Two photocurrent integrators, $\Sigma_1$ and $\Sigma_2$, are used for the demodulation. The target distance, $L_o$, is given by

$$L_o = \frac{c T_p}{2} \left( 1 - \frac{V_1}{V_1 + V_2} \right), \quad (5.20)$$

where $c$ is a light velocity and $T_p$ is a pulse width. From Eq. (5.20), the output voltages of $V_1$ and $V_2$ are expected as shown Figure 5.18 (c).

Figure 5.19 shows measurement results of TOF range finding. The measurement setup employs a 5-MHz pulsed laser beam for a spot projection since a field projection requires a strong flashlight intensity and a higher photo sensitivity. The laser beam source has 10 mW power and 665 nm wavelength. In the preliminary test, the present image sensor was operated at 40 MHz, and the TOF range finding was performed under no ambient light. The measured target range is between 60 cm and 120 cm from the sensor. The range offset is calibrated
at 90 cm, which mainly results from a delay of the pulsed modulation. The measured range error is within ±15 cm. A standard deviation of error is 7.3 cm. The preliminary test shows the feasibility of TOF range finding using the present image sensor.

5.6 ID Beacon Detector for Augmented Reality System

In recent years, our real world becomes closely tied to a computer world due to wide use of PDA and its network infrastructure. Then an augmented reality (AR) system becomes important as an interface between the real world and the computer world. In the AR system, the information of the computer world is attached to a view of the real world to support human activities. Some methods have been proposed for such an AR system up to now. In a visual tagging system [90], a 2-D barcode with ID is attached to a target object and captured by a barcode reader. An AR system using RF-ID tags [91] also requires an ID reader. Therefore it is difficult for these methods to get both the locations and IDs of some target objects. An AR system using optical devices with an ID beacon, such as [92] and [93], is a possible solution to the problem. It can get a scene image, locations and IDs of one or more target objects simultaneously as shown in Figure 5.20. It, however, limits a carrier speed of ID beacon.
due to a standard image sensor of 30 fps. Its data rate using 15 Hz carrier is not enough to identify a lot of moving objects. An AR system using a high-speed smart image sensor, which is presented in [26], achieves 120 bit/ID·sec data rate using 4 kHz carrier and packet transmission [94]. It corresponds to 8-bit ID detection in 15 fps. Yet it is not enough to identify various objects in the real world.

We propose a smart image sensor which is capable of high-speed and low-intensity ID beacon detection for a practical AR system. It employs a digital readout scheme and makes a high-speed carrier of ID beacon available to receive huge amounts of ID information in real time. In addition, a pixel circuit with a logarithmic-response photo detector and an adaptive modulation amplifier allows a low-intensity ID beacon detection for both indoor and outdoor applications. The adaptive sensing and high-speed readout schemes also contribute to an asynchronous system among a sensor and ID beacons.
5.7 Circuit Configurations of ID Beacon Detector

5.7.1 Pixel Circuit and Operation

Figure 5.21 shows a pixel circuit with an adaptive modulation amplifier and analog/digital readout circuits. An incident light generates $V_{pd}$ in logarithmic response to its intensity. The logarithmic-response photo detector contributes to avoid a saturation problem for wide range of background illumination and to keep asynchronous among a reset cycle and ID beacons. The analog signal $V_{pd}$ for 2-D image is read out by a source follower circuit via a column line, $value_{out}$. The log-response 2-D image is not high quality but enough and suitable for an AR system to recognize what kind of objects in a nonuniform contrast scene.

On the other hand, $V_{pd}$ is fed into an adaptive modulation amplifier. At the adaptive modulation amplifier, the average level of $V_{avg}$ is generated and subtracted from the original $V_{pd}$ for the low-intensity ID beacon detection in wide range of background illumination. The output swing of $V_{mod}$ is amplified again by a differential amplifier with the adaptive reference voltage of $V_{avg}$. At a code readout circuit with thresholding, $V_{pix}$ of a non-selected pixel is set to a low level. After a pixel is selected by $SEL2$, the voltage level of $V_{pix}$ is decided by compared with a bias voltage $V_{bn}$. A precharged line, $code_{out}$, is changed in accordance with $V_{pix}$. A column-parallel sense amplifier digitizes an ID-beacon signal of a selected pixel.
Figure 5.22 shows a timing diagram of the pixel circuits. In an AR system using active optical devices, an incident light can contain a beacon signal, $E_{sig}$, as well as background illumination, $E_{bg}$. We assume the background illumination is generally constant or low frequency below 100 Hz. When an incident light has a beacon signal, the pixel circuits amplify only the beacon signal and generate $V_{amp}$ due to adaptive constant-illumination suppression. The adaptive suppression requires the average level, $V_{avg}$, of $E_{sig} + E_{bg}$. Therefore 1-bit data of a target ID is coded using 2 cycles of carrier to keep 50% duty. That is, ‘01’ and ‘10’ represent ‘1’ and ‘0’, respectively. This coding is the same as [94] using a special image sensor [26], which detects only a positive edge of an incident level.
5.7.2 Analog and Digital Readout Circuits

To utilize a high-speed ID-beacon carrier, a high-speed frame readout is required. Column-parallel dynamic logics with a sense amplifier achieve high-speed sampling and digitization of $V_{\text{pix}}$ as shown in Figure 5.23. First, an output, $\text{code}_{\text{out}}$, is set to high level by $\text{PRE}$. Then the voltage level of $\text{code}_{\text{out}}$ is compared with $V_{\text{ref}}$ and digitized by a sense amplifier at a positive edge of $SCK$ shortly after a pixel is selected by $SEL2$. Finally the results of digital frame readout are transferred to output buffers by $OCK$ and sent to an off-chip decoder every 32 bits within the next readout cycle. The readout clock cycle achieves 200 MHz in a circuit simulation of a $128 \times 128$ prototype sensor. Supposing that the digital frame rate requires
four times as the carrier speed to sample asynchronous beacon data without fault, it utilizes a 100 kHz ID-beacon carrier. Figure 5.24 shows a timing diagram of the digital readout.

5.8 Design of \(128 \times 128\) ID Beacon Detector

5.8.1 Sensor Configuration

The present sensor consists of a pixel array with an adaptive modulation amplifier, two row-select decoders, source follower readout circuits with a column selector, column-parallel dynamic logics with a sense amplifier for digital readout, and a multiplexer with output buffers as shown in Figure 5.25. In a pixel, a low-intensity incident light from ID beacon is amplified by logarithmic-response and adaptive constant-illumination suppression to realize high-sensitivity beacon detection in wide range of background illumination. When the pixel is selected, the amplified beacon signals are digitized by a column-parallel dynamic logic with a sense amplifier and an in-pixel thresholding readout circuit. The digital readout scheme achieves high-speed beacon sampling and low-intensity beacon detection by a compact circuit implementation. In addition, the digital beacon readout operates independently of analog readout for 2-D image. A beacon decoder, an ADC for 2-D image, and a sensor controller in Figure 5.25 are implemented in an FPGA, not integrated in the present prototype sensor.
Chapter 5  Extension of Demodulation Sensing

5.8.2 Chip Implementation

We designed and fabricated a smart sensor using the present pixel circuit in a 0.35 μm CMOS process. Figure 5.26 shows a microphotograph of the smart image sensor. It has a 128 × 128 pixel array with independent analog/digital readout circuits. The pixel circuit occupies 26.0 μm × 26.0 μm with 13.4% fill factor. The pixel layout is also shown in Figure 5.26. The photo diode is formed by an n⁺-diffusion in a p-substrate. The in-pixel capacitance of \( C_0 \) in Figure 5.21 is 200 fF. The parameters of the fabricated sensor are summarized in Table 5.2. The power dissipation is 682 mW at a speed of 40MHz and a power supply of 4.2 V. The pixel circuits are more suitable for a high pixel resolution than the conventional special smart sensor [26] since the pixel size is about 1/4 of [26].
Figure 5.26 Chip microphotograph and pixel layout.

<table>
<thead>
<tr>
<th>Table 5.2 Parameters of the beacon detector</th>
</tr>
</thead>
<tbody>
<tr>
<td>Process</td>
</tr>
<tr>
<td>Chip size</td>
</tr>
<tr>
<td># pixels</td>
</tr>
<tr>
<td>Pixel size</td>
</tr>
<tr>
<td>Fill factor</td>
</tr>
<tr>
<td>Power Dissipation</td>
</tr>
</tbody>
</table>

5.9 System Setup for Augmented Reality

5.9.1 System Configuration

Figure 5.27 shows a measurement system of the fabricated sensor. It consists of the smart sensor with a lens, an external ADC, an FPGA and a host computer. The FPGA operates 40 MHz, which employs sensor control, ID decode and data transmission. A red LED of 620 nm wavelength is used for a target ID beacon. Figure 5.28 shows measured waveforms of $V_{pd}$, $V_{mod}$ and $V_{amp}$ in Figure 5.21 when a beacon carrier speed is 40 kHz. Our pixel circuits amplify $V_{pd}$ adaptively and generate $V_{amp}$ for digital readout.
5.9.2 Beacon Protocol

Figure 5.29 shows a coding method and a packet format in the ID beacon detection system. The present ID beacon detector requires a 50% duty ratio of a beacon signal for ambient light suppression, therefore we applied Manchester encoding to the packet format. That is, a beacon source transfers ‘01’ and ‘10’ to a smart image sensor as the ID signal of ‘1’ and ‘0’, respectively. Furthermore, a smart image sensor acquires a 40 kHz beacon carrier at a sampling frequency of 80 kHz. On this condition, a smart image sensor performs a high-
speed digital frame access of 80,000 frames/s. The conventional augmented reality system using a special smart sensor [94] also uses the Manchester encoding since the special smart sensor [26] detects only a rising edge of a beacon signal. Therefore, the present smart image sensor and system are also capable of asynchronous communication in the same way as [94]. In the present system, a packet consists of 4 bits for header information, 16 bits for data, and 2 bits for footer information as shown in Figure 5.29. For performance comparison, we use the same packet format and 3 packets/frame transmission as [94]. In the present system, a packet transmission and a scene image capture are asynchronously carried out.
Chapter 5  Extension of Demodulation Sensing

5.10 Measurement Results of ID Beacon Detector

5.10.1 Frame Rate with ID-Beacon Detection

The present smart image sensor has two frame rates of analog and digital readout as mentioned previously. The analog frame rate is 30 fps in the measurement system, which is limited by an external ADC. It is enough for a real-time AR system. If the pixel resolution becomes higher, it will require a high-speed ADC or a column-parallel ADC to keep 30 fps of 2-D image capture. The digital frame rate should be adapted to ID-beacon carrier. Therefore we set the frame rate to four times as the carrier speed in order to sample an ID beacon without fault. In the measurement system, an ID beacon using 40 kHz carrier was successfully sampled. We applied packet transmission to the measurement system for asynchronous ID-beacon sampling. A packet consists of 4-bit header, 16-bit coded data and 2-bit footer to transfer 8-bit data for ID. In addition, a packet sequence of ID beacon is repeated 3 times in one frame of AR images. This packet protocol is based on [94]. In this situation, the data bandwidth is 4850 bit/ID·sec, which provides 160-bit data for each target ID in 30 fps. The proposed scheme has more potential of high-speed sampling since it is limited by the sensor control speed by an FPGA and the photo sensitivity of a standard digital CMOS process. Figure 5.30 shows a reproduced image with ID information from a blinking LED. It has additional information of the target object as well as its ID number due to large capacity of bandwidth.

Figure 5.30 Reproduced image with ID information.
5.10.2 Sensitivity and Dynamic Range

Figure 5.31 shows the sensitivity and dynamic range of ID-beacon detection. The pixel circuit can detect a low-intensity incident swing of ID beacon in wide range of background illumination. The minimum detectable intensity of ID beacon is measured using TEGs of a pixel circuit. To evaluate the sensitivity of the photo detection, the ID-beacon intensity and the background intensity are normalized by the photo current, \(I_{pd}\), of each incident light. The illuminance corresponding to the background photo current is shown in Figure 5.31 (in the upper axis) for reference. We define \(10 \log \frac{E_{sig}}{E_{bg}}\) as SBR (Signal-to-Background Ratio), which stands for the sensitivity of beacon detection. High sensitivity below -10.0 dB SBR is achieved in wide range of 40 dB background illumination.

5.10.3 Performance Comparison

The performance comparison is summarized in Table 5.3. The AR system using a 30-fps CCD imager provides 0.2 AR images/s with 16 IDs/frame [93]. Even the state-of-the-art high-speed CMOS imager [29], which achieves 10k fps imaging, utilizes only 2.5 kHz beacon carrier. The AR system [94] using a special image sensor [26] allows 4 kHz beacon carrier.
carrier. It, however, has the capability to recognize only 8-bit IDs/frame in 15 fps. The present smart sensor utilizes 40 kHz carrier and recognizes 160-bit IDs/frame in 30 fps in the same situation. The large capacity of bandwidth has a potential to attach additional and meaningful information to an AR image from the target objects.

### 5.11 Summary

We have presented a pixel-level color image sensor with efficient ambient light suppression. Bidirectional photocurrent integrators realize pixel-level demodulation of a modulated RGB flashlight with suppressing an ambient light at short intervals during an exposure period. Therefore, it avoids saturation from ambient illumination to realize the applicability to non-ideal illumination conditions. Every pixel provides color information without false color and intensity loss of color filters. We have demonstrated the efficient ambient light suppression and the pixel-level color imaging using a 64 × 64 prototype image sensor. Moreover, TOF range finding with ±15 cm range accuracy has been performed to show the feasibility of depth-key object extraction. The measurement results show that the present sensing scheme and circuit implementation realize the support capability of innate color capture and object extraction for image recognition in various measurement situations.

Furthermore, we have presented a low-intensity beacon detector for augmented reality systems. A 128 × 128 prototype beacon detector achieves 30-fps scene capture, 4850 bit/ID-sec using 40 kHz carrier, and less than -10.0 dB signal-to-background ratio (SBR) in more than 40 dB background illumination for a high-speed and robust AR system with active optical devices. It enables to get a scene image, locations, IDs and additional information of multiple target objects simultaneously in real time. These features realize a robust augmented reality system in various scene conditions.

---

(1) The sensor is fabricated using a 0.18 µm process.
(2) The sensors are fabricated using a 0.35 µm process.
Chapter 6

Digital Associative Engine for Hamming Distance Search

6.1 Introduction

This chapter proposes a high-speed digital associative engine based on Hamming distance. An associative engine efficiently realizes data compression, pattern recognition, multi-media and intelligent processing, which require huge amounts of memory access and data processing time. Content addressable memories (CAMs) have been developed to reduce them as reported in [62]–[66], however they are capable of detecting only complete match data. Therefore, some associative memories have been proposed for quick nearest match detection [67]–[72]. These associative memories employing analog circuit techniques attain quick nearest match detection with compact circuit implementations. On the other hand, they have difficulties to operate with faultless precision in a deep sub-micron (DSM) process and a low voltage supply. Moreover, the feasible capacity is limited by the analog operation. Therefore, they are not suitable for a large data capacity and a system-on-chip VLSI in DSM process technologies. An associative engine is also efficient for high-speed 3-D data processing, thus a high-speed and scalable associative engine is desired for the 3-D image capture.

The proposed associative engine has three principal advantages as follows.

1. The first advantage is high-speed search in a large database due to a hierarchical search architecture. The search time of our method is limited by $O(\sqrt{N})$ or $O(\log M)$ at $N$-bit $M$-word data capacity. In addition, it has no limitation of the number of data patterns $M$, the bit length $N$ and the search distance theoretically.

2. The second advantage is a capability of a low-voltage operation in DSM. The circuit implementation has tolerance for device fluctuations in DSM and allows a low-voltage
operation under 1.0V, which is difficult for the conventional analog approaches.

3. The third advantage is additional functions for associative processing. The synchronous search logic embedded in a memory cell provides data addresses with the exact Hamming or Manhattan distance in order of the distance. Therefore it realizes high-speed data sorting in addition to nearest match detection for the conventional use. We have designed a 64-bit 32-word associative engine using a 1P5M 0.18 µm CMOS process and successfully demonstrated the high-speed distance estimation and the low-voltage operation with faultless precision.

Section 6.2 introduces a concept of the proposed digital associative computation. Section 6.3 proposes circuit configurations and operations of the digital associative engine. Section 6.4 shows design of the digital associative engine with 64 bit × 32 word memories. In Section 6.5, measurement results and potential capability are discussed. Finally, Section 6.6 summarizes this chapter.

6.2 Concept of Digital Hamming Distance Search

6.2.1 Basic Search Operation

We propose a logic-in-memory architecture using a search signal propagation via chained search circuits in word parallel. Figure 6.1 shows the basic operation of Hamming-distance (HD) estimation without hierarchical search. The operation includes a data comparison, a search signal propagation and a mismatch masking. First the input data string (Data A) is compared with each template data (Data B) using an XOR gate in bit parallel. In Figure 6.1, a match/mismatch bit provides 1/0 as the XOR result, respectively. Then search signals (SS) are injected to each LSB of the template data. A search circuit embedded in a memory cell leads the search signals to pass through a match bit and to stop at the first-encountered mismatch bit. Therefore the complete match data (i.e. HD = 0) are detected in the first clock period since the search signal is provided from the MSB. In the next clock period, the first-encountered mismatch bit is masked simultaneously in each word and the search signals restart propagating to the next mismatch bit. Thus, the data of HD = 1 are detected. After this manner, the data of HD = n are detected in the n-th clock period as shown in Figure 6.1. The search operation can detect not only the nearest match data but also all data in the sorted order of Hamming distance in synchronization with the clock cycle.
Chapter 6  Digital Associative Engine for Hamming Distance Search

### 6.2.2 Word-Parallel and Hierarchical Search Structure

The basic search time is limited by the search signal propagation via chained search circuits. Thus it is linearly-related to the data length due to a ripple-mode search structure. Figure 6.2 shows a hierarchical structure of the search signal propagation for high-speed Hamming-distance estimation in a large input number. The template data are divided into some blocks. Search signals (SS) are injected to all blocks simultaneously. The search path is connected to a hierarchical node (HN), which provides permission signals (PS) to the next block and hierarchical node. The permission signal makes a mismatch bit maskable.

Figure 6.3 shows an operation diagram of the word-parallel and hierarchical search in a case of HD = 2. At the first clock period, the search signals injected to all blocks start propagating through match bits in each block in the same way of the basic operation. Some propagations are interrupted at the first-encountered mismatch bit in each block. The others pass to the hierarchical nodes, and update the permission signals for the next block and hierarchical node as shown by the clock period 0 in Figure 6.3. In this period, the data of HD = 0 are detected since the search signal has no interruption and it is provided from the last hierarchical node. At the next clock period, only one mismatch bit in each word is masked, which interrupts the search signal propagation and receives a permission signal from the previous hierarchical node. The search signal restarts from the masked bit and updates permission signals again. Note that the Hamming distance of the data is represented by the operated clock cycles at the time of detecting the search signal from the last hierarchical
Chapter 6  Digital Associative Engine for Hamming Distance Search

Figure 6.2 Hierarchical structure: (a) search signal path, (b) permission signal path.

Figure 6.3 Operation diagram of hierarchical search.
node. For example, the data of \( \text{HD} = 2 \) are detected in the clock period 2 as shown in Figure 6.3. In the present architecture, the critical path is the search signal path of one block and the hierarchical bypass line. The search time has similar characteristics of a carry-bypass adder so that it is applicable to a large database.

### 6.2.3 Manhattan-Distance Evaluation Using Thermometer Encoding

All associative memories with Hamming-distance estimation can deal with Manhattan-distance estimation using thermometer encoding as reported in [71]. Figure 6.4 shows an example of the thermometer encoding. A 3-bit binary code can be translated to a 7-bit thermometer code. In general, \( k \) bit binary data are translated to \( 2^k - 1 \) bit data using the thermometer encoding. The present architecture also estimates Manhattan distance between Data A and Data B in the same way of Hamming-distance estimation as shown in Figure 6.4. The hardware reusability for a wide variety of applications is important as an associative engine. A larger data capacity is necessary for the thermometer encoding than the normal binary encoding. However fully parallel Manhattan-distance estimation using the normal binary encoding requires complicated circuits in a memory cell for absolute difference calculation. Therefore the needful hardware area using the present architecture and the thermometer encoding can be smaller in many practical cases.

<table>
<thead>
<tr>
<th>bin.</th>
<th>thermo code</th>
<th>MD = 3</th>
</tr>
</thead>
<tbody>
<tr>
<td>000</td>
<td>0000000</td>
<td>Data A (6): 0111111</td>
</tr>
<tr>
<td>001</td>
<td>0000001</td>
<td>Data B (3): 0000111</td>
</tr>
<tr>
<td>010</td>
<td>0000011</td>
<td>A ( \oplus ) B 0111000</td>
</tr>
<tr>
<td>011</td>
<td>0000111</td>
<td>Detected at clock period 3</td>
</tr>
<tr>
<td>100</td>
<td>0001111</td>
<td>( \downarrow )</td>
</tr>
<tr>
<td>101</td>
<td>0011111</td>
<td>( \downarrow )</td>
</tr>
<tr>
<td>110</td>
<td>0111111</td>
<td></td>
</tr>
<tr>
<td>111</td>
<td>1111111</td>
<td>Estimation result = 3</td>
</tr>
</tbody>
</table>

**Figure 6.4** Manhattan-distance estimation using thermometer encoding.
6.3 Circuit Configuration

6.3.1 Logic-in-Memory Search Circuit

Figure 6.5 and Figure 6.6 show a schematic and a timing diagram of the associative memory cell implemented by static circuits. It is composed of a SRAM cell, an XOR/XNOR circuit for comparison with the input data, and a search circuit for signal propagation and masking. Even-numbered and odd-numbered search circuits are complementary in order to reduce the critical path and the circuit area. All search paths are swept by setting the search signal (SS) to 0. Then all mask registers are initialized before the search operation starts. In a match bit, the search signal passes to the next bit since the comparison result (M) is true. In a mismatch bit, the search signal stops and waits for the next clock (φ). A false result of M is masked by the next clock and the search signal restarts from the masked cell where both the search signal (SS) and the permission signal (PS) are true. Therefore all data are detected in order of Hamming or Manhattan distance (D) in word parallel as shown in Figure 6.6. In the circuit implementation, a permission signal is also used for a search signal from a hierarchical node to the next hierarchical node. The static circuit implementation realizes a low-voltage operation and high tolerance for device fluctuations though it occupies a large circuit area.
Figure 6.6 Timing diagram of search circuit.

Figure 6.7 shows another implementation using dynamic circuits to save a search circuit area for a large capacity. All search circuits are precharged by $\phi_1$ before the search operation. A mismatch bit is masked by $\phi_2$ in the same way of the static circuit implementation. The dynamic circuit implementation realizes a small cell area and a large data capacity, however it has less tolerance for power supply noise, cross-talk noise and leakage current especially in a low-voltage operation. Therefore the static circuit implementation is better for SoC applications in DSM process technologies if the area constraint is satisfied.

### 6.3.2 Priority Address Encoder

Figure 6.8 (a) shows a detected data selector, which masks a search output in order to acquire another data of the same distance continuously. The detected data address is acquired by the next priority encoder stage as shown in Figure 6.8 (b). It consists of a priority decision circuit and an address encoder. The detected data selector masks a search output (SO) by the priority decision output (PO). The binary-tree priority encoder realizes a small area and quick address encoding with $O(\log M)$ delay time for $M$-word capacity.
Figure 6.7 Dynamic circuit implementation of the associative memory cell: (a) odd-numbered cell, (b) even-numbered cell.

Figure 6.8 Schematics of: (a) detected data selector, (b) binary-tree priority encoder.
6.4 Chip Implementation

We have designed and fabricated a 64-bit 32-word associative engine using the present architecture and the static circuit implementation in a 1P5M 0.18 μm CMOS process. Figure 6.9 illustrates a block diagram of the associative engine and Figure 6.10 shows the chip microphotograph. The associative engine is composed of a 64-bit 32-word associative memory array, a memory read/write circuit with data buffers, a word address decoder, and a 32-input priority encoder with detected data selectors. A two-stage hierarchical structure is implemented as shown in Figure 6.9 (b). A hierarchical node is realized by a 2-input AND gate. In the 2-stage hierarchical structure, the number of hierarchical nodes on each propagation path is different. Therefore the number of blocks and each bit length need to be optimized.
for the minimum critical path. We have also designed a 64-bit 2-word associative memory using the dynamic circuit implementation for feasibility and performance evaluation.

### 6.5 Measurement Results and Discussions

#### 6.5.1 Function Tests

Figure 6.11 shows functional test results of Hamming-distance estimation using the fabricated associative engine. 64-word temporary data are randomly generated and stored in the memories. The search circuits provide an output signal for the first time in the clock period 23. That is the detected data has 23-bit Hamming distance from the input data. The search operation is interrupted and the detected output is masked in order to acquire another data of the same distance. The search operation starts again in case of no remaining data of the same distance. Therefore the associative engine can provide a couple of data in the same search clock period. For example, 2 data of $HD = 24$ are detected as shown by the clock period 24 in Figure 6.11 The associative engine has a capability of Manhattan-distance estimation.
using thermometer encoding in the same way as shown in Figure 6.12. A 3-bit binary code is encoded to a 7-bit thermometer code. Each word has nine 7-bit thermometer codes (i.e. 63-bit data). In the functional test of Manhattan-distance estimation, the nearest match is detected at the first time. In the clock period 8, the 12th word with 8-bit Manhattan distance is detected as the nearest match. And then the 2nd and 3rd match data are also detected in order.

The present associative engine provides not only the detected data address but also the Hamming or Manhattan distance. Moreover the distance estimation is strictly exact regardless of the bit length, the number of words, and the distance between each data. These features are important for high scalability of data capacity and high reliability for distance estimation, which has not been achieved by the conventional fully parallel architectures based on analog techniques [67]–[72].

### 6.5.2 Area and Capacity

The designed 64-bit 32-word associative engine occupies 475 $\mu$m $\times$ 1160 $\mu$m (0.55 mm$^2$). The area of a memory macro cell with a static search circuit is 9.6 $\mu$m $\times$ 13.6 $\mu$m (130.56 $\mu$m$^2$)
as shown in Figure 6.13 (a). In the static circuit implementation using a 0.18 \( \mu m \) process, the cell area is 6 times and 3 times as large as a 6T SRAM cell and a complete-match CAM cell, respectively. Figure 6.13 (b) shows a layout of the dynamic circuit implementation. It occupies 7.2 \( \mu m \times 8.8 \mu m \) (63.36 \( \mu m^2 \)). In this case, the cell area is 3 times and 2 times as large as a 6T SRAM cell and a complete-match CAM cell. The number of transistors in the present memory cell is larger than the conventional analog approaches [67]–[72]. The analog approaches are, however, difficult to follow the device scaling especially in a DSM process with the high performance and marginal capacity. The present approach can follow the device scaling and operate in a low supply voltage because of synchronous digital search logics embedded in memories. Besides, it has no limitation of capacity and search distance.
Therefore the associative engine has more potential for a practical use and a large capacity than the conventional designs.

### 6.5.3 Operation Speed

Figure 6.14 shows measured waveforms using an electron beam probe at room temperature. It shows a delay time of the critical path from the search clock (CLK) to a search output (SOi). The delay time for distance search in 64-bit data length is 2.18 ns in the worst case. The operation speed of the fabricated associative engine is 411.5 MHz and 40.0 MHz at 1.8V and 0.75V, respectively. Figure 6.15 shows measurement results of the operation speed in a 0.75V-to-1.8V power supply. The search time depends on the distance in a case that the target application requires only the nearest match data. For example, the nearest match detection is completed in 41.3 ns at 16-bit Hamming distance. The operation speed is higher than the conventional analog approaches. The worst-case operation requires 65 clock periods in a case that the nearest match data has the maximum distance of 64 bit. Therefore it takes 158.0 ns in the worst case.

Figure 6.16 shows the relation between a search cycle time and data capacity. The search time is limited by the search signal propagation or the priority encoding. The search signal propagation takes $O(\sqrt{N})$ at $N$-bit length due to a two-stage hierarchical structure. On the other hand, the priority encoding takes $O(\log M)$ at $M$-word length due to a binary-tree structure. Therefore the present architecture keeps a high speed operation in a large database as
Chapter 6  Digital Associative Engine for Hamming Distance Search

Figure 6.15  Operation frequency and power supply voltage.

Figure 6.16  Cycle time and data capacity.
Table 6.1 Specifications of the digital associative engine.

<table>
<thead>
<tr>
<th>Specification</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Process</td>
<td>1P5M 0.18 μm CMOS process</td>
</tr>
<tr>
<td>Power Voltage Supply</td>
<td>0.7 V – 1.8 V</td>
</tr>
<tr>
<td>Organization</td>
<td>64 bit × 32 word memory cells</td>
</tr>
<tr>
<td></td>
<td>32-input priority encoder</td>
</tr>
<tr>
<td>Functions</td>
<td>Nearest match detection</td>
</tr>
<tr>
<td></td>
<td>Distance ordering</td>
</tr>
<tr>
<td>Module Size</td>
<td>475 μm × 1160 μm (0.55 mm²)</td>
</tr>
<tr>
<td>Num. of Transistors</td>
<td>88.5k transistors</td>
</tr>
<tr>
<td>Memory Cell Size</td>
<td>9.6 μm × 13.6 μm (130.56 μm²)</td>
</tr>
<tr>
<td></td>
<td>7.2 μm × 8.8 μm (63.36 μm²) (1)</td>
</tr>
<tr>
<td>Search Time Order</td>
<td>O(√N) (@ N-bit capacity)</td>
</tr>
<tr>
<td>Encoding Time Order</td>
<td>O(log M) (@ M-word capacity)</td>
</tr>
<tr>
<td>Operation Speed</td>
<td>411.5 MHz (@ 1.8V, measured)</td>
</tr>
<tr>
<td></td>
<td>454.5 MHz (@ 1.8V, simulated)</td>
</tr>
<tr>
<td></td>
<td>40.0 MHz (@ 0.75V, measured)</td>
</tr>
<tr>
<td></td>
<td>41.4 MHz (@ 0.75V, simulated)</td>
</tr>
<tr>
<td>Worst-Case Search Time</td>
<td>158.0 ns (0-bit to 64-bit distance)</td>
</tr>
<tr>
<td>Power Dissipation</td>
<td>51.3 mW (@ 1.8V, 400MHz)</td>
</tr>
<tr>
<td></td>
<td>1.18 mW (@ 0.75V, 40MHz)</td>
</tr>
</tbody>
</table>

shown in Figure 6.16. The distance estimation has no limitation of data capacity as mentioned above.

6.5.4 Power Dissipation

The power dissipation of the associative engine is less than 51.3 mW at a supply voltage of 1.8V and an operation speed of 400 MHz. In a low-voltage operation, it is 1.18 mW at a supply voltage of 0.75V and an operation speed of 40 MHz. The search accuracy of the conventional analog approach is unstable and sometimes senseless in a low-voltage operation. The present search results are strictly exact regardless of a power supply voltage. The specifications of the digital associative engine are summarized in Table 6.1.

6.6 Summary

We have proposed a new concept and circuit implementation for a high-speed and low-voltage associative engine with exact Hamming distance search. It achieves no limitation of data capacity and keeps a high speed operation in a large database due to a hierarchical search architecture and a synchronous search logic embedded in a memory cell. The circuit

(1) Designed using dynamic circuit implementation as shown in Figure 6.7
implementation realizes high tolerance for device fluctuations in DSM process technologies and a low-voltage operation under 1.0V. The associative engine provides the exact distance of the detected data, so it has the capability of data sorting in order of Hamming distance as well as traditional nearest match detection. A 64-bit 32-word associative engine has been designed using a 1P5M 0.18 \( \mu \)m CMOS process and successfully tested. It achieves an operation speed of 411.5 HHz at a supply voltage of 1.8 V, and also attains a low-voltage operation of 40 HHz at a supply voltage of 0.75 V.
Chapter 7

Scalable Multi-Chip Architecture Using Digital Associative Engines

7.1 Introduction

This chapter proposes a scalable multi-chip architecture using the digital associative engine which is presented in Chapter 6. High capacity scalability is important for the associative memories since the required database capacity depends on the various applications. A multi-chip structure is most efficient for the capacity scalability as well as the standard memories. In the complete match detection such as [62]–[66], all the detected data are the correct results because they are exactly the same as the input. Therefore, the complete match data can be compiled without additional comparison among the detected data even in a multi-chip structure. On the other hand, in the conventional nearest match associative memories [67]–[72], each module provides just the local nearest data since the search operation is executed independently of each other module. Thus, the global nearest detection requires additional memory access and distance calculation because the exact Hamming distance is not provided by the local nearest match detection. Furthermore, it requires an inter-chip distance comparison among all the local nearest data. These features make it difficult for [67]–[72] to attain high capacity scalability by a multi-chip structure. The digital implementations have a potential capacity scalability by a multi-chip structure. [74] reports an 8-chip structure with extra winner-take-all (WTA) processors. It requires extra 4th, 5th and more pipelined WTA processors on each chip in order to build up a larger database capacity. On the other hand, a fully word-parallel architecture, such as [95] and the associative engine proposed in Chapter 6, is more efficient for high-speed associative processing than [74].

The proposed scalable multi-chip architecture employs the proposed fully word-parallel associative memories, and achieves a high capacity scalability. It is simply realized by extra
section 7.2 reviews the basic architecture of the fully word-parallel associative engine, and presents a concept of the scalable multi-chip architecture. section 7.3 shows circuit configurations and operations. section 7.4 describes a module generator for various capacities to extend the capacity scalability in the design phase. section 7.5 discusses performance evaluation based on post-layout simulations. finally, section 7.6 summarizes this chapter.

7.2 Concept of Scalable Multi-Chip Architecture

7.2.1 Performance Characteristics of Digital Associative Engine

The digital associative engine which is presented in chapter 6 searches for the nearest match data in word parallel as shown in figure 7.1. first, the input (D_in) is compared with all template data (D_0, D_1, ... D_M) by using an XOR/XNOR circuit embedded in a memory cell. next, the number of mismatch bits is counted by a search signal propagation via hierarchically chained search circuits in word parallel. the search circuit is also embedded in a memory cell and controls the search signal propagation based on the comparison results.
Chapter 7  Scalable Multi-Chip Architecture Using Digital Associative Engines

$(D_{in} \oplus D_M)$. A mismatch bit is masked in every word, and then the next mismatch bit is detected by a search signal propagation during a search clock period. The mask and search operations are carried out during a search clock period regardless of where a mismatch bit exists. Therefore the nearest match data are detected faster than the others, and the 2nd and 3rd nearest data are also detected in order of the distance. The associative processing architecture is capable of exact Hamming-distance search for all the template data in the distance order. Finally, the detected address is provided by a priority address encoder.

The search cycle time is linearly proportional to the bit length in a serial search path structure. It becomes a bottleneck of the associative processing, hence a hierarchical search structure is implemented for the search signal paths as presented in Section 6.2. The search cycle time is limited by $O(\sqrt{N})$ at an $N$-bit length database due to the two-stage hierarchical structure. Search results are transferred to a priority address encoder to acquire the address output during the next search operation. The priority address encoder is implemented using a binary-tree structure, hence the address encoding time is limited by $O(\log M)$ at an $M$-word database. The search cycle time ($T_c$), which determines the search throughput, is given by

$$T_c = \max(T_1, T_2), \quad (7.1)$$

$$T_1 \propto O(\sqrt{N}), \quad (7.2)$$

$$T_2 \propto O(\log M), \quad (7.3)$$

where $T_1$ is a search propagation time and $T_2$ is a priority address encoding time. $N$ and $M$ are the bit length and the number of words, respectively. The total search time ($T_s$) is given by

$$T_s = T_c \times (D + 1), \quad (7.4)$$

where $D$ is Hamming distance between the input and the detected data.

### 7.2.2 Multi-Chip Structures

Figure 7.2 shows possible multi-chip structures of the present associative memory. Figure 7.2 (a) shows a bus structure with a scan controller, which has the high capacity scalability and flexibility. It is, however, difficult to attain a high-speed search operation since the scan controller sequentially searches all the chips for a detected address during a search clock period. Figure 7.2 (b) shows a star structure with a winner-take-all (WTA) processor. The WTA processor simultaneously collects all the detected addresses. It is capable of acquiring
Figure 7.2 Possible multi-chip structures: (a) a bus structure with a scan controller, (b) a star structure with a WTA processor, (c) the present hierarchical structure.

A detected address during a search clock period. On the other hand, it requires a special WTA processor according to the number of chips. The address signal wires increase in proportion to $O(P \times \log P)$ in a case of a $P$-chip structure, and all the output signals concentrate on the same WTA processor chip. It becomes a potential problem on the capacity scalability and flexibility.

We propose a hierarchical structure using an inter-chip pipelined priority decision (PPD) circuit as shown in Figure 7.2 (c). In the present architecture, an associative memory chip interacts with each other using a completion signal ($D_{cmp}$) via a hierarchical PPD node em-
Figure 7.3 Examples of inter-chip wiring in a multi-chip structure: (a) a star structure, (b) the present hierarchical structure.

Table 7.1 Comparison among multi-chip structures.

<table>
<thead>
<tr>
<th></th>
<th>bus structure</th>
<th>star structure</th>
<th>hierarchical structure</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>16 chips</td>
<td>64 chips</td>
<td>16 chips</td>
</tr>
<tr>
<td>Num. of wires</td>
<td>12</td>
<td>16</td>
<td>208</td>
</tr>
<tr>
<td>Total wire length</td>
<td>180.0</td>
<td>1008.0</td>
<td>311.5</td>
</tr>
<tr>
<td>Search clock latency</td>
<td>16</td>
<td>64</td>
<td>1</td>
</tr>
<tr>
<td>Throughput</td>
<td>1/16</td>
<td>1/64</td>
<td>1 (lossless)</td>
</tr>
</tbody>
</table>

* A hierarchical tree network and address output buses, respectively.

bedded in a chip. A completion signal \(D_{cmp}\) represents whether any data are detected in a chip or not, which is provided by intra-chip priority decision results \(PO_m\). The inter-chip PPD circuit determines whether any chip contains a detected address and which chip is given priority for providing a search result. Therefore, a search result can be autonomously provided from the associative memory chip with priority. A long signal wire between chips limits the search operation speed. The present multi-chip structure, however, realizes a two-dimensional chip array with a tree network by short signal wires as shown in Figure 7.3 since a chip is adjacently connected by peer-to-peer interaction with four chips at a maximum. Therefore, it requires short signal wires of \(O(P)\) for an inter-chip PPD circuit and output bus wires of \(O(\log P)\). The present multi-chip architecture enables fully chip- and word-parallel Hamming distance search with no throughput decrease, additional clock latency of \(O(\log P)\), and inter-chip wires of \(O(P)\) for a configuration of \(P\) chips. Table 7.1 shows comparison among the multi-chip structures at a capacity of 256 bit \(\times\) 256 word per chip. In the compar-
Figure 7.4 Hierarchical multi-chip structure using embedded binary-tree pipelined priority decision circuits.

ison, CAM chips are placed in a two-dimensional array, and they are connected by straight wires as shown in Figure 7.3. The wire length is normalized by a pitch of the chip array. In a star structure, we assume that an additional WTA host processor compiles all the detected addresses from CAM chips and searches them in a single search clock.

### 7.3 Circuit Realization and Operation

#### 7.3.1 Hierarchical Inter-Chip Connections

Figure 7.4 shows a hierarchical multi-chip structure using a binary-tree pipelined priority decision (PPD) circuit. All CAM chips are hierarchically connected via PPD nodes as shown in Figure 7.4 (a). A CAM chip that detects data of \( HD = D \) during the \( D \)-th clock period provides an activation signal \( \text{Act}_p \) to a PPD node. The activation signal is generated by an intra-chip completion signal \( \text{D cmp} \). The hierarchical PPD nodes transfer the activation signals to the next stage while it determines which one is a priority result. Finally, they return
the priority decision results \( MPO_p \) to the CAM chips. The priority decision is carried out in the pipeline. Therefore, it requires additional latency of \( L_c \) clock cycles, which is given by

\[
L_c = 2 \times \log_2 P - 1, \tag{7.5}
\]

where \( P \) is the number of chips in the multi-chip structure. For example, the pipelined priority decision with eight CAM chips is completed in five clocks as shown by clock period numbers in Figure 7.4 (a).

The number of hierarchical PPD nodes \( N_{ppd} \) is given by

\[
N_{ppd} = P - 1, \tag{7.6}
\]

due to a binary-tree structure. Therefore each PPD node can be efficiently embedded in a CAM chip as shown in Figure 7.4 (b). All CAM chips are implemented by the same circuit configuration. This feature enables a multi-chip structure without any additional processor chip. In the multi-chip structure, one PPD node always remains as shown by CAM#0 in Figure 7.4 (b). The remaining PPD node is used for extension of the capacity, hence it attains the high capacity scalability by the flexible number of chips.

### 7.3.2 Extended Associative Memory Configuration

Figure 7.5 shows a block diagram of an associative memory chip extended for the multi-chip structure. It requires two-input multiplexers and shift registers in addition to the single-chip circuit configuration presented in Section 6.4. An associative memory chip provides an activation signal \( Act_p \) to a PPD node in a case that it detects a search output \( SO_m \). In the chip- and word-parallel Hamming distance search, some data of the same Hamming distance can be simultaneously detected ranging over all chips. Therefore, the inter-chip PPD circuit determines which chip is given priority over the other activated chips. An activated chip that receives the priority from the inter-chip PPD circuit provides the detected address and the chip ID as a search result. After the priority word is masked, the other detected words are evaluated again by the intra- and inter-chip priority decision circuits. In this case, all the search signal propagations are interrupted. And then, the search results \( SO_m \), which are temporarily buffered by shift registers, are provided to the intra-chip priority encoder again. The search signal propagations start again after all the detected addresses are processed since the priority decision circuit becomes available for the next search results. \( MCO_p \) is a completion signal of the inter-chip PPD circuit. The number of shift registers \( N_{reg} \) is a logarithmic order of
the number of chips as follows:

\[ N_{\text{reg}} = 2 \times \log P - 1, \]  

(7.7)
since it is determined by the additional clock latency resulting from a hierarchical PPD circuit.

### 7.3.3 Pipelined Priority Decision Circuit

The intra-chip priority decision is carried out by a binary-tree priority address encoder as presented in Section 6.3. It consists of a priority decision circuit and an address encoder as shown in Figure 7.6 (a). An inter-chip PPD circuit is designed based on the binary-tree priority decision circuit. A PPD node consists of a priority decision cell, an ID decoder, and register buffers. A priority decision cell has three inputs \((Pin)\) and three outputs \((Pout)\) in a similar configuration to the intra-chip priority decision circuit as shown in Figure 7.6 (a) and (b). In the intra-chip priority decision circuit, an input of \(Pin_a\) is also used for a return path.
Figure 7.6 Simplified schematics of binary-tree priority decision circuits: (a) intra-chip priority decision circuit and address encoder, (b) inter-chip pipelined priority decision circuit.
from the upper hierarchical level. On the other hand, the inter-chip priority decision circuit loses the original input of Pin_a since the operations are pipelined. Therefore, an input of Pin_a is buffered by shift registers in each PPD node. The shift registers are prepared according to the maximum number of chips. The number of buffer stages is set by the chip ID since the return path length is different for the hierarchical levels. Figure 7.7 shows a timing diagram of the inter-chip PPD circuit. The number of buffer stages can be determined by the least true bit of a chip ID because of a binary-tree structure. An inter-chip completion signal, MCO_p, is acquired by Pout_c of the top node, for example, Pout_c of CAM#4 in a multi-chip structure with eight chips. The completion signal is provided to each chip along a return path.

### 7.4 Module Generator for Various Capacities

We have developed a module generator for various capacities of the present associative memories. A required capacity of associative memories is different for various applications. Therefore a module generator which automatically provides an optimized structure with any database capacity is also important for the high capacity scalability. The present architecture
of fully chip- and word-parallel Hamming-distance estimation has the simplicity, regularity, and flexibility in structure. Therefore an associative memory module with variable capacities can be designed using a common macro cell library which includes a memory cell with a search circuit, a part of an address decoder, a sense amplifier, a word mask circuit, a shift register and so on. Figure 7.8 shows the module generator functions. The module generator partially employs Synopsys HSPICE, Cadence Dracula LPE and Virtuoso. The inputs are hard macro cells and a specification file including their cell sizes and pin locations. First, the library cells are extracted to SPICE netlists by using Dracula LPE, and then the cell performances are characterized by using HSPICE. The characterization can be skipped in a case that the module generator has already characterized the library cells. The delay of a hierarchical search node is especially estimated with various fan-outs since the fan-out increases in proportion to the bit length of the next block. Then, the module generator divides the database into hierarchical blocks based on the capacity requirements and the characterization results. A hierarchical structure that provides the minimum search path is generated by simulated annealing. Finally, the library cells are arranged, and the module generator provides a layout script file for Virtuoso. An inter-chip PPD node and additional shift registers are automatically added to the associative memory module according to the specified number of
Chapter 7  Scalable Multi-Chip Architecture Using Digital Associative Engines

Figure 7.9 Module generator execution example.

Figure 7.10 Examples of module generation: (a) 128-bit 256-word module for a single chip, (b) 256-bit 256-word module for 16-chip structure.

Figure 7.9 shows an execution example of the module generator. Figure 7.10 (a) and (b) are the module generation examples. Figure 7.10 (a) is a 128-bit 256-word module for a single-chip structure. Figure 7.10 (b) is a 256-bit 256-word module for a 16-chip structure. The module generator also reports the maximum delay.
### Table 7.2 Area of associative memory module.

<table>
<thead>
<tr>
<th>Database capacity</th>
<th>Area (Module size)</th>
</tr>
</thead>
<tbody>
<tr>
<td>4K (64 b x 64)</td>
<td>0.98 mm² (0.79 x 1.24)</td>
</tr>
<tr>
<td>16K (128 b x 128)</td>
<td>3.02 mm² (1.40 x 2.16)</td>
</tr>
<tr>
<td>64K (256 b x 256)</td>
<td>11.05 mm² (2.63 x 4.20)</td>
</tr>
<tr>
<td>256K (512 b x 512)</td>
<td>38.34 mm² (5.08 x 7.55)</td>
</tr>
<tr>
<td>1M (1024 b x 1024)</td>
<td>146.40 mm² (10.00 x 14.64)</td>
</tr>
</tbody>
</table>

### 7.5 Performance Evaluation

#### 7.5.1 Area and Capacity

Table 7.2 shows the estimated areas of an associative memory module with various database capacities. The number of transistors in the present associative memory cell is larger than that applying the conventional analog approaches. However, the analog approaches make it difficult for device scaling to keep the performance and marginal capacity. The present approach can achieve device scaling and operate at a low supply voltage because of the synchronous digital search logics embedded in the memories. Therefore, in comparison with the conventional designs, the associative memory has greater potential for practical use and a larger capacity.

#### 7.5.2 Search Cycle Time and Inter-Chip Bit Rate

Figure 7.11 shows a search cycle time of various database capacities assuming the bit length ($N$) and the number of words ($M$) are the same. The measured performance of the designed associative engine is also plotted in Figure 7.11, which is presented in Section 6.5. The search cycle time is limited by the search signal propagation of $O(\sqrt{N})$ or the priority address encoding of $O(\log M)$ as shown by Eq. (7.1). Therefore the hierarchical search structure attains a high-speed search operation in a large database. It achieves a search cycle time of 8.90 ns at a 1024-bit 1024-word database (i.e. 1Mb capacity). The required inter-chip bit rate is determined by the search cycle time. 454.5 MHz and 112.3 MHz inter-chip signalings are required for the associative memories of 4K b/chip and 1M b/chip, respectively. These inter-chip transmission speeds are feasible in the latest chip-to-chip interconnect technologies.
7.5.3 Hamming-Distance Search Time

Figure 7.12 shows additional latency for the multi-chip structure. The binary-tree PPD circuit reduces the additional latency to $O(\log P)$ as shown by Eq. (7.5). Therefore the additional latency is just 133.5 ns even for a 256Mb database which consists of 256 associative memory chips with a 1024-bit 1024-word capacity. Furthermore the multi-chip architecture maintains a continuous search operation with no throughput decrease, which enables the detection of data after the 2nd nearest data. The total search time depends on the Hamming distance between the input and the detected data as shown by Eq. (7.4). Figure 7.13 shows the total search time in 1-, 16-, and 256-chip structures of 256-bit 256-word associative memories as a function of Hamming distance of the detected data. In these configurations, the search time for the complete match data is 13.6 ns, 45.5 ns, and 81.8 ns at 16Mb, 1Mb, and 64Kb capacities, respectively. Furthermore the search time for the nearest match data is 1.18 $\mu$s, 1.21 $\mu$s, and 1.25 $\mu$s in the worst case, respectively. The hierarchical multi-chip architecture and circuit implementation achieve the capacity scalability with small performance degradation.
Chapter 7  Scalable Multi-Chip Architecture Using Digital Associative Engines

Figure 7.12 Additional latency for the multi-chip structure.

Figure 7.13 Total search time as a function of Hamming distance of the detected data.
7.6 Summary

We have proposed a hierarchical multi-chip architecture using fully digital and word-parallel associative memories based on Hamming distance. The multi-chip structure efficiently realizes the high capacity scalability by using an inter-chip pipelined priority decision (PPD) circuit. The inter-chip PPD circuit enables fully chip- and word-parallel associative processing by taking advantage of the feature of the digital associative processing architecture, which attains no throughput decrease, additional clock latency of \( O(\log P) \), and inter-chip wires of \( O(P) \) for a configuration of \( P \) chips. The developed module generator automatically optimizes the hierarchical search structure and provides the associative memory module for various capacity requirements. The feasibility of the architecture and circuit implementation has been demonstrated by post-layout simulations with measurement results of a single-chip implementation. The performance evaluation shows that the hierarchical multi-chip architecture is capable of the high-speed and continuous associative processing based on Hamming distance with a megabit database capacity.
Chapter 8

Digital Associative Engine with Wide Search Range Based on Manhattan Distance

8.1 Introduction

In this chapter, we propose a digital associative engine with wide search range based on Manhattan distance. Associative processing based on Manhattan distance is capable of much more practical applications than that based on Hamming distance, for example, code-book-based image compression [74], vector-quantization recognition [75] and so on as shown in Figure 8.1. Although associative processors based on Hamming distance are capable of Manhattan distance estimation using thermometer encoding as presented in Section 6.2, however they require $2^i$ bit length for $i$-bit data elements. Therefore, associative processing with a compact bit length requires the natural binary coding for Manhattan distance such as [74]–[76].

The proposed word-parallel associative engine is capable of accurate and wide-range Manhattan-distance computation. The word-parallel digital implementation using a hierarchical search path enables a high-speed search operation with faultless precision, a low-voltage operation mode, and a potential capability of unlimited data capacity. These features are important for a system-on-a-chip application in future process technologies, which it is difficult to attain using the conventional mixed-signal approaches [73], [75]–[76]. Furthermore, it performs a continuous search operation to detect not only the nearest match data but also all data in the sorted order of the exact Manhattan distance. It requires considerable search operations in a case of the conventional architectures [73]–[76]. Word-parallel distance calculation circuits autonomously count the Manhattan distance using a weighted
search clock to detect the nearest match data. The unique associative processing with accurate and wide-range Manhattan-distance computation efficiently realizes various new applications such as human-like learning and high-speed data sorting in addition to the conventional use.

Section 8.2 proposes Manhattan distance search algorithm and circuit realization. The Manhattan distance computation consists of three operation stages, which are an absolute flag generation, a distance counting operation, and a nearest match detection in candidates. These operations are carried out using a weighted search clock technique in word parallel. Section 8.3 shows design of the digital associative engine with 64 words of 8 bit \( \times \) 32 element. Measurement results are presented in Section 8.4, and then Section 8.5 summarizes this chapter.

### 8.2 Manhattan Distance Search Algorithm and Circuit Realization

#### 8.2.1 Element Circuit Structure

Associative processing based on Manhattan distance generally handles \( i \)-bit \( \times \) \( j \)-element data as shown in Figure 8.1. Manhattan distance computation requires SAD (summation of absolute difference) between an input and all stored data. Figure 8.2 (a) shows an 8-bit element structure. The stored data are divided into blocks and hierarchically connected by
Chapter 8  Digital Associative Engine with Wide Search Range Based on Manhattan Distance

Figure 8.2 Block diagram: (a) an 8-bit element structure, (b) a word structure with hierarchical search path.

A bypass line to reduce the search signal propagation path as shown in Figure 8.2 (b). The 8-bit element consists of 8 SRAM cells, a bit selector, a subtractor based on a half adder (HA) with an absolute function (ABS), a flag register (FR) with a bit comparison function, and a chained search circuit as shown in Figure 8.3.

The present algorithm and circuit implementation for Manhattan distance computation are shown in Figure 8.4 through Figure 8.7. First, absolute flags are generated in element parallel. Then, a distance counting operation is executed by a chained search signal propagation in word parallel. It is processed by weighted search clocks which are autonomously provided by word-parallel distance calculation circuits. Finally, the nearest match data is detected in Candidates which are activated by the word-parallel calculation circuits at the same time. All the data can be detected by a continuous search operation in the sorted order of Manhattan distance.
8.2.2 Absolute Flag Generation

Figure 8.4 (a) shows the element-parallel absolute flag generation. First, an input data \( A_{ij} \) is compared with a stored data \( B_{ij} \) from MSB to LSB in element parallel. It determines ‘\( A_{ij} > B_{ij} \)’ or ‘\( A_{ij} < B_{ij} \)’ using an input \( A_{ij} \) and a sum result \( S_{ij} \) of HA. The comparison result \( F_{jk} \) is stored in a flag register and used for an absolute function by switching a carry result \( C_{ij} \) of HA between \( A_{ij} \cdot B_{ij} \) and \( \overline{A_{ij}} \cdot B_{ij} \). The absolute difference is calculated in element parallel during the word-parallel summation.

8.2.3 Distance Counting Operation

The distance counting operation is executed from LSBs to MSBs of elements in word parallel as shown in Figure 8.4 (b). A sum result \( S_{0j} \) of \( A_{0j} \) and \( B_{0j} \) is set to \( M_{jk} \) as a control signal of a chained search circuit. A search signal detects the first-encountered mismatch bit with \( M_{jk} = 1 \) in each block. The search clock period is limited by the search signal propagation path via chained search circuits. Therefore, a hierarchical search path, which is proposed in Chapter 6, is implemented as shown in Figure 8.2 (b). A bypass search signal \( P_{kb} \) is also used for a mask permission signal to the next block, which makes only one mismatch bit maskable in each word for the next clock period. The interrupted search

![Figure 8.3 Circuit configuration of an 8-bit element cell.](image)
Figure 8.4 Search operation flow: (a) absolute flag generation, (b) distance counting operation, (c) weighted search clock supply.
signal starts again from the masked bit, and finally a search signal can be detected as \( S_{out_k} \) when all the mismatch bits have been masked. Therefore, the operation clocks represent the number of mismatch bits. After that, a distance counting operation is executed again for a carry result \( C_{0j} \) in a similar manner to the counting operation for a sum result \( S_{0j} \). These counting operations are repeated from \( A_{0j} \) to \( A_{7j} \).

### 8.2.4 Weighted Search Clock Technique

Figure 8.5 shows a word-parallel distance calculation circuit using autonomous weighted search clocks. The word-parallel circuit receives the search output signal \( S_{out_k} \), and it counts the Manhattan distance based on a weight of a search clock \( \phi_{sch} \). A search clock has different weights according to the bit number \( i \) that is currently evaluated in elements. For example, it has a weight of \( 2^i \) and \( 2^{i+1} \)-bit Manhattan distance during a counting operation for \( i \)-th sum and carry outputs, respectively. A word-parallel circuit autonomously provides \( \phi_{sch_k} \) to count all the mismatch bits faster. Therefore, it has a local weight \( Wl_k \) as a current weight of \( \phi_{sch_k} \), and accumulates a global weight \( Wg \) on a residual weight \( Wr_k \) as shown in Figure 8.4 (c). A search clock \( \phi_{sch_k} \) is provided and the local weight \( Wl_k \) is subtracted from \( Wr_k \) when the sum total of \( Wr_r \) and \( Wg \) exceeds \( Wl_k \). The local weight \( Wl_k \) always precedes the global weight \( Wg \) in every word since the global weight \( Wg \) is commonly updated according
8.2.5 Nearest Match Detection in Candidates

The distance counting operation is interrupted at the detection timing of $Act_k$, and then the process moves to nearest match detection for $Candidates$ as shown in Figure 8.6. $Candidates$ are all the words activated by $Act_k$ at the same time. They have different residual weight according to their Manhattan distance from the input since the distance is given by $\Sigma Wg - Wr_k$. $\Sigma Wg$ is the total distance weight operated before the detection timing of $Act_k$. Note that $Candidates$ are closer to the input than all the other undetected words in the present search algorithm, hence they include the nearest match data. This feature contributes to detect the nearest match data, and also enables a continuous search operation for data sorting in order of the exact Manhattan distance. The nearest match detection in $Candidates$ is carried out by a nearest match detector and a priority address encoder. It evaluates each residual weight $Wr_k$ from MSB to LSB as shown in Figure 8.6. The process maintains consistency with each
other word. It keeps all residual weights other than the nearest data in *Candidates*, and then the detected nearest data is masked to continue a search operation for the next nearest data. The circuit configuration is shown in Figure 8.7.

### 8.3 Chip Implementation

We have designed and fabricated an associative engine using the present search architecture in a 1P5M 0.18 \( \mu \)m CMOS process. Figure 8.8 illustrates a block diagram of the search engine. It consists of a search memory array with 64 words of 8 bit \( \times \) 32 element, a memory
read/write circuit with data shift registers, a word decoder, word-parallel distance calculation circuits, a priority address encoder for nearest match detection in candidates, and a CAM controller. These components are implemented in a die size of $2.8 \times 2.8 \text{ mm}^2$. Figure 8.9 shows a chip microphotograph and an 8-bit element cell layout. A 32-element word is divided into four blocks to reduce the critical path.

### 8.4 Measurement Results and Discussions

#### 8.4.1 Operation Speed and Power Dissipation

The measurement results show that the operation speed attains 294.1 MHz and the power dissipation is 320.7 mW at a supply voltage of 1.8 V. The total search time for nearest match detection is $2.00 \mu s$ in the worst case. Figure 8.10 shows the operation speed as a function of the supply voltage from 0.8 V to 2.0 V. The fully digital implementation enables a low-voltage operation mode up to 0.8 V. It attains an operation frequency of 72.4 MHz and a power dissipation of 15.1 mW at 0.9 V. The associative processing ensures Manhattan distance computation with faultless precision.
Figure 8.9 Chip microphotograph and layout of an element cell.
8.4.2 Search Range

Figure 8.11 shows the worst-case search time for wide-range Manhattan distance computation. The present search engine is capable of a continuous search operation to detect all data in the sorted order of the exact Manhattan distance in addition to the nearest match data. It efficiently realizes a wide-range search operation as shown by (a) in Figure 8.11. On the other hand, the conventional architectures require considerable search operations. Figure 8.11 (b) is estimated based on [74] as a conventional digital technique. Figure 8.11 (c) is estimated based on [76] as a conventional mixed-signal technique assuming that it is scalable to the same capacity as the present coprocessor since there was no report on such a long distance search by mixed-signal techniques so far. The capacity scalability is also one of advantages of the present digital implementation.

8.4.3 Area and Capacity

Table 8.1 shows the core area and SRAM ratio of various data capacities. The integration ratio of SRAMs is almost equivalent to the ratio of 19% of the conventional digital processor [74]. Furthermore, the present architecture has the possibility of a large database capacity in a practical die size since it makes device scaling easier than the conventional mixed-signal techniques. Table 8.2 summarizes the chip specifications.
**Chapter 8  Digital Associative Engine with Wide Search Range Based on Manhattan Distance**

**Table 8.1** Core area and SRAM ratio.

<table>
<thead>
<tr>
<th>Data size</th>
<th>Core area</th>
<th>SRAM ratio</th>
</tr>
</thead>
<tbody>
<tr>
<td>8-bit 32-ele. 64-word (16K)</td>
<td>2.37 mm²</td>
<td>17.2 %</td>
</tr>
<tr>
<td>8-bit 64-ele. 128-word (64K)</td>
<td>6.70 mm²</td>
<td>21.9 %</td>
</tr>
<tr>
<td>8-bit 128-ele. 256-word (256K)</td>
<td>22.25 mm²</td>
<td>25.3 %</td>
</tr>
<tr>
<td>8-bit 256-ele. 512-word (1M)</td>
<td>81.04 mm²</td>
<td>27.5 %</td>
</tr>
</tbody>
</table>

**8.5 Summary**

We have proposed a new word-parallel digital architecture and circuit implementation for accurate and wide-range Manhattan distance computation employing a hierarchical search path and a weighted search clock technique. It is capable of the detection of all data in the sorted order of the exact Manhattan distance in addition to the nearest match data. The weighted search clock technique performs the wide-range associative processing with fewer additional cycles. Furthermore, the digital implementation enables a low-voltage operation for SoC applications in future process technologies. It also makes device scaling easier and provides the possibility of a large data capacity with unlimited search distance. An associative engine, with 64 words of 8 bit \( \times \) 32 element, has successfully performed the Manhattan distance computation. The worst-case search time of all data sorting takes 5.85 \( \mu s \) at a supply
Table 8.2 Specifications of the associative engine.

<table>
<thead>
<tr>
<th>Specification</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Process</td>
<td>1P5M 0.18 µm CMOS process</td>
</tr>
<tr>
<td>Chip size</td>
<td>2.8 mm × 2.8 mm</td>
</tr>
<tr>
<td>Power voltage supply</td>
<td>0.8 V – 1.8 V</td>
</tr>
<tr>
<td>Database capacity</td>
<td>8-bit 32-element 64-word templates</td>
</tr>
<tr>
<td>Distance measure</td>
<td>Manhattan distance</td>
</tr>
<tr>
<td>Functions</td>
<td>Nearest detection / All data sorting</td>
</tr>
<tr>
<td>Nearest detection time</td>
<td>1.65 µs ~ 2.00 µs</td>
</tr>
<tr>
<td>All data sorting time</td>
<td>5.85 µs</td>
</tr>
<tr>
<td>Operation speed</td>
<td>294.1 MHz @ 1.8 V</td>
</tr>
<tr>
<td></td>
<td>72.4 MHz @ 0.9 V</td>
</tr>
<tr>
<td>Power dissipation</td>
<td>320.7 mW @ 1.8 V, 294.1 MHz</td>
</tr>
<tr>
<td></td>
<td>15.1 mW @ 0.9 V, 72.4 MHz</td>
</tr>
</tbody>
</table>
Chapter 9

Associative Processing for 3-D Image Capture

9.1 Introduction

In this chapter, we present an associative processing flow for 3-D image capture. We have achieved the high-speed and high-resolution smart image sensors for range finding and the high-speed associative engines with high capacity scalability in Chapter 2 through Chapter 8. 3-D image capture requires various associative processing algorithms after the range measurement, such as 3-D object clipping, synthesis of multidirectional range data, and object recognition.

A depth-key technique such as [18] is used for 3-D object clipping, however it requires a given range, where a target object is placed. It is then difficult for the depth-key technique to separate multiple objects placed in the same range. Therefore, a 3-D object clipping algorithm is necessary to search all the 3-D range data for neighbor points according to the relative distance among 3-D range data. Figure 9.1 shows a basic operation of associative processing for 3-D object clipping. First a start point is selected on a target object as shown in Figure 9.1 (a). An associative engine searches for the neighbor points within a threshold range, and holds the neighbor points as active 3-D data. And then, the next target point is selected in the active 3-D data as shown in Figure 9.1 (b). After searching for the neighbor points from the new target point, the target point is updated to another active 3-D data as shown in Figure 9.1 (c). An associative engine continuously searches for the next target point. The chain search algorithm for neighbor points realizes object clipping to obtain a target object as shown in 9.1 (d). Furthermore, it is efficiently performed by an associative engine.

Section 9.2 presents an associative processing algorithm for object clipping. Then, Section
9.3 describes circuit configurations and operations of the associative engine for object clipping. Section 9.4 shows simulation results for the feasibility and the performance evaluation. Section 9.5 summarizes this chapter.

### 9.2 Associative Processing for 3-D Object Clipping

We present an associative processing flow based on the proposed digital associative engines in Figure 9.2. All the 3-D range data are stored in the associative memories. The associative engine for 3-D object clipping is designed on the basis of a Manhattan distance search engine described in Chapter 8. It is capable of word-parallel and exact Manhattan distance computation. Associative processing for 3-D object clipping requires a function of exhaustive range search in addition to the standard associative processing. The associative engine consists of a memory array, search circuits embedded in memories, word-parallel distance calculator, flag
registers, mask registers, and a priority address encoder. The flag registers hold active 3-D data which are within the search range of a target point. The mask registers represent the 3-D data that are already detected during the search operation. The priority address encoder provides the least word address in active 3-D data whose flag registers are activated. The associative processing starts with an initial point, which is arbitrarily selected in the 3-D range data. And then, the object clipping is carried out as follows:

(a) Search the stored range data for neighbor points of the initial point based on Manhattan distance. For example, range data of #1 and #4 are activated as shown in Figure 9.2. In this case, the flag registers of #1 and #4 are updated.

(b) Provide one of the activated range data. In Figure 9.2, a priority address encoder provides an address of #1 based on the flag registers. And then, the range data of #1 is selected and read out. A flag register of the selected range data is masked by the priority decision at the same time.

(c) Search the stored range data again for neighbor points of the selected range data. In Figure 9.2, two range data, #3 and #6, are activated as neighbor points of the point #1. The flag registers of #3 and #6 are incrementally updated, that is, the flag register of #4 is still activated.

(d) Provide one of the activated range data. In Figure 9.2, a priority address encoder provides an address of #3. The range data of #3 is selected and read out. A flag register of #3 is masked.

(e) Continue to search the stored range data for neighbor points of the selected range data in the same way of (3). In the case shown in Figure 9.2, There are no neighbor points of #3 within a threshold range. Therefore, no range data are activated.

(f) Carry out a readout operation again. A priority address encoder provides an address of #4, which is a neighbor points of the initial data, based on the flag registers. A flag register of #4 is masked after an address of #4 is provided.

One of target objects is clipped when all the active flags are read out and masked. After that, an initial point is selected again in the inactivated range data, and the associative processing obtains the next target object. The associative processing basically repeats two operations: a search operation and a readout operation. The algorithm attains exhaustive data search and no redundant data readout for accurate 3-D object clipping.
Figure 9.2 Associative processing flow for 3-D image capture.
### 9.3 Circuit Configurations

Figure 9.3 shows a word structure of associative engine for 3-D object clipping. It consists of three 12-bit element cells, which are assigned to \( x \), \( y \) and \( z \) addresses of 3-D data. The element cells are connected by a search signal path via each search circuit. In the \( x \) address element, an input data, \( A_{xk} \), is compared with stored data, \( B_{xk} \), by a full adder in word parallel. \( \phi_{\text{add}} \) is a clock signal for the full adder. The overflow carry is registered by \( \phi_{\text{abs}} \) as an absolute flag. A search signal, \( S\text{ch}_{xk} \), is injected to the \( x \)-element cell. The search signal propagates to the next element via the search circuit. The search operation basically follows the digital associative engine presented in Section 8.2.

A word-parallel distance calculator provides a search clock, \( \phi_{\text{sch}} \), for distance counting. The distance counting operation is executed in the same fashion as the digital associative engine presented in Section 8.2. In this case, the search operation using weighted search clocks continues until the total global weight, \( \Sigma W_g \), reaches the threshold distance for 3-D object clipping. All the detected words are set to active 3-D data and the flag registers are activated. The associative engine also has a binary-tree priority address encoder, which provides the least address of the active 3-D data. Then, the next search operation is executed.
9.4 Performance Evaluation

We have designed the associative engine for 3-D object clipping using Verilog-HDL. The associative engine contains 76.8K words of 12 bit × 3 element. Three elements in a word are assigned to 12-bit x, y, and z addresses, respectively. In this simulation, the input range map is composed of 320 × 240 range data. It is generated from a range map captured by the XGA 3-D image sensor presented in Section 2.8, and it is down-converted to a QVGA (320 × 240) format. An initial point is set to the center position of the input range map. Then, the search operation sequentially detects all the 3-D range data of a target object on which the initial point is. Finally, the target object is clipped according to the 3-D range data. The search range, i.e. the distance threshold, is set to about 8 mm in this case.

The associative engine for 3-D object clipping requires 81 clocks for a range search operation. The associative engine requires a search operation of 182 MHz to clip all of the target objects from a QVGA 3-D range map. It is feasible by a 0.18 μm CMOS process or the
next generation technologies since the associative engine with 256 bit × 64 word memories achieves a speed of 294 MHz and it is limited by a logarithmic order of the number of words as described in Chapter 8. The estimated core area is 995.4 mm² in a 0.18 µ standard CMOS process on the basis of a layout of the associative engine based on Manhattan distance presented in Section 8.3. The core area can be reduced down to 248.9 mm² in a case of a 90 nm standard CMOS process. Thus the core area for a QVGA image format is feasible using the current CMOS process technologies.

9.5 Summary

We have discussed an associative processing for 3-D image capture. An associative processing has a potential capability of 3-D object clipping, synthesis of multidirectional range data, and object recognition. We have addressed the object clipping algorithm, and presented an associative processing flow of the chain search algorithm. We have designed the associative engine with 76.8K words of 12 bit × 3 element using Verilog-HDL. The feasibility of the associative processing for 3-D object clipping has been demonstrated by using a range map captured by the XGA 3-D image sensor. The associative engine requires a search operation of 182 MHz to clip all of the target objects from a QVGA 3-D range map. The core area was estimated at 248.9 mm² using a 90 nm standard CMOS process.
Chapter 10

Conclusions

This thesis focused on smart image sensors and associative engines for three dimensional image capture. We have addressed current issues in high-speed and high-resolution 3-D image capture systems, and proposed new frame access techniques, sensing schemes, sensor architectures, and circuit designs. We have also proposed new associative engines with high capacity scalability. The followings are conclusions through this thesis.

Chapter 2: We have proposed a high-speed dynamic frame access technique and circuit implementation for a real-time and high-resolution 3-D image sensor. The high-speed read-out scheme realizes to make a standard and compact pixel circuit available and to get a location and an intensity profile of a projected sheet beam on the sensor plane quickly. The column-parallel position detector reduces redundant data transmission for a real-time measurement system. A 640 × 480 3-D image sensor has been successfully demonstrated in a real-time and high-resolution range finding system. The maximum range finding speed is 65.1 range maps/s. The maximum range error is 0.87 mm and the standard deviation of error is 0.26 mm at 1200 mm distance due to a gravity center calculation with an intensity profile. We have shown a range finding system using multiple range finders for a full 3-D model capture. A scale-up version with 1024 × 768 pixels has been also developed.

Furthermore, we have proposed the pixel-parallel and column-parallel ambient light suppression techniques which are adapted to use in the proposed access technique. A 352 × 288 3-D image sensor with column-parallel ambient light suppression has been presented. The proposed column-parallel suppression technique employs adaptive reset feedback circuits, and efficiently reduces a high-contrast ambient light, device fluctuations, and select timing variations. It realizes a high-speed 3-D image capture system using a low-intensity beam projection, and attains the robust dynamic frame access in a high-speed operation and a high pixel resolution.
Chapter 3: We have proposed a row-parallel frame access architecture for a 1,000-fps range finder, which has many potential applications such as shape measurement of structural deformation and destruction, quick inspection of industrial components, scientific observation of high-speed moving objects, and fast visual feedback systems in robot vision. The row-parallel search operations are executed by a chained search circuit embedded in a pixel on the focal plane. The bit-streamed column address flow realizes row-parallel address acquisition with a compact circuit implementation. Moreover, a multi-sampling technique is available for range accuracy improvement.

We have shown the feasibility and the potential capability using a prototype position detector with 128 × 16 pixels. A 375 × 365 ultra fast range finder has been also designed and fabricated in a 1P5M 0.18 μm standard CMOS process. It achieves a high-speed frame access rate with multiple samplings. The maximum frame access rate is 394.5 kHz with 4 samplings, which is capable of 1052 range maps/s in case that the measurement setup has a plenty strong beam intensity. Then, it provides 1.10 mm range accuracy at a target distance of 600 mm. It has been improved up to a 0.2 sub-pixel resolution by the multi-sampling technique. The present techniques and circuits will open the way to the future applications which require extremely high-speed and high-accuracy 3-D image capture.

Chapter 4: We have proposed a new sensing scheme of low-intensity beam detection for a robust range finding system. A correlation circuit and a current-mode suppression circuit of constant illumination realize high sensitivity, high selectivity, and availability in wide-range background illumination. A 120 × 110 position sensor for robust range finding has been designed and successfully tested. The position sensor achieves high-sensitive light detection of -18 dB SBR in 48 dB background illumination. It also realizes high selectivity to detect only a target beam in a high contrast ambient light due to -13 dB suppression of another incident light with even harmonics of a correlation frequency. We have discussed a trade-off between the sensitivity and the frame rate, and presented the maximum frame rate of 2,000 fps at -16 dB SBR. We have applied the position sensor to a triangulation-based range finding system. It achieves a range accuracy with in 1.5 mm at a distance of 1000 mm. The present position sensor has advantages to future application fields which require a safe light projection for human eyes in various measurement environments.

Chapter 5: We have presented a pixel-level color image sensor with efficient ambient light suppression. Bidirectional photocurrent integrators realize pixel-level demodulation of a modulated RGB flashlight with suppressing an ambient light at short intervals during an
exposure period. Therefore, it avoids saturation from ambient illumination to realize the applicability to non-ideal illumination conditions. Every pixel provides color information without false color and intensity loss of color filters. We have demonstrated the efficient ambient light suppression and the pixel-level color imaging using a $64 \times 64$ prototype image sensor. Moreover, TOF range finding with $\pm 15$ cm range accuracy has been performed to show the feasibility of depth-key object extraction. The measurement results show that the present sensing scheme and circuit implementation realize the support capability of innate color capture and object extraction for image recognition in various measurement situations.

Furthermore, we have presented a low-intensity beacon detector for augmented reality systems. A $128 \times 128$ prototype beacon detector achieves 30-fps scene capture, 4850 bit/ID-sec using 40 kHz carrier, and less than -10.0 dB signal-to-background ratio (SBR) in more than 40 dB background illumination for a high-speed and robust AR system with active optical devices. It enables to get a scene image, locations, IDs and additional information of multiple target objects simultaneously in real time. These features realize a robust augmented reality system in various scene conditions.

**Chapter 6:** We have proposed a new concept and circuit implementation for a high-speed and low-voltage associative engine with exact Hamming distance search. It achieves no limitation of data capacity and keeps a high speed operation in a large database due to a hierarchical search architecture and a synchronous search logic embedded in a memory cell. The circuit implementation realizes high tolerance for device fluctuations in DSM process technologies and a low-voltage operation under 1.0V. The associative engine provides the exact distance of the detected data, so it has the capability of data sorting in order of Hamming or Manhattan distance as well as traditional nearest match detection. A 64-bit 32-word associative co-processor has been designed using 1P5M 0.18 $\mu$m CMOS process and successfully tested. It achieves an operation speed of 411.5 HHz at a supply voltage of 1.8 V, and also attains a low-voltage operation of 40 HHz at a supply voltage of 0.75 V.

**Chapter 7:** We have proposed a hierarchical multi-chip architecture using fully digital and word-parallel associative memories based on Hamming distance. The multi-chip structure efficiently realizes the high capacity scalability by using an inter-chip pipelined priority decision (PPD) circuit. The inter-chip PPD circuit enables fully chip- and word-parallel associative processing by taking advantage of the feature of the digital associative processing architecture, which attains no throughput decrease, additional clock latency of $O(\log P)$, and inter-chip wires of $O(P)$ for a configuration of $P$ chips. The developed module generator au-
tomatically optimizes the hierarchical search structure and provides the associative memory module for various capacity requirements. The feasibility of the architecture and circuit implementation has been demonstrated by post-layout simulations with measurement results of a single-chip implementation. The performance evaluation shows that the hierarchical multi-chip architecture is capable of the high-speed and continuous associative processing based on Hamming distance with a megabit database capacity.

Chapter 8: We have proposed a new word-parallel digital architecture and circuit implementation for accurate and wide-range Manhattan distance computation employing a hierarchical search path and a weighted search clock technique. It is capable of the detection of all data in the sorted order of the exact Manhattan distance in addition to the nearest-match data. The weighted search clock technique performs the wide-range associative processing with fewer additional cycles. Furthermore, the digital implementation enables a low-voltage operation for SoC applications in future process technologies. It also makes device scaling easier and provides the possibility of a large data capacity with unlimited search distance. An associative engine, with 64 words of 8 bit × 32 element, has successfully performed the Manhattan distance computation. The worst-case search time of all data sorting takes 5.85 µs at a supply voltage of 1.8 V.

Chapter 9: We have discussed an associative processing for 3-D image capture. An associative processing has a potential capability of 3-D object clipping, synthesis of multidirectional range data, and object recognition. We have addressed the object clipping algorithm, and presented an associative processing flow of the chain search algorithm. We have designed the associative engine with 76.8K words of 12 bit × 3 element using Verilog-HDL. The feasibility of the associative processing for 3-D object clipping has been demonstrated by using a range map captured by the XGA 3-D image sensor. The associative engine requires a search operation of 182 MHz to clip all of the target objects from a QVGA 3-D range map. The core area was estimated at 248.9 mm² using a 90 nm standard CMOS process.

As can be seen from the above results, the frame access techniques and sensing schemes efficiently realize high-speed, high-resolution and robust 3-D image capture systems. And then, the digital associative processing architectures attain high-speed data search and high capacity scalability. Therefore, the proposed smart image sensors and associative engines will make significant contributions to the advancement of 3-D image capture systems and become a driving force of future applications with high-quality 3-D images.
References


List of Publications

Technical Journals


**Commercial Journals**


Proceedings of International Conferences


**Proceedings of Domestic Conferences and Meetings**


Awards


2. Y. Oike, Takeda Scholarship Award 2002 from The Takeda Foundation, Apr. 2002.

