initial cell serch paper

8/22/2019 Initial Cell Serch Paper

1/63

1

Comparison of Initial Cell Search Algorithms for W-CDMA Systems

by

Sanat Kamal Bahl

Thesis submitted to the Faculty of the Graduate School

of the University of Maryland in partial fulfillment

of the requirements for the degree ofMaster of Science

2002


2/63

2

Title of Thesis: Comparison of Initial Cell Search Algorithms for

W-CDMASytems

Sanat Kamal Bahl, Master of Science, 2002

Thesis directed by: James F. Plusquellic

Assistant Professor

Dept. of Computer Science and Electrical Engineering

ABSTRACT

In this thesis, an Improved Cell Search Design (Improved CSD) using cyclic codes is

compared with the 3GPP Cell Search Design using comma free codes (3GPP-comma free

CSD) in terms of (1) hardware utilization on a field programmable gate array (FPGA) and

(2) acquisition time for different probabilities of false alarm rates. Our results indicate

that for a channel whose signal-to-noise ratio is degraded with additive white gaussian

noise (AWGN), the Improved CSD achieves faster synchronization with the base station

and has lower hardware utilization when compared with the 3GPP-comma free CSD

scheme under the same design constraints.


3/63

3

Table of Contents

1.0 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2.0 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.0 Cell Search Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.1 Synchronization Channels in W-CDMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.2 Cell Search Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2.1 Stage 1: Slot Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2.2 Stage 2: Frame Synchronization and Code Group Identification . . . . . . . . 13

3.2.3 Stage 3: Scrambling Code Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.0 Improved Cell Search Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.1 Stage1: Slot Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.2 Stage2: Frame Synchronization and Code Group Identification . . . . . . . . . . . . . 21

4.3 Stage3: Scrambling Code Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.3.1 Scrambling Code Generator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.3.2 Descrambler. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.0 3GPP-comma free Cell Search Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.1 Stage 2 of 3GPP-comma free Cell Search Design . . . . . . . . . . . . . . . . . . . . . . . . 32

5.2 Reduced Length FHT Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6.0 Experimental Method and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.1 Experimental Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.1.1 FPGA Design Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

7.0 Summary, Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

7.1 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

7.2 Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

7.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

8.0 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54


4/63

4

List of Abbreviations

AMPS Advanced Mobile Phone Service

ASIC Application Specific Integrated Circuit

A/D Analog-to-Digital

AWGN Additive White Gaussian Noise

BS Base Station

Cp Primary Synchronization Code

Cssc Secondary Synchronization Code

Cs Cyclic Hierarchical Sequence

CLB Configurable Logic Block

CPICH Common Pilot Channel

D/A Digital-to-Analog

DFT Discrete Fourier Transform

DSP Digital Signal Processing

DS-CDMA Direct Sequence-Code Division Multiple Access

FHT Fast Hadamard Transformer

FPGA Field Programmable Gate Array

GIC Group Indicator Code

GPS Global Positioning System

GSM Global System for Mobile communication

LC Logic Cell

LFSR Linear Feedback Shift Register

LUT Look-Up Table

MS Mobile StationPSC Primary Synchronization Code

P-SCH Primary Synchronization Channel

SSC Secondary Synchronization Code

SNR Signal-to-Noise Ratio


5/63

5

SCH Synchronization Channel

S-SCH Secondary Synchronization Channel

3G Third Generation

3GPP Third Generation Partnership Project

TIA Telecommunications Industry Association

W-CDMA Wideband-Code Division Multiple Access


6/63

6

List of Figures

Figure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page

1 DS-CDMA Transmitter-Receiver Block Level Diagram . . . . . . . . . . . . . . . . . . . . . . 3

2 Synchronization Channels in Cell Search. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Hierarchical Matched Filter (64-chip and 4-symbol accumulation). . . . . . . . . . . . . . 17

4 Hierarchical Matched Filter (16-chip and 16-symbol accumulation). . . . . . . . . . . . . 18

5 Slot Boundary Detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

6 Frame Synchronization and Code Group Identification. . . . . . . . . . . . . . . . . . . . . . . 24

7 Scrambling Code Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

8 Multiple Scrambling Code Generator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

9 Scrambling Code Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3110 Individual Stage of FHT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

11 16 chip FHT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

12 Hadamard Code Metrics (Butterfly Operation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

13 2-Slice Virtex-E CLB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

14 Detailed View of Virtex-E Slice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

15 Comparison of Improved CSD and 3GPP-comma free CSD PFA=10-3. . . . . . . . . . . 48

16 Comparison of Improved CSD and 3GPP-comma free CSD PFA=10-4. . . . . . . . . . . 48


7/63

7

List of Tables

Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page

1 Hierarchical Matched Filter (16 and 64-chip Accumulation). . . . . . . . . . . . . . . . . . . 16

2 Sequences X1,i and X2,i for Code Groups 1 to 32. . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 Masking Functions used in Stage 3: Scrambling Code Generator . . . . . . . . . . . . . . . 28

4 Allocations of SSCs for Secondary SCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Timing Diagram of Inputs to FHT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6 Reduced Length Walsh Sequences (256 chip sequence to 16 chip sequence) . . . . . . 41

7 Hardware Specifications of System: Quantization 4 Input Data Bits. . . . . . . . . . . . . 49

8 Hardware Specifications of FHT: 16 and 256 chip sequence. . . . . . . . . . . . . . . . . . . 49


8/63

8

Chapter 1

Introduction

1.0 Introduction

First generation (1G) mobile communications systems were based on analog technol-

ogy and started in the early to mid 1980s. These 1G systems had a number of limitations

which included (1) low quality voice service, (2) limited capacity and (3) inability to pro-

vide global roaming.

Digital second generation (2G) systems were then developed in Europe and US. The

various second generation systems included (1) Global System for Mobile communica-

tion (GSM) which utilizes time division multiple access (TDMA). In TDMA each user is

assigned a particular time slot. (2) The TDMA/136 specification which was defined in the

US, in 1988, by the Telecommunications Industry Association (TIA), developed with the

aim of digitizing the analog Advanced Mobile Phone Service (AMPS). (3) In the US, IS-

95 was proposed for 2G systems, to provide better voice quality and higher capacity. IS-

95 was based on CDMA technology. However, different 2G technologies were not

interoperable and not available across geographic areas. In addition, the low bit rate of 2G

systems could not meet subscriber demands for multimedia services. Third generation

(3G) systems aim to solve these problems encountered with 2G systems, by promising

global roaming across 3G standards, higher data rates, improved quality of service and


9/63

9

support for multimedia applications. The most popular candidates for 3G cellular systems

are CDMA2000 and Wideband-Code Division Multiple Access (W-CDMA) [1] [2]. Both

of these schemes are based on Direct Sequence-Code Division Multiple Access (DS-

CDMA) technology. In DS-CDMA, the data signals are directly modulated by a digital

code signal.

In a spread spectrum CDMA system, the transmitted signal is spread over a wide fre-

quency band that is wider than the minimum bandwidth required to transmit the informa-

tion being sent. In a typical scenario where there are multiple users or mobile stations

(MSs) in a cell, each user has a unique scrambling code. This scrambling code should be

such that it has low cross correlation properties with the other user codes. The signal

received by the MS from the transmitting base station (BS) is correlated with the users

scrambling code. This despreads only the signal of that particular user whereas the other

spread spectrum signals will remain spread. A block diagram of a DS-CDMA transmitter

and receiver is shown in Figure 1. Spreading consists of multiplying the input data by a

scrambling code sequence whose bit rate is much higher than the data bit rate. At the

receiving side the signal is multiplied with the same scrambling code sequence that is

exactly synchronized to the received code sequence. The Encoding block shown in Figure

1 is used to add error correcting bits and to perform interleaving in order to protect infor-

mation bits from channel noise and interference. The reverse operations are performed in

the Decoding stage at the receiver.


10/63

10

The main difference between W-CDMA and CDMA2000 is that W-CDMA supports asyn-

chronous BSs whereas CDMA2000 relies on synchronized BSs. Synchronous CDMA

systems need an external time reference. A Global Positioning System (GPS) clock can

be used by all BSs to synchronize their operations. This allows the MS to use different

phases of the same scrambling code to distinguish between adjacent BSs. In an asynchro-

nous CDMA system, each BS has an independent time reference, and the MS, does not

have prior knowledge of the relative time difference between various BSs. The advantage

of asynchronous operation is that it eliminates the need to synchronize the BSs to an accu-

rate external timing source. However, since there is no external time synchronization

between the adjacent BSs, different phases of the same code cannot be used to distinguish

XEncoding

Scrambling Code

Generator

Scrambling CodeSynchronization

DecodingBaseband Baseband

XData Data

Scrambling Code

Generator

Transmitter Receiver

Figure 1: DS-CDMA Transmitter-Receiver Block Level Diagram

D/A A/D


11/63

11

adjacent BS. Thus, in an asynchronous CDMA system, adjacent BSs can only be identi-

fied by using distinct scrambling codes. Consequently, cell search, which involves the

process of achieving code, time and frequency synchronization of the MS with the BS,

takes longer in comparison to a synchronous CDMA system. Cell search is complicated in

the presence of signals which are intended for other mobile systems within a cell as well

as signals from other BSs. Thus, it is very important to develop algorithms and hardware

implementations to perform cell search using lower acquisition time and minimum hard-

ware resources for asynchronous CDMA systems.

Cell search is performed according to the algorithm proposed by Wang et al. [3]. In the

proposed cell search algorithm, code and time synchronization is achieved assuming a

large frequency error and after achieving code and time synchronization, frequency syn-

chronization is performed. In this study we consider the problem of achieving code and

time synchronization. The process of achieving code and time synchronization in the cell

search algorithm for W-CDMA systems is divided into three stages (1) slot synchroniza-

tion, (2) frame synchronization and code group identification, and (3) scrambling code

identification. This thesis presents a 3G Partnership Project (3GPP) cell search design

using cyclic codes (Improved CSD) to achieve faster synchronization at lower hardware

complexity. The second part of this thesis compares the two design algorithms for per-

forming initial cell search: the Improved CSD and the 3GPP cell search design using

comma free codes (3GPP-comma free CSD) in terms of (1) acquisition time measure and

(2) hardware specifications on a Xilinx Virtex-E XCV1000E field programmable gate

array (FPGA). The thesis also proposes design improvements in stage 2 of the 3GPP-


12/63

12

comma free CSD beyond those proposed by Li et al. [4]. The 3GPP-comma free CSD

proposed in this thesis uses a Fast Hadamard Transformer (FHT) in stage 2 that achieves

lower hardware complexity and faster decoding. Furthermore, masking functions are used

in stage 3 of both the Improved CSD and the 3GPP-comma free CSD to reduce the num-

ber of scrambling code generators required as described in previous work [4]. This results

in a reduction in the ROM size required to store the initial phases of the scrambling code

generators in stage 3. The Improved CSD proposed in this thesis aims to achieve faster

synchronization between the MS and the BS and thus improves system performance. The

experiments carried out using accumulation over multiple slots in stage 1 indicate that for

an additive white gaussian noise (AWGN) channel in a high signal-to-noise ratio the

Improved CSD achieves faster synchronization with the BS and has lower hardware utili-

zation when compared with the 3GPP-comma free CSD scheme under the same design

constraints.

The thesis is organized as follows. Work done by other research groups and suggestions

by the 3GPP working group are presented in Chapter 2. Chapter 3 describes the synchro-

nization channels in W-CDMA cell search and introduces the three step cell search algo-

rithm used in W-CDMA for synchronization between the MS and the BS. Chapter 4

describes the Improved cell search design using cyclic codes proposed as a means of

achieving faster synchronization. Chapter 5 discusses the 3GPP cell search design using

comma free codes. Chapter 6 presents the experimental method and results of the compar-

ison of the two cell search algorithms on a Xilinx Virtex-E XCV1000E FPGA. Chapter 7

is a summary, discussion, and an overview of future directions of this research.


13/63

13

Chapter 2

Background

Cell search design is critical as it impacts the system performance and there is a need to

design efficient receiver structures and algorithms to reduce the cell search time. This

Chapter summarizes efforts by research groups and the 3GPP working groups to design

efficient schemes and algorithms for each of the three stages of the cell search algorithm.

2.0 Background

Wang et al. proposes a pipelined process to be used in first three stages of the cell search

algorithm [3]. The cell search scenarios considered in their study are (1) initial cell

search: when a mobile is switched on and (2) target cell search: during idle and active

modes of the MS. Instead of the serial cell search sequentially searching through code,

time and frequency, their method first acquires code and time synchronization assuming a

larger frequency error and then performs frequency synchronization [3] [5].

The synchronization code sequences used in stage 1 and stage 2 of the cell search algo-

rithm are made up of bits called "chips" which can be either +1 or -1. The synchronization

code sequences are 256 chips in length. If a traditional matched filter is used then a huge

adder circuit (256 input adder) will be required to sum up the correlation results. This will


14/63

14

lead to wastage of hardware resources. Hence, Siemens and Texas Instruments in their

working group draft have suggested a hierarchical matched filter design which uses two

matched filters to reduce the hardware complexity significantly [6]. The details of the

hierarchical matched filter design will be presented in Chapter 4.

The 3GPP specification uses comma free codes in stage 2 of the cell search algorithm

[7] [8]. Nortel networks in their working group proposal have suggested the use of cyclic

codes in the SCHs [9]. The use of cyclic codes for generating the synchronization codes

will be explained in more detail in Chapter 4. These cyclic codes can reduce hardware uti-

lization and acquisition time if the receiver is properly designed.

To reduce the complexity of searching through all the 512 scrambling codes, the con-

cept of code grouping and group indicator codes (GIC) was introduced [10]. This reduces

the cell search time as the scrambling code is identified by first detecting the code group.

Once the code group is detected then the scrambling code used by the cell can be easily

identified as there are a limited number of codes in each code group. This reduces the cell

search time significantly. This idea was accepted in the 3GPP specifications. To further

reduce cell search time, frame boundary synchronization is also achieved in stage 2 after

identifying the code group and slot ID [11].

Ericsson in their working group draft have proposed increasing the number of code

groups in stage 2 of the cell search [12]. Increasing the number of code groups reduces

the number of scrambling codes in a code group. Their proposed scheme uses either 256,


15/63

15

128 or 64 code groups in stage 2 of the cell search. They claim that the scheme using 256

code groups is the preferred scheme as it requires only two scrambling code correlators in

stage 3 of initial cell search and achieves reduced hardware complexity.

In stage 2 of the 3GPP-comma free CSD presented in this thesis, a FHT design is pro-

posed in replacement to the Golay correlator presented by Li et al. [4]. A FHT provides an

efficient technique to detect the code group and slot ID in stage 2. Previous FHT designs

[13] and [14] utilize a lot of hardware resources, hence, a fast and efficient Hadamard

transformer is needed to reduce the hardware utilization and to perform faster decoding.

A compact and efficient FHT design will also draw less power from the handset.

Siemens in their working group draft have suggested the use of masking functions in

stage 3 to reduce the design complexity for generating the scrambling codes in parallel

[15]. The use of masking functions reduces the number of scrambling code generators

required to generate the codes in parallel. Any masking function can be selected by the

designer as long as they generate codes with minimum overlap. The use of masking func-

tions reduces the hardware significantly as compared to the previous design by Li et al.

[4].

Li et al. have designed an application specific integrated circuit (ASIC) for performing

cell search in W-CDMA systems [4]. In stage 1 and stage 2 of their cell search design the

authors use a correlator structure to detect the code group and slot ID. The correlator

structure used is a Golay correlator [16]. In stage 3 of the cell search algorithm, 16 scram-


16/63

16

bling code generators are used for generating the codes in parallel.

In summary, most of the literature found in this area have presented simulation results of

their algorithms and have not investigated the hardware complexity of their design

schemes except the work presented by Li et al. [4]. The designs used by the mobile man-

ufacturers is company proprietary and there are very few documents which describe their

actual design schemes. It is critical to consider a practical hardware implementation of the

cell search algorithm especially because chip area and power utilization are the two most

important factors in a mobile handset.


17/63

17

Chapter 3

Cell Search Algorithm

3.0 Cell Search Algorithm

This Chapter describes the synchronization channels in W-CDMA cell search and intro-

duces the cell search algorithm used in the synchronization of the MS with the BS for W-

CDMA systems.

3.1 Synchronization Channels in W-CDMA

In CDMA systems, spreading codes are used to differentiate physical channels from the

same transmitter, and scrambling codes are used to differentiate transmitters. The MS

needs to achieve code and time synchronization with the BS before any communication

with the BS can start. The process of searching for a code and achieving synchronization

with the BS is called cell search. Cell search is performed in two scenarios: when a MS is

switched on (initial cell search) and during active or idle mode (target cell search). Target

cell search is used to find handover candidates during a call. Cell search design is impor-

tant and needs to be completed in minimum delay as it impacts the system performance.

Each cell in a CDMA system is identified by its downlink scrambling code which is of

length 38,400 chips. The 38,400 chips form a radio frame which is divided into 15 slots.


18/63

18

Each slot in the radio frame is of 2,560 chips [7].

Figure 2 shows the slot and frame structure of the three synchronization channels used

in cell search: the Primary-Synchronization Channel (P-SCH), Secondary-Synchroniza-

tion Channel (S-SCH) and the Common Pilot Channel (CPICH) [7] [17]. The P-SCH

together with the S-SCH are also called Synchronization Channel (SCH). In the P-SCH, a

256 chip sequence is transmitted at the start of each slot. The same P-SCH sequence is

used by all the BSs and is transmitted once every slot. As the same sequence is used by all

the transmitting stations, only one matched filter is sufficient to detect the slot boundary

value. To reduce the complexity of the matched filter implementation, a hierarchical

scheme is used as will be explained in detail in Chapter 4. The S-SCH is used for carrying

15 different sequences, one in each slot, for the different code groups and is repeated after

every frame. These sequences are used in identifying the code group. The CPICH is used

38,400 chipsOne Frame = 15 slots (10 msec)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

10 CPICH Symbols

2,560 chips

256 chips

P-SCH

S-SCH

CPICH

(0.67 msec)

(0.067 msec)

Figure 2: Synchronization Channels in Cell Search


19/63

19

to carry the downlink common pilot symbols scrambled by the scrambling code of the BS.

Each slot of this channel is divided into 10 symbols, each of 256 chips in length.

To reduce the complexity of synchronizing to the BSs in W-CDMA, the concept of code

grouping and the use of code group indicator codes (GIC) were introduced [10]. The 512

scrambling codes used in W-CDMA are divided into code groups. After the code group is

identified then only the scrambling code used by the cell needs to be detected. The num-

ber of possible scrambling codes from which one code needs to be identified depends on

how many code groups are selected in stage 2 of the design. For example, if 32 code

groups are used in stage 2 then the number of scrambling codes in stage 3 are 16. Simi-

larly, if 64 code groups are used then there will be 8 possible scrambling codes. Although,

the number of scrambling codes will be fixed at 512, the number of code groups can be

increased from 32 to 256 [12]. The complexity is further reduced by combining frame

synchronization and code group identification in stage 2 of the cell search algorithm [11].

3.2 Cell Search Algorithm

The process of achieving code and time synchronization in the cell search algorithm is

divided into three stages (1) slot synchronization, (2) frame synchronization and code

group identification, and (3) scrambling code identification [3] [7] [8] [18].

3.2.1 Stage 1: Slot Synchronization


20/63

20

During stage 1 of the cell search procedure the MS uses the SCHs Primary Synchroniza-

tion Code (PSC) to acquire slot synchronization to a cell. This is typically done with a

single matched filter matched to the PSC which is common to all cells. The slot timing of

the cell can be obtained by detecting peak values in the matched filter output. The starting

position of the synchronization code may be determined from observations over one slot

duration. However, decisions based on observations over a single slot may be unreliable,

when the signal-to-noise ratio (SNR) is low or if fading is severe. Reliable slot synchroni-

zation is required to minimize cell search time. In order to increase reliability, observa-

tions are made over multiple slots and the results are then combined. This ensures that the

correct slot boundary is identified.

3.2.2 Stage 2: Frame Synchronization and Code Group Identification

During stage 2 of the cell search procedure, the MS uses the SCHs Secondary Synchro-

nization Code (SSC) to achieve frame synchronization and identify the code group of the

cell found in stage 1. This is done by correlating the received signal with all possible SSC

sequences and identifying the maximum correlation value. Since the cyclic shifts of the

sequences are unique, the code group as well as the frame synchronization is determined.

3.2.3 Stage 3: Scrambling Code Identification

During stage 3 of the cell search procedure, the MS determines the exact primary scram-

bling code used by the cell. The primary scrambling code is typically identified through


21/63

21

symbol-by-symbol correlation over the CPICH with all codes within the code group iden-

tified in stage 2. In this stage, a threshold value is used to decide whether the code has

been identified. The threshold value can be predetermined using a parameter called prob-

ability of false alarm rate [19].

This three stage cell search algorithm helps in simplifying the synchronization process

of the MS with the BS. Each stage and their hardware implementation will be explained

in the following Chapters.


22/63

22

Chapter 4

Improved Cell search Design

4.0 Improved Cell Search Design

This Chapter describes the Improved CSD using a set of cyclic codes. The cyclic codes

were proposed by Nortel networks to be used on the Secondary SCH [9]. These cyclic

codes allow very efficient detection and improves the cell search in terms of acquisition

time and hardware utilization. The three stage cell search design and their hardware

implementation are explained in Sections 4.1, 4.2 and 4.3.

4.1 Stage 1: Slot Synchronization

The MS first needs to acquire the PSC which is common to all the BSs. These codes are

of length 256 chips. The matched filter output is given by

where Rj

is the jth sample of the received complex signal, and

Cpj is the jth bit of the PSC

Hence, a traditional matched filter implementation would require 256 taps and a large

Y RjC pjj 0=

255

= (1)


23/63

23

adder circuit. This would increase the delay as well as power consumption at the receiver

which is not desirable. Thus, a hierarchical structure is proposed for performing the

matched filter operations which will need lesser number of taps, reduced circuitry and

lower power consumption [6]. The PSC consists of an unmodulated hierarchical sequence

of length 256 chips, transmitted once every slot. The PSC is the same for every BS in the

system and is transmitted time aligned with the slot boundary. The PSC is chosen to have

good auto-correlation properties. This means that when the PSC sequence is correlated

with itself, the interference from adjacent BSs is minimized and a high peak value is

obtained.

The hierarchical sequences used for generating the PSC are constructed from two con-

stituent sequences X1 and X2 of length n1 and n2, respectively, using the following equa-

tion

Cp(n)=X1(n mod n2)+X2(n div n1) modulo 2, n=0,1,..,(n1*n2)-1 (2)

where n1=n2=16.

The constituent sequences X1 and X2 are both defined as:

X1=X2=(1,1,-1,-1,-1,-1,1,-1,1,1,-1,1,1,1,-1,1) [9].

There are different techniques in which the hierarchical matched filter can be designed

as shown in Table 1.

Table 1: Hierarchical Matched Filter (16 and 64 chip Accumulation)

16 chip

Accumulator

16 symbol

Accumulator

64 chip

Accumulator

4 symbol

Accumulator

Register Taps 16 16 64 4


24/63

24

The hierarchical matched filter consists of two concatenated matched filter blocks. The

design using 64 taps is shown in Figure 3. This solution is not ideal because of the follow-

ing reasons. First, the matched filter design requires 64 taps. Second, the design needs a

64-input adder as shown in Figure 3. A better solution is to use the design shown in Fig-

ure 4. Hence, in stage 1 of both the Improved CSD and the 3GPP-comma free CSD the

hierarchical matched filter using 16 chip and 16 symbol accumulation is used.

Adder Length 16 16 64 4

Table 1: Hierarchical Matched Filter (16 and 64 chip Accumulation)

16 chip

Accumulator

16 symbol

Accumulator

64 chip

Accumulator

4 symbol

Accumulator

X X X X X X X X X X X X X X X X

+ + + + + + + + + + + + + + + +

+ + + + + + + +

+

X

+

X

+

InData

Adder Tree 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1 64 65 128 129 192 193 256

Adder Tree 2

PSCHCode

PSCHCode

5 levels of adders

Result

X

+

X

+


+ + + + + + + + + + + + + + + +

+ + + + + + + +

49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64

Figure 3: Hierarchical Matched Filter (64 chip and 4 symbol accumulation)

+ +

+

ShiftRegister 1

ShiftRegister 2


25/63

25

In this design, the first matched filter receives the input signals serially from the BS.

Correlation over X1 (16 chip accumulation) is performed before correlation over X2 (16

symbol accumulation). However, the two matched filters can be interchanged and the

selection is an implementation option. After 16 clock cycles when the shift register 1 is

filled, the data stored in the shift register 1 is matched in parallel with the code applied to

the taps of the matched filter (tap coefficients). The tap coefficients are the PSC sequences

which are the same for all the BSs. Hence, the same matched filter structure can be used

for all the BSs. The adder circuit is implemented as a tree structure with the 16 inputs

applied in parallel. If the data bits in the shift register 1 match with the tap coefficients

then the result of the adder tree will be the highest value possible (16 or greater). The sec-

ond matched filter has a shift register 2 of size 256 registers. Only 16 taps are needed to


+ + + + + + + + + + + + + + + +

+ + + + + + + +

+

+

X X X X X

+ + + + +

+

X X X X

+ + + +

+ + +

X

+

X

+

+

X

+

+

InData

Adder Tree 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1 16 17 32 33 48 49 64 65 80 81 96 176 177 192 193 208 209 224 225 240 241 256

Adder Tree 2

PSCH

Code

PSCHCode

3 levels of adders

3 levels of adders

Result

Figure 4: Hierarchical Matched Filter (16 chip and 16 symbol accumulation)

ShiftRegister 1

ShiftRegister 2


26/63

26

match every sixteenth value of the shift register 2. The result from the first adder tree is

stored in the shift register 2 of the second matched filter. After 256 clock cycles the shift

register 2 in the second matched filter will be filled with the results from the first matched

filter. The data in the shift register 2 is then matched in parallel with the tap coefficients.

The tap coefficients are the same as the PSC sequence. If the data bits match the code

sequence then the result of the second adder tree will be 256 or greater in magnitude corre-

sponding to the peak value. An advantage of this scheme is that no multiplier circuit is

needed as the correlations can be performed using an adder/subtractor circuit.

Each memory cell in shift register 1 is 4-bits wide assuming that, at the input to the dig-

ital receiver, the signal is sampled with a 4-bit analog-to-digital (A/D) convertor. Shift

register 2 is 8-bits wide to store the result from the first adder tree block. For performing

the correlation, it is not necessary to perform 16*16 operations but only 16+16 accumula-

tion operations, which leads to a considerable reduction in hardware complexity. The

hardware complexity of implementing the hierarchical matched filter is calculated as

shown. In one slot period (2,560 chips), the receiver has to perform at least 81,920 com-

plex additions per slot, (2,560*(16+16)). The traditional matched filter implementation

without the hierarchical structure would require 256 complex additions. Thus, the hierar-

chical matched filter achieves a saving of a factor of 8 in terms of complex additions.

From Figure 2, each slot has a duration of 0.67 msec (670 sec). The complexity of stage 1

in terms of real additions per second is 245 Madds/sec (8,1920*2/670). The incoming

complex signal is divided into two components, the sine part called the "in-phase" (I-

phase) and the cosine part called the "quadrature-phase" (Q-phase). The factor of 2 is for


27/63

27

the two branches I and Q of the complex signal. Thus, in stage 1 of the initial search,

8,1920 complex additions in 1 slot and computing power of 245 Madds/sec is needed.

There are two such hierarchical matched filters for the I and Q channels of the received

complex signal as shown in Figure 5. The correlation results over I and Q channels are

combined non-coherently over 1 slot duration and the result is stored in an accumulator

which is implemented as a shift register. The output of the accumulator is given to a com-

parator block to detect the peak value corresponding to the slot boundary of the closest BS

and the MS needs to synchronize with this BS. As the code can be affected by AWGN and

fading, accumulation over multiple slots is needed to correctly identify the slot boundary.

It is important that the slot boundary is correctly identified in order to avoid the cost of

increased acquisition time in case the wrong slot boundary is given to stage 2.


+ + + + + + + + + + + + + + + +

+ + + + + + + +

+

+

X X X X X

+ + + + +

+

X X X X

+ + + +

+ + +

X

+

X

+

+

X

+

+ +

I-Phase

Q-Phase

InData

Adder Tree 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1 16 17 32 33 48 49 64 65 80 81 96 176 177 192 193 208 209 224 225 240 241 256

Non-Coherent Detection Block

Accumulator

Comparator

Adder Tree 2

PSCH

Code

PSCHCode

3 levels of adders

3 levels of addersSlot Boundary Value

Stage 1 Complete

(.)

(.) 2

2

+

Figure 5: Slot Boundary Detection

ShiftRegister 1

ShiftRegister 2


28/63

28

4.2 Stage 2: Frame Synchronization and Code Group

Identification

The Secondary SCH consists of 15 sequences belonging to a family of cyclic codes

(SSCs), each of length 256 chips. These SSCs are transmitted repeatedly in parallel with

the Primary SCH. The procedure for constructing the cyclic codes is similar to that of the

hierarchical sequence (equation 2) for the Primary SCH except that it uses specific

sequences of length 16 from Table 2 for each code group.

The procedure for constructing the cyclic hierarchical sequence Csi,1 for slot 1 is exactly

the same as constructing the hierarchical sequence Cp for the Primary SCH. The

sequence Csi,1 for slot 1 will be referred to as the zero cyclic shift sequence as no shift is

applied to the constituent sequence X1i. For slots 2 to 15, the cyclic codes are constructed

from the two constituent sequences X1i,k-1 and X2i,k-1 of length n1 and n2 respectively

using the following formula

Csi,k(n)=X2i,k-1 (n mod n2)+X1i,k-1 (n div n1) modulo 2, n=0,1,..,(n1*n2)-1 (3)

where i is code group number,

k=2,3,..,15 is slot number,

n is chip number in slot, n1=n2=16, and

the constituent sequences X1i,k-1 and X2i,k-1 in each code group i are chosen to be the

following sequences from Table 2 [9].


29/63

29

The constituent sequence X2i,k-1 (inner sequence) is exactly equal to the base sequence

X2i in every slot, i.e. X2i,k-1=X2i at all k. The constituent sequence X1i,k-1 (outer

sequence) are formed from the base sequence X1iby cyclic right shifts of X1

ion k-1 posi-

tions (from 0 to 15) clockwise for each slot number k, from 1 to 15. The generation of the

cyclic codes can be understood clearly by considering the following example.

For the first code group the sequence is given by

X11,0=(1,1,1,-1,-1,-1,1,-1,-1,1,1,-1,1,-1,1,1), k=1 for slot 1, No cyclic shift

X11,1=(1,1,1,1,-1,-1,-1,1,-1,-1,1,1,-1,1,-1,1), k=2 for slot 2, cyclic right shift by 1 posi-

tion

X11,14=(1,-1,-1,-1,1,-1,-1,1,1,-1,1,-1,1,1,1,1), k=15 for slot 15, cyclic right shift by 14

positions.

Table 2: Sequences X1i and X2i for Code Groups 1 to 32

Code Group Code Group

1 1 1 1-1-1-1 1-1-1 1 1-1 1-1 1 1 17 1-1 1 1-1 1-1 1 1 1-1 1 1 1-1 1

2 1-1 1 1-1 1 1 1-1-1 1 1 1 1 1-1 18 1 1 1-1-1-1-1-1 1-1-1-1 1-1-1-13 1 1-1 1-1-1-1 1-1 1-1 1 1-1-1-1 19 1-1-1-1 1-1-1 1 1 1 1-1-1-1-1 1

4 1-1-1-1-1 1-1-1-1-1-1-1 1 1-1 1 20 1 1-1 1 1 1-1-1 1-1 1 1-1 1-1-1

5 1 1 1-1 1 1-1 1-1 1 1-1-1 1-1-1 21 -1-1-1 1 1-1-1 1 1-1 1-1 1 1-1-1

6 1-1 1 1 1-1-1-1-1-1 1 1-1-1-1 1 22 -1 1-1-1 1 1-1-1 1 1 1 1 1-1-1 1

7 1 1-1 1 1 1 1-1-1 1-1 1-1 1 1 1 23 -1-1 1-1 1-1 1-1-1 1 1-1-1-1-1-1

8 1-1-1-1 1-1 1 1-1-1-1-1-1-1 1-1 24 -1 1 1 1 1 1 1 1-1-1 1 1-1 1-1 1

9 1 1-1 1-1-1-1 1 1-1 1-1-1 1 1 1 25 -1 1 1 1-1-1 1 1 1-1-1-1-1-1-1 1

10 1-1-1-1-1 1-1-1 1 1 1 1-1-1 1-1 26 -1-1 1-1-1 1 1-1 1 1-1 1-1 1-1-1

11 -1 1-1-1-1-1-1 1 1 1 1-1 1-1 1-1 27 -1 1 1 1 1 1-1-1 1-1-1-1 1 1 1-1

12 -1-1-1 1-1 1-1-1 1-1 1 1 1 1 1 1 28 -1-1 1-1 1-1-1 1 1 1-1 1 1-1 1 1

13 1-1-1-1 1-1-1 1-1-1-1 1 1 1 1-1 29 -1 1-1-1 1 1 1 1 1-1 1 1 1 1-1 1

14 1 1-1 1 1 1-1-1-1 1-1-1 1-1 1 1 30 -1-1-1 1 1-1 1-1 1 1 1-1 1-1-1-1

15 1-1-1-1-1 1 1-1 1 1 1-1 1 1 1-1 31 -1 1 1 1-1-1 1 1-1 1 1 1 1 1 1-1

16 1 1-1 1-1-1 1 1 1-1 1 1 1-1 1 1 32 -1-1 1-1-1 1 1-1-1-1 1-1 1-1 1 1


30/63

30

The same procedure for forming the cyclic codes will be used for other code groups.

Thus, for the 32 codes groups and 15 slots (in one frame), 512 different cyclic codes with

a length of 256 chips each are constructed. In other words, each of the 32 code groups has

16 cyclic codes. This set of 512 (32X16) cyclic codes has good correlation properties that

make it good candidates for the SSCs. Many pairs of cyclic codes are fully orthogonal as

the cross correlation is zero, some pairs have small cross correlation properties. The cross

correlation of each cyclic hierarchical sequence Csi,kwith Cp code of Primary SCH is

small. These 512 cyclic codes are unique for each code group/slot locations pair. Thus, it

is possible to uniquely determine both the scrambling code group and the frame timing in

the second stage of the initial cell search.

By identifying the code group/slot location pair that gives the maximum correlation

value, the code group as well as the frame synchronization is determined. The output

from the matched filter is given to a non-coherent block which computes the energy over I

and Q channels and then gives the result to the comparator module as shown in Figure 6.

One slot search period time (2,560 chips) is enough to uniquely identify the correct code

group and the frame timing in the second stage of acquisition when the signal-to-noise

ratio is high. This is one major difference with the 3GPP-comma free CSD where at least

three slots are necessary to uniquely identify the correct code group and frame timing.

The Improved CSD also uses a smaller size ROM 32X16 to store the cyclic codes as com-

pared to the 3GPP-comma free CSD which uses a ROM of size 32X60 to store the comma

free codes.


31/63

31

The input data samples for the Secondary SCH are stored in an input buffer with 256

complex memory cells called the Secondary Buffer as shown in Figure 6. These input

data samplesare producedafter waveform matched filtering and sampling at thechip rate.

The result from the hierarchical matched filter design is then given to a non-coherent mod-

ule which is used to calculate the energy over I and Q channels and then give it to a com-

parator block.

The ROM-stored code sequences given in Table 2 are each tried in sucession before the

data from the next slot comes in. The data in the shift register is latched till all these

+ + + + + + + + + + + + + + + +

+ + + + + + + +

+


Adder Tree 1

Adder Tree 2

+ + + + + + + + + + + + + + + +

+ + + + + + + +

+


1 256Sampling Counter Secondary Buffer

Code Register 1

Code Register 2

Slot Boundary Value

3 levels adder tree

3 levels adder tree

Enable Stage1 Complete

Matched Filter 1

Matched Filter 2

5X SysClock

5X SysClock

I-Phase

Q-Phase

Code Group

Slot ID

Non-coherent Detection Block

Comparator

Stage 2 Complete

Cyclic Codes

Buffer used to fill the Data Register of

Matched Filter1

(.)

(.) 2

2

+

1 2 3 4 5 6 7 8 9 10111213 1514 16

1 2 3 4 5 6 7 8 9 10111213 1514 16

Rom32 X 16

12

3

32

Figure 6: Frame synchronization and Code Group Identification

Shift Register 1

Shift Register 2


32/63

32

sequences have been correlated. This is achieved in stage 2 of the Improved CSD scheme

using two clocks, a slow clock called the system clock in the design and a fast clock which

runs at 5X system clock. The sampling is performed at the slow clock rate (system clock).

Once the data is latched in the buffer, the fast clock (5X system clock) is used to perform

the correlations.

The comparator block gives the highest correlated code group from the Table 2 with the

data sequence and also the number of shifts which have been applied to the code group

sequence. The number of shifts is the same as the slot ID. From the slot ID the frame

boundary can easily be identified because the number of slots in a frame is fixed at 15.

4.3 Stage 3: Scrambling Code Identification

After achieving code group and frame synchronization, the scrambling code is identified

by correlating the symbols in the CPICH with all possible scrambling codes in the code

group. The codes are generated using a scrambling code generator and the descrambling

operation is carried out using a descrambler. The details of the scrambling code generator

and the descrambler used in stage 3 of the cell search are explained in Sections 4.3.1 and

4.3.2 respectively.

4.3.1 Scrambling Code Generator

Each cell is allocated one and only one primary scrambling code. The scrambling code


33/63

33

sequences are constructed by combining two real sequences into a complex sequence [7].

Each of the two real sequences are constructed as the position wise modulo 2 sum of

38,400 chip segments of two binary sequences generated by means of two generator poly-

nomials of degree 18. Let x and y be the two sequences respectively. The resulting

sequences constitute segments of a set of Gold sequences. The x sequence is constructed

using the primitive polynomial 1+X7+X18. The y sequence is constructed using the poly-

nomial 1+X5+X7+X10+X18. The sequence depending on the chosen scrambling code

number n is denoted as zn. Furthermore, let x(i), y(i) and zn(i) denote the ith symbol of the

sequence x, y, and zn, respectively. The sequences x and y are constructed as

x(i+18)=x(i+7)+x(i) modulo 2, i=0,1,..,218 - 20 (4)

y(i+18)=y(i+10)+y(i+7)+y(i+5)+y(i) modulo 2, i=0,1,..,218 - 20 (5)

The nth Gold code sequence zn, n=0,1,..,218 - 2, is then defined as

zn(i)=x((i+n) modulo (218 -1))+y(i) modulo 2, i=0,1,..,218- 2 (6)

Finally, the nth complex scrambling code sequence sn is defined as

sn(i)=zn(i)+jzn((i+131,072) modulo (218-1)), i=0,1,..,38,399 (7)

The pattern from phase 0 up to the phase of 38,399 is repeated for every radio frame.


34/63

34

The scrambling code generator used to generate the long codes is shown in Figure 7. A

total of 218 -1=262,143 scrambling codes, numbered 0,1,..,262,142 can be generated using

the code generator. However not all the scrambling codes are used. The scrambling codes

are divided into 512 sets each of a primary scrambling code and 15 secondary scrambling

codes. The primary scrambling codes consist of scrambling codes n=16*i where

i=0,1,..,511. The ith set of secondary scrambling codes consists of scrambling codes

16*i+k, where k=1,2,..,15. There is a one-to-one mapping between each primary scram-

bling code and 15 secondary scrambling codes in a set such that ith primary scrambling

code corresponds to ith set of secondary scrambling codes. The set of primary scrambling

codes is further divided into 32 scrambling code groups, each consisting of 16 primary

scrambling codes. The jth scrambling code group consists of primary scrambling codes

16*16*j+16*k, where j=0,1,..,31 and k=0,1,..,14.

+

+

+

+

0717

I Channel

Q ChannelCode

Code

+

+

6 5 4 3 2 18910111213141516

0717 6 5 4 3 2 18910111213141516

Figure 7: Scrambling Code Generator


35/63

35

In stage 3, 16 scrambling codes need to be generated in parallel. If the scrambling code

generator shown in Figure 7 is used to generate the codes then 16 such code generators

would be required. However, generating the codes in parallel using 16 code generators

could be expensive as a huge ROM would be required to store the initial phases for all the

16 code generators.

Table 3: Masking Functions used in Stage 3: Scrambling Code Generator

Masking Function For I Channel Code

in LFSR 1

Masking Function For Q Channel

Code in LFSR 1

Code1 000000000000000001 001000000001010000Code2 000000000000000010 010000000010100000

Code3 000000000000000100 100000000101000000

Code4 000000000000001000 000000001000000001

Code5 000000000000010000 000000010000000010

Code6 000000000000100000 000000100000000100

Code7 000000000001000000 000001000000001000

Code8 000000000010000000 000010000000010000

+

+

+

+

Masking Function for I Channel

Masking Function for I Channel

Masking Function for Q Channel

Masking Function for Q Channel

0717

071017

I Channel

Q ChannelInitial Phases

1

2

32

ROM 32 X 18

for Code generator Code

Code

. . .

. . .

. . .

. . .

5

Figure 8: Multiple Scrambling Code Generator

LFSR 1

LFSR 2


36/63

36

In order to reduce the hardware utilization, in stage 3 of both the designs only one

scrambling code generator is used to generate 16 codes in parallel when 32 code groups

are used as shown in Figure 8. Sixteen masking functions are used to generate the codes

in parallel [15]. Masking functions can generate codes which have minimum overlap and

reduce the hardware circuitry to a single scrambling code generator at the expense of a few

logic gates. The masking functions used for generating the codes are given in Table 3.

Masking function for I and Q Channel Code in linear feedback shift register (LFSR) 2

were kept fixed as 000000000000000001 and 001111111101100000. Besides reducing

the hardware from 16 code generators to one code generator, the design also reduces the

ROM size to 32X18 from the size 512X18 if 16 code generators were used.

4.3.2 Descrambler

Descrambling is carried out using data over the CPICH and the codes generated by the

scrambling code generator and masking functions. Counters are used as shown in Figure

9 to keep track of the votes obtained after the descrambling and the comparison opera-

tions. After these operations are completed, the final step is to decide whether cell search

Code9 000000000100000000 000100000000100000

Code10 000000001000000000 001000000001000000

Code11 000000010000000000 010000000010000000Code12 000000100000000000 100000000100000000

Code13 000001000000000000 000000001010000001

Code14 000010000000000000 000000010100000010

Code15 000100000000000000 000000101000000100

Code16 001000000000000000 000001010000001000

Table 3: Masking Functions used in Stage 3: Scrambling Code Generator

Masking Function For I Channel Code

in LFSR 1

Masking Function For Q Channel

Code in LFSR 1


37/63

37

has been successful and a code has been found. For this purpose a parameter called prob-

ability of false alarm rate (PFA) is used to predefine the threshold value (VTH) [19]. The

relation can be expressed by the following equation

PFA=e-V

TH/V (8)

where V is twice the variance of the I and Q components.

If the counter exceeds VTH then the cell search operation is declared a success and the

particular long code is identified.


38/63

38

X X

+

(.)

(.)22

+

+

X X

+

Descrambler2

Descrambler3

Descrambler16

Descrambler1

.

Descrambler

Descrambler

counter15

..16

counter13..

14

counter11..

12

counter10..

9

counter7..

8

counter5..

6

counter3..

4

counter1..

2

T

hreshold

FirstComparatorBlock

SecondComparatorBlock

IChannelCode

QCh

annelCode

QCh

annelCode

IChannelCode

Data

Data

Data

Data

Increment

Counter

Code

Found

+ +

++

MaskingFunctionforIChannel

MaskingFunctionforIChannel

MaskingFunctionforQChannel

MaskingFunctionforQChannel0

7

17

0

7

10

17

IChannel

QChannel

InitialPhases

1232

MultipleScramblingCodeGenerator

ROM

32X18

Descrambler

Long

Code

IChannel

QChann

el

IChannel

QChann

el

Output1

Output16

Value

forCodegenerator

Code

Code

Output1

...

...

...

... 5

Figur

e9:ScramblingCodeIdentification


39/63

39

Chapter 5

3GPP-comma free Cell Search Design

5.0 3GPP-comma free Cell Search Design

This Chapter discusses stage 2 of the 3GPP cell search design using comma free codes.

Stage 1 and stage 3 for the 3GPP-comma free CSD design were kept the same as the

Improved CSD to compare stage 2 of both the designs. A Fast Hadamard Transformer

(FHT) is proposed to be used in stage 2 of the cell search algorithm. To reduce the hard-

ware utilization of the FHT design, reduced length Walsh sequences are proposed as

explained in Section 5.1.

5.1 Stage 2 of 3GPP-comma free Cell Search Design

In CDMA systems, the BS identifies each user in a cell by a unique scrambling code. In

order to minimize the interference in a cell when two users transmit at the same time,

orthogonal (Walsh) codes are used. The Walsh codes are generated using a Walsh-Had-

amard function. When these Walsh codes are transmitted by the BS, they are affected by

interference, fading and noise which may be AWGN. At the receiver, a decoding logic is

required to correctly determine which of the Walsh codes was the most likely to have been

sent. A FHT can be used to provide such a decoding circuitry.

The table provided in the 3GPP Specifications for the comma free codes is for 64 code


40/63

40

groups. For comparison with the Improved CSD scheme which uses 32 code groups, only

32 of the possible 64 code groups are used. The 32 secondary SCH sequences are con-

structed such that their cyclic shifts are unique, i.e., a non-zero cyclic shift less than 15 of

any of the 32 sequences is not equivalent to some cyclic shift of any other of the 32

sequences. Also, a non-zero cyclic shift less than 15 of any of the sequences is not equiv-

alent to itself with any other cyclic shift less than 15. Table 4 lists the sequences of SSCs

used to encode the 32 different scrambling code groups [7].

Table 4: Allocation of SSCs for Secondary SCH

Scrambling

Code

Group

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Group 0 1 1 2 8 9 10 15 8 10 16 2 7 15 7 16

Group 1 1 1 5 16 7 3 14 16 3 10 5 12 14 12 10

Group 2 1 2 1 15 5 5 12 16 6 11 2 16 11 15 12

Group 3 1 2 3 1 8 6 5 2 5 8 4 4 6 3 7

Group 4 1 2 16 6 6 11 15 5 12 1 15 12 16 11 2

Group 5 1 3 4 7 4 1 5 5 3 6 2 8 7 6 8

Group 6 1 4 11 3 4 10 9 2 11 2 10 12 12 9 3

Group 7 1 5 6 6 14 9 10 2 13 9 2 5 14 1 13

Group 8 1 6 10 10 4 11 7 13 16 11 13 6 4 1 16Group 9 1 6 13 2 14 2 6 5 5 13 10 9 1 14 10

Group 10 1 7 8 5 7 2 4 3 8 3 2 6 6 4 5

Group 11 1 7 10 9 16 7 9 15 1 8 16 8 15 2 2

Group 12 1 8 12 9 9 4 13 16 5 1 13 5 12 4 8

Group 13 1 8 14 10 14 1 15 15 8 5 11 4 10 5 4

Group 14 1 9 2 15 15 16 10 7 8 1 10 8 2 16 9

Group 15 1 9 15 6 16 2 13 14 10 11 7 4 5 12 3

Group 16 1 10 9 11 15 7 6 4 16 5 2 12 13 3 14

Group 17 1 11 14 4 13 2 9 10 12 16 8 5 3 15 6

Group 18 1 12 12 13 14 7 2 8 14 2 1 13 11 8 11

Group 19 1 12 15 5 4 14 3 16 7 8 6 2 10 11 13

Group 20 1 15 4 3 7 6 10 13 12 5 14 16 8 2 11

Group 21 1 16 3 12 11 9 13 5 8 2 14 7 4 10 15

Group 22 2 2 5 10 16 11 3 10 11 8 5 13 3 13 8

Group 23 2 2 12 3 15 5 8 3 5 14 12 9 8 9 14

Group 24 2 3 6 16 12 16 3 13 13 6 7 9 2 12 7

Group 25 2 3 8 2 9 15 14 3 14 9 5 5 15 8 12

Group 26 2 4 7 9 5 4 9 11 2 14 5 14 11 16 16

Group 27 2 4 13 12 12 7 15 10 5 2 15 5 13 7 4


41/63

41

The 16 SSCs, (Cssc,1,..,Cssc,16), are complex-valued with identical real and imaginary

components, and are constructed from position wise multiplication of a Hadamard

sequence and a sequence z, defined as z=(b,b,b,-b,b,b,-b,-b,b,-b,b,-b,-b,-b,-b,-b), where

b=(1,1,1,1,1,1,-1,-1,1,-1,1,-1,1,-1,-1,1). The Hadamard sequence is obtained from one of

the rows of a Hadamard matrix which consists of +1 and -1. The rows and columns of the

Hadamard matrix have the property that they are mutually orthogonal. The following

examples show how to construct a Hadamard matrix

In general the Hadamard matrix can be defined recursively as

where HN is a matrix of size N X N.

If a vector X with length N is an input then a vector Y obtained as a result of the Had-

amard transform is equal to

Y=HN*X (10)

Group 28 2 5 9 9 3 12 8 14 15 12 14 5 3 2 15

Group 29 2 5 11 7 2 11 9 4 16 7 16 9 14 14 4

Group 30 2 6 2 13 3 3 12 9 7 16 6 9 16 13 12

Group 31 2 6 9 7 7 16 13 3 12 2 13 12 9 16 6

Table 4: Allocation of SSCs for Secondary SCH

Scrambling

Code

Group

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

H21 1

1 1=

H4

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

=

HNHN HN

HN HN= (9)


42/63

42

The entries in Table 4 denote what SSC to use in the different slots for the different

scrambling code groups, e.g. the entry "5" means that SSC Cssc,5 shall be used for the cor-

responding scrambling code group and slot. The kth SSC, Cssc,kk=1,2,..,16 can be calcu-

lated using the following expression:

Cssc,k=(1+j)(Hm(0)z(0),Hm(1)z(1),Hm(2)z(2),..,Hm(255)z(255)) (11)

where m=16(k-1)

As each element of the Hadamard matrix is either +1 or -1, the multiplication operation

used in equation 11 can be reduced to a series of addition/subtraction operations. In gen-

eral, for a N-point input sample, the FHT algorithm needs to perform Nlog2N addition and

subtraction operations.

Figure 10 shows an individual stage of the FHT. Each stage has an upper and a lower

input terminal. The upper input terminal is configured to receive multiple input signals

which are either Walsh chips (if the stage is the first stage of the FHT) or intermediate cor-

relation coefficients (if the stage is not the first stage of the FHT). If an input of N-Walsh

chips is to be processed then the upper input terminal receives N/2 input signal bits and the

lower input terminal receives the other N/2 input bits.

+

-

0

1

1

0

1

0

+

+

En

1 2

1 2

Figure 10: Individual Stage of FHT

Upper Input

Lower Input

Output to

Next Stageof FHT

Terminal

Terminal

Enable


43/63

43

+ -

0 1

1 01 0

++

+ -

0 1

1 01 0

++

+ -

0 1

1 01 0

++

SamplingC

ounter

SlotBoundaryVa

lue

EnableS

tage1Complete

CommaFreeCodes

1 2 32

Slot1

Slot2

Slot3

S

lot15

Buffer

D

etector

3

RegistertoStore

Comparator

CodeGroup

SlotID

Table43GPP25.2

13v4.0

ShiftRegister

Adder

Adder/Subtrac

tor

InputDataBitsfrom

Buffer

fromStage1

En

En

En

MSB

LSB

Counter

3Bit

+ + -+

Phase1

Phase

2

Phase3

Phase4

Phas

e5

DatatoFHT

H

adamardCodeMetrics

1

2

3

4

5

6

7

8

9

10111213

15

14

16

1

1

1

1

1

2

2

15

161012

2

6

9

6

HadamardRow

Ids

ROM

32X60

123

4

123

4

1

2

1

2

11

1

2

3

y15

y15+y16

(y13-y14)+(y15-y16)

((y9-y10)-(y11-y12))+((y13-y14)+(y15-y16)

)

((y1-y2)-(y3-y4))-((y5-y6)-(y7-y8))+((y9-y10)-(y11-1

2))-((y13-y14)+(y15-y16))

y16

y15-y16

(y13-y14)-(y15-y16)

((y9-y10)-(y11-12))-((y13-y14)+(y15-y16))

((y1-y2)-(y3-y4))-((y5-y6)-(y7-y8))-((y9-y10)-(y11-12))-((y13-y14)+(y15-y16))

InputPhase1

Phase2

Phase3

Phase4

y2

y1-y2

(y1+y2)-(y3+y4)

((y1+y2)+(y3+y4))-((y5+y6)+(y7+y8))

((y

1+y2)+(y3+y4))+((y5+y6)+(y7+y8))-((y9+y10)+(y11+y12))+((y13+y14)+(y15+y16))

y1

y1+y2

(y1+y2)+

(y3+y4)

((y1+y2)+(y3+y4))+((y5+y6)+(y7+y8))

((y1

+y2)+(y3+y4))+((y5+y6)+(y7+y8))+((y9+y10)+(y11+y12

))+((y13+y14)+(y15+y16))

Figure11:16chipFHT

(C2)

(C0)


44/63

44

Figure 11 shows the design for a FHT structure which is used for decoding a 16 chip

sequence. The design proposed is a very compact and efficient implementation as com-

pared to previous designs [13] [14]. The inputs to the FHT are applied according to the

timing diagram as shown in Table 5. The inputs are applied in a non-sequential order and

hence a buffer is required to initially store the vectors before passing them to the FHT

structure. If a 16 chip sequence needs to be decoded then a buffer of length 16 registers is

required to initially store the vectors. The addition and subtraction operations in the FHT

algorithm are used to generate correlation coefficients for the received Walsh code. The

correlation coefficients express the likelihood that a received codeword is the correct

Walsh code.

Table 5: Timing Diagram of Inputs to FHT

Phase 1 Upper Input 0 1 2 3 4 5 6 7

Phase 1 Lower Input 8 9 10 11 12 13 14 15

Phase 2 Upper Input 0 1 2 3

Phase 2 Lower Input 4 5 6 7

Phase 3 Upper Input 0 1Phase 3 Lower Input 2 3

Phase 4 Upper Input 0

Phase 4 Lower Input 1


45/63

45

Phase4

((y1+y2)+(y3+y4))+((y5+y6)+(y7+y8))+((y9+y10)+(y11+y12))+((y13+

y14)+(y15+y16))

((y1+y2)+(y3+y4))+((y5+y6)+(y7+y8))-((y9+y10)+(y11+y12))+((y13+y14)+(y15+y16))

((y1+y2)+(y3+y4))-((y5+y6)+(y7+y8))+((y9+y10)+(y11+y12))-((y13+y

14)+(y15+y16))

((y1+y2)+(y3+y4))-((y5+y6)+(y7+y8))-((y9+y10)+(y11+y12))-((y13+y14)+(y15+y16))

((y1+y2)-(y3+y4)

)+((y5+y6)-(y7+y8))+((y9+y10)-(y11+y12))+((y13+y14)-(y15+y16))

((y1+y2)-(y3+y4)

)+((y5+y6)-(y7+y8)-((y9+y10)-(y11+y12))+((y13+y14)-(y15+y16))

((y1+y2)-(y3+y4)

)-((y5+y6)-(y7+y8)+((y9+y10)-(y11+y12))-((y13+y14

)-(y15+y16))

((y1+y2)-(y3+y4)

)-((y5+y6)-(y7+y8))-((y9+y10)-(y11+y12))-((y13+y14

)-(y15+y16))

((y1-y2)+(y3-y4))+((y5-y6)+(y7-y8))+((y9-y10)+(y11-y12))+((y13-y14

)+(y15-y16))

((y1-y2)+(y3-y4))+((y5-y6)+(y7-y8))-((y9-y10)+(y11-y12))+((y13-y14)

+(y15-y16))

((y1-y2)+(y3-y4))-((y5-y6)+(y7-y8))+((y9-y10)+(y11-y12))-((y13-y14)+(y15-y16))

((y1-y2)+(y3-y4))-((y5-y6)+(y7-y8))-((y9-y10)+(y11-y12))-((y13-y14)+

(y15-y16))

((y1-y2)-(y3-y4))+((y5-y6)-(y7-y8))+((y9-y10)-(y11-y12))+((y13-y14)-(y15-y16))

((y1-y2)-(y3-y4))+((y5-y6)-(y7-y8))-((y9-y10)-(y11-y12))+((y13-y14)-(

y15-y16))

((y1-y2)-(y3-y4))-((y5-y6)-(y7-y8))+((y9-y10)-(y11-y12))-((y13-y14)-(y15-y16))

((y1-y2)-(y3-y4))-((y5-y6)-(y7-y8))-((y9-y10)-(y11-y12))-((y13-y14)-(y

15-y16))

Phase3

((y1+y2)+(y3+y

4))+((y5+y6)+(y7+y8))

((y1+y2)+(y3+y

4))-((y5+y6)+(y7+y8))

((y1+y2)-(y3+y4

))+((y5+y6)-(y7+y8))

((y1+y2)-(y3+y4

))-((y5+y6)+(y7+y8))

((y1-y2)+(y3-y4

))+((y5-y6)+(y7-y8))

((y1-y2)+(y3-y4

))-((y5-y6)+(y7-y8))

((y1-y2)-(y3-y4)

)+((y5-y6)-(y7-y8))

((y1-y2)-(y3-y4

))-((y5-y6)-(y7-y8))

((y9+y10)+(y11+y12))+((y13+y14)+(y15+y16))

((y9+y10)+(y11+y12))-((y13+y14)+(y15+y16))

((y9+y10)-(y11+

y12))+((y13+y14)-(y15+y16))

((y9+y10)-(y11+

y12))-((y13+y14)-(y15+y16))

((y9-y10)+(y11-

y12))+((y13-y14)+(y15-y16))

((y9-y10)+(y11-

y12))-((y13-y14)+(y15-y16))

((y9-y10)-(y11-y

12))+((y13-y14)+(y15-y16))

((y9-y10)-(y11-1

2))-((y13-y14)+(y15-y16))

Phase2

(y1+y2)+(y3+y4)

(y1+y2)-(y3+y4)

(y1-y2)+(y3-y4)

(y1-y2)-(y3-y4)

(y5+y6)+(y7+y8)

(y5+y6)-(y7+y8)

(y5-y6)+(y7-y8)

(y5-y6)-(y7-y8)

(y9+y10)+(y11+y12)

(y9+y10)-(y11+y12)

(y9-y10)+(y11-y12)

(y9-y10)-(y11-y12)

(y13+y14)+(y15+y16)

(y13+y14)-(y15+y16)

(y13-y14)+(y15-y16)

(y13-y14)-(y15-y16)

Phase1

y1+

y2

y1-y2

y3+

y4

y3-y4

y5+

y6

y5-y6

y7+

y8

y7-y8

y9+

y10

y9-y10

y11

+y12

y11

-y12

y13

+y14

y13

-y14

y15

+y16

y15

-y16

Input

y1

y2

y3

y4

y5

y6

y7

y8

y9

y10

y11

y12

y13

y14

y15

y16

Figure12:H

adamardCodeMetrics(ButterflyO

peration)


46/63

46

The correlation coefficients are also called the Hadamard code metrics and are gener-

ated as shown in Figure 12 for a 16-point FHT. This operation is also called the butterfly

operation. The butterfly operation is also used in other digital signal processing (DSP)

applications such as calculating the discrete fourier transform (DFT). The Walsh code

having the largest metric is then selected as the most likely code that will be transmitted.

It is the job of the detector to find which of the code groups and slot ID is being used

from the table provided in the 3GPP specifications [7], using the three Hadamard rows

(Walsh codes). The detector needs to identify the code group in the minimum amount of

time which uses a lot of hardware resources. Also, if the correct sequence of Hadamard

rows is not identified and given to the detector then it can lead to wastage of additional

clock cycles as it will try to find the sequence from the table provided in the 3GPP specifi-

cations. The detection circuitry is used to locate the sequence from the table and hence

find the code group and slot ID. Also, in the 3GPP-comma free CSD implementation, two

clocks are not needed. Even if two clocks are used, a marginal gain will be achieved only

in the detection phase 5 as shown in Figure 11. This is due to the fact that detection of the

code group and slot ID cannot start till at least three slots have been identified by phases 1

- 4.

There are a number of stages in the FHT design depending on the length of the Walsh

sequence. Each subsequent stage receives an input from the previous stage in half the

number of clock cycles required for the previous stage. This is achieved by reducing the

length of shift register by a factor of two for each subsequent stage of the FHT.


47/63

47

A counter is used as a clock to determine the time interval at which each successive pair

of input signals is received by the FHT. The upper shift registers in each of the stages are

always enabled whereas the lower shift registers are enabled by the bits of the counter.

The length of the counter register is dependent on how many stages are there in the FHT.

The counter bit C0 is the LSB and C2 is the MSB. Counter bit C2 is alternately high for

four clock cycles and then goes low for four clock cycles (000...011, 100...111). The bit

C0 is alternately high and low for each clock cycle (000,001,...etc.). The number of bits in

the counter depend on the number of stages, which in turn depends on the length of Walsh-

Hadamard sequence to be used. If there are N Walsh chips then the counter length must be

log2N bits. The length of the shift register in each of the stage s of the design is given by

the following relation (N/4)/2s. For example the length of the shift registers used in the

first stage of the FHT is (16/4)/20=4. Similarly, the length of registers used in other stages

can be calculated.

In the first stage, the input signals corresponding to Walsh chips 0 to 7 arrive at the

upper adder whereas the Walsh chips from 8 to 15 are applied to the adder/subtractor cir-

cuit in the lower half of stage 1. During the first four clock cycles, the data bits from the

adder unit are selected by the multiplexer 1 in stage 1. The lower shift register of stage 1

is enabled to store the outputs from the adder/subtractor unit. Thus at the end of four

clock cycles, the upper shift register stores the result of addition of the first four pairs

whereas the lower shift register stores the result of subtraction. In the fifth clock cycle, C2

goes high which disables the lower shift register in stage 1. The result of the upper shift

register in stage 1 and the adder output from stage 1, which gives the addition of a new


48/63

48

pair of inputs, is then passed onto the adder and adder/subtractor unit in stage 2. Thus,

each subsequent stage receives its input from the previous stage. This process is then

repeated for each of the other stages in the FHT. At the end of eight clock cycles, all of the

16 correlation coefficients are generated and the largest coefficient is selected as the most

likely Walsh-Hadamard codeword to have been transmitted. The design is flexible and can

be easily modified to incorporate any chip sequence which has a length of a power of two.

5.2 Reduced Length FHT Design

If the 256X256 matrix is observed carefully then it is noticed that the 256 chip sequence

can be identified by 16 chip sequences shown in Table 6.

Table 6: Reduced Length Walsh Sequences (256 chip sequence to 16 chip sequence)

Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

2 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1

3 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1

4 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1

5 1 1 1 1 -1 -1 -1 -1 1 1 1 1 -1 -1 -1 -1

6 1 -1 1 -1 -1 1 -1 1 1 -1 1 -1 -1 1 -1 1

7 1 1 -1 -1 -1 -1 1 1 1 1 -1 -1 -1 -1 1 1

8 1 -1 -1 1 -1 1 1 -1 1 -1 -1 1 -1 1 1 -1

9 1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1

10 1 -1 1 -1 1 -1 1 -1 -1 1 -1 1 -1 1 -1 1

11 1 1 -1 -1 1 1 -1 -1 -1 -1 1 1 -1 -1 1 1

12 1 -1 -1 1 1 -1 -1 1 -1 1 1 -1 -1 1 1 -1

13 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1

14 1 -1 1 -1 -1 1 -1 1 -1 1 -1 1 1 -1 1 -1

15 1 1 -1 -1 -1 -1 1 1 -1 -1 1 1 1 1 -1 -1

16 1 -1 -1 1 -1 1 1 -1 -1 1 1 -1 1 -1 -1 1


49/63

49

Thus in a CDMA receiver, only the first 16 chips of the entire Walsh sequence can be

used. The buffer, which is used to store the input value, will also be reduced in length

from 256 to 16 registers. The proposed design ideas lead to considerable savings in hard-

ware resources. The reduced length Walsh sequence helps in achieving faster decoding.

The two designs were synthesized and the hardware resources utilized were compared on

a Xilinx Virtex-E XCV1000E FPGA.


50/63

50

Chapter 6

Experimental Method and Results

6.0 Experimental Method and Results

This Chapter explains the method used to measure the acquisition time for both of the

cell search designs, Improved CSD and the 3GPP-comma free CSD. Section 6.1.1 pro-

vides details of the FPGA used for prototyping the algorithms and for comparing the hard-

ware specifications of both designs. Section 6.2 presents the results of the acquisition time

measure and the hardware comparison. Section 6.2 also compares the hardware utiliza-

tion of the FHT design using 256 and 16 chip sequences.

6.1 Experimental Method

The acquisition time was measured by counting the number of clock cycles used by the

RTL simulation. The input chip rate is given by the 3GPP specifications and this gives the

acquisition time measure. For comparing the hardware specifications and the maximum

frequency of operation of both designs on the FPGA, the Xilinx Foundation ISE software

was used to generate the bit map file for programming the FPGA. The details of the

FPGA and the design process used for the hardware comparison are explained in Section

6.1.1.


51/63

51

6.1.1 FPGA Design Process

The FPGA used for prototyping the designs is a Xilinx Virtex-E XCV1000E BG560

with a speed grade of 6. As the name suggests, FPGAs are capable of being reconfigured

to implement any desired digital circuit. This is made possible by having a large number

of small configurable logic blocks (CLB) and a connection mechanism between these

blocks which is used to interconnect the CLBs according to the design. The basic building

block of the Virtex-E CLB is the logic cell (LC). Each Virtex-E CLB contains four LCs,

organized in two similar slices, as shown in Figure 13 [20]. A LC includes a 4-input func-

tion generator, carry logic, and a storage element. Virtex-E function generators are imple-

mented as 4-input look-up tables (LUTs). Along with the LUTs the CLB also contains D

flip-flops for storing data. The output from the function generator in each LC drives both

the CLB output and the D input of the flip-flop. The block diagram of a 2-Slice Xilinx

Virtex-E CLB is as shown in Figure 13. The detailed view of a Virtex-E Slice is shown in

Figure 14 [20].


52/63

52

Figure 13: 2-Slice Virtex-E CLB

Figure 14: Detailed View of Virtex-E Slice


53/63

53

The entire design was coded in Verilog at the Register Transfer Level (RTL). The RTL

design was then synthesized using the Synopsys FPGA Express synthesis tool available

with the Foundation ISE software. The bit map generated was then used to program the

FPGA using the JTAG cable.

6.2 Experimental Results

To compare the acquisition time between the Improved CSD and the 3GPP-comma free

CSD, experiments were carried out using input vectors generated in Matlab. Threshold

values determined for the two probabilities of false alarm rates (PFA=10-3 and PFA=10

-4)

were 28 and 37 respectively. The number of clock cycles between the start of the system

and the point when the counter in stage 3 exceeds the computed threshold values was

determined. The equivalent gate count and maximum frequency of operation were com-

pared for both the designs using a 256 chip sequence in stage 2 and the same design con-

straints in the FPGA Express synthesis tool on a Xilinx Virtex-E XCV1000E FPGA.

From the experiments conducted, it was observed that the Improved CSD uses fewer

number of slots to achieve synchronization as compared to the 3GPP-comma free CSD in

stage 2. The results obtained indicate that when averaging is carried out over 15 slots in

stage 1 of both the designs (PFA1=10-3 and VTH1=28), the Improved CSD has an acquisi-

tion time of 13.66 msec as compared to 14.53 msec for the 3GPP-comma free CSD. Thus,

the Improved CSD achieves an improvement of 0.87 msec for an AWGN channel (Figure


54/63

54

15). Similarly, an improvement of 0.87 msec was observed when PFA2=10-4 and

VTH2=37. Figures 15 and 16 show the acquisition time measures for 2,4,8 and 15 slots in

stage 1 of the design. The number of slots in the other stages, as discussed in previous

Chapters, were kept fixed as 1 slot in stage 2 of the Improved CSD and three slots in

3GPP-comma free CSD and 15 slots in stage 3 of both designs.


55/63

55

Figure 15: Comparison of Improved CSD and 3GPP-comma free CSD PFA=10-3

Figure 16: Comparison of Improved CSD and 3GPP-comma free CSD PFA=10-4

2 4 6 8 10 12 14 162

4

6

8

10

12

14

16Acquisition Time Measures: Quantization 4 Input Data Bits

Number of Slots in Stage1

AcquisitionTime(inmsec)

Improved CSD3GPPcomma free CSD

2 4 6 8 10 12 14 164

6

8

10

12

14

16Acquisition Time Measures: Quantization 4 Input Data Bits

Number of Slots in Stage 1

AcquisitionTime(inmsec)

Improved CSD3GPPcomma free CSD


56/63

56

As seen from Table 7, the Improved CSD had a lower equivalent gate count (136,297)

and a higher maximum frequency of operation (22.066 MHz) on a Xilinx Virtex-E

XCV1000E FPGA as compared to the 3GPP-comma free CSD when the same constraints

were used in the synthesis of both the designs.

In the FHT design, the input Walsh sequence length can be reduced from 256 chips to

16 chips to reduce the hardware utilization. The proposed idea leads to considerable sav-

ings in hardware resources. The buffer, which is used to store the input value, is reduced

in length from 256 to 16 registers. The reduced length Walsh sequence helps in achieving

faster decoding. The FHT designs using 16 and 256 chip sequences were synthesized and

the hardware resources utilized were compared using a Xilinx Virtex-E XCV1000E

FPGA. The hardware utilization for both the FHT designs are compared in Table 8.

The results of the reduced length sequence indicate that the FHT design, using 16 chip

sequence, achieves 90% reduction in hardware resources (equivalent gate count) as com-

pared to the design which uses 256 chip sequence. Also, the maximum frequency of oper-

Table 7: Hardware Specifications of System: Quantization 4 Input Data Bits

FPGA XCV 1000E

BG560 Speed Grade 6

Number

of Slice

Registers

Number of

4 Input

LUTs

Equivalent

Gate Count

Max. Frequency of

Operation (Post

Route Timing)

Improved CSD 9086 7354 136297 22.066 MHz

3GPP-comma free CSD 10141 7777 144180 12.887 MHz

Table 8: Hardware Specifications of FHT: 16 and 256 chip sequence

FPGA XCV

1000E BG560

Speed Grade 6

Number of

Slice Registers

Number of 4

Input LUTs

Equivalent

Gate Count

Max. Frequency of

Operation (Post

Route Timing)

FHT 16 chips 71 173 1591 35.769 MHz

FHT 256 chips 1070 1370 17,191 16.025 MHz


57/63

57

ation of the 16 chip FHT (35.679 MHz) is more than double that of the 256 chip FHT

(16.025 MHz).


58/63

58

Chapter 7

Summary, Conclusions and Future Work

7.0 Summary, Conclusions and Future Work

In this Chapter the conclusions drawn form the experimental results are summarized

and the scope for future work is outlined.

7.1 Summary

In Chapter 2, we discussed some of the previous work done by other research groups

and also the 3GPP working group suggestions. Chapter 3 introduced the cell search algo-

rithm, which is divided into three stages to simplify the synchronization between the MS

and the BS. Chapter 4 discussed the Improved CSD which is the proposed design scheme

to perform initial cell search. The hierarchical matched filter design proposed by Siemens

and Texas Instruments was used in stage 1 of both the cell search designs [6]. In stage 2 of

the initial cell search algorithm, two possible design schemes were compared: the

Improved CSD which uses cyclic codes and the 3GPP-comma free CSD using the comma

free codes. The details of the Improved CSD are described in Chapter 4. In stage 3 of

both the cell search designs, masking functions are proposed to reduce the hardware utili-

zation as compared to the previous design described by Li et al. [4]. Chapter 5 described

the 3GPP-comma free CSD using a FHT design in stage 2 of the cell search algorithm.

Further design improvements are suggested in the FHT design by reducing the length of


59/63

59

the input Walsh sequence from 256 chips to 16 chip sequences. Chapter 6 discussed the

experimental method and presented the results in terms of acquisition time and hardware

utilization for both the Improved CSD and the 3GPP-comma free CSD. The hardware uti-

lization of the FHT design using 256 chip sequences and the reduced length (16 chip

sequences) are also presented.

7.2 Conclusions

For an AWGN channel model in a high signal-to-noise ratio environment, it was found

that accumulation over one slot in the Improved CSD scheme and accumulation over three

slots in the 3GPP-comma free CSD scheme in stage 2 of the cell search algorithm gives

correct code group and slot boundary identification. Due to the reduction in the required

number of slots, the Improved CSD uses lesser number of clock cycles in stage 2 as com-

pared to the 3GPP-comma free CSD to detect the code group and slot ID. This reduction

in the number of clock cycles leads to faster acquisition, fewer calls getting dropped and

lower power consumption during the synchronization between the MS and the BS. The

use of cyclic codes in the Improved CSD has lower hardware utilization and a higher max-

imum frequency of operation as compared to the 3GPP-comma free CSD. In conclusion,

the Improved CSD is a better cell search design in comparison to the 3GPP-comma free

CSD since it has faster acquisition time and lower hardware utilization.


60/63

60

7.3 Future Work

This thesis investigates code and time synchronization of the cell search algorithm. In

addition to code and time synchronization, frequency synchronization between the MS

and the BS needs to be achieved. The receiver design presented in this thesis would need

to include another module to achieve frequency synchronization. Also, the cell search

considered in this thesis is initial cell search. There is another cell search called target cell

search which needs to be performed during a call and when a MS is in motion and moves

from one cell to another. VLSI implementations to perform target cell search efficiently

need to be investigated.

Kiessling et al. [21] suggest performance enhancements to W-CDMA initial cell search

algorithm. The authors consider the advantages of oversampling and passing multiple

candidates in the cell search stages instead of one candidate to reduce the cell search time.

Passing multiple candidates in each of the stages will reduce the cell search time but

increase the design complexity a

initial cell serch paper

Documents