Data Loading...
Built-In Self-Test Flipbook PDF
No description
136 Views
58 Downloads
FLIP PDF 2.39MB
US011449404B1 ( 12 )
United States Patent
( 10 ) Patent No .: US 11,449,404 B1 (45) Date of Patent : Sep. 20, 2022
Ziaja et al .
( 54 ) BUILT- IN SELF - TEST FOR PROCESSOR
11/1994 Giles et al .
5,369,752 A
UNIT WITH COMBINED MEMORY AND
5,978,946 A * 11/1999 Needham
LOGIC
6,249,892 B1 * 6/2001 Rajsuman
( 71 ) Applicant: SambaNova Systems , Inc. , Palo Alto , CA (US) ( 72 ) Inventors: Thomas Alan Ziaja, Austin , TX (US ) ; Dinesh Rajasavari Amirtharaj , Milpitas , CA (US ) ( 73 ) Assignee : SambaNova Systems , Inc. , Palo Alto , CA ( US ) ( * ) Notice: Subject to any disclaimer, the term of this
6,249,893 B1 * 6/2001 Rajsuman
G06F 11/2236 714/E11.166 GOOF 11/2236 714/739 GO6F 11/2236 714/739
(Continued ) FOREIGN PATENT DOCUMENTS WO
2010142987 Al
12/2010
OTHER PUBLICATIONS
patent is extended or adjusted under 35 U.S.C. 154 ( b ) by 0 days. ( 21 ) Appl. No .: 17 /501,969
Prabhakar et al., “ Plasticine: A Reconfigurable Architecture for
( 22 ) Filed :
Primary Examiner
Oct. 14 , 2021
Parallel Patterns”, ISCA ’17 , Jun. 24-24 , 2017 , Toronto ON , Canada . 4
(Continued ) Matthew M Kim
Assistant Examiner Indranil Chowdhury
Related U.S. Application Data
( 60 ) Provisional application No. 63 /220,266 , filed on Jul .
(74 ) Attorney , Agent, or Firm — Haynes Beffel & Wolfeld LLP ; Sikander M. Khan ; André Henri Grouwstra
9 , 2021 .
( 57 )
(51 ) Int. Cl. GOOF 11/27 G06F 11/22
( 2006.01 ) ( 2006.01 )
(52) U.S. CI .
G06F 11/27 (2013.01 ) ; G06F 11/2236
CPC
( 2013.01 )
( 58 ) Field of Classification Search ???
G06F 11/27 ; G06F 11/2236
See application file for complete search history. References Cited
( 56 )
U.S. PATENT DOCUMENTS 3,761,695 A
9/1973 Eichelberger
4,929,889 A * 5/1990 Seiler
ABSTRACT
A processor unit includes a memory and an ALU coupled with the memory. The processor unit also comprises a test controller, a test control register, and a signature register. The test controller manages a series of steps to test the processor unit . It overrides an ALU control signal with a replacement ALU control signal , stored in the test control register. It generates a test pattern and writes it to a memory address . It reads memory output data from the memory address, and forwards it to the ALU . The ALU executes an operation on the memory output data based on the replace ment ALU control signal. The ALU output provides a test result, which is compressed to obtain a test signature , and stored in the signature register.
GOIR 31/318558 714/ E11.169
22 Claims , 8 Drawing Sheets
400
Configurable unit 473 495
494
MUX 490
498
497
420
ALU
MUX
440
492
472 491
MUX
430
496
Logic
Memory 410 474
MUX
Register 462 ALU Control
450
471
BIST Control 470
Test I / F 460
480 MISR
Compress 482
Register 485
US 11,449,404 B1 Page 2
( 56 )
References Cited U.S. PATENT DOCUMENTS 6,317,819 6,532,337 8,214,172 8,924,801
B1 * B1 B2 B2
11/2001 3/2003 7/2012 12/2014
Morton Yoshinaka Wang et al . Tekumalla et al .
9,739,833 B2 8/2017 Hou et al . 10,831,507 B2 11/2020 Shah et al . 2002/0083388 A1 * 6/2002 Lueck
2004/0123198 2004/0218454 2005/0268185 2011/0239070
Al Al
G06F 9/3012 711 /E12.051
GOIR 31/318547 714/726
6/2004 Gschwind 11/2004 Gorman et al .
A1 12/2005 Vinke et al. A1 * 9/2011 Morrison
G06F 11/2236 714/ E11.155
2013/0080847 Al
2014/0281776 Al 2014/0317463 Al 2015/0206559 Al 2015/0276874 Al
3/2013 Zorian et al .
9/2014 Champion et al . 10/2014 Chandra et al . 7/2015 Priel et al . 10/2015 Morton
2015/0325314 A1 * 11/2015 Ziaja
2018/0238965 Al
2019/0204382 2020/0258590 2020/0310809 2022/0092247
A1 A1 * A1 A1 *
G11C 7/1078 365 / 189.12
8/2018 Anzou et al .
7/2019 8/2020 10/2020 3/2022
Pradeep et al . GO1R 31/318597 Spica Hughes et al . G06F 8/443 Koeplinger
Garg et. al . , LBIST — A technique for infield safety, Design & Reuse , dated Sep. 21 , 2015 , 4 pages . Radhakrishnan, Design for Testability ( DFT ) Using SCAN , dated Sep. 1999 , Issue - 2 , 13 pages. Press, Thorough test means testing through the RAM , EDN , dated Sep. 17 , 2012 , 3 pages . Li et. al . , Logic BIST : State -of - the - Art and Open Problems, dated Mar. 16, 2015 , 6 pages. Venkataraman et. al., An experimental study of N - detect scan ATPG patterns on a processor, Proceedings of the 22nd IEEE VLSI Test Symposium (VTS 2004 ) , dated May 2004 , 7 pages . Krishna H. V. et.al. , Techniques to Improve Quality of Memory Interface Tests in SoCs Using Synopsys TetraMAX's RAM Sequen tial ATPG , Texas Instruments, Bangalore, India, 14 pages . Retrieved on Oct. 20 , 2021. Retrieved from the internet [URL : https:// pdfcoffee . com / ram - sequential -atpg -pdf- free.html ] . MacDonald , Logic BIST, EE5375 University of Texas El Paso (UTEP) , dated Nov. 20 , 2014 , 15 pages.
Einfochips PES , Memory Testing: MBIST, BIRA & BISR | An Insight into Algorithms and Self Repair Mechanism , Einfochips, dated Dec. 11 , 2019 , 14 pages . Retrieved on Oct. 21 , 2021 . Retrieved from the internet [URL: https://www.einfochips.com/blog/ memory -testing - an -insight-into -algorithms- and -self- repair-mechanism / # utm_source = rss & utm_medium = rss ] .
U.S. Appl. No. 17 /503,227_Office Action dated Feb. 4 , 2022 , 11 pages.
U.S. Appl. No. 17 /468,066 — Office Action dated Jan. 18 , 2022 , 46 pages .
OTHER PUBLICATIONS
Seok , et al . , “Write - through method for embedded memory with
compression Scan - based testing, ” 2012 IEEE 30th VLSI Test Sym >>
posium (VTS ), Apr. 23-26 , 2012 , pp . 158-163 . Sitchinava, Thesis : “ Dynamic Scan Chains A Novel Architecture to Lower the Cost of VLSI Test , " MIT, Sep. 2003 , 64 pages . Podobas et al , A Survey on Coarse -Grained Reconfigurable Archi tectures From a Performance Perspective , IEEEAccess , vol . 2020 . 3012084 , Jul . 27 , 2020 , 25 pages. M. Emani et al., “ Accelerating Scientific Applications With SambaNova Reconfigurable Dataflow Architecture , ” in Computing in Science &
Engineering , vol . 23 , No. 2 , pp . 114-119 , Mar. 1 - Apr. 2021 , doi : 10.1109 /MCSE.2021.3057203 .
U.S. Appl. No. 17/ 503,227 Notice of Allowance dated Apr. 4 , 2022 ,
9 pages.
U.S. Appl. No. 17 /468,024 — Office Action dated Jan. 18 , 2022 , 47 pages . U.S. Appl. No. 17 / 468,024_Response to Office Action dated Jan. 18 , 2022 , filed Apr. 7 , 2022 , 11 pages . U.S. Appl. No. 17/ 468,066 Response to Office Action dated Jan. 18 , 2022 , filed Apr. 7 , 2022 , 11 pages . PCT /US2021/057391 — International Search Report and Written
Opinion , dated Feb. 24 , 2022 , 14 pages . U.S. Appl. No. 17 /503,227_Response to Office Action dated Feb. 4 , 2022 , filed Mar. 17 , 2022 , 9 pages .
* cited by examiner
U.S. Patent
Sep. 20 , 2022
US 11,449,404 B1
Sheet 1 of 8
100
Configurable unit 190
198
192
Memory
/
ALU 140
110
1 .
/ 1 1
ATPG
1
11
1 1
1
1
1 ] 1
. 1 1
1 1 1 1
MBIST
1
Test I / F 160
170
1
1
ALU Control
1 I 1
150
1 1 1 1 1 !
] 1
1 11
1
FIG . 1 200
Configurable unit
! 1
230
1
294
1 1
220 1 1
292
298
297
Logic 0
1
296
I
MUX
ALU 240
290
1
1 1
I I
Memory
I
ALU Control
210
250
1
1 1
1 1
1
1 11
ATPG
1 1
1 1
1
1
.
1 1
MBIST 270
Test I / F 260
1
I
1
1
1
I
1
FIG . 2
U.S. Patent
Sep. 20 , 2022
US 11,449,404 B1
Sheet 2 of 8
300
Configurable unit 372 391
390
Memory
MUX
3
392
398
ALU 340
310
3 1
374
! 3
MUX
3)
1380
MISR
Register
3
371
Compress
362
382
ALU Control 350
BIST Control 370
Test I / F 360
FIG . 3
1
Register 2 .
385
U.S. Patent
Sep.20,2022
US 11.449.404 B1
Sheet 3 of 8
400
Configurable unit
1 0
473 495
? ?? 494
7
430
496
497
Logic
]
1
1 1 1
420
490
498
ALU 440
492
472
0
1
491
1
1
1
Memory
MUX
1
410
1
474
1 .
1
MUX
480
MISR
Register 11
462 ALU Control 450
471
BIST Control 470
Test I / F 460
Compress 1
1
Register 485
1
FIG . 4
482
U.S. Patent
Sep. 20 , 2022
Sheet 4 of 8
US 11,449,404 B1
500 510
Provide a first memory test vector to the memory data input r
520
Write the first memory test vector to a first memory address 530
Read memory output data from the first memory address 540
Forward the memory output data to the ALU r
550
Replace the ALU control signal 560
Perform ALU operation based on replacement control signal r 570
Obtain test result from the ALU data output 580
Compress the test result to obtain a signature r
Store the signature in a register
FIG . 5
590
U.S. Patent
Sep. 20 , 2022
US 11,449,404 B1
Sheet 5 of 8
600 -
Reconfigurable Processor 610 Tester 620
Test I / F
628 616
625 I/ O I / F 638
Host
630
Array of Configurable Units with BIST 615
618
635
Memory
Memory I / F
640
648 645
FIG . 6 715 10 738
MAGCU1
MAGCU2
AGCU12 L11
711
Tile 1
L13
712
710
AGCU23
713 AGCU24
122
L12
714
Tile 2 720
L14
AGCU14
AGCU13
AGCU22 L31
715
716
FIG . 7
U.S. Patent
Sep. 20 , 2022
US 11,449,404 B1
Sheet 6 of 8
800
N
818 801
811 820 1831
841
812
S
802 S
S PMU
CU AG !
S
S
S
S
S
S
PMU S
PMU S
PMU
PCU
CU AG !
S
S
PCU
PCU
PMU AG !
S
AG !
S
PCU
PMU
PMU
PCU
803 S
S
PCU
S
CU
E
W
821
PCU S
PCU
PMU S
S
PMU
PCU
S
S
CU St
AG !
S
AG
FIG . 8A 850 851
S
857 S
AG
S
PCU
PMU
CU S
PMU
PCU
PMU
PCU
AG
S
PMU
S
S
PCU
S
CU
14 ACAceh PCU
PMU
AG
S PCU
PMU
PCU
PCU
PMU S
S
PMU
CU
853
855
854 856 AG
852
S
862
PMU
PCU S
?? S
863
861
FIG . 8B
CU
U.S. Patent
Sep. 20 , 2022
US 11,449,404 B1
Sheet 8 of 8
1000 1089 Control Counter chain 1090
Control outputs
1094
1080 ALU Func. Unit Scalar
Inputs
Scalar FIFO 1070 1060
Vector
Inputs
1081
Func. Unit 1082
Func. Unit 1083 Scalar outputs
Func. Unit
Func. Unit
Func. Unit
1084
1085
1086
1089
1098
Vector outputs
Vector FIFO
1021
Vector FIFO
1010
1054
Input config data
MUX
Configuration Data Store 1020 with config . data for comp. unit 1022
Unit configuration load /unload process 1040
Completion bus Command bus
1091
11
Daisy chain logic 1093
Register
1092
1
1
|
1055
MISR
1053
Compress
|
1056
BIST Control 1052
Test I / F 1050
1
1 1
1 1
Test bus
FIG . 10
I
Register 1057
US 11,449,404 B1 1
2
BUILT- IN SELF - TEST FOR PROCESSOR UNIT WITH COMBINED MEMORY AND
IC . It can also be used after production, so that an IC in a
life -critical application can test itself every time it is pow LOGIC ered up . Logic BIST generates and applies a relatively large num CROSS -REFERENCES 5 ber of pseudo - random test vectors to the scan chains , com presses the results obtained at - speed , and compares the This application claims the benefit of U.S. provisional compressed results with precompiled compressed results to patent application No. 63 / 220,266 , entitled , “ Logic BIST detect any differences (i.e. , errors ). However, LBIST has and Functional Test for a CGRA , ” filed on 9 Jul. 2021. The challenges. The pseudo - random test vectors can create paths priority application is hereby incorporated by reference 10 that are notused in normal operation ( false paths), and may detect failures on the false paths. This wastes good ICs . herein for all purposes. LBIST may also generate extra heat because of heightened This application is related to U.S. application entitled activity test that would not be experienced in normal “ Array of Processor Units with Pathway BIST ” , Ser. No. operationduring . The extra heat can cause timing violations, and 17incorporated /503,227 filed concurrently herewith , which is hereby 15 thus functional faults. The heightened activity may also by reference herein for all purposes. cause crosstalk issues that are not experienced during nor The following are also incorporated by reference for all mal operation. Yet another problem is that LBIST cannot purposes as if fully set forth herein : control don’t -care bits . Whereas typically with ATPG the Prabhakar et al . , “ Plasticine: A Reconfigurable Architec test coverage grows roughly linearly with the number of test
ture for Parallel Patterns , ” ISCA ’17 , Jun . 24-28 , 2017 , 20 vectors (until it nears an asymptote ), for LBIST the test Toronto , ON , Canada; and coverage grows only roughly logarithmically, and the Shah et al . , “ Configuration Load of a Reconfigurable Data Processor ” , U.S. Pat . No. 10,831,507 , issued Nov. 10 , 2020 .
asymptote may be lower than achieved with ATPG . The fastest digital circuits cannot take the burden of slowdown by flipflops for scan testing with ATPG or LBIST BACKGROUND 25 vectors , and they may not be coverable with scan tests . For those cases , functional tests may be developed that directly Technical Field test for the correct functionality of a circuit or block . Functional tests are used in moderation , as their develop The technology disclosed relates to built - in self -test 30 ment consumes much engineering time , and production test ( BIST ) of integrated circuits. In particular, it relates to may take much tester time . testing of processor chips that include one or more modules Processor chips are conventionally tested with ATPG for the logic and MBIST for the memory. The arithmetic logic comprising a datapath with a memory and an ALU . unit (ALU ) performs a number of different operations ( on sets of two input numbers ). The number of internal states the Context 35 ALU can have can be exceedingly high, and ATPG scan has been considered the only practical solution to The subject matter discussed in this section should not be testing achieve good coverage . assumed to be prior art merely as a result of its mention in However, the logic related to the insertion of MBIST this section . Similarly, a problem mentioned in this section vectors the extraction of MBIST results creates prob or associated with the subject matter provided as back 40 lems forand scan ( ATPG or LBIST ) , including interface ground should not be assumed to have been previously logic that is nottesting observable , or shadow logic that isn't used recognized in the prior art. The subject matter in this section .
merely represents different approaches, which in and of themselves can also correspond to implementations of the
in normal operation. Additionally, large processor chips
made in advanced semiconductor processes show more failures than is expected on the basis of the scan test claimed technology. 45 coverage for both stuck - at faults and speed -dependent Integrated circuits that combine multiple processors on a mechanisms that should be found with at -speed tests . single die , such as used for artificial intelligence or graphics Defects that are the suspects for this discrepancy may processing, are made in the most advanced semiconductor include ( 1 ) bridging ( short -circuits ), ( 2 ) opens (missing processes. Such processes always provide new challenges to connection ), (3 ) defects in re - convergent logic for stuck - at IC designers, including for testing their correct functionality 50 vectors, (4) high-resistive shorts known as non -logic bridg prior to shipping finished product to customers . Tradition- ing , ( 5 ) resistive opens, and ( 6 ) coupling faults for at - speed ally, a digital IC is tested using automatic test pattern vectors . One approach to capture these defects is functional generation ( ATPG ), which distributes the ATPG test vectors testing , with the drawbacks mentioned above . Another, over the IC in scan chains that transport the test vectors from called “ n -detect”, is detecting a defect in n different ways as an external tester to scan flipflops in the logic, and that 55 if it were a stuck-at fault. However, applying n-detect on transport test results back from the scan flipflops to the ATPG increases the cost of testing by n times . external test machine . Test vectors and test results may be SUMMARY compressed for more efficient interfacing. Tests focus on detecting stuck - at faults, and a coverage ( i.e. , reachability and observability ) percentage in the high nineties has been 60 In a first aspect , implementations of the disclosed tech considered adequate . Additionally, an integrated circuit may nology provide a configurable unit that includes a memory , include circuits for built - in self -test (BIST ) dedicated to an ALU coupled with the memory, a test controller, a test specific blocks . Those includes memories (memory BIST, or control register, and a signature register. The signature MBIST) , other standardized circuits , and some logic ( logic register may be coupled with an ALU output to receive ALU BIST, or LBIST ) . BIST significantly reduces the depen- 65 output data , compress the ALU output data, and store the dence on an external tester and the cost of testing the IC , compressed result as a test signature. The test controller which is proportional to the time a tester takes for testing the manages a series of steps. The steps include overriding an
US 11,449,404 B1 3
4
ALU control signal with a replacement ALU control signal ( from the test control register ). The test controller generates a test pattern and forwards the test pattern to an input of a first circuit, other than a scan chain input. The first circuit output data is forwarded to the ALU , which executes an ALU operation on the first circuit output data, based on the replacement ALU control signal . A test result is obtained from the ALU output, compressed, and stored in the signa
Detailed Description , are intended to limit the scope as claimed . Instead , they merely represent examples of differ
ent implementations of the disclosed technology. 5
DETAILED DESCRIPTION
Terminology
ture register. AGCU — Address generation and coalescing unit . The first circuit may be ( or include ) the memory. The test 10 ALU — arithmetic logic unit . controller forwards the test pattern to the memory and writes ATPG - automatic test pattern generation . it a first address. It obtains first circuit output data by reading BIST — built - in self - test. from the first address. The test pattern may be included in a CGRA — coarse - grained reconfigurable architecture . series of test patterns for detecting a memory error . The test CPU - central processing unit , a datapath along with a pattern may include a pseudo - random number, focused at 15 control unit . testing logic in the datapath , including testing the ALU . Datapath — a collection of functional units that perform In a second aspect , implementations of the disclosed technology provide a method to test a datapath in a config data processing operations, registers, and buses . The func
units may include memory, ALUs , multipliers, etc. urable unit . The datapath includes a memory and an ALU . tional LFSR a linear - feedback shift register. The method includes the following steps . It provides a 20 MISR — Multiple - input signature register. memory test vector from aa series of memory test vectors to PCU — Pattern compute unit . the memory, and writes the memory test vector to a first PMU — Pattern memory unit . address in the memory. It reads memory output data from the first address, and forwards this data to the ALU . The method Processor an electronic circuit that processes informa replaces a signal on an ALU control input with a replace- 25 tion ( data and /or signals ). ment ALU control signal , and the ALU performs an operaSIMD Single instruction, multiple data . tion on the data read from the memory, based on the replacement ALU control signal . A test result is obtained
Introduction The datapath in a configurable unit in a CGRA may, for
from the ALU data output, and compressed to obtain a test example , include logic circuits , a memory and an ALU . The signature. The test signature is stored in a signature register. 30 ALU functionality may be configurable by an ALU control Particular aspects of the technology disclosed are circuit responsive to a configuration file or bit file in a data described in the claims , detailed description, and drawings. flow architecture , or responsive to instructions in instruction cycles in a control flow architecture . The ALU may be or BRIEF DESCRIPTION OF THE DRAWINGS 35 include one or more SIMDs for performing parallel opera tions . Multiple interconnected configurable units may make FIG . 1 illustrates a processor unit, such as used in a up a deep neural net, applicable for a wide spectrum of Coarse Grain Reconfigurable Architecture ( CGRA ) chip . functions are enhanced or made possible by artificial FIG . 2 illustrates a block diagram of another processor intelligencethat . Because of the large size of CGRA and other unit , such as might be used in a multiprocessor chip .
FIG . 3 illustrates another processor unit with built- in self 40 processor chips, modern processes are used , and conven tional ways of production testing can no longer adequately FIG . 4 illustrates another processor unit with BIST struc and cost - effectively find nearly all functional defects.
test.
tures in an implementation of the disclosed technology. FIG . 5 illustrates a method for testing a datapath in a processor unit in an implementation of the disclosed tech- 45 nology FIG . 6 is a system diagram illustrating a test system including a reconfigurable processor, a tester, a test host , and a memory , in an implementation of the disclosed technology. FIG . 7 is a simplified block diagram of components of a 50 CGRA processor. FIG . 8A is a simplified diagram of aa tile comprising an array of configurable units in an implementation of the disclosed technology. FIG . 8B is another example diagram of a tile comprising 55 an array of configurable units in an implementation of the disclosed technology. FIG . 9 is a block diagram illustrating an example configurable Pattern Memory Unit (PMU) including BIST circuits . FIG . 10 is aa block diagram illustrating an example con- 60 figurable Pattern Compute Unit ( PCU ) including BIST circuits . In the figures, like reference numbers may indicate functionally similar elements. The systems and methods illustrated in the figures, and described in the Detailed Descrip- 65 tion below , may be arranged and designed in a wide variety of different implementations. Neither the figures, nor the
Memory BIST fails to adequately cover some relevant parts of the datapath . High - coverage scan tests still don't adequately find all defects. ATPG finds mostly stuck - at faults only, and n - detect ATPG scan tests are very expensive . Logic BIST has many challenges and can lead to false rejects. Implementations of the disclosed technology provide a novel way of testing a configurable unit and other processor units. They equip the configurable unit with a test controller or BIST controller that tests the datapath from input to output, even if it is very wide , and that may provide both tests targeting the memory and tests targeting the ALU and other logic . Tests may be deterministic ( for the memory ) and / or pseudo - random ( for the logic ) . The BIST controller ensures that the datapath is in a state that is similar to normal operation , so that logic testing becomes quasi- functional testing with generated , rather than designed , tests . It also controls compression of the output data to create a test signature that an external tester can compare with a pre compiled signature. The use of generated tests provides the advantages of n - detect without the associated costs . Implementations The following detailed description is made with reference to the figures. Example implementations are described to illustrate the technology disclosed , not to limit its scope ,
US 11,449,404 B1 5
6
which is defined by the claims . Those of ordinary skill in the art will recognize a variety of equivalent variations on the description that follows .
semiconductor process technologies and the very wide data paths in current configurable unit designs , it misses many defects.
FIG . 1 illustrates a processor unit 100 such as used in a FIG . 3 illustrates another processor unit 300 with built - in CGRA . Processor unit 100 may be configurable using con- 5 self - test. FIG . 3 includes all elements of FIG . 1 , with like figuration data like a bit file, and includes memory 110 , ALU numbering, such as memory 310 , ALU 340 , and ALU 140 , and ALU control circuit 150 , which controls the control circuit 350. The operation of ALU 340 can be function of ALU 140. Data flows from input databus 190 determined by an ALU control signal generated in ALU through the blocks and intermediate databuses to output control circuit 350 , which can be statically configured in a databus 190 may carry address flow setting using configuration data from a bit file, or tion and 198. read Input /write databus control lines for memory 110 , asinformawell as 10 data provided in each instruction cycle in a control flow setting data for memory 110. Input databus 190 may be very wide, by instruction . Input databus 190 is illustrated in and include multiple lanes of parallel data . The memory two parts, inputdecoding databus 390 and intermediate bus 391 , output data flows through intermediate bus 192 and enters
ALU 140. ALU 140 may perform various sorts of operations 15 separated by multiplexer 372, and output databus 398 . Processor unit 300 further includes test interface 360 ( for on the data it receives from intermediate bus 192. The type Processor unit 100 further includes test interface 160 ( for example , a JTAG port) which receives test instructions and
example, a JTAG port ), test control register 362 , BIST controller 370 , multiplexer 374 , and MISR 380 , which may include test result compressor 382 and signature register test data, and returns test results . Test interface 160 controls 20 385. This is an example of a processor unit including a test MBIST controller 170 , which can autonomously test the pattern generator, a circuit ( BIST controller 370 ) to apply a memory and in many cases repair some defective loca- test vector from the test pattern generator to a data input of tions ) , and it can send ATPG vectors into the scan chains , a datapath , and a test result output (MISR 380 ) , configured and return test results from the scan chains . This is an to output a test result at the test result output. example of a processor unit including a test pattern genera- 25 In normal operation , data flows through and is processed tor, a circuit ( MBIST controller 170 ) to apply a test vector in processor unit 300 in the same manner as it flows through from the test pattern generator to a data input of a datapath ; and is processed in processor unit 100 of FIG . 1. The BIST and a test result output (test interface 160 ) ; configured to circuits may be inactive . An external tester may load test output a test result at the test result output. MBIST is a rather control register 362 with a replacement ALU control word , effective solution for testing memories , whereas ATPG is an 30 which it passes to an input of multiplexer 374 as a replace efficient and low - power solution that readily achieves a ment ALU control signal. In BIST mode , controlled by BIST relatively high coverage of stuck - at faults in logic circuits. controller 370 , multiplexer 372 replaces input data from
of ALU operation is controlled by ALU control circuit 150 .
FIG . 2 illustrates a block diagram of another processor unit 200 , such as might be used in a multiprocessor chip . In this architecture , there are two input paths, that may each have their own databus . Processor unit 200 may be configurable using configuration data like a bit file. It includes memory 210 , logic circuit 220 , multiplexer 230 , ALU 240 , and ALU control circuit 250. An input databus 290 may carry address information and read /write control lines for memory 210 , as well as data for memory 210. Input databus 290 may be very wide, and include multiple lanes of parallel data . An input databus 294 may carry data for logic circuit 220. Logic circuit 220 may include combinational logic , flipflops, registers, and other elements. Memory 210 transfers its output data to multiplexer 230 via intermediate bus 292 , and logic circuit 220 transfers its output data to multiplexer 230 via intermediate bus 296. Multiplexer 230 selects data from either intermediate bus 292 or intermediate bus 296 and transfers it to ALU 240 via intermediate bus 297. ALU 240 provides its output data on output databus 298 . Processor unit 200 further includes test interface 260 ( for example, a JTAG port) and MBIST controller 270 , which provide the same functionality as test interface 160 and MBIST controller 170 in FIG . 1. Whereas test interface 160 controlled the scan chains for ATPG vectors for ALU 140 and ALU control circuit 150 , test interface 260 controls the
scan chains for ATPG vectors for ALU 240 , ALU control
35
40
45
50
55
input databus 390 with test vectors generated by BIST controller 370. The test vectors may include memory addresses and data to be stored in memory 310 , and may be accompanied by read and write control signals for memory 310. In BIST mode, the test patterns may include determin istic vectors targeted at memory testing , and pseudo - random data targeted at logic testing . BIST controller 370 can be configured to test the whole datapath from the input of memory 310 through the output of ALU 340 , using the techniques described herein . For example, BIST controller 370 may generate or output a series of memory tests (test patterns optimized for detecting a memory error - such as a march algorithm , RAM sequential, zero -one , checkerboard , butterfly, sliding diagonal, etc. ) , but unlike in standard MBIST it may not directly monitor the output of memory 310. It may also generate a series of pseudo - random test vectors , but unlike in LBIST, it doesn't provide the pseudo - random test vectors to scan chains into ALU 340 and ALU control circuit 350. Instead, it provides the pseudo -random test vectors via memory 310 to ALU 340 , while controlling both the memory write and read addresses and the ALU functionality ( e.g. , by overriding the output from ALU control circuit 350 using multiplexer 374 ) . Output databus 398 outputs the data from ALU 340 , for example to another configurable unit , but it also transfers the data to MISR 380. Test result compressor 382 may use any
compression technique known in the art to compress the
circuit 250 , as well as logic circuit 220. This is an example 60 ALU output data , including cyclic redundancy check, ones of a processor unit including a test pattern generator ; a
count, transition count, parity checking , syndrome checking ,
circuit (MBIST controller 270 ) to apply a test vector from etc. BIST controller 370 sends a signal to MISR 380 to the test pattern generator to a data input of aa datapath ; and compress the ALU output data and store the compressed a test result output (M interface 260 ) ; configured to output output data as a signature in signature register 385 , from a test result at the test result output. This test solution has 65 where it can be read via test interface 360 by, for example, worked well for both the basic datapath of FIG . 1 and the an external tester, that may compare the signature with a expanded datapath of FIG . 2. However, for the newest precompiled test signature to determine a test result.
US 11,449,404 B1 7
8
An implementation does not need to isolate memory 310 input databus 294 has been split into input databus 494 and from other circuits to perform a test . It uses the memory in intermediate bus 495 , separated by multiplexer 473. Both situ . In an implementation, BIST controller 370 may gen- multiplexers are also coupled with BIST input bus 471 , erate a first part of the test vectors for testing memory 310 , which receives its data from BIST controller 470. Processor and a second part for testing the logic . By not changing the 5 unit 400 further includes test interface 460 , test control topology, leaving the datapath intact, and applying test register 462 , multiplexer 474 , and MISR 480 , including test vectors at the input of processor unit 300 , an implementation result compressor 482 and signature register 485. Further achieves a better coverage of the datapath, and is able to test databuses are intermediate bus 492 , intermediate bus 496 , it at speed . By using pseudo -randomly generated test vec intermediate bus 497 , and output databus 498. The operation tors able to achieve a high n -detect ALU 440 can be determined by an ALU control signal value, the , andimplementation thus a superioriscoverage of defects beyond just 10 of generated control circuit 450 , which can be statically stuck - at faults. By using signature compression, the band configuredininALU a data flow setting using configuration data width burden on the chip's test bus can remain in check . from a bit file , or provided in each instruction cycle in a In an example implementation , a datapath may include 16 parallel 32 -bit lanes for aa total width of 512 bits. Additional 15 control flow setting by instruction decoding. normal operation, data flows through and is processed lines may carry control signals, addresses, parity informa in Inprocessor unit 400 in the same manner as it flows through tion, etc. In a first cycle in a test loop , BIST controller 370 , via BIST input bus 371 and multiplexer 372 , provides each and is processed in processor unit 200 of FIG . 2. The BIST lane with an address and data for memory 310 , and gives it circuits may be inactive . An external tester may load test a write instruction . Memory 310 stores the data at the 16 20 control register 462 with a replacement ALU control word , addresses. In a next cycle , BIST controller 370 provides which it passes to an input of multiplexer 474 as a replace each lane with a next address, and gives a read instruction . ment ALU control signal. In BIST mode , controlled by BIST Memory 310 retrieves the data , and outputs 16 lanes of controller 470 , multiplexer 472 may replace input data from parallel data on intermediate bus 392 , which transports the input databus 490 with test vectors generated by BIST data to ALU 340. BIST controller 370 overrides the ALU 25 controller 470. Also , multiplexer 473 may replace input data control circuit 350 in multiplexer 374 and selects the from input databus 494 with the test vectors . The test vectors replacement ALU control signal for ALU 340 , which pro for memory 410 may include memory addresses and data to cesses the data it receives from intermediate bus 392. ALU
be stored, along with read and write control signals, and
be embedded in BIST controller 370 or in ALU control
440 , for example to another configurable unit, but it also
340 may include a SIMD , and may thus be capable of patterns to test logic circuit 420. In BIST mode, the test processing the 16 parallel lanes of data simultaneously . It 30 patterns may include deterministic vectors targeted at outputs the results on output databus 398 , which allows memory testing , and pseudo -random data targeted at logic MISR 380 to compress the results , and store the compressed testing . BIST controller 470 also overrides control of the results as a signature in signature register 385. An external ALU 10 , via multiplexer 474 . tester may read the compressed results from the MISR and BIST controller 470 tests the whole datapath from the compare them with precompiled compressed results to 35 inputs of memory 410 and logic circuit 420 through the determine if they match (pass ) or are different ( fail). output of ALU 440 , using the techniques described herein . BIST controller 370 may run 4,096 loops of such tests . For example, BIST controller 470 may generate or output a The example operation may run 5 loops of testing dedicated series of memory tests , but unlike in standard MBIST it to memory 310 , and 4,091 loops of testing dedicated to the doesn't directly monitor the output of memory 410. It may logic and ALU . 40 also generate a series of pseudo -random test vectors , but In addition , the implementation may run ATPG tests via unlike in LBIST, it doesn't provide the pseudo - random test test interface 360 into scan chains (not drawn in FIG . 3 ) vectors to scan chains into logic circuit 420 , ALU 440 and anywhere in the datapath , ALU control circuit 350 , and any ALU control circuit 450. Instead, it provides the pseudo of the BIST circuits , including BIST controller 370 , multi- random test vectors via memory 410 and / or logic circuit 420 plexer 372 , multiplexer 374 , test control register 362 , and 45 to ALU 440 , while controlling both the memory write and MISR 380 . read addresses and the ALU functionality (e.g. , by overrid Some implementations may deviate from the architecture ing the output from ALU control circuit 450 using multi shown in FIG . 3. For example, test control register 362 may plexer 474 ) . Output databus 498 outputs the data from ALU
circuit 350. The functionality of test control register 362 and 50 transfers the data to MISR 480. Test result compressor 482 multiplexer 374 ( i.e. , overriding control of the ALU by BIST may use any compression technique known in the art to controller 370 ) may be integrated in ALU control circuit compress the data, including cyclic redundancy check, ones 350 , and instead of controlling multiplexer 374 , BIST con- count, transition count, parity checking , syndrome checking , troller 370 may directly provide a control signal to ALU etc. BIST controller 470 sends a signal to MISR 480 to control circuit 350. Further, test result compressor 382 and 55 compress the ALU output data and store the compressed
signature register 385 may be combined in a single circuit . Although FIG . 3 shows a single test interface, an implementation may have multiple test interfaces to communicate with the various test circuits shown , and to communicate
output data as a signature in signature register 485 , from where it can be read via test interface 460 by, for example, an external tester, that may compare the signature with a precompiled test signature to determine a test result. 60 with scan chains. Testing the datapath via memory 410 may be similar or FIG . 4 illustrates another processor unit 400 with BIST identical to the method described for testing processor unit structures in an implementation of the disclosed technology. 300 in FIG . 3. However, processor unit 400 has a secondary FIG . 4 includes all elements of FIG . 2 , with like numbering. input path via input databus 494 , intermediate bus 495 , logic This includes memory 410 , logic circuit 420 , multiplexer circuit 420 , intermediate bus 496 , and multiplexer 430. So , 430 , ALU 440 , and ALU control circuit 450. However, input 65 in addition to testing the datapath via memory 410 , BIST databus 290 has been split into two parts, input databus 490 controller 470 may select BIST input bus 471 at multiplexer and intermediate bus 491 , separated by multiplexer 472 and 473 to pass test vectors through logic circuit 420 .
US 11,449,404 B1 9
10
To ensure testable behavior of ALU 440 , an implemen- determine a test result . For example, if the signature matches tation may reset ALU 440 and any other part of the datapath the precompiled signature , the test passes , and if they don't in processor unit 400 at the start of testing, and BIST match, the test fails. controller 470 may override the control signal from ALU Method 500 may further include ; control circuit 450 in multiplexer 474 to provide a replace - 5 Step 511 providing a first pseudo - random number from ment ALU control signal , or otherwise take control of the a first series of pseudo - random numbers to the memory data ALU 440 functionality. At the end of testing , BIST control input. An implementation may generate the first series of ler 470 may flush the datapath by running a series of zero pseudo -random numbers using a first LFSR , with a first length , a first feedback polynomial , and a first seed . vectors through it . Some implementations may deviate from the architecture 10 Step 521 writing the first pseudo - random number to a second address in the memory . The second address may be shown in FIG . 4. For example, test control register 462 may any available address in the memory . The second address be embedded in BIST controller 470 or in ALU control may fixed , or it may be based on an index of the first circuit 450. The functionality of test control register 462 and pseudobe-random number in the first series of pseudo -random
multiplexer 474 ( i.e. , overriding control of theALU by BIST 15 numbers. For example, each pseudo -random number in the controller 470 ) may be integrated in ALU control circuit 450 , and instead of controlling multiplexer 474 , BIST con
first series of pseudo - random numbers may have a unique
: a first pseudo - random number may have index 0 , a troller 470 may directly provide a control signal to ALU index second one may have index 1 , a third one may have index control circuit 450. Further, test result compressor 482 and 2 , etc. The second address may increase or decrease with the signature register 485 may be combined in a single circuit . 20 index , or be any function of the index . In one implementa Although FIG . 4 shows a single test interface, an imple- tion , the second address includes aa one - hot encoded address mentation may have multiple test interfaces to communicate based on (at least a part of bits included in) the index of the with the various test circuits shown , and to communicate test pattern in the series of test patterns. A one- hot encoded with scan chains. number is a binary number with only a single bit “ 1 ” , and all FIG . 5 illustrates a method 500 for testing a datapath in a 25 other bits “ O ” . For example, index 0 may translate to a string processor unit in an implementation of the disclosed tech- of 16 bits 0000 0000 0000 0001 ; index 1 may translate to 0000 0000 0010 , etc. In an implementation where the nology. The datapath includes a memory with aa data input 0000 and a data output , an ALU with a control input, a data input second address includes a one -hot encoded version of the second addresses may address successive and a data output , and an intermediate bus coupling the index , successive in the memory. memory data output with the ALU data input. Method 500 30 columns Step 531 reading memory output data from the second includes the following steps : address Step 510 providing a first memory test vector from a 540 . in the memory . Method 500 may proceed with Step series of memory test vectors to the memory data input. The datapath may further include a logic circuit with a series of memory test vectors may follow any sequence of 35 dataTheinput and a data output, and a multiplexer with a first tests that uncover memory defects, including sequences input coupled with the memory data output and a second determined in a march algorithm , RAM sequential, zero input coupled with the logic circuit data output, and with an one , checkerboard , butterfly , sliding diagonal, and other output coupled with the ALU data input. Method 500 may memory test algorithms. further include: Step 520 writing the first memory test vector to a first 40 Step 512 -providing a second pseudo - random number address in the memory. The first address may be determined from a second series of pseudo -random numbers to the logic by the memory test algorithm that the implementation circuit data input. An implementation may generate the follows. second series of pseudo - random numbers using a second Step 530 — reading memory output data from the first LFSR , with a second length, a second feedback polynomial , 45 and a second seed . address in the memory . Step 540 — forwarding the memory output data via the Step 542 — forwarding data from the logic circuit data intermediate bus to the ALU . output via the multiplexer and the intermediate bus to the Step 550 replacing a signal on the control input with a ALU . Method 500 may proceed with Step 550 . replacement ALU control signal. The replacement ALU The technology disclosed relates to built - in self -test control signal ensures that the ALU is testable in a manner 50 (BIST) of processor chips that include one or more proces that is at least representative for normal operation. sor units comprising a datapath with a memory and an ALU . Step 560 — performing an ALU operation based on the The datapath may be very wide . Implementations use a new replacement ALU control signal. The ALU processes the form of BIST that complements ATPG to support a high data at its data input according to the replacement ALU fault coverage. It circumvents the problems and limitations control signal, and places the result on its data output as a 55 of ATPG , LBIST, and MBIST to separate functional and faulty ICs with high confidence. Step 570_obtaining the test result from the ALU data Implementations may test a configurable unit with ATPG output. An implementation may forward the test result to a to achieve a high coverage of stuck - at faults, for example 99 % . In addition , they may generate test patterns for MISR for Step 580 and Step 590 .
test result.
Step 580 compressing the test result to obtain a signa- 60 memory test and functional test . They apply the test patterns ture . An implementation may use any compression tech- to an input of the configurable unit, for instance a memory nique known in the art to compress the ALU output data , input or a logic input, and retrieve output data from an output including cyclic redundancy check, ones count, transition of the configurable unit. Thus, the test patterns run through count , parity checking, syndrome checking, etc. the full datapath to yield the output data . A BIST controller Step 590 —storing the signature in a register. The register 65 generates the test patterns, applies them to the configurable may be part of a MISR . Implementations may further unit input, and ensures that the conditions of the datapath compare the signature with a precompiled signature to generally resemble those of normal operation. The BIST
US 11,449,404 B1 11
12
controller also instructs a MISR to compress the output data register. An external tester may access the register, for example via a JTAG test interface, to retrieve the result signature and compare it with a precompiled signature to determine the test result (pass if the result signature equals the precompiled signature , and fail otherwise ). The test patterns may include patterns specifically targeting the memory, and similar to those found in commercially available MBIST, including march tests and traditional tests such as zero - one, checkerboard, butterfly, sliding diagonal , etc. The test patterns may further include a series of pseudorandom numbers that target the ALU , and that are similar to those found in LBIST solutions . While an implementation checks the memory, the BIST controller or the external tester may place the ALU in a “ transparent ” mode, i.e. the output data equals the ALU input data , or the ALU could be kept in its standard operational mode . While the implementation checks the ALU and any other logic , the BIST controller controls memory addressing for transparent operation. The BIST controller may operate the memory at a fixed address, or it may sequence in any order) through all available addresses, or through any subset of the available addresses . For example , it may use a one - hot encoded address , where the single address bit that is high sequences through the available address bits . Implementations support hard -wired , semi- fixed , and programmable modes of the ALU . Where an ALU mode is
urable unit that includes a memory and an ALU may further include the test circuits shown in and described for FIG . 3 . In a reconfigurable processor that separates PMUs and PCUS as individual configurable units , a PMU may hold a first test interface , a first BIST controller, and a datapath multiplexer, similar to multiplexer 372 , whereas a PCU may hold a second test interface, a second BIST controller, a test control register, an ALU control multiplexer, and a MISR . To configure configurable units in array of configurable units 615 with a configuration file , test host 630 can send the configuration file to memory 640 via I/ O interface 638 , databus 618 , and memory interface 648. The configuration file can be loaded in many ways , as suits a particular implementation , including in datapaths outside reconfigur able processor 610. The configuration file can be retrieved from memory 640 via the memory interface 648. Chunks of the configuration file can then be sent in a distribution sequence to configurable units in array of configurable units 615 . Reconfigurable processor 610 and one or more reconfig urable components therewithin (e.g. , array of configurable units 615 ) are referred to as “ reconfigurable hardware ” , as reconfigurable processor 610 and the one or more compo nents therewithin are configurable and reconfigurable to suit needs of a program being executed thereon . Reconfigurable components can be statically configured in a data flow setting during execution of a function using the components . FIG . 7 is a simplified block diagram of components of a
into a result signature, and store the result signature in a
5
10
15
20 25
semi- fixed or programmable, the implementation seizes CGRA processor 700. In this example, CGRA processor 700 control of the ALU by replacing an ALU control signal from 30 has 2 tiles ( tile 710 and tile 720 ) . A tile comprises an array
an ALU controller with a replacement ALU control signal. The replacement ALU control signal may be stored in a register, such as a JTAG test control register. The BIST controller may control a multiplexer and direct it to forward the replacement ALU control signal to the ALU instead of the ALU control signal. An implementation may further clear the state of the ALU prior to applying any test vectors , for example by applying a reset routing, and an implementation may flush the ALU after applying test vectors , for example by applying a series of zero vectors to the datapath . A Reconfigurable Processor System FIG . 6 is a system diagram illustrating a test system 600 including a reconfigurable processor 610 , a tester 620 , a test host 630 , and a memory 640 , in an implementation of the disclosed technology. As shown in the example of FIG . 6 , reconfigurable processor 610 , which may be a single semiconductor chip , includes an array of configurable units 615 , coupled with a test interface 628 , an external I /O interface 638 , and an external memory interface 648. Test interface 628 may be coupled with array of configurable units 615 and , optionally, other parts of reconfigurable processor 610 via test bus 616. I /O interface 638 and memory interface 648 may be coupled with array of configurable units 615 via databus 618. Tester 620 is coupled with test interface 628 via lines 625. Test host 630 is coupled with U / O interface 638 via lines 635. Memory 640 is coupled with memory interface 648 via lines 645. Additionally, tester 620 and test host 630 may interface with each other. Reconfigurable processor 610 may be , or include, a CGRA, whose architecture and functionality will be clarified in successive figures. In any case , array of configurable units 615 includes multiple configurable units , and a configurable unit may include a memory and / or an ALU . For example, a configurable unit may include a PMU , a PCU , or both a PMU and a PCU . A configurable unit further includes a test interface coupled with test bus 616 and dedicated
of configurable units coupled to a bus system , that may
include an array - level network . The bus system includes a top - level network coupling the tiles to external I /O interface 738 (or any number of interfaces ). Other implementations
35 may use different bus architectures . The configurable units in each tile may be nodes on the array - level network . Each tile has four AGCUS (e.g. , MAGCU1, AGCU12 , AGCU13 , and AGCU14 in tile 710 ) . The AGCUs are nodes on the top - level network and nodes on the array -level 40 networks, and include resources for routing data among nodes on the top - level network and nodes on the array - level network in each tile . Nodes on the top level network in this example include one or more external i/ O interfaces , including I /O interface 45 738. The interfaces to external devices include circuits for routing data among nodes on the top -level network and external devices , such as high- capacity memory, host pro cessors , other CGRA processors , FPGA devices , and so on , that are coupled with the interfaces. 50 One of the AGCUs in a tile in this example is configured to be a master AGCU (MAGCU ), which includes an array configuration load /unload controller for the tile . Other implementations may include more than one array configu ration load / unload controller, and one array configuration 55 load / unload controller may be implemented by logic dis tributed among more than one AGCU . The MAGCU1 includes a configuration load / unload con troller for tile 710 , and MAGCU2 includes a configuration
load / unload controller for tile 720. In other implementations, loading and unloading configuration of more than one tile . In further implementations, more than one configuration controller can be designed for configuration of a single tile . Also , the configuration load / unload controller can be imple 65 mented in other portions of the system , including as a stand -alone node on the top - level network and the array self- test logic as described herein . For example, a config- level network or networks.
60 a configuration load/unload controller can be designed for
US 11,449,404 B1 13
14
The top - level network is constructed using top - level switches ( switch 711 , switch 712 , switch 713 , switch 714 , switch 715 , and switch 716 ) coupled with each other as well
bus width of 32 bits . Also , a control bus ( see FIGS . 9-11 ) that can comprise a configurable interconnect is included carry ing multiple control bits on signal routes designated by configuration bits in the configuration file for the tile . The control bus can comprise physical lines separate from the data buses in some implementations. In other implementa tions , the control bus can be implemented using the same physical lines with a separate protocol or in time sharing procedure. The physical buses differ in the granularity of data being transferred . In one implementation , the vector bus can carry a chunk that includes 16 channels ( e.g. , 512 bits ) of data as its payload. The scalar bus can have a 32 - bit payload and carry scalar operands or control information . The control bus can carry control handshakes such as tokens and other signals. The vector and scalar buses can be packet-switched, including headers that indicate a destination of each packet and other information such as sequence numbers that can be used to reassemble a file when the packets are received out of order. Each packet header can contain a destination identifier that identifies the geographical coordinates of the destination switch unit ( e.g. , the row and column in the array ), and an interface identifier that identifies the interface on the destination switch (e.g. , North , South , East , West, etc. ) used to reach the destination unit. A switch unit, as shown in the example of FIG . 8A , may have eight interfaces . The North , South , East and West interfaces of a switch unit are used for connections between switch units . The Northeast, Southeast, Northwest and
as with other nodes on the top - level network , including the AGCUs , and I /O interface 738. The top - level network 5
includes links ( e.g. , L11, L12 , L21 , L22) connecting the top -level switches . Data may travel in packets between the top - level switches on the links, and from the switches to the nodes on the network coupled with the switches . For example, switch 711 and switch 712 are coupled by a link 10 L11 , switch 714 and switch 715 are coupled by a link L12 , switch 711 and switch 714 are coupled by a link L13 , and switch 712 and switch 713 are coupled by a link L21 . The links can include one or more buses and supporting control
lines , including for example a chunk -wide bus (vector bus ) . 15 For example, the top - level network can include data , request and response channels operable in coordination for transfer of data in a manner analogous to an AXI compatible protocol. See , AMBA® AXI and ACE Protocol Specification, ARM , 2017 . 20 Top - level switches can be coupled with AGCUs . For example, switch 711 , switch 712 , switch 714 and switch 715 are coupled with MAGCU1, AGCU12 , AGCU13 and AGCU14 in tile 710 , respectively. Switch 712 , switch 713 , switch 715 and switch 716 are coupled with MAGCU2, 25 AGCU22 , AGCU23 and AGCU24 in tile 720 , respectively. Top - level switches can be coupled with one or more external I /O interfaces ( e.g. , I/O interface 738 ) . FIG . 8A is a simplified diagram of aa tile comprising an 2
array of configurable units 800 in an implementation of the 30 Southwest interfaces of a switch unit are each used to make connections with PCU or PMU instances . Two switch units units 800 includes multiple types of configurable units . The in each tile quadrant have connections to an AGCU that
disclosed technology. In this example, array of configurable
types of configurable units , in this example, include PMU , include multiple address generation ( AG ) units and a PCU , switch units ( S ) , and AGCUs (each including two coalescing unit ( CU) coupled with the multiple address address generators AG and aa shared CU) . For an example of 35 generation units. The coalescing unit (CU) arbitrates the functions of these types of configurable units , see between the AGs and processes memory requests. Each of Prabhakar et al . , “ Plasticine : A Reconfigurable Architecture the eight interfaces of a switch unit can include a vector for Parallel Patterns” , as detailed in the section Cross interface, a scalar interface, and a control interface to References . Each of the configurable units may include a communicate with the vector network , the scalar network, configuration store comprising a set of registers or flip - flops 40 and the control network . storing configuration data that represents either the setup or During execution of an execution fragment of aa machine the sequence to run a program , and that can include the after configuration, data can be sent via one or more unit number of nested loops , the limits of each loop iterator, the switches and one or more links between the unit switches to instructions to be executed for each stage , the source of the the configurable units using the vector bus and vector operands, and the network parameters for the input and 45 interface ( s) of the one or more switch units on the array level output interfaces. In the implementation of array of config- network . urable units 800 , PMU and PCU units are arranged in a A data processing operation implemented by configura tion of a tile comprises a plurality of execution fragments of checkerboard pattern . Additionally, each of these configurable units contains a the data processing operation which are distributed among configuration store comprising a set of registers or flip - flops 50 and executed by corresponding configurable units ( AGS, that store a status usable to track progress in nested loops or CUs . PMUS, and PCUs in this example ). otherwise . A configuration file includes a bitstream repreTest circuits in this example comprises configurable units senting the initial configuration, or starting state , of each of with dedicated BIST circuitry that can be addressed via a test the components that execute the program . This bitstream is bus such as test bus 616 in FIG . 6. In this example, the BIST referred to as a bit file . Program Load is the process of 55 circuitry includes BIST logic 801 in AG 818 , BIST logic 802 setting up the configuration stores in the array of configur- in a PMU , BIST logic 803 in a PCU , and BIST logic in a able units based on the contents of the bit file to allow all the switch , as described above . In the illustrated embodiment all components to execute a program (i.e. , a machine ). Program configurable units (PMUS , PCUS , AGs ) in the array may Load may also require loading all PMU memories. include local dedicated BIST circuitry. In some embodi The bus system includes links interconnecting configur- 60 ments, a plurality of the configurable units in the array , able units in the array . The links in the array level network which can be fewer than all the configurable units in the include one or more, and in this case two , kinds of physical array, include local dedicated BIST circuitry. By including data buses : a chunk - level vector bus (e.g. , 512 bits of data ), separately addressable (via the test bus ) BIST controllers for and a word -level scalar bus (e.g. , 32 bits of data ). For each configurable unit in a plurality of configurable units in instance , interconnect 821 between switch 811 and switch 65 the array, an implementation significantly reduces full- chip 812 may include a vector bus interconnect with vector bus debug time at a preproduction stage by indicating where a width of 512 bits , and aa scalar bus interconnect with aa scalar defect is related . While operational in the field , the infor
US 11,449,404 B1 16
15
mation makes it possible to mitigate the results of a defect, for example by replacing a configurable unit, shutting it down, slowing it down, speeding it up , or any other action
application Ser. No. 17 /378,399 by Grohoski et al . , which is incorporated by reference for all purposes as if fully set forth herein . But in some embodiments, subarrays are not neces e that keeps array of configurable units 800 functioning sarily homogeneous. For example , in its first row , array of acceptably . 5 configurable units 850 includes subarray 851 , subarray 852 , In one implementation, the configurable units include and subarray 853. Each of these comprises one PMU and configuration and status registers holding unit configuration one PCU . For example subarray 851 includes PMU 854 and files loaded in a configuration load process or unloaded in a PCU 855. PMU 854 may include a first set of BIST circuits configuration unload process . The registers can be con- 856 , as will later be illustrated with reference to FIG . 9. PCU nected in a serial chain and can be loaded with configuration 10 855 may include a second set of BIST circuits 857 , as will data through a process of shifting bits through the serial be illustrated with reference to FIG . 10. Jointly, first set of 9
chain . In some implementations, there may be more than one
BIST circuits 856 and second set of BIST circuits 857
this data through its serial chain at the rate of 1 bit per cycle , where shifter cycles can run at the same rate as the bus cycle . It will take 512 shifter cycles for aa configurable unit to load
and three PCU units . The first PMU 862 includes aa first set of BIST circuits, and the last PCU 863 includes a second set of BIST circuits. Again, jointly, the first set of BIST circuits and the second set of BIST circuits provide all test func tionality illustrated earlier with reference to FIGS . 3-4 . However, the datapath in last PCU 863 is much longer than in , for instance, subarray 851 . Array of configurable units 850 comprises in this example of rows and columns of processors, each of which is a configurable unit. In another example, the array can com prise multiple stacked planes, each plane including rows and columns . The array of configurable units may include N homogeneous sub -arrays, arranged in N identical rows . Also , array of configurable units 850 includes N + 1 rows of switch units S that form the routing infrastructure of the array level network . In other embodiments, the subarray can be columns . In yet other embodiments, other spare geom ries , such as rectangles consisting of a contiguous subset of rows and columns of PMUs and PC Us , may be utilized . Although FIGS . 8A - B show arrays of configurable units, more generally the units don't need to be configurable. An array of processor units may be integrated on a single integrated circuit chip. A processor may include one or more local memories and one or more ALUS . An ALU may include a SIMD . An array may consist of subarrays, each
serial chain arranged in parallel or in series. When a con- provide all test functionality illustrated earlier with reference figurable unit receives the, for example, 512 bits of con- to FIGS . 3-4 . In its second row , array of configurable units figuration data in one bus cycle , the configurable unit shifts 15 850 includes subarray 861 , with a total of three PMU units
512 configuration bits with the 512 bits of data received over A configuration file or bit file , before configuration of the tile , can be sent using the same vector bus , via one or more unit switches and one or more links between the unit switches to the configurable unit using the vector bus and vector interface ( s ) of the one or more switch units on the array level network . For instance, a chunk of configuration data in aa unit file particular to a configurable unit PMU 841 can be sent to the PMU 841 , via aa link 820 between aa load controller in the address generator AG and the West ( W) vector interface of switch 811 , switch 811 , and a link 831 between the Southeast ( SE ) vector interface of switch 811 and PMU 841. Configuration data for the instrumentation network can be included in the configuration data for associated configurable units or provided via other configuration data structures. The configurable units interface with the memory through multiple memory interfaces. Each of the memory interfaces can be accessed using several AGCUs . Each AGCU contains a reconfigurable scalar data path to generate requests for the off - chip memory . Each AGCU contains FIFOs ( first - in - firstthe vector interface .
out buffers for organizing data ) to buffer outgoing com-
mands , data , and incoming responses from the off-chip memory .
20
25
30
35
40
comprising one or more processor units, and a set of BIST circuits as described with reference to FIGS . 3-4 to test a datapath in the subarray. The datapath may span one or more processor units . The BIST circuits may include a test con
Configuration files can be loaded to specify the configuration of the tile including instrumentation logic units and 45 troller or BIST controller, an input multiplexer to override the control bus , for the purposes of particular data process- input data with test vectors generated by the test controller, ing operations, including execution fragments in the con- a test control register and multiplexer to override a processor figurable units, interconnect configurations and instrumen- control setting or ALU control setting stored in a functional tation network configurations. Technology for coordinating configuration register with a replacement control setting the loading and unloading of configuration files is described 50 stored in the test control register, and a MISR or more by Shah et al . in “ Configuration Load of a Reconfigurable generally, a data compressor and a signature register, with a Data Processor” , U.S. Pat . No. 10,831,507 , issued Nov. 10 , test result output that can be read from outside the subarray. 2020 . Test vectors generated by the test controller may include FIG . 8B is another example diagram of a tile comprising memory test patterns and / or pseudo -random data . An array an array of configurable units 850 in an implementation of 55 of processor units comprises two or more subarrays with
the disclosed technology. Unlike in array of configurable units 800 , PMU and PCU units are not arranged in a
each one or more processor units and one set of test circuits, such that each of the subarrays is individually testable with
checkerboard pattern . Instead , they are arranged in identical the methods presented herein . rows with alternating PMU and PCU units. More generally, FIG . 9 is aa block diagram illustrating an example config they are arranged in subarrays such as partial or whole rows , 60 urable PMU 900 including BIST circuits . Configurable partial or whole columns, or other subarrays spanning one or PMU 900 may include scratchpad memory 930 coupled with more rows and one or more columns . Subarrays may be a reconfigurable scalar datapath 920 configured to calculate homogenous comprising identical circuitry in the sense that addresses (RA , WA ) and control (WE , RE ) of scratchpad each homogeneous subarray could be replaced by another memory 930 , along with bus interfaces also used in aa PCU homogeneous subarray. In such a case , a spare homogeneous 65 (FIG . 10 ) , including for vectors , scalars, and control infor subarray can be used to replace a defective homogeneous mation . Configurable PMU 900 is a configurable unit that subarray identified by the BIST, see for example U.S. patent includes an input databus 910 ( for example, with vector
US 11,449,404 B1 17
18
inputs, scalar inputs, and control inputs ). It also includes a vector FIFOs 912. In a first cycle , it writes the test vector to replacement databus 965 for testing in an implementation of banking and buffering logic 935 at the memory address by the disclosed technology. Configurable PMU 900 includes asserting the WE input at banking and buffering logic 935 the following BIST circuits: test interface 950 , which may via replacement databus 965 , multiplexer 974 , scalar FIFO be a JTAG port, first BIST controller 960 ( first is used 5 911 , and reconfigurable scalar datapath 920. In a second because it may operate in tandem with a second BIST cycle , later than the first cycle , first BIST controller 960 controller used in a PCU) , multiplexer 972 , multiplexer 974 , controls a memory read operation from the memory address and multiplexer 976. First BIST controller 960 may be by the memory address to the RA input at banking started by control signals from the test bus via test interface andproviding buffering logic 935 via replacement databus 965 , mul 950. When started , first BIST controller 960 takes control of 10 974 , scalar FIFO 911 , and reconfigurable scalar the input data of configurable PMU 900 by deselecting the tiplexer , and asserting the RE input at banking and input databus 910 in multiplexer 972 , multiplexer 974 , and datapath 920 logic 935 via replacement databus 965 , multi multiplexer 976 , respectively, and selecting replacement buffering databus 965 instead . Then , first BIST controller 960 gener plexer 974 , scalar FIFO 911 , and reconfigurable scalar ates test signals, which may include a series of test patterns 15 datapath 920. Scratchpad memory 930 releases the data targeted at testing memory functionality , for example for stored at the memory address to the output databus. Of memory 931 through memory 934 , and pseudo - random course , if the datapath including scratchpad memory 930 numbers focused on testing logic , as disclosed earlier in this and all operational units coupled to it function correctly, the document. First BIST controller 960 places the test signals data stored at the memory address matches the test vector. 20 FIG . 10 is a block diagram illustrating an example con on replacement databus 965 . The input databus 910 may include scalar inputs, and figurable PCU 1000 including BIST circuits. Configurable vector inputs, usable to provide write data ( WD ) . An output PCU 1000 is a configurable unit that can interface with the databus may provide scalar outputs and vector outputs to scalar, vector, and control buses also used in configurable other configurable units, for example to a PCU . The datapath PMU 900 , in this example using three corresponding sets of may be organized as a multi - stage reconfigurable pipeline, 25 inputs and outputs ( I/ O ) : scalar I/O , vector I /O , and control including stages of functional units ( FUs ) and associated I/ O . Scalar I /Os can be used to communicate single words of pipeline registers (PRs ) that register inputs and outputs of data ( e.g. , 128 bits ) . Vector IOs can be used to communicate the functional units . PMUS can be used to store distributed chunks of data (e.g. , 512 bits ) , in cases such as receiving on -chip memory throughout the array of reconfigurable configuration data in a unit configuration load process and units . 30 transmitting and receiving data during operation after con Scratchpad memory 930 may include multiple memory figuration across a long pipeline between multiple PCUs . banks (e.g. , memory 931 through memory 934 , which may Control I/ Os can be used to communicate signals on control be or include SRAMs). The banking and buffering logic 935 lines such as the star or end of execution of a configurable for the memory banks in the scratchpad can be configured to unit. Control inputs are received by control block 1090 , and operate in several banking modes to support various access 35 control outputs are provided by the control block 1090. An patterns. A computation unit as described herein can include output databus 1089 may comprise the scalar outputs, vector a lookup table stored in scratchpad memory 930 , from a outputs , and control outputs. configuration file or from other sources . In a computation Each vector input is buffered in this example using a unit as described herein , reconfigurable scalar datapath 920
vector FIFO in aa vector FIFO block 1060 which can include
can translate a section of a raw input value I for addressing 40 one or more vector FIFOs .Likewise , in this example, each
lookup tables implementing a function f ( I ) , into the address- scalar input is buffered using a scalar FIFO 1070. Using ing format utilized by the scratchpad memory 930 , adding input FIFOs decouples timing between data producers and appropriate offsets and so on , to read the entries of the consumers and simplifies inter -configurable - unit control lookup table stored in scratchpad memory 930 using the logic by making it robust to input delay mismatches. sections of the input value I. Each PMU can include write 45 The configurable unit includes ALU 1080 , which may address calculation logic and read address calculation logic include a SIMD to support multiple reconfigurable data that provide write address WA , write enable WE , read channels. The SIMD may have a multiple- stage ( stage 1 . address RA and read enable RE to banking and buffering stage N) , reconfigurable pipeline . Chunks of data written logic 935. Based on the state of scalar FIFO 911 and vector into a configuration serial chain in the configurable unit FIFOs 912 , and external control inputs, control block 915 50 include configuration data for each stage of each data can be configured to trigger the write address computation, channel in the SIMD . The configuration serial chain in the read address computation, or both , by enabling the appro- configuration data store 1020 is coupled with the multiple priate counters 916. A programmable chain of counters 916 data channels in ALU 1080 via ALU control input 1021 . ( Control Inputs, Control Outputs ) and control block 915 can A configurable data channel organized as a multi - stage trigger PMU execution . 55 pipeline can include multiple functional units ( e.g. , func When testing , first BIST controller 960 starts by selecting tional unit 1081 through functional unit 1086 ) at respective replacement databus 965 at the input data multiplexers stages . A computation unit or parts of a computation unit can (multiplexer 972 , multiplexer 974, and multiplexer 976 ) . be implemented in multiple functional units at respective Thus, it overrides any data that may be available on input stages in a multi- stage pipeline or in multiple multi - stage databus 910. First BIST controller 960 determines a memory 60 pipelines . In the example as shown in FIG . 10 , a circuit can address and provides the memory address to banking and be implemented in multiple functional units and multiple buffering logic 935 (at input WA ) via replacement databus memory units . Input registers in functional units can register 965 , multiplexer 974 , scalar FIFO 911 , and reconfigurable inputs from scalar FIFO 1070 or vector FIFO block 1060 or scalar datapath 920. It generates a test vector, which may from previous stages in a multi -stage pipeline . A functional include a memory test and /or a pseudo - random data , and 65 unit at a stage in a multi -stage pipeline can execute a provides the test vector to scratchpad memory 930 (WD function , e.g. , logical shift, an arithmetic function , compari input) via replacement databus 965 , multiplexer 972 , and son , a logical operation, etc. , and generate an output.
US 11,449,404 B1 19
20
A configurable unit in the array of configurable units includes configuration data store 1020 (e.g. , serial chains ) to store unit files comprising a plurality of chunks ( or sub - files of other sizes ) of configuration data particular to the corresponding configurable units. Configurable units in the array of configurable units each include unit configuration load logic 1040 coupled with configuration data store 1020 via line 1022 , to execute a unit configuration load process . The unit configuration load process includes receiving, via the bus system (e.g. , the vector inputs ), chunks of a unit file particular to the configurable unit and loading the received chunks into configuration data store 1020 of the configurable unit . The unit file loaded into configuration data store 1020 can include configuration data , including opcodes and routing configuration, for circuits ( e.g. , module) implementing the instrumentation logic in multiple functional units and multiple memory units , as described herein . The configuration data stores in configurable units in the two or more configurable units in this example comprise serial chains of latches , where the latches store bits that control configuration of the resources in the configurable unit . A serial chain in a configuration data store can include a shift register chain for configuration data and aa second shift register chain for state information and counter values connected in series. The input configuration data 1010 can be provided to a vector FIFO as vector inputs, and then be transferred to configuration data store 1020. The output configuration data 1030 can be unloaded from configuration data store 1020 using the vector outputs . The CGRA uses a daisy -chained completion bus to indicate when a load /unload command has been completed. The master AGCU transmits the program load and unload commands to configurable units in the array of configurable units over a daisy - chained command bus . As shown in the example of FIG . 10 , a control block 1090 , a daisy -chained completion bus 1091 and a daisy - chained command bus 1092 are coupled to daisy - chain logic 1093 , which communicates with the unit configuration load logic 1040. Daisychain logic 1093 can include load complete status logic , as described below. The daisy - chained completion bus is further described below. Other topologies for the command and completion buses are clearly possible but not described here . Configurable PCU 1000 includes the following BIST circuits : test interface 1050 , which may be a JTAG port, second BIST controller 1052 second is used as it may operate in tandem with first BIST controller 960 used in a PMU ), test control register 1053 , ALU control multiplexer 1054 , and MISR 1055 , which may include, separate or combined , test result compressor 1056 and signature register 1057. Second BIST controller 1052 may be started by control signals from the test bus via test interface 1050 . When started , second BIST controller 1052 takes control of the configuration data by overriding data from configuration
as described herein . The configurable processor can be configured in other ways to implement a computation unit . Other types of configurable processors can implement the computation unit in other ways . Also , the computation unit can be implemented using dedicated logic in some examples, or a combination of dedicated logic and instruc tion -controlled processors. Considerations We describe various implementations of a processor unit that includes BIST, and methods therefor. The technology disclosed can be practiced as a system , method , or article of manufacture . One or more features of an implementation can be combined with the base imple mentation . Implementations that are not mutually exclusive are taught to be combinable . One or more features of an implementation can be combined with other implementa tions . This disclosure periodically reminds the user of these options . Omission from some implementations of recitations that repeat these options should not be taken as limiting the combinations taught in the preceding sections — these reci tations are hereby incorporated forward by reference into each of the following implementations. Although the description has been described with respect to particular implementations thereof, these particular implementations are merely illustrative , and not restrictive . The description may reference specific structural implemen tations and methods, and does not intend to limit the technology to the specifically disclosed implementations and methods. The technology may be practiced using other features, elements, methods and implementations. Imple mentations are described to illustrate the present technology, not to limit its scope , which is defined by the claims . Those of ordinary skill in the art recognize a variety of equivalent variations on the description above . All features disclosed in the specification, including the claims , abstract, and drawings, and all the steps in any method or process disclosed , may be combined in any combination , except combinations where at least some of such features and /or steps are mutually exclusive . Each feature disclosed in the specification, including the claims , abstract, and drawings, can be replaced by alternative fea tures serving the same , equivalent, or similarpurpose , unless expressly stated otherwise. Although the description has been described with respect to particular implementations thereof, these particular implementations are merely illustrative, and not restrictive . For instance, many of the operations can be implemented in a CGRA , a System - on - Chip ( SOC ) , application - specific inte grated circuit (ASIC ), programmable processor, or in a programmable logic device such as a field -programmable gate array (FPGA ), obviating a need for at least part of the dedicated hardware. Implementations may be as a single chip , or as a multi -chip module (MCM ) packaging multiple semiconductor dies in a single package . All such variations
5
10
15 20
25
30
35
40
45
50
data store 1020 with test configuration data previously 55 and modifications are to be considered within the ambit of stored in test control register 1053. ALU control multiplexer the present disclosed technology the nature of which is to be
1054 , controlled by second BIST controller 1052 , selects determined from the foregoing description. replacement configuration data from test control register Any suitable technology for manufacturing electronic 1053 rather than the configuration data from configuration devices can be used to implement the circuits of particular data store 1020. Second BIST controller 1052 also controls 60 implementations, including CMOS , FinFET, BiCMOS , MISR 1055 , ensuring that data from output databus 1089 is bipolar, JFET, MOS , NMOS , PMOS , HBT, MESFET, etc. compressed as disclosed earlier in this document, and that Different semiconductor materials can be employed, such as the compressed data is stored in signature register 1057 , silicon , germanium , SiGe , GaAs , InP, GaN , Sic , graphene, from where it can be read by an external tester via test etc. Circuits may have single -ended or differential inputs, interface 1050 . 65 and single -ended or differential outputs. Terminals to cir This is one simplified example of a configuration of a cuits may function as inputs, outputs, both , or be in a configurable processor for implementing a computation unit high - impedance state , or they may function to receive sup
US 11,449,404 B1 22
21
ply power, a ground reference , a reference voltage , a reference current, or other. Although the physical processing of signals may be presented in a specific order, this order may be changed in different particular implementations. In some particular implementations, multiple elements , devices, or 55 circuits shown as sequential in this specification can be operating in parallel. Any suitable programming language can be used to implement the routines of particular implementations including C , C ++ , Java , JavaScript, compiled languages, 10 interpreted languages and scripts , assembly language, machine language , etc. Different programming techniques
can be employed such as procedural or object oriented .
Methods embodied in routines can execute on a single on a multiple processor system . 15 Although the steps, operations, or computations may be
processor device or
replacing a logic unit control signal with a replacement logic unit control signal obtained from the test control register; generating a test pattern ; forwarding the test pattern to an input of a first circuit , wherein the input of the first circuit is not a scan chain input; forwarding first circuit output data from an output of the first circuit to the logic unit; executing a logic unit operation on the first circuit output data based on the replacement logic unit control signal ;
obtaining a test result from a logic unit output; storing the signature in the signature register. 2. The processor unit of claim 1 , wherein : the first circuit includes the memory ;
compressing the test result to obtain a signature; and
presented in a specific order, this order may be changed in different particular implementations. In some particular forwarding the test pattern to the input of the first circuit implementations, multiple steps shown as sequential in this includes writing the test pattern at a first address in the 20 memory; and specification can be performed at the same time . Particular implementations may be implemented in a obtaining first circuit output data from the output of the tangible , non - transitory computer -readable storage medium first circuit includes reading the output data from the first address in the memory . for use by or in connection with the instruction execution system , apparatus , board, or device . Particular implementa3. The processor unit of claim 2 , wherein the test pattern tions can be implemented in the form of control logic in 25 is comprised in a series of test patterns for detecting a software or hardware or a combination of both . The control memory error. logic , when executed by one or more processors , may be 4. The processor unit of claim 3 , wherein the series of test operable to perform that which is described in particular patterns for detecting a memory error is based on a march implementations. For example, a tangible non - transitory algorithm . medium such as a hardware storage device can be used to 30 5. The processor unit of claim 3 , wherein the series of test
store the control logic , which can include executable instruc- patterns for detecting a memory error is based on one or more of a RAM sequential algorithm , a zero -one algorithm , Particular implementations may be implemented by using a checkerboard algorithm , a butterfly algorithm , a sliding a programmed general-purpose digital computer, applica- diagonal algorithm . tion - specific integrated circuits, programmable logic 35 6. The processor unit of claim 1 , wherein the test pattern devices, field -programmable gate arrays, optical , chemical, is comprised in aa series of test patterns that include pseudo biological , quantum or nanoengineered systems, etc. Other random numbers generated in a first linear - feedback shift components and mechanisms may be used . In general, the register ( LFSR) with a first length, a first feedback polyno tions .
functions of particular implementations can be achieved by systems , components , and /or circuits can be used . Cloud
mial , and aa first seed value , and wherein the test controller 7. The processor unit of claim 6 , wherein the first circuit
any means as is known in the art. Distributed, networked 40 includes the first LFSR .
computing or cloud services can be employed . Communi-
includes a logic circuit .
cation , or transfer, of data may be wired, wireless , or by any 8. The processor unit of claim 6 , wherein : the first circuit includes the memory ; other means. It will also be appreciated that one or more of the elements 45 forwarding the test pattern to the input of the first circuit depicted in the drawings / figures can also be implemented in includes writing the test pattern at a first address in the memory ; a more separated or integrated manner , or even removed or rendered as inoperable in certain cases , as is useful in obtaining output data from the output of the first circuit accordance with a particular application. includes reading the output data from the first address Thus, while particular implementations have been 50 in the memory ; and described herein , latitudes of modification, various changes, the first address is determined based on an index of the and substitutions are intended in the foregoing disclosures , test pattern in the series of test patterns. and it will be appreciated that in some instances some 9. The processor unit of claim 8 , wherein the first address features of particular implementations will be employed includes a one -hot address based on at least a part of bits without a corresponding use of other features without 55 included in the index of the test pattern in the series of test departing from the scope and spirit as set forth . Therefore , patterns. many modifications may be made to adapt a particular 10. The processor unit of claim 1 , wherein the logic unit situation or material to the essential scope and spirit. comprises an arithmetic logic unit (ALU ) and the logic unit control circuit comprises an ALU control circuit . 60 We claim : 11. The processor unit of claim 10 , wherein a datapath 1. A processor unit comprising , a memory , a logic unit includes multiple lanes of parallel data and the ALU coupled with a read data output of the memory, a logic unit includes a SIMD . control circuit with a control output coupled with a control 12. The processor unit of claim 10 , wherein the test input of the logic unit, a test controller, a test control register, controller replaces the logic unit control signal with the and a signature register with an input coupled with a logic 65 replacement logic unit control signal by selecting the unit output, wherein the test controller is configured to replacement logic unit control signal in a multiplexer with manage a series of steps comprising: an output coupled with the logic unit control input, and with
US 11,449,404 B1 23
24
a first input coupled with the ALU control circuit control
forwarding the memory output data via the intermediate
output, and a second input coupled with the test control bus to the logic unit ; register. replacing a signal on the control input with a replacement 13. The processor unit of claim 10 , wherein the ALU logic unit control signal ; comprises the test control register, and the test controller 5 performing a logic unit operation based on the replace replaces the logic unit control signal with the replacement ment logic unit control signal ; logic unit control signal by sending a signal to the ALU . obtaining a test result from the logic unit data output; 14. The processor unit of claim 10 , wherein the ALU compressing the test result to obtain a signature; and control circuit comprises the test control register, and the test controller replaces the logic unit control signal with the 10 storing the signature in a register . replacement logic unit control signal by sending a signal to 18. The method of claim 17 , wherein the logic unit the ALU control circuit . comprises an arithmetic logic unit (ALU ). 15. The processor unit of claim 1 , wherein the signature 19. The method of claim 17 , further comprising: register comprises separate circuits for a test result com providing a first pseudo - random number from aa series of pressor and a signature register. pseudo random numbers to the memory data input; 15 16. The processor unit of claim 1 , wherein the signature writing the first pseudo - random number to a second register comprises a combined circuit for test result com address in the memory ; and pression and storing the signature. reading memory output data from the second address in 17. A method to test a datapath in a processor unit , the the memory . datapath comprising a memory with a data input and a data 20 20. The method of claim 19 , wherein the second address output, a logic unit with a control input, a data input and a is determined based on an index of the first pseudo - random data output, and an intermediate bus coupling the memory in the series of pseudo - random numbers. data output with the logic unit data input, the method number 21. The method of claim 20 , wherein the second address comprising: a one - hot encoded address based on at least a part providing a first memory test vector from a series of 25 includes of bits included in the index . memory test vectors to the memory data input; method of claim 17 , further comprising compar writing the first memory test vector to a first address in the ing22.theThe signature with a precompiled signature to determine memory ; reading memory output data from the first address in the a test result. 2
memory ;