



Welcome to **E-XFL.COM** 

# Understanding <u>Embedded - FPGAs (Field Programmable Gate Array)</u>

Embedded - FPGAs, or Field Programmable Gate Arrays, are advanced integrated circuits that offer unparalleled flexibility and performance for digital systems. Unlike traditional fixed-function logic devices, FPGAs can be programmed and reprogrammed to execute a wide array of logical operations, enabling customized functionality tailored to specific applications. This reprogrammability allows developers to iterate designs quickly and implement complex functions without the need for custom hardware.

### **Applications of Embedded - FPGAs**

The versatility of Embedded - FPGAs makes them indispensable in numerous fields. In telecommunications.

| Details                        |                                                                 |
|--------------------------------|-----------------------------------------------------------------|
| Product Status                 | Active                                                          |
| Number of LABs/CLBs            | 65340                                                           |
| Number of Logic Elements/Cells | 1143450                                                         |
| Total RAM Bits                 | 82329600                                                        |
| Number of I/O                  | 516                                                             |
| Number of Gates                | -                                                               |
| Voltage - Supply               | 0.825V ~ 0.876V                                                 |
| Mounting Type                  | Surface Mount                                                   |
| Operating Temperature          | -40°C ~ 100°C (TJ)                                              |
| Package / Case                 | 1156-BBGA, FCBGA                                                |
| Supplier Device Package        | 1156-FCBGA (35x35)                                              |
| Purchase URL                   | https://www.e-xfl.com/product-detail/xillinx/xcku15p-1ffva1156i |

Email: info@E-XFL.COM

Address: Room A, 16/F, Full Win Commercial Centre, 573 Nathan Road, Mongkok, Hong Kong



## **Migrating Devices**

UltraScale and UltraScale+ families provide footprint compatibility to enable users to migrate designs from one device or family to another. Any two packages with the same footprint identifier code are footprint compatible. For example, Kintex UltraScale devices in the A1156 packages are footprint compatible with Kintex UltraScale+ devices in the A1156 packages. Likewise, Virtex UltraScale devices in the B2104 packages are compatible with Virtex UltraScale+ devices and Kintex UltraScale devices in the B2104 packages. All valid device/package combinations are provided in the Device-Package Combinations and Maximum I/Os tables in this document. Refer to UG583, UltraScale Architecture PCB Design User Guide for more detail on migrating between UltraScale and UltraScale+ devices and packages.



# Virtex UltraScale Device-Package Combinations and Maximum I/Os

Table 8: Virtex UltraScale Device-Package Combinations and Maximum I/Os

|                              | Package            | VU065              | VU080              | VU095              | VU125              | VU160              | VU190              | VU440              |
|------------------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|
| Package <sup>(1)(2)(3)</sup> | Dimensions<br>(mm) | HR, HP<br>GTH, GTY |
| FFVC1517                     | 40x40              | 52, 468<br>20, 20  | 52, 468<br>20, 20  | 52, 468<br>20, 20  |                    |                    |                    |                    |
| FFVD1517                     | 40x40              |                    | 52, 286<br>32, 32  | 52, 286<br>32, 32  |                    |                    |                    |                    |
| FLVD1517                     | 40x40              |                    |                    |                    | 52, 286<br>40, 32  |                    |                    |                    |
| FFVB1760                     | 42.5x42.5          |                    | 52, 650<br>32, 16  | 52, 650<br>32, 16  |                    |                    |                    |                    |
| FLVB1760                     | 42.5x42.5          |                    |                    |                    | 52, 650<br>36, 16  |                    |                    |                    |
| FFVA2104                     | 47.5x47.5          |                    | 52, 780<br>28, 24  | 52, 780<br>28, 24  |                    |                    |                    |                    |
| FLVA2104                     | 47.5x47.5          |                    |                    |                    | 52, 780<br>28, 24  |                    |                    |                    |
| FFVB2104                     | 47.5x47.5          |                    | 52, 650<br>32, 32  | 52, 650<br>32, 32  |                    |                    |                    |                    |
| FLVB2104                     | 47.5x47.5          |                    |                    |                    | 52, 650<br>40, 36  |                    |                    |                    |
| FLGB2104                     | 47.5x47.5          |                    |                    |                    |                    | 52, 650<br>40, 36  | 52, 650<br>40, 36  |                    |
| FFVC2104                     | 47.5x47.5          |                    |                    | 52, 364<br>32, 32  |                    |                    |                    |                    |
| FLVC2104                     | 47.5x47.5          |                    |                    |                    | 52, 364<br>40, 40  |                    |                    |                    |
| FLGC2104                     | 47.5x47.5          |                    |                    |                    |                    | 52, 364<br>52, 52  | 52, 364<br>52, 52  |                    |
| FLGB2377                     | 50x50              |                    |                    |                    |                    |                    |                    | 52, 1248<br>36, 0  |
| FLGA2577                     | 52.5x52.5          |                    |                    |                    |                    |                    | 0, 448<br>60, 60   |                    |
| FLGA2892                     | 55x55              |                    |                    |                    |                    |                    |                    | 52, 1404<br>48, 0  |

#### Notes:

- 1. Go to Ordering Information for package designation details.
- 2. All packages have 1.0mm ball pitch.
- 3. Packages with the same last letter and number sequence, e.g., A2104, are footprint compatible with all other UltraScale architecture-based devices with the same sequence. The footprint compatible devices within this family are outlined. See the UltraScale Architecture Product Selection Guide for details on inter-family migration.



# **Zynq UltraScale+: CG Device Feature Summary**

Table 11: Zynq UltraScale+: CG Device Feature Summary

|                                         | ZU2CG        | ZU3CG                                                                                                       | ZU4CG                      | ZU5CG                            | ZU6CG                        | ZU7CG            | ZU9CG          |  |  |  |
|-----------------------------------------|--------------|-------------------------------------------------------------------------------------------------------------|----------------------------|----------------------------------|------------------------------|------------------|----------------|--|--|--|
| Application Processing Unit             | Dual-core AR | RM Cortex-A53                                                                                               | MPCore with C<br>32KB/32KE | oreSight; NEOI<br>3 L1 Cache, 1M | N & Single/Dou<br>B L2 Cache | ıble Precision F | loating Point; |  |  |  |
| Real-Time Processing Unit               | Dua          | Dual-core ARM Cortex-R5 with CoreSight; Single/Double Precision Floating Point; 32KB/32KB L1 Cache, and TCM |                            |                                  |                              |                  |                |  |  |  |
| Embedded and External<br>Memory         | 256K         | 256KB On-Chip Memory w/ECC; External DDR4; DDR3; DDR3L; LPDDR4; LPDDR3; External Quad-SPI; NAND; eMMC       |                            |                                  |                              |                  |                |  |  |  |
| General Connectivity                    | 214 PS I/O;  | UART; CAN; U                                                                                                | SB 2.0; I2C; S             | PI; 32b GPIO;<br>Timer Counters  | Real Time Cloc               | k; WatchDog T    | imers; Triple  |  |  |  |
| High-Speed Connectivity                 | 4            | PS-GTR; PCI                                                                                                 | Gen1/2; Seria              | al ATA 3.1; Disp                 | olayPort 1.2a;               | USB 3.0; SGMI    |                |  |  |  |
| System Logic Cells                      | 103,320      | 154,350                                                                                                     | 192,150                    | 256,200                          | 469,446                      | 504,000          | 599,550        |  |  |  |
| CLB Flip-Flops                          | 94,464       | 141,120                                                                                                     | 175,680                    | 234,240                          | 429,208                      | 460,800          | 548,160        |  |  |  |
| CLB LUTs                                | 47,232       | 70,560                                                                                                      | 87,840                     | 117,120                          | 214,604                      | 230,400          | 274,080        |  |  |  |
| Distributed RAM (Mb)                    | 1.2          | 1.8                                                                                                         | 2.6                        | 3.5                              | 6.9                          | 6.2              | 8.8            |  |  |  |
| Block RAM Blocks                        | 150          | 216                                                                                                         | 128                        | 144                              | 714                          | 312              | 912            |  |  |  |
| Block RAM (Mb)                          | 5.3          | 7.6                                                                                                         | 4.5                        | 5.1                              | 25.1                         | 11.0             | 32.1           |  |  |  |
| UltraRAM Blocks                         | 0            | 0                                                                                                           | 48                         | 64                               | 0                            | 96               | 0              |  |  |  |
| UltraRAM (Mb)                           | 0            | 0                                                                                                           | 14.0                       | 18.0                             | 0                            | 27.0             | 0              |  |  |  |
| DSP Slices                              | 240          | 360                                                                                                         | 728                        | 1,248                            | 1,973                        | 1,728            | 2,520          |  |  |  |
| CMTs                                    | 3            | 3                                                                                                           | 4                          | 4                                | 4                            | 8                | 4              |  |  |  |
| Max. HP I/O <sup>(1)</sup>              | 156          | 156                                                                                                         | 156                        | 156                              | 208                          | 416              | 208            |  |  |  |
| Max. HD I/O <sup>(2)</sup>              | 96           | 96                                                                                                          | 96                         | 96                               | 120                          | 48               | 120            |  |  |  |
| System Monitor                          | 2            | 2                                                                                                           | 2                          | 2                                | 2                            | 2                | 2              |  |  |  |
| GTH Transceiver 16.3Gb/s <sup>(3)</sup> | 0            | 0                                                                                                           | 16                         | 16                               | 24                           | 24               | 24             |  |  |  |
| GTY Transceivers 32.75Gb/s              | 0            | 0                                                                                                           | 0                          | 0                                | 0                            | 0                | 0              |  |  |  |
| Transceiver Fractional PLLs             | 0            | 0                                                                                                           | 8                          | 8                                | 12                           | 12               | 12             |  |  |  |
| PCIe Gen3 x16 and Gen4 x8               | 0            | 0                                                                                                           | 2                          | 2                                | 0                            | 2                | 0              |  |  |  |
| 150G Interlaken                         | 0            | 0                                                                                                           | 0                          | 0                                | 0                            | 0                | 0              |  |  |  |
| 100G Ethernet w/ RS-FEC                 | 0            | 0                                                                                                           | 0                          | 0                                | 0                            | 0                | 0              |  |  |  |

#### Notes:

- 1. HP = High-performance I/O with support for I/O voltage from 1.0V to 1.8V.
- 2. HD = High-density I/O with support for I/O voltage from 1.2V to 3.3V.
- 3. GTH transceivers in the SFVC784 package support data rates up to 12.5Gb/s. See Table 12.



## Zynq UltraScale+: CG Device-Package Combinations and Maximum I/Os

Table 12: Zynq UltraScale+: CG Device-Package Combinations and Maximum I/Os

| Package                | Package            | ZU2CG              | ZU3CG              | ZU4CG              | ZU5CG              | ZU6CG              | ZU7CG              | ZU9CG              |
|------------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|
| (1)(2)(3)(4)(5)        | Dimensions<br>(mm) | HD, HP<br>GTH, GTY |
| SBVA484 <sup>(6)</sup> | 19x19              | 24, 58<br>0, 0     | 24, 58<br>0, 0     |                    |                    |                    |                    |                    |
| SFVA625                | 21x21              | 24, 156<br>0, 0    | 24, 156<br>0, 0    |                    |                    |                    |                    |                    |
| SFVC784 <sup>(7)</sup> | 23x23              | 96, 156<br>0, 0    | 96, 156<br>0, 0    | 96, 156<br>4, 0    | 96, 156<br>4, 0    |                    |                    |                    |
| FBVB900                | 31x31              |                    |                    | 48, 156<br>16, 0   | 48, 156<br>16, 0   |                    | 48, 156<br>16, 0   |                    |
| FFVC900                | 31x31              |                    |                    |                    |                    | 48, 156<br>16, 0   |                    | 48, 156<br>16, 0   |
| FFVB1156               | 35x35              |                    |                    |                    |                    | 120, 208<br>24, 0  |                    | 120, 208<br>24, 0  |
| FFVC1156               | 35x35              |                    |                    |                    |                    |                    | 48, 312<br>20, 0   |                    |
| FFVF1517               | 40x40              |                    |                    |                    |                    |                    | 48, 416<br>24, 0   |                    |

#### Notes:

- 1. Go to Ordering Information for package designation details.
- 2. FB/FF packages have 1.0mm ball pitch. SB/SF packages have 0.8mm ball pitch.
- 3. All device package combinations bond out 4 PS-GTR transceivers.
- All device package combinations bond out 214 PS I/O except ZU2CG and ZU3CG in the SBVA484 and SFVA625 packages, which bond out 170 PS I/Os.
- 5. Packages with the same last letter and number sequence, e.g., A484, are footprint compatible with all other UltraScale architecture-based devices with the same sequence. The footprint compatible devices within this family are outlined.
- 6. All 58 HP I/O pins are powered by the same  $V_{CCO}$  supply.
- 7. GTH transceivers in the SFVC784 package support data rates up to 12.5Gb/s.



# **Zynq UltraScale+: EG Device Feature Summary**

Table 13: Zynq UltraScale+: EG Device Feature Summary

|                                         | ZU2EG   | ZU3EG                                                                                                       | ZU4EG        | ZU5EG         | ZU6EG         | ZU7EG          | ZU9EG          | ZU11EG         | ZU15EG         | ZU17EG     | ZU19EG    |
|-----------------------------------------|---------|-------------------------------------------------------------------------------------------------------------|--------------|---------------|---------------|----------------|----------------|----------------|----------------|------------|-----------|
| Application Processing Unit             | Quad-co | re ARM Corte                                                                                                | x-A53 MPCore | e with CoreSi | ght; NEON & : | Single/Double  | Precision Flo  | ating Point; 3 | 2KB/32KB L1    | Cache, 1MB | L2 Cache  |
| Real-Time Processing Unit               |         | Dual-core ARM Cortex-R5 with CoreSight; Single/Double Precision Floating Point; 32KB/32KB L1 Cache, and TCM |              |               |               |                |                |                |                |            |           |
| Embedded and External<br>Memory         |         | 256KB On-Chip Memory w/ECC; External DDR4; DDR3; DDR3L; LPDDR4; LPDDR3;<br>External Quad-SPI; NAND; eMMC    |              |               |               |                |                |                |                |            |           |
| General Connectivity                    |         | 214 PS I/0                                                                                                  | D; UART; CAN | ; USB 2.0; 12 | C; SPI; 32b ( | GPIO; Real Tir | me Clock; Wa   | tchDog Timer   | s; Triple Time | r Counters |           |
| High-Speed Connectivity                 |         |                                                                                                             | 4 PS         | S-GTR; PCIe C | Gen1/2; Seria | I ATA 3.1; Dis | splayPort 1.2a | ; USB 3.0; S0  | GMII           |            |           |
| Graphic Processing Unit                 |         |                                                                                                             |              |               | ARM Mali-4    | 100 MP2; 64K   | B L2 Cache     |                |                |            |           |
| System Logic Cells                      | 103,320 | 154,350                                                                                                     | 192,150      | 256,200       | 469,446       | 504,000        | 599,550        | 653,100        | 746,550        | 926,194    | 1,143,450 |
| CLB Flip-Flops                          | 94,464  | 141,120                                                                                                     | 175,680      | 234,240       | 429,208       | 460,800        | 548,160        | 597,120        | 682,560        | 846,806    | 1,045,440 |
| CLB LUTs                                | 47,232  | 70,560                                                                                                      | 87,840       | 117,120       | 214,604       | 230,400        | 274,080        | 298,560        | 341,280        | 423,403    | 522,720   |
| Distributed RAM (Mb)                    | 1.2     | 1.8                                                                                                         | 2.6          | 3.5           | 6.9           | 6.2            | 8.8            | 9.1            | 11.3           | 8.0        | 9.8       |
| Block RAM Blocks                        | 150     | 216                                                                                                         | 128          | 144           | 714           | 312            | 912            | 600            | 744            | 796        | 984       |
| Block RAM (Mb)                          | 5.3     | 7.6                                                                                                         | 4.5          | 5.1           | 25.1          | 11.0           | 32.1           | 21.1           | 26.2           | 28.0       | 34.6      |
| UltraRAM Blocks                         | 0       | 0                                                                                                           | 48           | 64            | 0             | 96             | 0              | 80             | 112            | 102        | 128       |
| UltraRAM (Mb)                           | 0       | 0                                                                                                           | 14.0         | 18.0          | 0             | 27.0           | 0              | 22.5           | 31.5           | 28.7       | 36.0      |
| DSP Slices                              | 240     | 360                                                                                                         | 728          | 1,248         | 1,973         | 1,728          | 2,520          | 2,928          | 3,528          | 1,590      | 1,968     |
| CMTs                                    | 3       | 3                                                                                                           | 4            | 4             | 4             | 8              | 4              | 8              | 4              | 11         | 11        |
| Max. HP I/O <sup>(1)</sup>              | 156     | 156                                                                                                         | 156          | 156           | 208           | 416            | 208            | 416            | 208            | 572        | 572       |
| Max. HD I/O <sup>(2)</sup>              | 96      | 96                                                                                                          | 96           | 96            | 120           | 48             | 120            | 96             | 120            | 96         | 96        |
| System Monitor                          | 2       | 2                                                                                                           | 2            | 2             | 2             | 2              | 2              | 2              | 2              | 2          | 2         |
| GTH Transceiver 16.3Gb/s <sup>(3)</sup> | 0       | 0                                                                                                           | 16           | 16            | 24            | 24             | 24             | 32             | 24             | 44         | 44        |
| GTY Transceivers 32.75Gb/s              | 0       | 0                                                                                                           | 0            | 0             | 0             | 0              | 0              | 16             | 0              | 28         | 28        |
| Transceiver Fractional PLLs             | 0       | 0                                                                                                           | 8            | 8             | 12            | 12             | 12             | 24             | 12             | 36         | 36        |
| PCIe Gen3 x16 and Gen4 x8               | 0       | 0                                                                                                           | 2            | 2             | 0             | 2              | 0              | 4              | 0              | 4          | 5         |
| 150G Interlaken                         | 0       | 0                                                                                                           | 0            | 0             | 0             | 0              | 0              | 1              | 0              | 2          | 4         |
| 100G Ethernet w/ RS-FEC                 | 0       | 0                                                                                                           | 0            | 0             | 0             | 0              | 0              | 2              | 0              | 2          | 4         |

#### Notes

- 1. HP = High-performance I/O with support for I/O voltage from 1.0V to 1.8V.
- 2. HD = High-density I/O with support for I/O voltage from 1.2V to 3.3V.
- 3. GTH transceivers in the SFVC784 package support data rates up to 12.5Gb/s. See Table 14.



# **Zynq UltraScale+: EG Device Feature Summary**

Table 15: Zynq UltraScale+: EV Device Feature Summary

|                                         | ZU4EV                                                                                                       | ZU5EV                                                                                                                           | ZU7EV               |  |  |  |  |  |  |
|-----------------------------------------|-------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------|---------------------|--|--|--|--|--|--|
| Application Processing Unit             | Quad-core ARM Cortex-A53 MPC                                                                                | Quad-core ARM Cortex-A53 MPCore with CoreSight; NEON & Single/Double Precision Floating Point; 32KB/32KB L1 Cache, 1MB L2 Cache |                     |  |  |  |  |  |  |
| Real-Time Processing Unit               | Dual-core ARM Cortex-                                                                                       | Dual-core ARM Cortex-R5 with CoreSight; Single/Double Precision Floating Point; 32KB/32KB L1 Cache, and TCM                     |                     |  |  |  |  |  |  |
| Embedded and External<br>Memory         | 256KB On-Chip Memory w/ECC; External DDR4; DDR3; DDR3L; LPDDR4; LPDDR3; External Quad-SPI; NAND; eMMC       |                                                                                                                                 |                     |  |  |  |  |  |  |
| General Connectivity                    | 214 PS I/O; UART; CAN; USB 2.0; I2C; SPI; 32b GPIO; Real Time Clock; WatchDog Timers; Triple Timer Counters |                                                                                                                                 |                     |  |  |  |  |  |  |
| High-Speed Connectivity                 | 4 PS-GTR; PCIe Ger                                                                                          | n1/2; Serial ATA 3.1; DisplayPort 1                                                                                             | .2a; USB 3.0; SGMII |  |  |  |  |  |  |
| Graphic Processing Unit                 |                                                                                                             | ARM Mali-400 MP2; 64KB L2 Cache                                                                                                 | 9                   |  |  |  |  |  |  |
| Video Codec                             | 1                                                                                                           | 1                                                                                                                               | 1                   |  |  |  |  |  |  |
| System Logic Cells                      | 192,150                                                                                                     | 256,200                                                                                                                         | 504,000             |  |  |  |  |  |  |
| CLB Flip-Flops                          | 175,680                                                                                                     | 234,240                                                                                                                         | 460,800             |  |  |  |  |  |  |
| CLB LUTs                                | 87,840                                                                                                      | 117,120                                                                                                                         | 230,400             |  |  |  |  |  |  |
| Distributed RAM (Mb)                    | 2.6                                                                                                         | 3.5                                                                                                                             | 6.2                 |  |  |  |  |  |  |
| Block RAM Blocks                        | 128                                                                                                         | 144                                                                                                                             | 312                 |  |  |  |  |  |  |
| Block RAM (Mb)                          | 4.5                                                                                                         | 5.1                                                                                                                             | 11.0                |  |  |  |  |  |  |
| UltraRAM Blocks                         | 48                                                                                                          | 64                                                                                                                              | 96                  |  |  |  |  |  |  |
| UltraRAM (Mb)                           | 14.0                                                                                                        | 18.0                                                                                                                            | 27.0                |  |  |  |  |  |  |
| DSP Slices                              | 728                                                                                                         | 1,248                                                                                                                           | 1,728               |  |  |  |  |  |  |
| CMTs                                    | 4                                                                                                           | 4                                                                                                                               | 8                   |  |  |  |  |  |  |
| Max. HP I/O <sup>(1)</sup>              | 156                                                                                                         | 156                                                                                                                             | 416                 |  |  |  |  |  |  |
| Max. HD I/O <sup>(2)</sup>              | 96                                                                                                          | 96                                                                                                                              | 48                  |  |  |  |  |  |  |
| System Monitor                          | 2                                                                                                           | 2                                                                                                                               | 2                   |  |  |  |  |  |  |
| GTH Transceiver 16.3Gb/s <sup>(3)</sup> | 16                                                                                                          | 16                                                                                                                              | 24                  |  |  |  |  |  |  |
| GTY Transceivers 32.75Gb/s              | 0                                                                                                           | 0                                                                                                                               | 0                   |  |  |  |  |  |  |
| Transceiver Fractional PLLs             | 8                                                                                                           | 8                                                                                                                               | 12                  |  |  |  |  |  |  |
| PCIe Gen3 x16 and Gen4 x8               | 2                                                                                                           | 2                                                                                                                               | 2                   |  |  |  |  |  |  |
| 150G Interlaken                         | 0                                                                                                           | 0                                                                                                                               | 0                   |  |  |  |  |  |  |
| 100G Ethernet w/ RS-FEC                 | 0                                                                                                           | 0                                                                                                                               | 0                   |  |  |  |  |  |  |

#### Notes

- 1. HP = High-performance I/O with support for I/O voltage from 1.0V to 1.8V.
- 2. HD = High-density I/O with support for I/O voltage from 1.2V to 3.3V.
- 3. GTH transceivers in the SFVC784 package support data rates up to 12.5Gb/s. See Table 16.



contains vertical and horizontal clock routing that span its full height and width. These horizontal and vertical clock routes can be segmented at the clock region boundary to provide a flexible, high-performance, low-power clock distribution architecture. Figure 2 is a representation of an FPGA divided into regions.



Figure 2: Column-Based FPGA Divided into Clock Regions

# **Processing System (PS)**

Zynq UltraScale+ MPSoCs consist of a PS coupled with programmable logic. The contents of the PS varies between the different Zynq UltraScale+ devices. All devices contain an APU, an RPU, and many peripherals for connecting the multiple processing engines to external components. The EG and EV devices contain a GPU and the EV devices contain a video codec unit (VCU). The components of the PS are connected together and to the PL through a multi-layered ARM AMBA AXI non-blocking interconnect that supports multiple simultaneous master-slave transactions. Traffic through the interconnect can be regulated by the quality of service (QoS) block in the interconnect. Twelve dedicated AXI 32-bit, 64-bit, or 128-bit ports connect the PL to high-speed interconnect and DDR in the PS via a FIFO interface.

There are four independently controllable power domains: the PL plus three within the PS (full power, lower power, and battery power domains). Additionally, many peripherals support clock gating and power gating to further reduce dynamic and static power consumption.

## **Application Processing Unit (APU)**

The APU has a feature-rich dual-core or quad-core ARM Cortex-A53 processor. Cortex-A53 cores are 32-bit/64-bit application processors based on ARM-v8A architecture, offering the best performance-to-power ratio. The ARMv8 architecture supports hardware virtualization. Each of the Cortex-A53 cores has: 32KB of instruction and data L1 caches, with parity and ECC protection respectively; a NEON SIMD engine; and a single and double precision floating point unit. In addition to these blocks, the APU consists of a snoop control unit and a 1MB L2 cache with ECC protection to enhance system-level performance. The snoop control unit keeps the L1 caches coherent thus eliminating the need of spending software bandwidth for coherency. The APU also has a built-in interrupt controller supporting virtual interrupts. The APU communicates to the rest of the PS through 128-bit AXI coherent extension (ACE) port via Cache Coherent Interconnect (CCI) block, using the System Memory Management Unit (SMMU). The APU is also connected to the Programmable Logic (PL), through the 128-bit accelerator coherency port



## **General Connectivity**

There are many peripherals in the PS for connecting to external devices over industry standard protocols, including CAN2.0B, USB, Ethernet, I2C, and UART. Many of the peripherals support clock gating and power gating modes to reduce dynamic and static power consumption.

### USB 3.0/2.0

The pair of USB controllers can be configured as host, device, or On-The-Go (OTG). The core is compliant to USB 3.0 specification and supports super, high, full, and low speed modes in all configurations. In host mode, the USB controller is compliant with the Intel XHCI specification. In device mode, it supports up to 12 end points. While operating in USB 3.0 mode, the controller uses the serial transceiver and operates up to 5.0Gb/s. In USB 2.0 mode, the Universal Low Peripheral Interface (ULPI) is used to connect the controller to an external PHY operating up to 480Mb/s. The ULPI is also connected in USB 3.0 mode to support high-speed operations.

### **Ethernet MAC**

The four tri-speed ethernet MACs support 10Mb/s, 100Mb/s, and 1Gb/s operations. The MACs support jumbo frames and time stamping through the interfaces based on IEEE Std 1588v2. The ethernet MACs can be connected through the serial transceivers (SGMII), the MIO (RGMII), or through EMIO (GMII). The GMII interface can be converted to a different interface within the PL.

## **High-Speed Connectivity**

The PS includes four PS-GTR transceivers (transmit and receive), supporting data rates up to 6.0Gb/s and can interface to the peripherals for communication over PCIe, SATA, USB 3.0, SGMII, and DisplayPort.

### **PCIe**

The integrated block for PCIe is compliant with PCI Express base specification 2.1 and supports x1, x2, and x4 configurations as root complex or end point, compliant to transaction ordering rules in both configurations. It has built-in DMA, supports one virtual channel and provides fully configurable base address registers.

### SATA

Users can connect up to two external devices using the two SATA host port interfaces compliant to the SATA 3.1 specification. The SATA interfaces can operate at 1.5Gb/s, 3.0Gb/s, or 6.0Gb/s data rates and are compliant with advanced host controller interface (AHCI) version 1.3 supporting partial and slumber power modes.

### DisplayPort

The DisplayPort controller supports up to two lanes of source-only DisplayPort compliant with VESA DisplayPort v1.2a specification (source only) at 1.62Gb/s, 2.7Gb/s, and 5.4Gb/s data rates. The controller supports single stream transport (SST); video resolution up to 4Kx2K at a 30Hz frame rate; video formats Y-only, YCbCr444, YCbCr422, YCbCr420, RGB, YUV444, YUV422, xvYCC, and pixel color depth of 6, 8, 10, and 12 bits per color component.



## **Graphics Processing Unit (GPU)**

The dedicated ARM Mali-400 MP2 GPU in the PS supports 2D and 3D graphics acceleration up to 1080p resolution. The Mali-400 supports OpenGL ES 1.1 and 2.0 for 3D graphics and Open VG 1.1 standards for 2D vector graphics. It has a geometry processor (GP) and 2 pixel processors to perform tile rendering operations in parallel. It has dedicated Memory management units for GP and pixel processors, which supports 4 KB page size. The GPU also has 64KB level-2 (L2) read-only cache. It supports 4X and 16X Full scene Anti-Aliasing (FSAA). It is fully autonomous, enabling maximum parallelization between APU and GPU. It has built-in hardware texture decompression, allowing the texture to remain compressed (in ETC format) in graphics hardware and decompress the required samples on the fly. It also supports efficient alpha blending of multiple layers in hardware without additional bandwidth consumption. It has a pixel fill rate of 2Mpixel/sec/MHz and a triangle rate of 0.1Mvertex/sec/MHz. The GPU supports extensive texture format for RGBA 8888, 565, and 1556 in Mono 8, 16, and YUV formats. For power sensitive applications, the GPU supports clock and power gating for each GP, pixel processors, and L2 cache. During power gating, GPU does not consume any static or dynamic power; during clock gating, it only consumes static power.

### Video Codec Unit (VCU)

The video codec unit (VCU) provides multi-standard video encoding and decoding capabilities, including: High Efficiency Video Coding (HEVC), i.e., H.265; and Advanced Video Coding (AVC), i.e., H.264 standards. The VCU is capable of simultaneous encode and decode at rates up to 4Kx2K at 60 frames per second (fps) (approx. 600Mpixel/sec) or 8Kx4K at a reduced frame rate (~15fps).

# Input/Output

All UltraScale devices, whether FPGA or MPSoC, have I/O pins for communicating to external components. In addition, in the MPSoC's PS, there are another 78 I/Os that the I/O peripherals use to communicate to external components, referred to as multiplexed I/O (MIO). If more than 78 pins are required by the I/O peripherals, the I/O pins in the PL can be used to extend the MPSoC interfacing capability, referred to as extended MIO (EMIO).

The number of I/O pins in UltraScale FPGAs and in the programmable logic of UltraScale+ MPSoCs varies depending on device and package. Each I/O is configurable and can comply with a large number of I/O standards. The I/Os are classed as high-range (HR), high-performance (HP), or high-density (HD). The HR I/Os offer the widest range of voltage support, from 1.2V to 3.3V. The HP I/Os are optimized for highest performance operation, from 1.0V to 1.8V. The HD I/Os are reduced-feature I/Os organized in banks of 24, providing voltage support from 1.2V to 3.3V.

All I/O pins are organized in banks, with 52 HP or HR pins per bank or 24 HD pins per bank. Each bank has one common  $V_{CCO}$  output buffer power supply, which also powers certain input buffers. In addition, HR banks can be split into two half-banks, each with their own  $V_{CCO}$  supply. Some single-ended input buffers require an internally generated or an externally applied reference voltage ( $V_{REF}$ ).  $V_{REF}$  pins can be driven directly from the PCB or internally generated using the internal  $V_{REF}$  generator circuitry present in each bank.



## I/O Electrical Characteristics

Single-ended outputs use a conventional CMOS push/pull output structure driving High towards  $V_{CCO}$  or Low towards ground, and can be put into a high-Z state. The system designer can specify the slew rate and the output strength. The input is always active but is usually ignored while the output is active. Each pin can optionally have a weak pull-up or a weak pull-down resistor.

Most signal pin pairs can be configured as differential input pairs or output pairs. Differential input pin pairs can optionally be terminated with a  $100\Omega$  internal resistor. All UltraScale devices support differential standards beyond LVDS, including RSDS, BLVDS, differential SSTL, and differential HSTL. Each of the I/Os supports memory I/O standards, such as single-ended and differential HSTL as well as single-ended and differential SSTL. UltraScale+ families add support for MIPI with a dedicated D-PHY in the I/O bank.

### 3-State Digitally Controlled Impedance and Low Power I/O Features

The 3-state Digitally Controlled Impedance (T\_DCI) can control the output drive impedance (series termination) or can provide parallel termination of an input signal to  $V_{CCO}$  or split (Thevenin) termination to  $V_{CCO}/2$ . This allows users to eliminate off-chip termination for signals using T\_DCI. In addition to board space savings, the termination automatically turns off when in output mode or when 3-stated, saving considerable power compared to off-chip termination. The I/Os also have low power modes for IBUF and IDELAY to provide further power savings, especially when used to implement memory interfaces.

## I/O Logic

### Input and Output Delay

All inputs and outputs can be configured as either combinatorial or registered. Double data rate (DDR) is supported by all inputs and outputs. Any input or output can be individually delayed by up to 1,250ps of delay with a resolution of 5–15ps. Such delays are implemented as IDELAY and ODELAY. The number of delay steps can be set by configuration and can also be incremented or decremented while in use. The IDELAY and ODELAY can be cascaded together to double the amount of delay in a single direction.

### **ISERDES** and **OSERDES**

Many applications combine high-speed, bit-serial I/O with slower parallel operation inside the device. This requires a serializer and deserializer (SerDes) inside the I/O logic. Each I/O pin possesses an IOSERDES (ISERDES and OSERDES) capable of performing serial-to-parallel or parallel-to-serial conversions with programmable widths of 2, 4, or 8 bits. These I/O logic features enable high-performance interfaces, such as Gigabit Ethernet/1000BaseX/SGMII, to be moved from the transceivers to the SelectIO interface.



# **High-Speed Serial Transceivers**

Serial data transmission between devices on the same PCB, over backplanes, and across even longer distances is becoming increasingly important for scaling to 100Gb/s and 400Gb/s line cards. Specialized dedicated on-chip circuitry and differential I/O capable of coping with the signal integrity issues are required at these high data rates.

Three types of transceivers are used in the UltraScale architecture: GTH and GTY in FPGAs and MPSoC PL, and PS-GTR in the MPSoC PS. All transceivers are arranged in groups of four, known as a transceiver Quad. Each serial transceiver is a combined transmitter and receiver. Table 17 compares the available transceivers.

Table 17: Transceiver Information

|                      | Kintex U                | ItraScale               |                         | intex<br>aScale+                                                 | Virtex                  | Virtex UltraScale                                                |                                                                  | Zynq UltraScale+                      |                         |                                                                         |  |
|----------------------|-------------------------|-------------------------|-------------------------|------------------------------------------------------------------|-------------------------|------------------------------------------------------------------|------------------------------------------------------------------|---------------------------------------|-------------------------|-------------------------------------------------------------------------|--|
| Туре                 | GTH                     | GTY                     | GTH                     | GTY                                                              | GTH                     | GTY                                                              | GTY                                                              | PS-GTR                                | GTH                     | GTY                                                                     |  |
| Qty                  | 16–64                   | 0-32                    | 20–60                   | 0–60                                                             | 20–60                   | 0–60                                                             | 40–128                                                           | 4                                     | 0-44                    | 0–28                                                                    |  |
| Max.<br>Data<br>Rate | 16.3Gb/s                | 16.3Gb/s                | 16.3Gb/s                | 32.75Gb/s                                                        | 16.3Gb/s                | 30.5Gb/s                                                         | 32.75Gb/s                                                        | 6.0Gb/s                               | 16.3Gb/s                | 32.75Gb/s                                                               |  |
| Min.<br>Data<br>Rate | 0.5Gb/s                 | 0.5Gb/s                 | 0.5Gb/s                 | 0.5Gb/s                                                          | 0.5Gb/s                 | 0.5Gb/s                                                          | 0.5Gb/s                                                          | 1.25Gb/s                              | 0.5Gb/s                 | 0.5Gb/s                                                                 |  |
| Key<br>Apps          | Backplane PCIe Gen4 HMC | Backplane PCIe Gen4 HMC | Backplane PCIe Gen4 HMC | • 100G+ Optics<br>• Chip-to-Chip<br>• 25G+<br>Backplane<br>• HMC | Backplane PCIe Gen4 HMC | • 100G+ Optics<br>• Chip-to-Chip<br>• 25G+<br>Backplane<br>• HMC | • 100G+ Optics<br>• Chip-to-Chip<br>• 25G+<br>Backplane<br>• HMC | • PCIe<br>Gen2<br>• USB<br>• Ethernet | Backplane PCIe Gen4 HMC | • 100G+<br>Optics<br>• Chip-to-<br>Chip<br>• 25G+<br>Backplane<br>• HMC |  |

The following information in this section pertains to the GTH and GTY only.

The serial transmitter and receiver are independent circuits that use an advanced phase-locked loop (PLL) architecture to multiply the reference frequency input by certain programmable numbers between 4 and 25 to become the bit-serial data clock. Each transceiver has a large number of user-definable features and parameters. All of these can be defined during device configuration, and many can also be modified during operation.



# **Integrated Interface Blocks for PCI Express Designs**

The UltraScale architecture includes integrated blocks for PCIe technology that can be configured as an Endpoint or Root Port. UltraScale devices are compliant to the PCI Express Base Specification Revision 3.0. UltraScale+ devices are compliant to the PCI Express Base Specification Revision 3.1 for Gen3 and lower data rates, and compatible with the PCI Express Base Specification Revision 4.0 (rev 0.5) for Gen4 data rates.

The Root Port can be used to build the basis for a compatible Root Complex, to allow custom chip-to-chip communication via the PCI Express protocol, and to attach ASSP Endpoint devices, such as Ethernet Controllers or Fibre Channel HBAs, to the FPGA or MPSoC.

This block is highly configurable to system design requirements and can operate up to the maximum lane widths and data rates listed in Table 18.

Table 18: PCIe Maximum Configurations

|                              | Kintex<br>UltraScale | Kintex<br>UltraScale+ | Virtex<br>UltraScale | Virtex<br>UltraScale+ | Zynq<br>UltraScale+ |
|------------------------------|----------------------|-----------------------|----------------------|-----------------------|---------------------|
| Gen1 (2.5Gb/s)               | x8                   | x16                   | x8                   | x16                   | x16                 |
| Gen2 (5Gb/s)                 | x8                   | x16                   | x8                   | x16                   | x16                 |
| Gen3 (8Gb/s)                 | x8                   | x16                   | x8                   | x16                   | x16                 |
| Gen4 (16Gb/s) <sup>(1)</sup> |                      | x8                    |                      | x8                    | x8                  |

#### Notes:

For high-performance applications, advanced buffering techniques of the block offer a flexible maximum payload size of up to 1,024 bytes. The integrated block interfaces to the integrated high-speed transceivers for serial connectivity and to block RAMs for data buffering. Combined, these elements implement the Physical Layer, Data Link Layer, and Transaction Layer of the PCI Express protocol.

Xilinx provides a light-weight, configurable, easy-to-use LogiCORE™ IP wrapper that ties the various building blocks (the integrated block for PCIe, the transceivers, block RAM, and clocking resources) into an Endpoint or Root Port solution. The system designer has control over many configurable parameters: link width and speed, maximum payload size, FPGA or MPSoC logic interface speeds, reference clock frequency, and base address register decoding and filtering.

<sup>1.</sup> Transceivers in Kintex UltraScale and Virtex UltraScale devices are capable of operating at Gen4 data rates.



# Cache Coherent Interconnect for Accelerators (CCIX)

CCIX is a chip-to-chip interconnect operating at data rates up to 25Gb/s that allows two or more devices to share memory in a cache coherent manner. Using PCIe for the transport layer, CCIX can operate at several standard data rates (2.5, 5, 8, and 16Gb/s) with an additional high-speed 25Gb/s option. The specification employs a subset of full coherency protocols and ensures that FPGAs used as accelerators can coherently share data with processors using different instruction set architectures.

Virtex UltraScale+ HBM devices support CCIX data rates up to 16Gb/s and contain four CCIX ports and at least four integrated blocks for PCIe. Each CCIX port requires the use of one integrated block for PCIe. If not used with a CCIX port, the integrated blocks for PCIe can still be used for PCIe communication.

# **Integrated Block for Interlaken**

Some UltraScale architecture-based devices include integrated blocks for Interlaken. Interlaken is a scalable chip-to-chip interconnect protocol designed to enable transmission speeds from 10Gb/s to 150Gb/s. The Interlaken integrated block in the UltraScale architecture is compliant to revision 1.2 of the Interlaken specification with data striping and de-striping across 1 to 12 lanes. Permitted configurations are: 1 to 12 lanes at up to 12.5Gb/s and 1 to 6 lanes at up to 25.78125Gb/s, enabling flexible support for up to 150Gb/s per integrated block. With multiple Interlaken blocks, certain UltraScale devices enable easy, reliable Interlaken switches and bridges.

# **Integrated Block for 100G Ethernet**

Compliant to the IEEE Std 802.3ba, the 100G Ethernet integrated blocks in the UltraScale architecture provide low latency 100Gb/s Ethernet ports with a wide range of user customization and statistics gathering. With support for 10 x 10.3125Gb/s (CAUI) and 4 x 25.78125Gb/s (CAUI-4) configurations, the integrated block includes both the 100G MAC and PCS logic with support for IEEE Std 1588v2 1-step and 2-step hardware timestamping.

In UltraScale+ devices, the 100G Ethernet blocks contain a Reed Solomon Forward Error Correction (RS-FEC) block, compliant to IEEE Std 802.3bj, that can be used with the Ethernet block or stand alone in user applications. These families also support OTN mapping mode in which the PCS can be operated without using the MAC.



# Stacked Silicon Interconnect (SSI) Technology

Many challenges associated with creating high-capacity devices are addressed by Xilinx with the second generation of the pioneering 3D SSI technology. SSI technology enables multiple super-logic regions (SLRs) to be combined on a passive interposer layer, using proven manufacturing and assembly techniques from industry leaders, to create a single device with more than 20,000 low-power inter-SLR connections. Dedicated interface tiles within the SLRs provide ultra-high bandwidth, low latency connectivity to other SLRs. Table 19 shows the number of SLRs in devices that use SSI technology and their dimensions.

|                         |       | tex<br>Scale | Virtex<br>UltraScale |       |       |       | Virtex<br>UltraScale+ |      |      |       |       |       |       |       |       |
|-------------------------|-------|--------------|----------------------|-------|-------|-------|-----------------------|------|------|-------|-------|-------|-------|-------|-------|
| Device                  | KU085 | KU115        | VU125                | VU160 | VU190 | VU440 | VU5P                  | VU7P | VU9P | VU11P | VU13P | VU31P | VU33P | VU35P | VU37P |
| # SLRs                  | 2     | 2            | 2                    | 3     | 3     | 3     | 2                     | 2    | 3    | 3     | 4     | 1     | 1     | 2     | 3     |
| SLR Width (in regions)  | 6     | 6            | 6                    | 6     | 6     | 9     | 6                     | 6    | 6    | 8     | 8     | 8     | 8     | 8     | 8     |
| SLR Height (in regions) | 5     | 5            | 5                    | 5     | 5     | 5     | 5                     | 5    | 5    | 4     | 4     | 4     | 4     | 4     | 4     |

Table 19: UltraScale and UltraScale + 3D IC SLR Count and Dimensions

## **Clock Management**

The clock generation and distribution components in UltraScale devices are located adjacent to the columns that contain the memory interface and input and output circuitry. This tight coupling of clocking and I/O provides low-latency clocking to the I/O for memory interfaces and other I/O protocols. Within every clock management tile (CMT) resides one mixed-mode clock manager (MMCM), two PLLs, clock distribution buffers and routing, and dedicated circuitry for implementing external memory interfaces.

## Mixed-Mode Clock Manager

The mixed-mode clock manager (MMCM) can serve as a frequency synthesizer for a wide range of frequencies and as a jitter filter for incoming clocks. At the center of the MMCM is a voltage-controlled oscillator (VCO), which speeds up and slows down depending on the input voltage it receives from the phase frequency detector (PFD).

There are three sets of programmable frequency dividers (D, M, and O) that are programmable by configuration and during normal operation via the Dynamic Reconfiguration Port (DRP). The pre-divider D reduces the input frequency and feeds one input of the phase/frequency comparator. The feedback divider M acts as a multiplier because it divides the VCO output frequency before feeding the other input of the phase comparator. D and M must be chosen appropriately to keep the VCO within its specified frequency range. The VCO has eight equally-spaced output phases (0°, 45°, 90°, 135°, 180°, 225°, 270°, and 315°). Each phase can be selected to drive one of the output dividers, and each divider is programmable by configuration to divide by any integer from 1 to 128.

The MMCM has three input-jitter filter options: low bandwidth, high bandwidth, or optimized mode. Low-Bandwidth mode has the best jitter attenuation. High-Bandwidth mode has the best phase offset. Optimized mode allows the tools to find the best setting.



The MMCM can have a fractional counter in either the feedback path (acting as a multiplier) or in one output path. Fractional counters allow non-integer increments of 1/8 and can thus increase frequency synthesis capabilities by a factor of 8. The MMCM can also provide fixed or dynamic phase shift in small increments that depend on the VCO frequency. At 1,600MHz, the phase-shift timing increment is 11.2ps.

### **PLL**

With fewer features than the MMCM, the two PLLs in a clock management tile are primarily present to provide the necessary clocks to the dedicated memory interface circuitry. The circuit at the center of the PLLs is similar to the MMCM, with PFD feeding a VCO and programmable M, D, and O counters. There are two divided outputs to the device fabric per PLL as well as one clock plus one enable signal to the memory interface circuitry.

UltraScale+ MPSoCs are equipped with five additional PLLs in the PS for independently configuring the four primary clock domains with the PS: the APU, the RPU, the DDR controller, and the I/O peripherals.

## **Clock Distribution**

Clocks are distributed throughout UltraScale devices via buffers that drive a number of vertical and horizontal tracks. There are 24 horizontal clock routes per clock region and 24 vertical clock routes per clock region with 24 additional vertical clock routes adjacent to the MMCM and PLL. Within a clock region, clock signals are routed to the device logic (CLBs, etc.) via 16 gateable leaf clocks.

Several types of clock buffers are available. The BUFGCE and BUFCE\_LEAF buffers provide clock gating at the global and leaf levels, respectively. BUFGCTRL provides glitchless clock muxing and gating capability. BUFGCE\_DIV has clock gating capability and can divide a clock by 1 to 8. BUFG\_GT performs clock division from 1 to 8 for the transceiver clocks. In MPSoCs, clocks can be transferred from the PS to the PL using dedicated buffers.

# **Memory Interfaces**

Memory interface data rates continue to increase, driving the need for dedicated circuitry that enables high performance, reliable interfacing to current and next-generation memory technologies. Every UltraScale device includes dedicated physical interfaces (PHY) blocks located between the CMT and I/O columns that support implementation of high-performance PHY blocks to external memories such as DDR4, DDR3, QDRII+, and RLDRAM3. The PHY blocks in each I/O bank generate the address/control and data bus signaling protocols as well as the precision clock/data alignment required to reliably communicate with a variety of high-performance memory standards. Multiple I/O banks can be used to create wider memory interfaces.

As well as external parallel memory interfaces, UltraScale FPGAs and MPSoCs can communicate to external serial memories, such as Hybrid Memory Cube (HMC), via the high-speed serial transceivers. All transceivers in the UltraScale architecture support the HMC protocol, up to 15Gb/s line rates. UltraScale devices support the highest bandwidth HMC configuration of 64 lanes with a single FPGA.



## **Block RAM**

Every UltraScale architecture-based device contains a number of 36 Kb block RAMs, each with two completely independent ports that share only the stored data. Each block RAM can be configured as one 36Kb RAM or two independent 18Kb RAMs. Each memory access, read or write, is controlled by the clock. Connections in every block RAM column enable signals to be cascaded between vertically adjacent block RAMs, providing an easy method to create large, fast memory arrays, and FIFOs with greatly reduced power consumption.

All inputs, data, address, clock enables, and write enables are registered. The input address is always clocked (unless address latching is turned off), retaining data until the next operation. An optional output data pipeline register allows higher clock rates at the cost of an extra cycle of latency. During a write operation, the data output can reflect either the previously stored data or the newly written data, or it can remain unchanged. Block RAM sites that remain unused in the user design are automatically powered down to reduce total power consumption. There is an additional pin on every block RAM to control the dynamic power gating feature.

### **Programmable Data Width**

Each port can be configured as  $32K \times 1$ ;  $16K \times 2$ ;  $8K \times 4$ ;  $4K \times 9$  (or 8);  $2K \times 18$  (or 16);  $1K \times 36$  (or 32); or  $512 \times 72$  (or 64). Whether configured as block RAM or FIFO, the two ports can have different aspect ratios without any constraints. Each block RAM can be divided into two completely independent 18Kb block RAMs that can each be configured to any aspect ratio from  $16K \times 1$  to  $512 \times 36$ . Everything described previously for the full 36Kb block RAM also applies to each of the smaller 18Kb block RAMs. Only in simple dual-port (SDP) mode can data widths of greater than 18bits (18Kb RAM) or 36 bits (36Kb RAM) be accessed. In this mode, one port is dedicated to read operation, the other to write operation. In SDP mode, one side (read or write) can be variable, while the other is fixed to 32/36 or 64/72. Both sides of the dual-port 36Kb RAM can be of variable width.

### **Error Detection and Correction**

Each 64-bit-wide block RAM can generate, store, and utilize eight additional Hamming code bits and perform single-bit error correction and double-bit error detection (ECC) during the read process. The ECC logic can also be used when writing to or reading from external 64- to 72-bit-wide memories.

### **FIFO Controller**

Each block RAM can be configured as a 36Kb FIFO or an 18Kb FIFO. The built-in FIFO controller for single-clock (synchronous) or dual-clock (asynchronous or multirate) operation increments the internal addresses and provides four handshaking flags: full, empty, programmable full, and programmable empty. The programmable flags allow the user to specify the FIFO counter values that make these flags go active. The FIFO width and depth are programmable with support for different read port and write port widths on a single FIFO. A dedicated cascade path allows for easy creation of deeper FIFOs.



## **UltraRAM**

UltraRAM is a high-density, dual-port, synchronous memory block available in UltraScale+ devices. Both of the ports share the same clock and can address all of the 4K x 72 bits. Each port can independently read from or write to the memory array. UltraRAM supports two types of write enable schemes. The first mode is consistent with the block RAM byte write enable mode. The second mode allows gating the data and parity byte writes separately. UltraRAM blocks can be connected together to create larger memory arrays. Dedicated routing in the UltraRAM column enables the entire column height to be connected together. If additional density is required, all the UltraRAM columns in an SLR can be connected together with a few fabric resources to create single instances of RAM approximately 100Mb in size. This makes UltraRAM an ideal solution for replacing external memories such as SRAM. Cascadable anywhere from 288Kb to 100Mb, UltraRAM provides the flexibility to fulfill many different memory requirements.

### **Error Detection and Correction**

Each 64-bit-wide UltraRAM can generate, store and utilize eight additional Hamming code bits and perform single-bit error correction and double-bit error detection (ECC) during the read process.

# **High Bandwidth Memory (HBM)**

Virtex UltraScale+ HBM devices incorporate 4GB HBM stacks adjacent to the FPGA die. Using stacked silicon interconnect technology, the FPGA communicates to the HBM stacks through memory controllers that connect to dedicated low-inductance interconnect in the silicon interposer. Each Virtex UltraScale+ HBM FPGA contains one or two HBM stacks, resulting in up to 8GB of HBM per FPGA.

The FPGA has 32 HBM AXI interfaces used to communicate with the HBM. Through a built-in switch mechanism, any of the 32 HBM AXI interfaces can access any memory address on either one or both of the HBM stacks due to the flexible addressing feature. This flexible connection between the FPGA and the HBM stacks results in easy floorplanning and timing closure. The memory controllers perform read and write reordering to improve bus efficiency. Data integrity is ensured through error checking and correction (ECC) circuitry.

# **Configurable Logic Block**

Every Configurable Logic Block (CLB) in the UltraScale architecture contains 8 LUTs and 16 flip-flops. The LUTs can be configured as either one 6-input LUT with one output, or as two 5-input LUTs with separate outputs but common inputs. Each LUT can optionally be registered in a flip-flop. In addition to the LUTs and flip-flops, the CLB contains arithmetic carry logic and multiplexers to create wider logic functions.

Each CLB contains one slice. There are two types of slices: SLICEL and SLICEM. LUTs in the SLICEM can be configured as 64-bit RAM, as 32-bit shift registers (SRL32), or as two SRL16s. CLBs in the UltraScale architecture have increased routing and connectivity compared to CLBs in previous-generation Xilinx devices. They also have additional control signals to enable superior register packing, resulting in overall higher device utilization.



Zynq UltraScale+ MPSoCs contain an additional System Monitor block in the PS. See Table 20.

Table 20: Key System Monitor Features

|            | Kintex UltraScale<br>Virtex UltraScale | Kintex UltraScale+<br>Virtex UltraScale+<br>Zynq UltraScale+ MPSoC PL | Zynq UltraScale+ MPSoC PS |
|------------|----------------------------------------|-----------------------------------------------------------------------|---------------------------|
| ADC        | 10-bit 200kSPS                         | 10-bit 200kSPS                                                        | 10-bit 1MSPS              |
| Interfaces | JTAG, I2C, DRP                         | JTAG, I2C, DRP, PMBus                                                 | APB                       |

In FPGAs and the MPSoC PL, sensor outputs and up to 17 user-allocated external analog inputs are digitized using a 10-bit 200 kilo-sample-per-second (kSPS) ADC, and the measurements are stored in registers that can be accessed via internal FPGA (DRP), JTAG, PMBus, or I2C interfaces. The I2C interface and PMBus allow the on-chip monitoring to be easily accessed by the System Manager/Host before and after device configuration.

The System Monitor in the MPSoC PS uses a 10-bit, 1 mega-sample-per-second (MSPS) ADC to digitize the sensor outputs. The measurements are stored in registers and are accessed via the Advanced Peripheral Bus (APB) interface by the processors and the platform management unit (PMU) in the PS.

# **Configuration**

The UltraScale architecture-based devices store their customized configuration in SRAM-type internal latches. The configuration storage is volatile and must be reloaded whenever the device is powered up. This storage can also be reloaded at any time. Several methods and data formats for loading configuration are available, determined by the mode pins, with more dedicated configuration datapath pins to simplify the configuration process.

UltraScale architecture-based devices support secure and non-secure boot with optional Advanced Encryption Standard - Galois/Counter Mode (AES-GCM) decryption and authentication logic. If only authentication is required, the UltraScale architecture provides an alternative form of authentication in the form of RSA algorithms. For RSA authentication support in the Kintex UltraScale and Virtex UltraScale families, go to UG570, UltraScale Architecture Configuration User Guide.

UltraScale architecture-based devices also have the ability to select between multiple configurations, and support robust field-update methodologies. This is especially useful for updates to a design after the end product has been shipped. Designers can release their product with an early version of the design, thus getting their product to market faster. This feature allows designers to keep their customers current with the most up-to-date design while the product is already deployed in the field.

### **Booting MPSoCs**

Zynq UltraScale+ MPSoCs use a multi-stage boot process that supports both a non-secure and a secure boot. The PS is the master of the boot and configuration process. For a secure boot, the AES-GCM, SHA-3/384 decryption/authentication, and 4096-bit RSA blocks decrypt and authenticate the image.

Upon reset, the device mode pins are read to determine the primary boot device to be used: NAND, Quad-SPI, SD, eMMC, or JTAG. JTAG can only be used as a non-secure boot source and is intended for debugging purposes. One of the CPUs, Cortex-A53 or Cortex-R5, executes code out of on-chip ROM and copies the first stage boot loader (FSBL) from the boot device to the on-chip memory (OCM).



The ordering information shown in Figure 4 applies to all packages in the Kintex UltraScale+ and Virtex UltraScale+ FPGAs, and Figure 5 applies to Zyng UltraScale+s.

The -1L and -2L speed grades in the UltraScale+ families can run at one of two different  $V_{CCINT}$  operating voltages. At 0.72V, they operate at similar performance to the Kintex UltraScale and Virtex UltraScale devices with up to 30% reduction in power consumption. At 0.85V, they consume similar power to the Kintex UltraScale and Virtex UltraScale devices, but operate over 30% faster.

For UltraScale+ devices, the information in this document is pre-release, provided ahead of silicon ordering availability. Please contact your Xilinx sales representative for more information on Early Access Programs.



Figure 4: UltraScale+ FPGA Ordering Information



Figure 5: Zynq UltraScale+ Ordering Information



| Date       | Version | Description of Revisions                                                            |
|------------|---------|-------------------------------------------------------------------------------------|
| 02/06/2014 | 1.1     | Updated PCIe information in Table 1 and Table 3. Added FFVJ1924 package to Table 8. |
| 12/10/2013 | 1.0     | Initial Xilinx release.                                                             |