Welcome to **E-XFL.COM** # Understanding <u>Embedded - FPGAs (Field Programmable Gate Array)</u> Embedded - FPGAs, or Field Programmable Gate Arrays, are advanced integrated circuits that offer unparalleled flexibility and performance for digital systems. Unlike traditional fixed-function logic devices, FPGAs can be programmed and reprogrammed to execute a wide array of logical operations, enabling customized functionality tailored to specific applications. This reprogrammability allows developers to iterate designs quickly and implement complex functions without the need for custom hardware. # **Applications of Embedded - FPGAs** The versatility of Embedded - FPGAs makes them indispensable in numerous fields. In telecommunications. | Details | | |--------------------------------|---------------------------------------------------------| | Product Status | Obsolete | | Number of LABs/CLBs | 1057 | | Number of Logic Elements/Cells | 10570 | | Total RAM Bits | 920448 | | Number of I/O | 335 | | Number of Gates | - | | Voltage - Supply | 1.425V ~ 1.575V | | Mounting Type | Surface Mount | | Operating Temperature | 0°C ~ 85°C (TJ) | | Package / Case | 484-BBGA, FCBGA | | Supplier Device Package | 484-FBGA (23x23) | | Purchase URL | https://www.e-xfl.com/product-detail/intel/ep1s10f484c7 | Email: info@E-XFL.COM Address: Room A, 16/F, Full Win Commercial Centre, 573 Nathan Road, Mongkok, Hong Kong Figure 2-5. Stratix LE Each LE's programmable register can be configured for D, T, JK, or SR operation. Each register has data, true asynchronous load data, clock, clock enable, clear, and asynchronous load/preset inputs. Global signals, general-purpose I/O pins, or any internal logic can drive the register's clock and clear control signals. Either general-purpose I/O pins or internal logic can drive the clock enable, preset, asynchronous load, and asynchronous data. The asynchronous load data input comes from the data3 input of the LE. For combinatorial functions, the register is bypassed and the output of the LUT drives directly to the outputs of the LE. Each LE has three outputs that drive the local, row, and column routing resources. The LUT or register output can drive these three outputs independently. Two LE outputs drive column or row and direct link routing connections and one drives local interconnect resources. This allows the LUT to drive one output while the register drives another output. This feature, called register packing, improves device utilization because the device can use the register and the LUT for unrelated functions. Another special packing mode allows the register output to feed back into the LUT of the same LE so that the register is packed with its own fan-out LUT. This provides another mechanism for improved fitting. The LE can also drive out registered and unregistered versions of the LUT output. ## **LUT Chain & Register Chain** In addition to the three general routing outputs, the LEs within an LAB have LUT chain and register chain outputs. LUT chain connections allow LUTs within the same LAB to cascade together for wide input functions. Register chain outputs allow registers within the same LAB to cascade together. The register chain output allows an LAB to use LUTs for a single combinatorial function and the registers to be used for an unrelated shift register implementation. These resources speed up connections between LABs while saving local interconnect resources. See "MultiTrack Interconnect" on page 2–14 for more information on LUT chain and register chain connections. ## addnsub Signal The LE's dynamic adder/subtractor feature saves logic resources by using one set of LEs to implement both an adder and a subtractor. This feature is controlled by the LAB-wide control signal addnsub. The addnsub signal sets the LAB to perform either A+B or A-B. The LUT computes addition, and subtraction is computed by adding the two's complement of the intended subtractor. The LAB-wide signal converts to two's complement by inverting the B bits within the LAB and setting carry-in = 1 to add one to the least significant bit (LSB). The LSB of an adder/subtractor must be placed in the first LE of the LAB, where the LAB-wide addnsub signal automatically sets the carry-in to 1. The Quartus II Compiler automatically places and uses the adder/subtractor feature when using adder/subtractor parameterized functions. # **LE Operating Modes** The Stratix LE can operate in one of the following modes: - Normal mode - Dynamic arithmetic mode Each mode uses LE resources differently. In each mode, eight available inputs to the LE—the four data inputs from the LAB local interconnect; carry-in0 and carry-in1 from the previous LE; the LAB carry-in from the previous carry-chain LAB; and the register chain connection—are directed to different destinations to implement the desired logic function. LAB-wide signals provide clock, asynchronous clear, Figure 2-10. LUT Chain & Register Chain Interconnects The C4 interconnects span four LABs, M512, or M4K blocks up or down from a source LAB. Every LAB has its own set of C4 interconnects to drive either up or down. Figure 2–11 shows the C4 interconnect connections from an LAB in a column. The C4 interconnects can drive and be driven by all types of architecture blocks, including DSP blocks, TriMatrix memory blocks, and vertical IOEs. For LAB interconnection, a primary LAB or its LAB neighbor can drive a given C4 interconnect. C4 interconnects can drive each other to extend their range as well as drive row interconnects for column-to-column connections. Figure 2–11. C4 Interconnect Connections Note (1) *Note to Figure 2–11:* (1) Each C4 interconnect can drive either up or down four rows. Figure 2-16. M512 RAM Block LAB Row Interface ### M4K RAM Blocks The M4K RAM block includes support for true dual-port RAM. The M4K RAM block is used to implement buffers for a wide variety of applications such as storing processor code, implementing lookup schemes, and implementing larger memory applications. Each block contains 4,608 RAM bits (including parity bits). M4K RAM blocks can be configured in the following modes: - True dual-port RAM - Simple dual-port RAM - Single-port RAM - FIFO - ROM - Shift register When configured as RAM or ROM, you can use an initialization file to pre-load the memory contents. Figure 2–20. EP1S60 Device with M-RAM Interface Locations Note (1) *Note to Figure 2–20:* (1) Device shown is an EP1S60 device. The number and position of M-RAM blocks varies in other devices. The M-RAM block local interconnect is driven by the R4, R8, C4, C8, and direct link interconnects from adjacent LABs. For independent M-RAM blocks, up to 10 direct link address and control signal input connections to the M-RAM block are possible from the left adjacent LABs for M-RAM Table 2–13 shows the number of DSP blocks in each Stratix device. | Table 2–13. DSP Blocks in Stratix Devices Notes (1), (2) | | | | | | | | |----------------------------------------------------------|------------|---------------------------|-----------------------------|-----------------------------|--|--|--| | Device | DSP Blocks | Total 9× 9<br>Multipliers | Total 18× 18<br>Multipliers | Total 36× 36<br>Multipliers | | | | | EP1S10 | 6 | 48 | 24 | 6 | | | | | EP1S20 | 10 | 80 | 40 | 10 | | | | | EP1S25 | 10 | 80 | 40 | 10 | | | | | EP1S30 | 12 | 96 | 48 | 12 | | | | | EP1S40 | 14 | 112 | 56 | 14 | | | | | EP1S60 | 18 | 144 | 72 | 18 | | | | | EP1S80 | 22 | 176 | 88 | 22 | | | | #### *Notes to Table 2–13:* - (1) Each device has either the number of $9 \times 9$ -, $18 \times 18$ -, or $36 \times 36$ -bit multipliers shown. The total number of multipliers for each device is not the sum of all the multipliers. - (2) The number of supported multiply functions shown is based on signed/signed or unsigned/unsigned implementations. DSP block multipliers can optionally feed an adder/subtractor or accumulator within the block depending on the configuration. This makes routing to LEs easier, saves LE routing resources, and increases performance, because all connections and blocks are within the DSP block. Additionally, the DSP block input registers can efficiently implement shift registers for FIR filter applications. Figure 2–30 shows the top-level diagram of the DSP block configured for $18 \times 18$ -bit multiplier mode. Figure 2–31 shows the $9 \times 9$ -bit multiplier configuration of the DSP block. Accumulator Feedback accum\_sload0 (2) Result A ■ overflow0 Adder/ Subtractor/ addnsub1 (2) Accumulator1 Output Selection Multiplexer Result B signa (2) Summation Output signb (2) Register Block Result C Adder/ addnsub3 (2) Subtractor/ Accumulator2 overflow1 Result D accum\_sload1 (2) Accumulator Feedback Figure 2–34. Adder/Output Blocks Note (1) ## Notes to Figure 2–34: - (1) Adder/output block shown in Figure 2–34 is in 18 × 18-bit mode. In 9 × 9-bit mode, there are four adder/subtractor blocks and two summation blocks. - (2) These signals are either not registered, registered once, or registered twice to match the data path pipeline. ## Output Selection Multiplexer The outputs from the various elements of the adder/output block are routed through an output selection multiplexer. Based on the DSP block operational mode and user settings, the multiplexer selects whether the output from the multiplier, the adder/subtractor/accumulator, or summation block feeds to the output. #### Output Registers Optional output registers for the DSP block outputs are controlled by four sets of control signals: clock [3..0], aclr [3..0], and ena [3..0]. Output registers can be used in any mode. ## **Modes of Operation** The adder, subtractor, and accumulate functions of a DSP block have four modes of operation: - Simple multiplier - Multiply-accumulator - Two-multipliers adder - Four-multipliers adder Each DSP block can only support one mode. Mixed modes in the same DSP block is not supported. ## Simple Multiplier Mode In simple multiplier mode, the DSP block drives the multiplier sub-block result directly to the output with or without an output register. Up to four $18 \times 18$ -bit multipliers or eight $9 \times 9$ -bit multipliers can drive their results directly out of one DSP block. See Figure 2–35. provide general purpose clocking with multiplication and phase shifting as well as high-speed outputs for high-speed differential I/O support. Enhanced and fast PLLs work together with the Stratix high-speed I/O and advanced clock architecture to provide significant improvements in system performance and bandwidth. The Quartus II software enables the PLLs and their features without requiring any external devices. Table 2–18 shows the PLLs available for each Stratix device. | Table 2–18. Stratix Device PLL Availability | | | | | | | | | | | | | |---------------------------------------------|----------|-----------|----------|----------|--------------|--------------|--------------|---------------|----------|----------|--------------|--------------| | Desta | | Fast PLLs | | | | | | Enhanced PLLs | | | Ls | | | Device | 1 | 2 | 3 | 4 | 7 | 8 | 9 | 10 | 5(1) | 6(1) | 11(2) | 12(2) | | EP1S10 | <b>✓</b> | <b>✓</b> | <b>✓</b> | <b>✓</b> | | | | | <b>✓</b> | <b>✓</b> | | | | EP1S20 | <b>✓</b> | <b>✓</b> | <b>✓</b> | <b>✓</b> | | | | | <b>✓</b> | <b>✓</b> | | | | EP1S25 | <b>✓</b> | <b>✓</b> | <b>✓</b> | <b>✓</b> | | | | | <b>✓</b> | <b>✓</b> | | | | EP1S30 | <b>✓</b> | <b>✓</b> | <b>✓</b> | <b>\</b> | <b>√</b> (3) | <b>√</b> (3) | <b>√</b> (3) | <b>√</b> (3) | <b>✓</b> | <b>✓</b> | | | | EP1S40 | <b>✓</b> | <b>✓</b> | <b>✓</b> | <b>✓</b> | <b>√</b> (3) | <b>√</b> (3) | <b>√</b> (3) | <b>√</b> (3) | <b>✓</b> | <b>✓</b> | <b>√</b> (3) | <b>√</b> (3) | | EP1S60 | <b>✓</b> | <b>✓</b> | <b>✓</b> | <b>\</b> | <b>✓</b> | <b>✓</b> | <b>✓</b> | <b>✓</b> | <b>✓</b> | <b>\</b> | <b>✓</b> | <b>✓</b> | | EP1S80 | <b>\</b> | <b>✓</b> | <b>✓</b> | <b>\</b> | <b>✓</b> | <b>✓</b> | <b>✓</b> | <b>✓</b> | <b>✓</b> | <b>\</b> | <b>✓</b> | <b>✓</b> | #### Notes to Table 2–18: - (1) PLLs 5 and 6 each have eight single-ended outputs or four differential outputs. - (2) PLLs 11 and 12 each have one single-ended output. - (3) EP1S30 and EP1S40 devices do not support these PLLs in the 780-pin FineLine BGA® package. #### Clock Multiplication & Division Each Stratix device enhanced PLL provides clock synthesis for PLL output ports using $m/(n \times post\text{-scale counter})$ scaling factors. The input clock is divided by a pre-scale divider, *n*, and is then multiplied by the *m* feedback factor. The control loop drives the VCO to match $f_{IN} \times (m/n)$ . Each output port has a unique post-scale counter that divides down the high-frequency VCO. For multiple PLL outputs with different frequencies, the VCO is set to the least common multiple of the output frequencies that meets its frequency specifications. Then, the post-scale dividers scale down the output frequency for each output port. For example, if output frequencies required from one PLL are 33 and 66 MHz, set the VCO to 330 MHz (the least common multiple in the VCO's range). There is one pre-scale counter, *n*, and one multiply counter, *m*, per PLL, with a range of 1 to 512 on each. There are two post-scale counters (*l*) for regional clock output ports, four counters (g) for global clock output ports, and up to four counters (e) for external clock outputs, all ranging from 1 to 1024 with a 50% duty cycle setting. The post-scale counters range from 1 to 512 with any non-50% duty cycle setting. The Quartus II software automatically chooses the appropriate scaling factors according to the input frequency, multiplication, and division values entered. #### Clock Switchover To effectively develop high-reliability network systems, clocking schemes must support multiple clocks to provide redundancy. For this reason, Stratix device enhanced PLLs support a flexible clock switchover capability. Figure 2–53 shows a block diagram of the switchover circuit. The switchover circuit is configurable, so you can define how to implement it. Clock-sense circuitry automatically switches from the primary to secondary clock for PLL reference when the primary clock signal is not present. - RapidIO - HyperTransport ## **Dedicated Circuitry** Stratix devices support source-synchronous interfacing with LVDS, LVPECL, 3.3-V PCML, or HyperTransport signaling at up to 840 Mbps. Stratix devices can transmit or receive serial channels along with a low-speed or high-speed clock. The receiving device PLL multiplies the clock by a integer factor W (W = 1 through 32). For example, a HyperTransport application where the data rate is 800 Mbps and the clock rate is 400 MHz would require that W be set to 2. The SERDES factor J determines the parallel data width to deserialize from receivers or to serialize for transmitters. The SERDES factor J can be set to 4, 7, 8, or 10 and does not have to equal the PLL clock-multiplication W value. For a J factor of 1, the Stratix device bypasses the SERDES block. For a J factor of 2, the Stratix device bypasses the SERDES block, and the DDR input and output registers are used in the IOE. See Figure 2–73. R4, R8, and R24 Interconnect 840 Mbps 840 Mbps 8 Data Dedicated Dedicated Local Receiver Transmitter Interconnect Interface Interface rx load en 8× 8× 105 MHz Fast tx\_load\_en PLL Regional or global clock Figure 2-73. High-Speed Differential I/O Receiver / Transmitter Interface Example An external pin or global or regional clock can drive the fast PLLs, which can output up to three clocks: two multiplied high-speed differential I/O clocks to drive the SERDES block and/or external pin, and a low-speed clock to drive the logic array. | Table 4–9. Overshoot Input Voltage with Respect to Duty Cycle (Part 2 of 2) | | | | | |-----------------------------------------------------------------------------|----|--|--|--| | Vin (V) Maximum Duty Cycle (% | | | | | | 4.3 | 30 | | | | | 4.4 | 17 | | | | | 4.5 | 10 | | | | Figures 4–1 and 4–2 show receiver input and transmitter output waveforms, respectively, for all differential I/O standards (LVDS, 3.3-V PCML, LVPECL, and HyperTransport technology). Figure 4-1. Receiver Input Waveforms for Differential I/O Standards # Single-Ended Waveform #### **Differential Waveform** | Symbol | Parameter | Mi | n T | ур Мах | Ųг | |----------------------|---------------------------------------------------------------------------------------------------------------------|-----|-----|--------|-----| | t <sub>SKEW</sub> | Clock skew between two external clock outputs driven by the different counters with the same settings | | ±75 | | ps | | f <sub>SS</sub> | Spread spectrum modulation frequency | 30 | | 150 | kHz | | % spread | Percentage spread for spread spectrum frequency (10) | 0.4 | 0.5 | 0.6 | % | | t <sub>ARESET</sub> | Minimum pulse width on areset signal | 10 | | | ns | | tareset_recon<br>fig | Minimum pulse width on the areset signal when using PLL reconfiguration. Reset the PLL after scandataout goes high. | 500 | | | ns | | Symbol | Parameter | Min | Ту | p Max | U | |--------------------------|-----------------------------------------------------------------|---------------|----|------------------------------------------------------------|--------------| | f <sub>IN</sub> | Input clock frequency | 3<br>(1), (2) | | 650 | MHz | | f <sub>INPFD</sub> | Input frequency to PFD | 3 | | 420 | MHz | | f <sub>INDUTY</sub> | Input clock duty cycle | 40 | | 60 | % | | f <sub>EINDUTY</sub> | External feedback clock input duty cycle | 40 | | 60 | % | | t <sub>INJITTER</sub> | Input clock period jitter | | | ±200 (3) | ps | | t <sub>EINJITTER</sub> | External feedback clock period jitter | | | ±200 (3) | ps | | t <sub>FCOMP</sub> | External feedback clock compensation time (4) | | | 6 | ns | | f <sub>OUT</sub> | Output frequency for internal global or regional clock | 0.3 | | 450 | MHz | | f <sub>OUT_EXT</sub> | Output frequency for external clock (3) | 0.3 | | 500 | MHz | | t <sub>OUTDUTY</sub> | Duty cycle for external clock output (when set to 50%) | 45 | | 55 | % | | t <sub>JITTER</sub> | Period jitter for external clock output (6) | | | ±100 ps for >200-MHz outclk<br>±20 mUI for <200-MHz outclk | ps or<br>mUI | | t <sub>CONFIG5,6</sub> | Time required to reconfigure the scan chains for PLLs 5 and 6 | | | 289/f <sub>SCANCLK</sub> | | | t <sub>CONFIG11,12</sub> | Time required to reconfigure the scan chains for PLLs 11 and 12 | | | 193/f <sub>SCANCLK</sub> | |