SlideShare a Scribd company logo
1 of 20
Download to read offline
NXFEE INNOVATION
SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY
NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development)
#45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4
Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735.
NXFEE - VLSI IEEE TRANSACTION - 2018
PROJECT TITLE TITLE FOR VLSI
LOW POWER
VLSI_IEEE_01
(BACK-END)
SOFTWARE :
TANNER EDA
STUDENT COST
MRP:
RS. 12000/-
TOPIC : A 128-Tap Highly Tunable CMOS IF Finite Impulse Response Filter for Pulsed
Radar Applications
Abstract : A configurable-bandwidth (BW) filter is presented in this paper for pulsed
radar applications. To eliminate dispersion effects in the received waveform, a finite
impulse response (FIR) topology is proposed, which has a measured standard deviation
of an in-band group delay of 11 ns that is primarily dominated by the inherent, fully
predictable delay introduced by the sample-and-hold. The filter operates at an IF of 20
MHz, and is tunable in BW from 1.5 to 15 MHz, which makes it optimal to be used with
varying pulse widths in the radar. Employing a total of 128 taps, the FIR filter provides
greater than 50-dB sharp attenuation in the stop band in order to minimize all out-of-
band noise in the low signal-to-noise received radar signal. Fabricated in a 0.18-µm
silicon on insulator CMOS process, the proposed filter consumes approximately 3.5
mW/tap with a 1.8-V supply. A 20-MHz two-tone measurement with 200-kHz tone
separation shows IIP3 greater than 8.5 dBm.
VLSI_IEEE_02
(BACK-END)
SOFTWARE :
TANNER EDA
STUDENT COST
MRP:
RS. 8000/-
TOPIC : A Closed-Form Expression for Minimum Operating Voltage of CMOS D Flip-Flop
Abstract : In this paper, a closed-form expression for estimating the minimum operating
voltage (VDDmin) of D flip-flops (FFs) is proposed. VDDmin is defined as the minimum
supply voltage at which the FFs are functional without errors. The proposed expression
indicates that VDDmin of FFs is a linear function of the square root of logarithm of the
number of FFs, and its slope depends on the within-die variation of the threshold
voltage (VTH) and its intercept depends on the balance between nMOS and pMOS,
which is mainly due to the die-to-die VTH variation. The proposed expression of VDDmin
is validated by the simulation results as well as the silicon measurements. Finally, we
discuss the dependence of VDDmin on the device parameters.
VLSI_IEEE_05
(BACK-END)
SOFTWARE :
TANNER EDA
STUDENT COST
MRP:
RS. 8000/-
TOPIC : Design of Temperature-Aware Low-Voltage 8T SRAM in SOI Technology for High-
Temperature Operation (25 °C–300 °C)
Abstract : A temperature-aware low-voltage 8T static random access memory (SRAM)
for high-temperature operations is presented. A dedicated read port with virtual ground
and optimal body bias improves sensing margin under very high temperature (up to 300
°C). Bit line offset voltage for data “0” caused by the virtual ground scheme is also
compensated by a replica bit line. The independent body bias control feature of the
employed silicon-on-insulator (SOI) technology allows the write margin to be enhanced
significantly without using any write-assist circuitry. Test chips were fabricated in a 1-µm
SOI technology with tungsten interconnect for reliability at high temperature and lesser
NXFEE INNOVATION
SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY
NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development)
#45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4
Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735.
process variation. Measurement results demonstrate that the proposed SRAM operates
successfully up to 300 °C with the supply voltage range of 2–5 V. At the minimum
performance variation point (VDD = 2.5 V), the SRAM consumes 1.48 mW and shows the
access time of 156 ns and the maximum clock frequency of 14.38 MHz at 300 °C.
VLSI_IEEE_09
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 12000/-
TOPIC : Design of an Area-Effcient Million-Bit Integer Multiplier Using Double Modulus
NTT
Abstract : This brief proposes a double modulus number theoretical transform (NTT)
method for million-bit integer multiplication in fully homomorphic encryption. In our
method, each NTT point is processed simultaneously under two moduli, and the final
result is generated through the Chinese reminder theorem. The employment of double
modulus enlarges the permitted NTT sample size from 24 to 32 bits and thus improves
the transform efficiency. Based on the proposed double modulus method, we
accomplish a VLSI design of million-bit integer multiplier with the Schönhage–Strassen
algorithm. Implementation results on Altera Stratix-V FPGA show that this brief is able
to compute a product of two 1024k-bit integers every 4.9 ms at the cost of only 7.9k
ALUTs and 3.6k registers, which is more area-efficient when compared with the current
competitors.
VLSI_IEEE_10
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 12000/-
TOPIC : A Fast and Low-Complexity Operator for the Computation of the Arctangent of a
Complex Number
Abstract : The computation of the arctangent of a complex number, i.e., the atan2
function, is frequently needed in hardware systems that could profit from an optimized
operator. In this brief, we present a novel method to compute the atan2 function and a
hardware architecture for its implementation. The method is based on a first stage that
performs a coarse approximation of the atan2 function and a second stage that
improves the output accuracy by means of a lookup table. We present results for fixed-
point implementations in a field-programmable gate array device, all of them
guaranteeing last-bit accuracy, which provide an advantage in latency, speed, and use of
resources, when compared with well-established fixed-point options.
VLSI_IEEE_11
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 12000/-
TOPIC : A Reconfigurable LDPC Decoder Optimized for 802.11n/ac Applications
Abstract : This paper presents a high data-rate low-density parity-check (LDPC) decoder,
suitable for the 802.11n/ac (WiFi) standard. The innovative features of the proposed
decoder relate to the decoding algorithms and the interconnection between the
processing elements. The reduction of the hardware complexity of decoders based on
the min-sum (MS) algorithms comes at the cost of performance degradation, especially
at high-noise regions. We introduce more accurate approximations of the logsum-
product algorithm that also operate well for low signal-tonoise ratio values.
Telecommunication standards, including WiFi, support more than one quasi-cyclic LDPC
codes of different characteristics, such as codeword length and code rate. A proposed
design technique derives networks, capable of supporting a variety of codes and
NXFEE INNOVATION
SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY
NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development)
#45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4
Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735.
efficiently realizing connectivity between a variable number of processing units, with a
relatively small hardware overhead over the single-code case. As a demonstration of the
proposed technique, we implemented a reconfigurable network based on barrel
rotators, suitable for LDPC decoders compatible with WiFi standard. Our approach
achieves low complexity and high clock frequency, compared with related prior works. A
90-nm application-specified integrated circuit implementation of the proposed high-
parallel WiFi decoder occupies 4.88 mm2 and achieves an information throughput rate
of 4.5 Gbit/s at a clock frequency of 555 MHz.
VLSI_IEEE_13
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 10000/-
TOPIC : Approximate Sum-of-Products Designs Based on Distributed Arithmetic
Abstract : Approximate circuits provide high performance and require low power. Sum-
of-products (SOP) units are key elements in many digital signal processing applications.
In this brief, three approximate SOP (ASOP) models which are based on the distributed
arithmetic are proposed. They are designed for different levels of accuracy. First model
of ASOP achieves an improvement up to 64% on area and 70% on power, when
compared with conventional unit. Other two models provide an improvement of 32%
and 48% on area and 54% and 58% on power, respectively, with a reduced error rate
compared with the first model. Third model achieves the mean relative error and
normalized error distance as low as 0.05% and 0.009%, respectively. Performance of
approximate units is evaluated with a noisy image smoothing application, where the
proposed models are capable of achieving higher peak signalto-noise ratio than the
existing state-of-the-art techniques. It is shown that the proposed approximate models
achieve higher processing accuracy than existing works but with significant
improvements in power and performance.
VLSI_IEEE_25
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 10000/-
TOPIC : Vector Processing-Aware Advanced Clock-Gating Techniques for Low-Power
Fused Multiply-Add
Abstract : The need for power efficiency is driving a rethink of design decisions in
processor architectures. While vector processors succeeded in the high-performance
market in the past, they need a retailoring for the mobile market that they are entering
now. Floating-point (FP) fused multiply-add (FMA), being a functional unit with high
power consumption, deserves special attention. Although clock gating is a well-known
method to reduce switching power in synchronous designs, there are unexplored
opportunities for its application to vector processors, especially when considering active
operating mode. In this research, we comprehensively identify, propose, and evaluate
the most suitable clock-gating techniques for vector FMA units (VFUs). These techniques
ensure power savings without jeopardizing the timing. We evaluate the proposed
techniques using both synthetic and “real-world” application-based benchmarking.
Using vector masking and vector multilane-aware clock gating, we report power
reductions of up to 52%, assuming active VFU operating at the peak performance.
Among other findings, we observe that vector instruction-based clock-gating techniques
achieve power savings for all vector FP instructions. Finally, when evaluating all
techniques together, using “real-world” benchmarking, the power reductions are up to
NXFEE INNOVATION
SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY
NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development)
#45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4
Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735.
80%. Additionally, in accordance with processor design trends, we perform this research
in a fully parameterizable and automated fashion.
VLSI_IEEE_29
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 12000/-
TOPIC : A Flexible Wildcard-Pattern Matching Accelerator via Simultaneous Discrete
Finite Automata
Abstract : Regular expression matching becomes indispensable elements of Internet of
Things network security. However, traditional ternary content addressable memory
(TCAM) search engine is unable to handle patterns with wildcards, as it precisely tracks
only one active state with single transition. This paper proposes a promising
simultaneous pattern matching methodology for wildcard patterns by two separated
engines to represent discrete finite automata. A key preprocessing to encode possible
postfix pattern by a unique key ensures that follow-up patterns can accurately traverse
all possible matches with limited hardware resources. This approach is practical and
scalable for achieving good performance and low space consumption in network
security, and it can be applicable to any regular expressions even with multi-wildcard
patterns. The experimental results demonstrate that this scheme can efficiently and
accurately recognize wildcard patterns by simultaneously tracking only two active
states. By adopting SRAM TCAM in the proposed architecture, the energy consumption
is reduced to around 39%, compared with the energy consumption using a computing
system that contains a large memory lookup and comparison overhead.
VLSI_IEEE_30
(BACK-END)
SOFTWARE :
TANNER EDA
STUDENT COST
MRP:
RS. 8000/-
TOPIC : Low-Power and Fast Full Adder by Exploring New XOR and XNOR Gates
Abstract : In this paper, novel circuits for XOR/XNOR and simultaneous XOR–XNOR
functions are proposed. The proposed circuits are highly optimized in terms of the
power consumption and delay, which are due to low output capacitance and low short-
circuit power dissipation. We also propose six new hybrid 1-bit full-adder (FA) circuits
based on the novel full-swing XOR–XNOR or XOR/XNOR gates. Each of the proposed
circuits has its own merits in terms of speed, power consumption, powerdelay product
(PDP), driving ability, and so on. To investigate the performance of the proposed
designs, extensive HSPICE and Cadence Virtuoso simulations are performed. The
simulation results, based on the 65-nm CMOS process technology model, indicate that
the proposed designs have superior speed and power against other FA designs. A new
transistor sizing method is presented to optimize the PDP of the circuits. In the
proposed method, the numerical computation particle swarm optimization algorithm is
used to achieve the desired value for optimum PDP with fewer iterations. The proposed
circuits are investigated in terms of variations of the supply and threshold voltages,
output capacitance, input noise immunity, and the size of transistors.
NXFEE INNOVATION
SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY
NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development)
#45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4
Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735.
VLSI_IEEE_31
(BACK-END)
SOFTWARE :
TANNER EDA
STUDENT COST
MRP:
RS. 8000/-
TOPIC : A 0.9-V 12-bit 100-MS/s 14.6-fJ/Conversion-Step SAR ADC in 40-nm CMOS
Abstract : This paper presents a low-power 12-bit 100-MS/s asynchronous successive
approximation register analog-to-digital converter (SAR ADC). Several techniques are
developed to enhance the ADC performance. The non binary capacitor array with small
digital-to-analog converter (DAC) capacitors (total 394 fF) allows for reducing DAC
settling time and power consumption while maintaining extremely high hardware
utilization. The proposed nonlinear capacitance correction method solves the nonlinear
capacitance problems of the comparator when the small unit capacitor is used. The
latch output glitch removal method ensures the speed and accuracy of the comparator
at the low supply voltage. Furthermore, the proposed high-speed SAR logic and timing
sequence improved SAR logic’s operating speed by 75% compared with traditional SAR
logic. The prototype was fabricated using a 40-nm CMOS technology. At a 0.9-V supply
and 100-MS/s sampling rate, the ADC achieves a signal-to-noise distortion ratio of 67.3
dB and consumes 2.6 mW, resulting in a figure of merit of 14.6 fJ/conversion-step. The
ADC core occupies an active area of only 50 × 280 µm2.
VLSI_IEEE_36
(BACK-END)
SOFTWARE :
TANNER EDA
STUDENT COST
MRP:
RS. 8000/-
TOPIC : SRAM Circuits for True Random Number Generation Using Intrinsic Bit Instability
Abstract : This paper describes a novel approach to a true random number generator
(TRNG) using SRAM circuits. The principles of operation are described in the context of
past work on integrated circuit TRNGs. The required modifications to standard SRAM
arrays are minor and have little impact on the area. Experimental results from large 1-
Mbit SRAM arrays fabricated on a 55-nm process using the foundry supplied SRAM cell
layouts show good results. Simple helper functions, suitable for very small hardware
implementation, allow improvement, including the ability for the resulting binary strings
to pass all of the National Institute of Standards randomness tests. We describe the
circuits, their principle of operation and statistical behavior, as well as the underlying
physical mechanisms providing the entropy.
VLSI_IEEE_44
(BACK-END)
SOFTWARE :
TANNER EDA
STUDENT COST
MRP:
RS. 10000/-
TOPIC : Improving Error Correction Codes for Multiple-Cell Upsets in Space Applications
Abstract : Currently, faults suffered by SRAM memory systems have increased due to
the aggressive CMOS integration density. Thus, the probability of occurrence of single-
cell upsets (SCUs) or multiple-cell upsets (MCUs) augments. One of the main causes of
MCUs in space applications is cosmic radiation. A common solution is the use of error
correction codes (ECCs). Nevertheless, when using ECCs in space applications, they must
achieve a good balance between error coverage and redundancy, and their
encoding/decoding circuits must be efficient in terms of area, power, and delay.
Different codes have been proposed to tolerate MCUs. For instance, Matrix codes use
Hamming codes and parity checks in a bi-dimensional layout to correct and detect some
patterns of MCUs. Recently presented, column–line–code (CLC) has been designed to
tolerate MCUs in space applications. CLC is a modified Matrix code, based on extended
Hamming codes and parity checks. Nevertheless, a common property of these codes is
the high redundancy introduced. In this paper, we present a series of new low
NXFEE INNOVATION
SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY
NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development)
#45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4
Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735.
redundant ECCs able to correct MCUs with reduced area, power, and delay overheads.
Also, these new codes maintain, or even improve, memory error coverage with respect
to Matrix and CLC codes.
HIGH SPEED DATA TRANSMISSION
VLSI_IEEE_04
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 10000/-
TOPIC : Approximate Error Detection With Stochastic Checkers
Abstract : Designing reliable systems, while eschewing the high overheads of
conventional fault tolerance techniques, is a critical challenge in the deeply scaled
CMOS and post CMOS era. To address this challenge, we leverage the intrinsic resilience
of application domains such as multimedia, recognition, mining, search, and analytics
where acceptable outputs are produced despite occasional approximate computations.
We propose stochastic checkers (checkers designed using stochastic logic) as a new
approach to performing error checking in an approximate manner at greatly reduced
overheads. Stochastic checkers are inherently inaccurate and require long latencies for
computation. To limit the loss in error coverage, as well as false positives (correct
outputs flagged as erroneous), caused due to the approximate nature of stochastic
checkers, we propose input permuted partial replicas of stochastic logic, which improves
their accuracy with minimal increase in overheads. To address the challenge of long
error detection latency, we propose progressive checking policies that provide an early
decision based on a prefix of the checker’s output bit stream. This technique is further
enhanced by employing progressively accurate binary-to-stochastic converters. Across a
suite of error-resilient applications, we observe that stochastic checkers lead to greatly
reduced overheads (29.5% area and 21.5% power, on average) compared with
traditional fault tolerance techniques while maintaining high coverage and very low
false positives.
VLSI_IEEE_06
(BACK-END)
SOFTWARE :
TANNER EDA
STUDENT COST
MRP:
RS. 8000/-
TOPIC : A 0.65-V, 500-MHz Integrated Dynamic and Static RAM for Error Tolerant
Applications
Abstract : The diminishing returns provided by voltage scaling have led to a recent
paradigm shift toward so-called “approximate computing,” where computation accuracy
is traded off for cost in error-tolerant applications. In this paper, a novel approach to
achieving the power–performance–area versus data integrity tradeoff is proposed by
integrating robust static memory cells and error-prone dynamic cells within a single
array. In addition, the resulting integrated dynamic and static random access memory
(iD-SRAM) provides the ability to trade off power consumption and accuracy on-the-fly
according to the current conditions and operating mode. A 4-kB iD-SRAM array was
implemented in a low-power, 65-nm CMOS technology, providing as much as an 80%
power reduction and a 20% area reduction as compared with standard approaches,
when applied to a video decoder at 500 MHz.
NXFEE INNOVATION
SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY
NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development)
#45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4
Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735.
VLSI_IEEE_07
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 12000/-
TOPIC : Efficient FPGA Mapping of Pipeline SDF FFT Cores
Abstract : In this paper, an efficient mapping of the pipeline single-path delay feedback
(SDF) fast Fourier transform (FFT) architecture to field-programmable gate arrays
(FPGAs) is proposed. By considering the architectural features of the target FPGA,
significantly better implementation results are obtained. This is illustrated by mapping
an R22SDF 1024-point FFT core toward both Xilinx Virtex-4 and Virtex-6 devices. The
optimized FPGA mapping is explored in detail. Algorithmic transformations that allow a
better mapping are proposed, resulting in implementation achievements that by far
outperforms earlier published work. For Virtex-4, the results show a 350% increase in
throughput per slice and 25% reduction in block RAM (BRAM) use, with the same
amount of DSP48 resources, compared with the best earlier published result. The
resulting Virtex-6 design sees even larger increases in throughput per slice compared
with Xilinx FFT IP core, using half as many DSP48E1 blocks and less BRAM resources. The
results clearly show that the FPGA mapping is crucial, not only the architecture and
algorithm choices.
VLSI_IEEE_08
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 10000/-
TOPIC : Algorithm and Architecture Design of Adaptive Filters With Error Nonlinearities
Abstract : This paper presents a framework based on the logarithmic number system to
implement adaptive filters with error nonlinearities in hardware. The framework is
demonstrated through pipelined implementations of two recently proposed adaptive
filtering algorithms based on logarithmic cost, namely, least mean logarithmic square
(LMLS) and least logarithmic absolute difference (LLAD). To the best of our knowledge,
the proposed architectures are the first attempts to implement both LMLS and LLAD
algorithms in hardware. We derive error computing algorithms to realize the nonlinear
error functions for LMLS and LLAD and map them onto hardware. We also propose a
novel variable-α scheme to enhance the original LMLS algorithm and prove its
robustness and suitability for VLSI implementations in practical applications. Detailed bit
width and error analysis are carried out for the proposed VLSI fixed point
implementations. Post layout implementation results show that with an additional
multiplier over conventional least mean square (LMS), 7-dB improvement in steady-
state mean square deviation performance can be achieved and with the proposed
variable-α scheme, 12-dB improvement can be achieved without compromising the
convergence. We will show that LMLS can potentially replace LMS in practical
applications, by demonstrating a proof-of-concept by extending the framework to
transform domain adaptive filters.
NXFEE INNOVATION
SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY
NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development)
#45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4
Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735.
VLSI_IEEE_17
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 16000/-
TOPIC : Design and FPGA Implementation of a Reconfigurable Digital Down Converter for
Wideband Applications
Abstract : This brief presents a field-programmable gate array-based 2 implementation
of a reconfigurable digital down converter (DDC) that 3 can process input bandwidth of
up to 3.6 GHz and provide a flexible 4 down-converted output. The proposed DDC
consists of a mixer and 5 a re-sampling filter. The re-sampling filter can work at much
higher 6 clock rate. The reason is that all the single-cycle recursive loops in the 7 re-
sampling filter are pipelined by using either real/imaginary part-time 8 multiplexing or
parallel processing technique. With features like arbitrary 9 sampling rate conversion,
and dynamic configuration, the proposed design 10 is highly flexible, so that it can
generate a down-converted output with 11 sampling rate, selectable within the range of
1 kS/s–225 MS/s. Moreover, 12 the flexibility is further improved by being able to
specify the output 13 sampling rate and center frequency to a resolution of less than 1
S/s. The 14 experimental results show that the proposed design can achieve the same
15 functionality as the existing work but with fewer hardware resources.
VLSI_IEEE_18
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 16000/-
TOPIC : The Implementation of the Improved OMP for AIC Reconstruction Based on
Parallel Index Selection
Abstract : Sparse signal recovery becomes extremely challenging for a variety of real-
time applications. In this paper, we improve the orthogonal matching pursuit (OMP)
algorithm based on parallel correlation indices selection mechanism in each iteration
and Goldschmidt algorithm. Simulation results show that the improved OMP algorithm
with a reduced number of iterations and low hardware complexity of matrix operations
has higher success rate and recovery signal-to-noise-ratio (RSNR) for sparse signal
recovery. This paper presents an efficient complex valued system hardware architecture
of the recovery algorithm for analog-to-information structure based on compressive
sensing. The proposed architecture is implemented and validated on the Xilinx Virtex6
field-programmable gate array (FPGA) for signal reconstruction with N = 1024, K = 36,
and M = 256. The implementation results showed that the improved OMP algorithm
achieved a higher RSNR of 31.04 dB compared with the original OMP algorithm. This
synthesized design consumes a few percentages of the hardware resources of the FPGA
chip with the clock frequency of 135.4 MHZ and reconstruction time of 170 µs, which is
faster than the existing design.
NXFEE INNOVATION
SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY
NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development)
#45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4
Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735.
VLSI_IEEE_20
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 12000/-
TOPIC : Approximate Hybrid High Radix Encoding for Energy-Efficient Inexact Multipliers
Abstract : Approximate computing forms a design alternative that exploits the intrinsic
error resilience of various applications and produces energy-efficient circuits with small
accuracy loss. In this paper, we propose an approximate hybrid high radix encoding for
generating the partial products in signed multiplications that encodes the most
significant bits with the accurate radix-4 encoding and the least significant bits with an
approximate higher radix encoding. The approximations are performed by rounding the
high radix values to their nearest power of two. The proposed technique can be
configured to achieve the desired energy–accuracy tradeoffs. Compared with the
accurate radix-4 multiplier, the proposed multipliers deliver up to 56% energy and 55%
area savings, when operating at the same frequency, while the imposed error is
bounded by a Gaussian distribution with near-zero average. Moreover, the proposed
multipliers are compared with state-of-the-art inexact multipliers, outperforming them
by up to 40% in energy consumption, for similar error values. Finally, we demonstrate
the scalability of our technique.
VLSI_IEEE_23
(BACK-END)
SOFTWARE :
TANNER EDA
STUDENT COST
MRP:
RS. 8000/-
TOPIC : Low Phase Noise Ku-Band VCO With Optimal Switched-Capacitor Bank Design
Abstract : In this brief, a low phase noise Ku-band voltage-controlled oscillator (VCO)
fabricated in a 130-nm BiCMOS process is presented. The phase noise mechanism of the
switched-capacitor bank is analyzed, an optimum bank design to reduce phase noise is
proposed, and a tradeoff with tuning range is discussed. The prototype 12.2–13.1-GHz
VCO achieves a measured phase noise of −120.6 dBc/Hz at 1-MHz offset when running
at 12.67 GHz. The VCO core consumes a power of 17.7 mW and attains a figure of merit
of 190.
VLSI_IEEE_24
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 10000/-
TOPIC : A High-Accuracy Programmable Pulse Generator With a 10-ps Timing Resolution
Abstract : Automatic test equipment must have high-precision and low-power pulse
generators (PGs) for testing memory and device-under-test ICs. This paper describes a
high-accuracy and wide-data-rate-range PG with a 10-ps time resolution. The PG
comprises an edge combiner (EC) and a multiphase clock generator (MPCG). The EC can
produce an arbitrary waveform through 32 phase outputs of the MPCG. The EC adopts a
one/zero detector and phase selection logic to define an operational data rate range
and a timing resolution, respectively. Therefore, the EC uses the phase selection logic to
combine the period window of the one/zero detector with the MPCG output phases.
The EC also uses a countdown counter for a wide operational range. In the MPCG, a
multiphase oscillator (MPO) adopts a ring oscillator scheme with sub feedback loops to
extend its maximum operational frequency. The MPO also uses a phase error corrector
to reduce the output phase error resulting from process and layout mismatches. Thus,
the PG can obtain high accuracy waveforms owing to small phase errors. The test chip
was implemented using a 0.13-µm CMOS process. The core area and power
consumption of the PG were measured to be 250 × 300 µm2 and 18.7 mW, respectively.
The data rate range of the PG was determined to be from 3.2 kHz to 893 MHz. The time
NXFEE INNOVATION
SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY
NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development)
#45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4
Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735.
resolution and average accuracy of the PG were measured to be 10 ps and ±0.3 LSB,
respectively.
VLSI_IEEE_32
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 20000/-
TOPIC : A Variable-Size FFT Hardware Accelerator Based on Matrix Transposition
Abstract : Fast Fourier transform (FFT) is the kernel and the most time-consuming
algorithm in the domain of digital signal processing, and the FFT sizes of different
applications are very different. Therefore, this paper proposes a variable-size FFT
hardware accelerator, which fully supports the IEEE-754 single-precision floating-point
standard and the FFT calculation with a wide size range from 2 to 220 points. First, a
parallel Cooley–Tukey FFT algorithm based on matrix transposition (MT) is proposed,
which can efficiently divide a large size FFT into several small size FFTs that can be
executed in parallel. Second, guided by this algorithm, the FFT hardware accelerator is
designed, and several FFT performance optimization techniques such as hybrid twiddle
factor generation, multibank data memory, block MT, and token-based task scheduling
are proposed. Third, its VLSI implementation is detailed, showing that it can work at 1
GHz with the area of 2.4 mm2 and the power consumption of 91.3 mW at 25 ◦C, 0.9 V.
Finally, several experiments are carried out to evaluate the proposal’s performance in
terms of FFT execution time, resource utilization, and power consumption. Comparative
experiments show that our FFT hardware accelerator achieves at most 18.89× speedups
in comparison to two software-only solutions and two hardware dedicated solutions.
VLSI_IEEE_33
(BACK-END)
SOFTWARE :
TANNER EDA
STUDENT COST
MRP:
RS. 8000/-
TOPIC : A 12-bit 40-MS/s SAR ADC With a Fast-Binary-Window DAC Switching Scheme
Abstract : This paper presents a 12-bit 40-MS/s successive approximation register
analog-to-digital converter (ADC) for ultrasound imaging systems. By incorporating a
fast binary window digital-to-analog converter (DAC) switching technique, the
problematic most significant bit transition glitch was removed to improve linearity
without increasing the input capacitance or using a calibration scheme. A hybrid DAC
was also developed to overcome the yield problem that occurs when a tiny unit
capacitance is used in the DAC. Moreover, a reference buffer was used to accelerate the
DAC settling to achieve high speed conversion. The prototype ADC was fabricated using
a 130-nm CMOS technology. The ADC core occupied an active area of 0.1 mm2 and
consumed a total power of 1.32 mW when a 1.2-V supply was used at a conversion rate
of 40 MS/s. The measured peak signal-to-noise-and-distortion ratio and spuriousfree
dynamic range were 64 and 77.5 dB, respectively. The peak effective number of bits was
10.33, which is equivalent to a Walden figure-of-merit of 25.6 fJ/conversion step.
NXFEE INNOVATION
SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY
NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development)
#45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4
Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735.
VLSI_IEEE_34
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 10000/-
TOPIC : Combating Data Leakage Trojans in Commercial and ASIC Applications With
Time-Division Multiplexing and Random Encoding
Abstract : Globalization of microchip fabrication opens the possibility for an attacker to
insert hardware Trojans into a chip during the manufacturing process. While most
defensive methods focus on detection or prevention, a recent method, called
Randomized Encoding of Combinational Logic for Resistance to Data Leakage (RECORD),
uses data randomization to prevent hardware Trojans from leaking meaningful
information even when the entire design is known to the attacker. Both RECORD and its
sequential variant require significant area and power overhead. In this paper, a Time-
Division Multiplexed version of the RECORD design process is proposed which reduces
area overhead by 63% and power by 56%. This time-division multiplexing (TDM) concept
is further refined to allow commercial off the shelf (COTS) products and IP cores to be
safely operated from a separate chip. These new methods tradeoff latency (5.3× for
TDM and 3.9× for COTS) and energy use to accomplish area and power savings and
achieve greater security than the original RECORD process.
VLSI_IEEE_35
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 10000/-
TOPIC : A 3.2-GHz Supply Noise-Insensitive PLL Using a Gate-Voltage-Boosted Source-
Follower Regulator and Residual Noise Cancellation
Abstract : In this brief, we propose a supply noise-insensitive charge pump phase-
locked loop (PLL) using a source-follower (SF) regulator and noise cancellation. In order
to minimize the voltage drop of the SF regulator while improving supply rejection, a
gate-voltage-boosting technique and the body-controlled noise cancellation are
proposed. To suppress the phase noise from the ring oscillator, a reference multiplier is
employed to maximize the PLL loop bandwidth. Implemented in 65-nm CMOS, a
prototype PLL at 3.2 GHz achieves supply noise spur of less than −33 dBc for a 50-mVpp
supply noise around the loop bandwidth while consuming 3.12 mW from a 1-V supply.
VLSI_IEEE_37
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 12000/-
TOPIC : Low-Complexity VLSI Design of Large Integer Multipliers for Fully Homomorphic
Encryption
Abstract : Large integer multiplication has been widely used in fully homomorphic
encryption (FHE). Implementing feasible large integer multiplication hardware is thus
critical for accelerating the FHE evaluation process. In this paper, a novel and efficient
operand reduction scheme is proposed to reduce the area requirement of radix-r
butterfly units. We also extend the singleport, merged-bank memory structure to the
design of number theoretic transform (NTT) and inverse NTT (INTT) for further area
minimization. In addition, an efficient memory addressing scheme is developed to
support both NTT/INTT and resolving carries computations. Experimental results reveal
that significant area reductions can be achieved for the targeted 786 432- and 1 179
648-bit NTT-based multipliers designed using the proposed schemes in comparison with
the related works. Moreover, the two multiplications can be accomplished in 0.196 and
2.21 ms, respectively, based on 90-nm CMOS technology. The low-complexity feature of
the proposed large integer multiplier designs is thus obtained without sacrificing the
NXFEE INNOVATION
SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY
NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development)
#45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4
Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735.
time performance.
VLSI_IEEE_38
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 10000/-
TOPIC : Algorithm and VLSI Architecture Design of Proportionate-Type LMS Adaptive
Filters for Sparse System Identification
Abstract : Proportionate-type normalized LMS (Pt-NLMS) family of adaptive filtering
algorithms for sparse system identification pose significant implementation challenges
due to their high computational complexity especially for real-time applications like
network echo cancelation. In this paper, we make the first attempt to implement Pt-
NLMS algorithms in hardware. Several reformulations are proposed to simplify the
original Pt-NLMS algorithms, thereby making them amenable to realtime VLSI
implementations and the reformulated algorithms referred as delayed µ-law
proportionate LMS (DMPLMS) algorithm for white input and delayed wavelet MPLMS
(DWMPLMS) for colored input are then implemented in hardware. Simulation studies
demonstrate that the performance loss is very small for the proposed reformulations.
We implemented the proposed designs considering 16-bit fixed point representation in
hardware, and synthesis results show that the DMPLMS architecture with ≈30% increase
in hardware over the state-of-the-art conventional delayed LMS architecture achieves
3× improvement in convergence rate for white input and the DWMPLMS architecture
with ≈70% increase in hardware achieves 10× improvement in convergence rate for
correlated input conditions.
VLSI_IEEE_41
(BACK-END)
SOFTWARE :
TANNER EDA
STUDENT COST
MRP:
RS. 10000/-
TOPIC : A Fast-Locking, Low-Jitter Pulse width Control Loop for High-Speed ADC
Abstract : A fast-locking, high-precision, and low-jitter pulse width control loop (PWCL)
for high-speed high-resolution analog-to-digital converter is presented. Only through
controlling the delay of rising edge to adjust duty cycle, the clock jitter could be
suppressed greatly. An improved charge pump with a follower circuit and self-biased
loop was designed to decrease the voltage ripples for higher accuracy and lower jitter. A
startup circuit was adopted to enable the pulse width control loop lock rapidly. With the
SMIC 0.18 µm 3.3 V CMOS process, the simulation and measured results show that
within 180 ns the PWCL can lock the clock duty cycles for the accuracy of 50 ± 1% with
10%∼90% input duty cycle from 50 to 550 MHz. The rms-jitter is 73 fs at 250 MHz. The
active area is about 0.023 mm2.
NXFEE INNOVATION
SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY
NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development)
#45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4
Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735.
AREA EFFICIENT/ TIMING & DELAY REDUCTION
VLSI_IEEE_03
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 10000/-
TOPIC : A Residue-to-Binary Converter for the Extended Four-Moduli Set {2n − 1, 2n + 1,
22n + 1, 22n+p}
Abstract : This brief presents a residue-to-binary converter for the moduli set {2n − 1, 2n
+ 1, 22n + 1, 22n+ p}, where n is a positive integer and 0 ≤ p ≤ n − 2. The converter
consists of three simplified 4n-bit carry-save adders (CSAs) along with a modulo (24n −1)
adder. The main contribution of this brief is reducing the requirements of the proposed
CSA network, which has impacted the area, delay, power and energy. Compared with
four-moduli and five-moduli sets that have the dynamic range 2v(24n −1), where v = n
or 2n, the proposed converter resulted in the average area, delay, power, and energy
reductions of 22.7%, 9.2%, 17.8%, and 24.5%, respectively. Moreover, the throughput
rate per unit area has been improved by an average of 48.7%.
VLSI_IEEE_12
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 10000/-
TOPIC : An Efficient Fault-Tolerance Design for Integer Parallel Matrix–Vector
Multiplications
Abstract : Parallel matrix processing is a typical operation in many systems, and in
particular matrix–vector multiplication (MVM) is one of the most common operations in
the modern digital signal processing and digital communication systems. This paper
proposes a fault tolerant design for integer parallel MVMs. The scheme combines ideas
from error correction codes with the self-checking capability of MVM. Field-
programmable gate array evaluation shows that the proposed scheme can significantly
reduce the overheads compared to the protection of each MVM on its own. Therefore,
the proposed technique can be used to reduce the cost of providing fault tolerance in
practical implementations.
VLSI_IEEE_15
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 10000/-
TOPIC : Extending 3-bit Burst Error-Correction Codes With Quadruple Adjacent Error
Correction
Abstract : The use of error-correction codes (ECCs) with advanced correction capability
is a common system-level strategy to harden the memory against multiple bit upsets
(MBUs). Therefore, the construction of ECCs with advanced error correction and low
redundancy has become an important problem, especially for adjacent ECCs. Existing
codes for mitigating MBUs mainly focus on the correction of up to 3-bit burst errors. As
the technology scales and cell interval distance decrease, the number of affected bits
can easily extend to more than 3 bit. The previous methods are therefore not enough to
satisfy the reliability requirement of the applications in harsh environments. In this
paper, a technique to extend 3-bit burst error-correction (BEC) codes with quadruple
adjacent error correction (QAEC) is presented. First, the design rules are specified and
then a searching algorithm is developed to find the codes that comply with those rules.
The H matrices of the 3-bit BEC with QAEC obtained are presented. They do not require
additional parity check bits compared with a 3-bit BEC code. By applying the new
algorithm to previous 3-bit BEC codes, the performance of 3-bit BEC is also remarkably
improved. The encoding and decoding procedure of the proposed codes is illustrated
NXFEE INNOVATION
SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY
NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development)
#45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4
Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735.
with an example. Then, the encoders and decoders are implemented using a 65-nm
library and the results show that our codes have moderate total area and delay
overhead to achieve the correction ability extension.
VLSI_IEEE_19
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 10000/-
TOPIC : A 588-Gb/s LDPC Decoder Based on Finite-Alphabet Message Passing
Abstract : An ultrahigh throughput low-density paritycheck (LDPC) decoder with an
unrolled full-parallel architecture is proposed, which achieves the highest decoding
throughput compared to previously reported LDPC decoders in the literature. The
decoder benefits from a serial message-transfer approach between the decoding stages
to alleviate the well-known routing congestion problem in parallel LDPC decoders.
Furthermore, a finite-alphabet message passing algorithm is employed to replace the
VN update rule of the standard min-sum (MS) decoder with lookup tables, which are
designed in a way that maximizes the mutual information between decoding messages.
The proposed algorithm results in an architecture with reduced bit-width messages,
leading to a significantly higher decoding throughput and to a lower area compared to
an MS decoder when serial message transfer is used. The architecture is placed and
routed for the standard MS reference decoder and for the proposed finite-alphabet
decoder using a custom pseudo hierarchical backend design strategy to further alleviate
routing congestions and to handle the large design. Post layout results show that the
finite-alphabet decoder with the serial message transfer architecture achieves a
throughput as large as 588 Gb/s with an area of 16.2 mm2 and dissipates an average
power of 22.7 pJ per decoded bit in a 28-nm fully depleted silicon on isulator library.
Compared to the reference MS decoder, this corresponds to 3.1 times smaller area and
2 times better energy efficiency.
VLSI_IEEE_21
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 10000/-
TOPIC : Basic-Set Trellis Min–Max Decoder Architecture for Non binary LDPC Codes With
High-Order Galois Fields
Abstract : Non binary low-density parity-check (NB-LDPC) codes outperform their
binary counterparts in terms of error correction performance. However, the drawback
of NB-LDPC decoders is high complexity, especially for the check node unit (CNU), and
the complexity increases considerably when increasing the Galois-field (GF) order. In this
paper, a novel basic-set trellis min–max algorithm is proposed to greatly reduce not only
the CNU complexity but also the number of messages exchanged between the check
node and the variable node compared with previous studies, which is highly efficient for
higher order GFs. In addition, the proposed CNU is designed to compute the messages in
a parallel way. Layered decoder architectures based on the proposed algorithm were
implemented for the (837, 726) NB-LDPC code over GF(32) and the (1512, 1323) code
over GF(64) using 90-nm CMOS technology, and obtained a reduction in the complexity
by 30% and 37% for the CNU, and 40% and 37.4% for the whole decoder, respectively.
Moreover, the proposed decoder achieves a higher throughput at 1.67 Gbit/s and 1.4
Gbit/s compared with the other state-of-the-art high-rate NB-LDPC decoders with high-
order GFs.
NXFEE INNOVATION
SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY
NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development)
#45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4
Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735.
VLSI_IEEE_22
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 12000/-
TOPIC : Analysis and Design of Cost-Effective, High-Throughput LDPC Decoders
Abstract : This paper introduces a new approach to cost effective, high-throughput
hardware designs for low-density parity-check (LDPC) decoders. The proposed
approach, called non surjective finite alphabet iterative decoders (NS-FAIDs), exploits
the robustness of message-passing LDPC decoders to inaccuracies in the calculation of
exchanged messages, and it is shown to provide a unified framework for several designs
previously proposed in the literature. NS-FAIDs are optimized by density evolution for
regular and irregular LDPC codes, and are shown to provide different tradeoffs between
hardware complexity and decoding performance. Two hardware architectures targeting
high-throughput applications are also proposed, integrating both Min-Sum (MS) and NS-
FAID decoding kernels. ASIC post synthesis implementation results on 65-nm CMOS
technology show that NS-FAIDs yield significant improvements in the throughput to area
ratio, by up to 58.75% with respect to the MS decoder, with even better or only slightly
degraded error correction performance.
VLSI_IEEE_26
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 12000/-
TOPIC : ULV-Turbo Cache for an Instantaneous Performance Boost on Asymmetric
Architectures
Abstract : An asymmetric architecture is commonly used in modern embedded systems
to reduce energy consumption. The systems tend to execute more applications in the
energy-efficient core, which typically employs ultralow voltage (ULV) to save energy.
However, caches become a reliability and performance barrier that limits the minimum
operating voltage and blocks system performance in the ULV environment. The poor
performance of an ultralow-voltage core causes most workload requirements to awaken
and then execute on the host core, leading to high energy consumption. In this paper,
we propose a ULV-Turbo cache based on a ULV-selective-ally 8T static random access
memory (SRAM) that is able to perform reliable ultralow-voltage operation and provide
the speedup function of SRAM rows ally. The system is able to speed up the ULV core
instantaneously and execute more applications with the ULV-Turbo cache. In our
system-wide evaluation based on a real attitude and heading reference system
workload on an asymmetric wearable system, the ULV-Turbo cache reduces the energy
consumption of the entire system by approximately 36%.
NXFEE INNOVATION
SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY
NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development)
#45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4
Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735.
VLSI_IEEE_27
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 12000/-
TOPIC : Low-Complexity Methodology for Complex Square-Root Computation
Abstract : In this brief, we propose a low-complexity methodology to compute a
complex square root using only a circular coordinate rotation digital computer (CORDIC)
as opposed to the state-of-the-art techniques that need both circular as well as
hyperbolic CORDICs. Subsequently, an architecture has been designed based on the
proposed methodology and implemented on the ASIC platform using the UMC 180-nm
Technology node with 1.0 V at 5MHz. Field programmable gate array (FPGA) prototyping
using Xilinx’ Virtex-6 (XC6v1x240t) has also been carried out. After thorough theoretical
analysis and experimental validations, it can be inferred that the proposed methodology
reduces 21.15% slice look up tables (on FPGA platform) and saves 20.25% silicon area
overhead and decreases 19% power consumption (on ASIC platform) when compared
with the state-of-the-art method without compromising the computational speed,
throughput, and accuracy.
VLSI_IEEE_28
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 10000/-
TOPIC : Securing the PRESENT Block Cipher Against Combined Side-Channel Analysis and
Fault Attacks
Abstract : In this paper, we present and evaluate a hardware implementation of the
PRESENT block cipher secured against both side-channel analysis and fault attacks (FAs).
The side-channel security is provided by the first-order threshold implementation
masking scheme of the serialized PRESENT proposed by Poschmann et al. For the FA
resistance, we employ the Private Circuits II countermeasure presented by Ishai et al. at
Eurocrypt 2006, which we tailor to resist arbitrary 1-bit faults. We perform a side-
channel evaluation using the state-of-the-art leakage detection tests, quantify the
resource overhead of the Private Circuits II countermeasure, subdue the
implementation to established differential FAs against the PRESENT block cipher, and
contemplate on the structural resistance of the countermeasure. This paper provides
the detailed instructions on how to successfully achieve a secure Private Circuits II
implementation for the data path as well as the control logic.
VLSI_IEEE_39
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 10000/-
TOPIC : Multilevel Half-Rate Phase Detector for Clock and Data Recovery Circuits
Abstract : In this brief, a half-rate (HR) bang-bang (BB) phase detector (PD) with multiple
decision levels is proposed for clock and data recovery (CDR) circuits. The combination
allows the oscillator to run at half the input data rate while providing information about
the sign and magnitude of the phase shift between the PD inputs. This allows a finer
control of the frequency of the oscillator in the phase-locked loop (PLL) of the CDR
circuit, which results in up to 30% less output clock jitter than with a conventional two-
levels HR BB PD. Thanks to this, the bit error rate can be decreased by up to 5× in a 5-
Gb/s CDR circuit. The proposed topology was implemented in a 28-nm FDSOI CMOS
technology providing average power consumption below 76 µW with a supply voltage of
1 V. Although multilevel (ML) BB PDs have already been proposed in some PLL-based
CDR with very interesting results, a specific design of the PD has to be implemented for
an HR system. This brief provides the first ML-HR-BBPD.
NXFEE INNOVATION
SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY
NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development)
#45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4
Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735.
VLSI_IEEE_40
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 12000/-
TOPIC : Fast Neural Network Training on FPGA Using Quasi-Newton Optimization
Method
Abstract : In this brief, a customized and pipelined hardware implementation of the
quasi-Newton (QN) method on field-programmable gate array (FPGA) is proposed for
fast artificial neural networks onsite training, targeting at the embedded applications.
The architecture is scalable to cope with different neural network sizes while it supports
batch-mode training. Experimental results demonstrate the superior performance and
power efficiency of the proposed implementation over CPU, graphics processing unit,
and FPGA QN implementations.
VLSI_IEEE_42
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 10000/-
TOPIC : Feedback-Based Low-Power Soft-Error-Tolerant Design for Dual-Modular
Redundancy
Abstract : Triple-modular redundancy (TMR), which consists of three identical modules
and a voting circuit, is a common architecture for soft-error tolerance. However, the
original TMR suffers from two major drawbacks: the large area overhead and the
vulnerability of the voter. In order to overcome these drawbacks, we propose a new
complementary dual-modular redundancy (CDMR) scheme for mitigating the effect of
soft errors. Inspired by the Markov random field (MRF) theory, a two-stage voting
system is implemented in CDMR, including a first stage optimal MRF structure and a
second-stage high-performance merging unit. The CDMR scheme can reduce the voting
circuit area by 20% while saving the area of one redundant module, achieving at least
26% error-rate reduction at an ultralow supply voltage of 0.25 V with 8.33% faster
timing compared to previous voter designs.
VLSI_IEEE_43
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 10000/-
TOPIC : A Simple Yet Efficient Accuracy Configurable Adder Design
Abstract : Approximate computing is a promising approach for low-power IC design and
has recently received considerable research attention. To accommodate dynamic levels
of approximation, a few accuracy-configurable adder (ACA) designs have been
developed in the past. However, these designs tend to incur large area overheads as
they rely on either redundant computing or complicated carry prediction. Some of these
designs include error detection and correction circuitry, which further increase the area.
In this paper, we investigate a simple ACA design that contains no redundancy or error
detection/correction circuitry and uses very simple carry prediction. The simulation
results show that our design dominates the latest previous work on accuracy-delay-
power tradeoff while using 39% lower area. In the best case, the iso-delay power of our
design is only 16% of accurate adder regardless of degradation in accuracy. One variant
of this design provides finer-grained and larger tunability than that of the previous
works. Moreover, we propose a delay adaptive self-configuration technique to further
improve the accuracy-delay-power tradeoff. The advantages of our method are
confirmed by the applications in multiplication and discrete cosine transform
computing.
NXFEE INNOVATION
SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY
NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development)
#45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4
Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735.
Audio, Image and Video Processing
VLSI_IEEE_14
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 20000/-
TOPIC : An Energy-Efficient Programmable Many core Accelerator for Personalized
Biomedical Applications
Abstract : Wearable personalized health monitoring systems can offer a cost-effective
solution for human health care. These systems must constantly monitor patients’
physiological signals and provide highly accurate, and quick processing and delivery of
the vast amount of data within a limited power and area footprint. These personalized
biomedical applications require sampling and processing multiple streams of
physiological signals with a varying number of channels and sampling rates. The
processing typically consists of feature extraction, data fusion, and classification stages
that require a large number of digital signal processing (DSP) and machine learning (ML)
kernels. In response to these requirements, in this paper, a tiny, energy efficient, and
domain-specific many core accelerator referred to as power-efficient nano clusters
(PENC) is proposed to map and execute the kernels of these applications. Simulation
results show that the PENC is able to reduce energy consumption by up to 80% and 25%
for DSP and ML kernels, respectively, when optimally parallelized. In addition, we fully
implemented three compute-intensive personalized biomedical applications, namely,
multichannel seizure detection, multi physiological stress detection, and standalone
tongue drive system (sTDS), to evaluate the proposed many core performance relative
to commodity embedded CPU, graphical processing unit (GPU), and field programmable
gate array (FPGA)-based implementations. For these three case studies, the energy
consumption and the performance of the proposed PENC many core, when acting as an
accelerator along with an Intel Atom processor as a host, are compared with the existing
commercial off-the-shelf general purpose, customizable, and programmable embedded
platforms, including Intel Atom, Xilinx Artix-7 FPGA, and NVIDIA TK1 advanced RISC
machine -A15 and K1 GPU system on a chip. For these applications, the PENC many core
is able to significantly improve throughput and energy efficiency by up to 1872× and
276×, respectively. For the most computational intensive application of seizure
detection, the PENC many core is able to achieve a throughput of 15.22 giga-operations-
per-second (GOPs), which is a 14× improvement in throughput over custom FPGA
solution. For stress detection, the PENC achieves a throughput of 21.36 GOPs and an
energy efficiency of 4.23 GOP/J, which is 14.87× and 2.28× better over FPGA
implementation, respectively. For the sTDS application, the PENC improves a through
put by 5.45× and an energy efficiency by 2.37× over FPGA implementation.
NXFEE INNOVATION
SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY
NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development)
#45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4
Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735.
VLSI_IEEE_16
(FRONT-END)
SOFTWARE:
MODELSIM
&
XILINX
STUDENT COST
MRP:
RS. 18000/-
TOPIC : VLSI Design of an ML-Based Power-Efficient Motion Estimation Controller for
Intelligent Mobile Systems
Abstract : In this paper, a machine learning (ML)-based power-efficient motion
estimation (ME) controller algorithm and VLSI architecture incorporating coding
bandwidth and rate distortion (R-D) cost using convex optimization are proposed to
effectuate a smart and bandwidth-efficient ME design for intelligent mobile systems. To
be smart and adapt to time altering coding bandwidth using intelligent power-
management techniques in modern application processor systems, we first propose an
ML-based bandwidth-on-demand ME controller algorithm based on the convex
optimization method to resolve the lack of an awareness of coding bandwidth in prior
ME designs. Then, a hardware-friendly and power-efficient VLSI architecture is
developed to implement an intelligent, high-performance, and low-power ME controller
design that can be combined with prior ME designs to satisfy the bandwidth-efficient
ME design target under bandwidth constraints. The final implementation results show
that the proposed smart ME controller architecture using our proposed bandwidth
control scheme costs 0.816K gate counts, consumes 0.873 mW of power at a working
frequency of 1.1 GHz with Taiwan Semiconductor Manufacture Company (TSMC) 90-nm
CMOS technology, and achieves an average bandwidth reduction of 56.08% compared
with previous non-band width on-demand ME designs for high-definition (HD) videos.

More Related Content

What's hot

Design of a 45nm TIQ Comparator for High Speed and Low Power 4-Bit Flash ADC
Design of a 45nm TIQ Comparator for High Speed and Low Power 4-Bit Flash ADCDesign of a 45nm TIQ Comparator for High Speed and Low Power 4-Bit Flash ADC
Design of a 45nm TIQ Comparator for High Speed and Low Power 4-Bit Flash ADCIDES Editor
 
01255490crosstalk noise
01255490crosstalk noise01255490crosstalk noise
01255490crosstalk noisesandeep patil
 
A 15 bit third order power optimized continuous time sigma delta modulator fo...
A 15 bit third order power optimized continuous time sigma delta modulator fo...A 15 bit third order power optimized continuous time sigma delta modulator fo...
A 15 bit third order power optimized continuous time sigma delta modulator fo...eSAT Publishing House
 
Matlab / Projects / Project / Image processing list
Matlab / Projects / Project / Image processing listMatlab / Projects / Project / Image processing list
Matlab / Projects / Project / Image processing listE2MATRIX
 
Allen Bradley PLC V/S Siemens PLC
Allen Bradley PLC V/S Siemens PLCAllen Bradley PLC V/S Siemens PLC
Allen Bradley PLC V/S Siemens PLCpaperpublications3
 
A 10-BIT 25 MS/S PIPELINED ADC USING 1.5-BIT SWITCHED CAPACITANCE BASED MDAC ...
A 10-BIT 25 MS/S PIPELINED ADC USING 1.5-BIT SWITCHED CAPACITANCE BASED MDAC ...A 10-BIT 25 MS/S PIPELINED ADC USING 1.5-BIT SWITCHED CAPACITANCE BASED MDAC ...
A 10-BIT 25 MS/S PIPELINED ADC USING 1.5-BIT SWITCHED CAPACITANCE BASED MDAC ...IAEME Publication
 
Vlsi IEEE 2014 titles 2014_2015_CSE/IT/ECE/EEE/ STUDENTS IN CHENNAI (S3 INFO...
Vlsi IEEE 2014 titles  2014_2015_CSE/IT/ECE/EEE/ STUDENTS IN CHENNAI (S3 INFO...Vlsi IEEE 2014 titles  2014_2015_CSE/IT/ECE/EEE/ STUDENTS IN CHENNAI (S3 INFO...
Vlsi IEEE 2014 titles 2014_2015_CSE/IT/ECE/EEE/ STUDENTS IN CHENNAI (S3 INFO...S3 Infotech IEEE Projects
 

What's hot (10)

Design of a 45nm TIQ Comparator for High Speed and Low Power 4-Bit Flash ADC
Design of a 45nm TIQ Comparator for High Speed and Low Power 4-Bit Flash ADCDesign of a 45nm TIQ Comparator for High Speed and Low Power 4-Bit Flash ADC
Design of a 45nm TIQ Comparator for High Speed and Low Power 4-Bit Flash ADC
 
01255490crosstalk noise
01255490crosstalk noise01255490crosstalk noise
01255490crosstalk noise
 
A 15 bit third order power optimized continuous time sigma delta modulator fo...
A 15 bit third order power optimized continuous time sigma delta modulator fo...A 15 bit third order power optimized continuous time sigma delta modulator fo...
A 15 bit third order power optimized continuous time sigma delta modulator fo...
 
Matlab / Projects / Project / Image processing list
Matlab / Projects / Project / Image processing listMatlab / Projects / Project / Image processing list
Matlab / Projects / Project / Image processing list
 
Allen Bradley PLC V/S Siemens PLC
Allen Bradley PLC V/S Siemens PLCAllen Bradley PLC V/S Siemens PLC
Allen Bradley PLC V/S Siemens PLC
 
A 10-BIT 25 MS/S PIPELINED ADC USING 1.5-BIT SWITCHED CAPACITANCE BASED MDAC ...
A 10-BIT 25 MS/S PIPELINED ADC USING 1.5-BIT SWITCHED CAPACITANCE BASED MDAC ...A 10-BIT 25 MS/S PIPELINED ADC USING 1.5-BIT SWITCHED CAPACITANCE BASED MDAC ...
A 10-BIT 25 MS/S PIPELINED ADC USING 1.5-BIT SWITCHED CAPACITANCE BASED MDAC ...
 
Rems final
Rems finalRems final
Rems final
 
252 256
252 256252 256
252 256
 
F1074145
F1074145F1074145
F1074145
 
Vlsi IEEE 2014 titles 2014_2015_CSE/IT/ECE/EEE/ STUDENTS IN CHENNAI (S3 INFO...
Vlsi IEEE 2014 titles  2014_2015_CSE/IT/ECE/EEE/ STUDENTS IN CHENNAI (S3 INFO...Vlsi IEEE 2014 titles  2014_2015_CSE/IT/ECE/EEE/ STUDENTS IN CHENNAI (S3 INFO...
Vlsi IEEE 2014 titles 2014_2015_CSE/IT/ECE/EEE/ STUDENTS IN CHENNAI (S3 INFO...
 

Similar to VLSI IEEE Transaction 2018 - IEEE Transaction

Nexgen tech vlsi 2015 2014
Nexgen  tech vlsi 2015 2014Nexgen  tech vlsi 2015 2014
Nexgen tech vlsi 2015 2014nexgentech
 
Designing of Asynchronous Viterbi Decoder for Low Power Consumption using Han...
Designing of Asynchronous Viterbi Decoder for Low Power Consumption using Han...Designing of Asynchronous Viterbi Decoder for Low Power Consumption using Han...
Designing of Asynchronous Viterbi Decoder for Low Power Consumption using Han...IRJET Journal
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
DEVELOPMENT OF DC SOURCE BASED SYSTEM GENERATOR USING SPWM FOR HIGH SWITCHING...
DEVELOPMENT OF DC SOURCE BASED SYSTEM GENERATOR USING SPWM FOR HIGH SWITCHING...DEVELOPMENT OF DC SOURCE BASED SYSTEM GENERATOR USING SPWM FOR HIGH SWITCHING...
DEVELOPMENT OF DC SOURCE BASED SYSTEM GENERATOR USING SPWM FOR HIGH SWITCHING...pharmaindexing
 
Vlsi 2015 2016 ieee project list-(v)_with abstract
Vlsi 2015 2016 ieee project list-(v)_with abstractVlsi 2015 2016 ieee project list-(v)_with abstract
Vlsi 2015 2016 ieee project list-(v)_with abstractS3 Infotech IEEE Projects
 
IRJET- Performance Analysis of IP Over Optical CDMA System based on RD Code
IRJET- Performance Analysis of IP Over Optical CDMA System based on RD CodeIRJET- Performance Analysis of IP Over Optical CDMA System based on RD Code
IRJET- Performance Analysis of IP Over Optical CDMA System based on RD CodeIRJET Journal
 
Comparative study of_digital_modulation (1)
Comparative study of_digital_modulation (1)Comparative study of_digital_modulation (1)
Comparative study of_digital_modulation (1)Bindia Kumari
 
Resume for Embedded Engineer_1
Resume for Embedded Engineer_1Resume for Embedded Engineer_1
Resume for Embedded Engineer_1gajendra parmar
 
Extremely Low Power FIR Filter for a Smart Dust Sensor Module
Extremely Low Power FIR Filter for a Smart Dust Sensor ModuleExtremely Low Power FIR Filter for a Smart Dust Sensor Module
Extremely Low Power FIR Filter for a Smart Dust Sensor ModuleCSCJournals
 
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...Hari M
 
3 phase energy meter on Zigbee1
3 phase energy meter on Zigbee13 phase energy meter on Zigbee1
3 phase energy meter on Zigbee1Sumit Vyas
 
AMAR_KANTETI_RESUME
AMAR_KANTETI_RESUMEAMAR_KANTETI_RESUME
AMAR_KANTETI_RESUMEamar kanteti
 
IEEE 2014 NS2 NETWORKING PROJECTS Optical networking with variable code-rate...
IEEE 2014 NS2 NETWORKING PROJECTS  Optical networking with variable code-rate...IEEE 2014 NS2 NETWORKING PROJECTS  Optical networking with variable code-rate...
IEEE 2014 NS2 NETWORKING PROJECTS Optical networking with variable code-rate...IEEEBEBTECHSTUDENTPROJECTS
 
Design and Implementation of Secured Wireless Communication Using Raspberry Pi
Design and Implementation of Secured Wireless Communication Using Raspberry PiDesign and Implementation of Secured Wireless Communication Using Raspberry Pi
Design and Implementation of Secured Wireless Communication Using Raspberry PiIRJET Journal
 
iaetsd Software defined am transmitter using vhdl
iaetsd Software defined am transmitter using vhdliaetsd Software defined am transmitter using vhdl
iaetsd Software defined am transmitter using vhdlIaetsd Iaetsd
 
Multiple Valued Logic for Synthesis and Simulation of Digital Circuits
Multiple Valued Logic for Synthesis and Simulation of Digital CircuitsMultiple Valued Logic for Synthesis and Simulation of Digital Circuits
Multiple Valued Logic for Synthesis and Simulation of Digital CircuitsIJERA Editor
 
A Simulation of Wideband CDMA System on Digital Up/Down Converters
A Simulation of Wideband CDMA System on Digital Up/Down ConvertersA Simulation of Wideband CDMA System on Digital Up/Down Converters
A Simulation of Wideband CDMA System on Digital Up/Down ConvertersEditor IJMTER
 
IMPLEMENTATION OF A NEW IR-UWB SYSTEM BASED ON M-OAM MODULATION ON FPGA COMPO...
IMPLEMENTATION OF A NEW IR-UWB SYSTEM BASED ON M-OAM MODULATION ON FPGA COMPO...IMPLEMENTATION OF A NEW IR-UWB SYSTEM BASED ON M-OAM MODULATION ON FPGA COMPO...
IMPLEMENTATION OF A NEW IR-UWB SYSTEM BASED ON M-OAM MODULATION ON FPGA COMPO...ijwmn
 

Similar to VLSI IEEE Transaction 2018 - IEEE Transaction (20)

Nexgen tech vlsi 2015 2014
Nexgen  tech vlsi 2015 2014Nexgen  tech vlsi 2015 2014
Nexgen tech vlsi 2015 2014
 
Designing of Asynchronous Viterbi Decoder for Low Power Consumption using Han...
Designing of Asynchronous Viterbi Decoder for Low Power Consumption using Han...Designing of Asynchronous Viterbi Decoder for Low Power Consumption using Han...
Designing of Asynchronous Viterbi Decoder for Low Power Consumption using Han...
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
DEVELOPMENT OF DC SOURCE BASED SYSTEM GENERATOR USING SPWM FOR HIGH SWITCHING...
DEVELOPMENT OF DC SOURCE BASED SYSTEM GENERATOR USING SPWM FOR HIGH SWITCHING...DEVELOPMENT OF DC SOURCE BASED SYSTEM GENERATOR USING SPWM FOR HIGH SWITCHING...
DEVELOPMENT OF DC SOURCE BASED SYSTEM GENERATOR USING SPWM FOR HIGH SWITCHING...
 
Vlsi 2015 2016 ieee project list-(v)_with abstract
Vlsi 2015 2016 ieee project list-(v)_with abstractVlsi 2015 2016 ieee project list-(v)_with abstract
Vlsi 2015 2016 ieee project list-(v)_with abstract
 
IRJET- Performance Analysis of IP Over Optical CDMA System based on RD Code
IRJET- Performance Analysis of IP Over Optical CDMA System based on RD CodeIRJET- Performance Analysis of IP Over Optical CDMA System based on RD Code
IRJET- Performance Analysis of IP Over Optical CDMA System based on RD Code
 
H0144757
H0144757H0144757
H0144757
 
Comparative study of_digital_modulation (1)
Comparative study of_digital_modulation (1)Comparative study of_digital_modulation (1)
Comparative study of_digital_modulation (1)
 
Resume for Embedded Engineer_1
Resume for Embedded Engineer_1Resume for Embedded Engineer_1
Resume for Embedded Engineer_1
 
Extremely Low Power FIR Filter for a Smart Dust Sensor Module
Extremely Low Power FIR Filter for a Smart Dust Sensor ModuleExtremely Low Power FIR Filter for a Smart Dust Sensor Module
Extremely Low Power FIR Filter for a Smart Dust Sensor Module
 
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
 
3 phase energy meter on Zigbee1
3 phase energy meter on Zigbee13 phase energy meter on Zigbee1
3 phase energy meter on Zigbee1
 
AMAR_KANTETI_RESUME
AMAR_KANTETI_RESUMEAMAR_KANTETI_RESUME
AMAR_KANTETI_RESUME
 
IEEE 2014 NS2 NETWORKING PROJECTS Optical networking with variable code-rate...
IEEE 2014 NS2 NETWORKING PROJECTS  Optical networking with variable code-rate...IEEE 2014 NS2 NETWORKING PROJECTS  Optical networking with variable code-rate...
IEEE 2014 NS2 NETWORKING PROJECTS Optical networking with variable code-rate...
 
Design and Implementation of Secured Wireless Communication Using Raspberry Pi
Design and Implementation of Secured Wireless Communication Using Raspberry PiDesign and Implementation of Secured Wireless Communication Using Raspberry Pi
Design and Implementation of Secured Wireless Communication Using Raspberry Pi
 
iaetsd Software defined am transmitter using vhdl
iaetsd Software defined am transmitter using vhdliaetsd Software defined am transmitter using vhdl
iaetsd Software defined am transmitter using vhdl
 
Multiple Valued Logic for Synthesis and Simulation of Digital Circuits
Multiple Valued Logic for Synthesis and Simulation of Digital CircuitsMultiple Valued Logic for Synthesis and Simulation of Digital Circuits
Multiple Valued Logic for Synthesis and Simulation of Digital Circuits
 
A Simulation of Wideband CDMA System on Digital Up/Down Converters
A Simulation of Wideband CDMA System on Digital Up/Down ConvertersA Simulation of Wideband CDMA System on Digital Up/Down Converters
A Simulation of Wideband CDMA System on Digital Up/Down Converters
 
O42018994
O42018994O42018994
O42018994
 
IMPLEMENTATION OF A NEW IR-UWB SYSTEM BASED ON M-OAM MODULATION ON FPGA COMPO...
IMPLEMENTATION OF A NEW IR-UWB SYSTEM BASED ON M-OAM MODULATION ON FPGA COMPO...IMPLEMENTATION OF A NEW IR-UWB SYSTEM BASED ON M-OAM MODULATION ON FPGA COMPO...
IMPLEMENTATION OF A NEW IR-UWB SYSTEM BASED ON M-OAM MODULATION ON FPGA COMPO...
 

More from Nxfee Innovation

Noise insensitive pll using a gate-voltage-boosted source-follower regulator ...
Noise insensitive pll using a gate-voltage-boosted source-follower regulator ...Noise insensitive pll using a gate-voltage-boosted source-follower regulator ...
Noise insensitive pll using a gate-voltage-boosted source-follower regulator ...Nxfee Innovation
 
An efficient fault tolerance design for integer parallel matrix vector
An efficient fault tolerance design for integer parallel matrix vectorAn efficient fault tolerance design for integer parallel matrix vector
An efficient fault tolerance design for integer parallel matrix vectorNxfee Innovation
 
Vector processing aware advanced clock-gating techniques for low-power fused ...
Vector processing aware advanced clock-gating techniques for low-power fused ...Vector processing aware advanced clock-gating techniques for low-power fused ...
Vector processing aware advanced clock-gating techniques for low-power fused ...Nxfee Innovation
 
The implementation of the improved omp for aic reconstruction based on parall...
The implementation of the improved omp for aic reconstruction based on parall...The implementation of the improved omp for aic reconstruction based on parall...
The implementation of the improved omp for aic reconstruction based on parall...Nxfee Innovation
 
Securing the present block cipher against combined side channel analysis and ...
Securing the present block cipher against combined side channel analysis and ...Securing the present block cipher against combined side channel analysis and ...
Securing the present block cipher against combined side channel analysis and ...Nxfee Innovation
 
Multilevel half rate phase detector for clock and data recovery circuits
Multilevel half rate phase detector for clock and data recovery circuitsMultilevel half rate phase detector for clock and data recovery circuits
Multilevel half rate phase detector for clock and data recovery circuitsNxfee Innovation
 
Low complexity methodology for complex square-root computation
Low complexity methodology for complex square-root computationLow complexity methodology for complex square-root computation
Low complexity methodology for complex square-root computationNxfee Innovation
 
Feedback based low-power soft-error-tolerant design for dual-modular redundancy
Feedback based low-power soft-error-tolerant design for dual-modular redundancyFeedback based low-power soft-error-tolerant design for dual-modular redundancy
Feedback based low-power soft-error-tolerant design for dual-modular redundancyNxfee Innovation
 
Fast neural network training on fpga using quasi newton optimization method
Fast neural network training on fpga using quasi newton optimization methodFast neural network training on fpga using quasi newton optimization method
Fast neural network training on fpga using quasi newton optimization methodNxfee Innovation
 
Efficient fpga mapping of pipeline sdf fft cores
Efficient fpga mapping of pipeline sdf fft coresEfficient fpga mapping of pipeline sdf fft cores
Efficient fpga mapping of pipeline sdf fft coresNxfee Innovation
 
Design of an area efficient million-bit integer multiplier using double modul...
Design of an area efficient million-bit integer multiplier using double modul...Design of an area efficient million-bit integer multiplier using double modul...
Design of an area efficient million-bit integer multiplier using double modul...Nxfee Innovation
 
Design and fpga implementation of a reconfigurable digital down converter for...
Design and fpga implementation of a reconfigurable digital down converter for...Design and fpga implementation of a reconfigurable digital down converter for...
Design and fpga implementation of a reconfigurable digital down converter for...Nxfee Innovation
 
Combating data leakage trojans in commercial and asic applications with time ...
Combating data leakage trojans in commercial and asic applications with time ...Combating data leakage trojans in commercial and asic applications with time ...
Combating data leakage trojans in commercial and asic applications with time ...Nxfee Innovation
 
Approximate sum of-products designs based on distributed arithmetic
Approximate sum of-products designs based on distributed arithmeticApproximate sum of-products designs based on distributed arithmetic
Approximate sum of-products designs based on distributed arithmeticNxfee Innovation
 
Approximate hybrid high radix encoding for energy efficient inexact multipliers
Approximate hybrid high radix encoding for energy efficient inexact multipliersApproximate hybrid high radix encoding for energy efficient inexact multipliers
Approximate hybrid high radix encoding for energy efficient inexact multipliersNxfee Innovation
 
Analysis and design of cost effective, high-throughput ldpc decoders
Analysis and design of cost effective, high-throughput ldpc decodersAnalysis and design of cost effective, high-throughput ldpc decoders
Analysis and design of cost effective, high-throughput ldpc decodersNxfee Innovation
 
An energy efficient programmable many core accelerator for personalized biome...
An energy efficient programmable many core accelerator for personalized biome...An energy efficient programmable many core accelerator for personalized biome...
An energy efficient programmable many core accelerator for personalized biome...Nxfee Innovation
 
Algorithm and vlsi architecture design of proportionate type lms adaptive fil...
Algorithm and vlsi architecture design of proportionate type lms adaptive fil...Algorithm and vlsi architecture design of proportionate type lms adaptive fil...
Algorithm and vlsi architecture design of proportionate type lms adaptive fil...Nxfee Innovation
 
A reconfigurable ldpc decoder optimized applications
A reconfigurable ldpc decoder optimized applicationsA reconfigurable ldpc decoder optimized applications
A reconfigurable ldpc decoder optimized applicationsNxfee Innovation
 
A high accuracy programmable pulse generator with a 10-ps timing resolution
A high accuracy programmable pulse generator with a 10-ps timing resolutionA high accuracy programmable pulse generator with a 10-ps timing resolution
A high accuracy programmable pulse generator with a 10-ps timing resolutionNxfee Innovation
 

More from Nxfee Innovation (20)

Noise insensitive pll using a gate-voltage-boosted source-follower regulator ...
Noise insensitive pll using a gate-voltage-boosted source-follower regulator ...Noise insensitive pll using a gate-voltage-boosted source-follower regulator ...
Noise insensitive pll using a gate-voltage-boosted source-follower regulator ...
 
An efficient fault tolerance design for integer parallel matrix vector
An efficient fault tolerance design for integer parallel matrix vectorAn efficient fault tolerance design for integer parallel matrix vector
An efficient fault tolerance design for integer parallel matrix vector
 
Vector processing aware advanced clock-gating techniques for low-power fused ...
Vector processing aware advanced clock-gating techniques for low-power fused ...Vector processing aware advanced clock-gating techniques for low-power fused ...
Vector processing aware advanced clock-gating techniques for low-power fused ...
 
The implementation of the improved omp for aic reconstruction based on parall...
The implementation of the improved omp for aic reconstruction based on parall...The implementation of the improved omp for aic reconstruction based on parall...
The implementation of the improved omp for aic reconstruction based on parall...
 
Securing the present block cipher against combined side channel analysis and ...
Securing the present block cipher against combined side channel analysis and ...Securing the present block cipher against combined side channel analysis and ...
Securing the present block cipher against combined side channel analysis and ...
 
Multilevel half rate phase detector for clock and data recovery circuits
Multilevel half rate phase detector for clock and data recovery circuitsMultilevel half rate phase detector for clock and data recovery circuits
Multilevel half rate phase detector for clock and data recovery circuits
 
Low complexity methodology for complex square-root computation
Low complexity methodology for complex square-root computationLow complexity methodology for complex square-root computation
Low complexity methodology for complex square-root computation
 
Feedback based low-power soft-error-tolerant design for dual-modular redundancy
Feedback based low-power soft-error-tolerant design for dual-modular redundancyFeedback based low-power soft-error-tolerant design for dual-modular redundancy
Feedback based low-power soft-error-tolerant design for dual-modular redundancy
 
Fast neural network training on fpga using quasi newton optimization method
Fast neural network training on fpga using quasi newton optimization methodFast neural network training on fpga using quasi newton optimization method
Fast neural network training on fpga using quasi newton optimization method
 
Efficient fpga mapping of pipeline sdf fft cores
Efficient fpga mapping of pipeline sdf fft coresEfficient fpga mapping of pipeline sdf fft cores
Efficient fpga mapping of pipeline sdf fft cores
 
Design of an area efficient million-bit integer multiplier using double modul...
Design of an area efficient million-bit integer multiplier using double modul...Design of an area efficient million-bit integer multiplier using double modul...
Design of an area efficient million-bit integer multiplier using double modul...
 
Design and fpga implementation of a reconfigurable digital down converter for...
Design and fpga implementation of a reconfigurable digital down converter for...Design and fpga implementation of a reconfigurable digital down converter for...
Design and fpga implementation of a reconfigurable digital down converter for...
 
Combating data leakage trojans in commercial and asic applications with time ...
Combating data leakage trojans in commercial and asic applications with time ...Combating data leakage trojans in commercial and asic applications with time ...
Combating data leakage trojans in commercial and asic applications with time ...
 
Approximate sum of-products designs based on distributed arithmetic
Approximate sum of-products designs based on distributed arithmeticApproximate sum of-products designs based on distributed arithmetic
Approximate sum of-products designs based on distributed arithmetic
 
Approximate hybrid high radix encoding for energy efficient inexact multipliers
Approximate hybrid high radix encoding for energy efficient inexact multipliersApproximate hybrid high radix encoding for energy efficient inexact multipliers
Approximate hybrid high radix encoding for energy efficient inexact multipliers
 
Analysis and design of cost effective, high-throughput ldpc decoders
Analysis and design of cost effective, high-throughput ldpc decodersAnalysis and design of cost effective, high-throughput ldpc decoders
Analysis and design of cost effective, high-throughput ldpc decoders
 
An energy efficient programmable many core accelerator for personalized biome...
An energy efficient programmable many core accelerator for personalized biome...An energy efficient programmable many core accelerator for personalized biome...
An energy efficient programmable many core accelerator for personalized biome...
 
Algorithm and vlsi architecture design of proportionate type lms adaptive fil...
Algorithm and vlsi architecture design of proportionate type lms adaptive fil...Algorithm and vlsi architecture design of proportionate type lms adaptive fil...
Algorithm and vlsi architecture design of proportionate type lms adaptive fil...
 
A reconfigurable ldpc decoder optimized applications
A reconfigurable ldpc decoder optimized applicationsA reconfigurable ldpc decoder optimized applications
A reconfigurable ldpc decoder optimized applications
 
A high accuracy programmable pulse generator with a 10-ps timing resolution
A high accuracy programmable pulse generator with a 10-ps timing resolutionA high accuracy programmable pulse generator with a 10-ps timing resolution
A high accuracy programmable pulse generator with a 10-ps timing resolution
 

Recently uploaded

Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 

Recently uploaded (20)

Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 

VLSI IEEE Transaction 2018 - IEEE Transaction

  • 1.
  • 2. NXFEE INNOVATION SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development) #45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4 Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735. NXFEE - VLSI IEEE TRANSACTION - 2018 PROJECT TITLE TITLE FOR VLSI LOW POWER VLSI_IEEE_01 (BACK-END) SOFTWARE : TANNER EDA STUDENT COST MRP: RS. 12000/- TOPIC : A 128-Tap Highly Tunable CMOS IF Finite Impulse Response Filter for Pulsed Radar Applications Abstract : A configurable-bandwidth (BW) filter is presented in this paper for pulsed radar applications. To eliminate dispersion effects in the received waveform, a finite impulse response (FIR) topology is proposed, which has a measured standard deviation of an in-band group delay of 11 ns that is primarily dominated by the inherent, fully predictable delay introduced by the sample-and-hold. The filter operates at an IF of 20 MHz, and is tunable in BW from 1.5 to 15 MHz, which makes it optimal to be used with varying pulse widths in the radar. Employing a total of 128 taps, the FIR filter provides greater than 50-dB sharp attenuation in the stop band in order to minimize all out-of- band noise in the low signal-to-noise received radar signal. Fabricated in a 0.18-µm silicon on insulator CMOS process, the proposed filter consumes approximately 3.5 mW/tap with a 1.8-V supply. A 20-MHz two-tone measurement with 200-kHz tone separation shows IIP3 greater than 8.5 dBm. VLSI_IEEE_02 (BACK-END) SOFTWARE : TANNER EDA STUDENT COST MRP: RS. 8000/- TOPIC : A Closed-Form Expression for Minimum Operating Voltage of CMOS D Flip-Flop Abstract : In this paper, a closed-form expression for estimating the minimum operating voltage (VDDmin) of D flip-flops (FFs) is proposed. VDDmin is defined as the minimum supply voltage at which the FFs are functional without errors. The proposed expression indicates that VDDmin of FFs is a linear function of the square root of logarithm of the number of FFs, and its slope depends on the within-die variation of the threshold voltage (VTH) and its intercept depends on the balance between nMOS and pMOS, which is mainly due to the die-to-die VTH variation. The proposed expression of VDDmin is validated by the simulation results as well as the silicon measurements. Finally, we discuss the dependence of VDDmin on the device parameters. VLSI_IEEE_05 (BACK-END) SOFTWARE : TANNER EDA STUDENT COST MRP: RS. 8000/- TOPIC : Design of Temperature-Aware Low-Voltage 8T SRAM in SOI Technology for High- Temperature Operation (25 °C–300 °C) Abstract : A temperature-aware low-voltage 8T static random access memory (SRAM) for high-temperature operations is presented. A dedicated read port with virtual ground and optimal body bias improves sensing margin under very high temperature (up to 300 °C). Bit line offset voltage for data “0” caused by the virtual ground scheme is also compensated by a replica bit line. The independent body bias control feature of the employed silicon-on-insulator (SOI) technology allows the write margin to be enhanced significantly without using any write-assist circuitry. Test chips were fabricated in a 1-µm SOI technology with tungsten interconnect for reliability at high temperature and lesser
  • 3. NXFEE INNOVATION SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development) #45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4 Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735. process variation. Measurement results demonstrate that the proposed SRAM operates successfully up to 300 °C with the supply voltage range of 2–5 V. At the minimum performance variation point (VDD = 2.5 V), the SRAM consumes 1.48 mW and shows the access time of 156 ns and the maximum clock frequency of 14.38 MHz at 300 °C. VLSI_IEEE_09 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 12000/- TOPIC : Design of an Area-Effcient Million-Bit Integer Multiplier Using Double Modulus NTT Abstract : This brief proposes a double modulus number theoretical transform (NTT) method for million-bit integer multiplication in fully homomorphic encryption. In our method, each NTT point is processed simultaneously under two moduli, and the final result is generated through the Chinese reminder theorem. The employment of double modulus enlarges the permitted NTT sample size from 24 to 32 bits and thus improves the transform efficiency. Based on the proposed double modulus method, we accomplish a VLSI design of million-bit integer multiplier with the Schönhage–Strassen algorithm. Implementation results on Altera Stratix-V FPGA show that this brief is able to compute a product of two 1024k-bit integers every 4.9 ms at the cost of only 7.9k ALUTs and 3.6k registers, which is more area-efficient when compared with the current competitors. VLSI_IEEE_10 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 12000/- TOPIC : A Fast and Low-Complexity Operator for the Computation of the Arctangent of a Complex Number Abstract : The computation of the arctangent of a complex number, i.e., the atan2 function, is frequently needed in hardware systems that could profit from an optimized operator. In this brief, we present a novel method to compute the atan2 function and a hardware architecture for its implementation. The method is based on a first stage that performs a coarse approximation of the atan2 function and a second stage that improves the output accuracy by means of a lookup table. We present results for fixed- point implementations in a field-programmable gate array device, all of them guaranteeing last-bit accuracy, which provide an advantage in latency, speed, and use of resources, when compared with well-established fixed-point options. VLSI_IEEE_11 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 12000/- TOPIC : A Reconfigurable LDPC Decoder Optimized for 802.11n/ac Applications Abstract : This paper presents a high data-rate low-density parity-check (LDPC) decoder, suitable for the 802.11n/ac (WiFi) standard. The innovative features of the proposed decoder relate to the decoding algorithms and the interconnection between the processing elements. The reduction of the hardware complexity of decoders based on the min-sum (MS) algorithms comes at the cost of performance degradation, especially at high-noise regions. We introduce more accurate approximations of the logsum- product algorithm that also operate well for low signal-tonoise ratio values. Telecommunication standards, including WiFi, support more than one quasi-cyclic LDPC codes of different characteristics, such as codeword length and code rate. A proposed design technique derives networks, capable of supporting a variety of codes and
  • 4. NXFEE INNOVATION SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development) #45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4 Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735. efficiently realizing connectivity between a variable number of processing units, with a relatively small hardware overhead over the single-code case. As a demonstration of the proposed technique, we implemented a reconfigurable network based on barrel rotators, suitable for LDPC decoders compatible with WiFi standard. Our approach achieves low complexity and high clock frequency, compared with related prior works. A 90-nm application-specified integrated circuit implementation of the proposed high- parallel WiFi decoder occupies 4.88 mm2 and achieves an information throughput rate of 4.5 Gbit/s at a clock frequency of 555 MHz. VLSI_IEEE_13 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 10000/- TOPIC : Approximate Sum-of-Products Designs Based on Distributed Arithmetic Abstract : Approximate circuits provide high performance and require low power. Sum- of-products (SOP) units are key elements in many digital signal processing applications. In this brief, three approximate SOP (ASOP) models which are based on the distributed arithmetic are proposed. They are designed for different levels of accuracy. First model of ASOP achieves an improvement up to 64% on area and 70% on power, when compared with conventional unit. Other two models provide an improvement of 32% and 48% on area and 54% and 58% on power, respectively, with a reduced error rate compared with the first model. Third model achieves the mean relative error and normalized error distance as low as 0.05% and 0.009%, respectively. Performance of approximate units is evaluated with a noisy image smoothing application, where the proposed models are capable of achieving higher peak signalto-noise ratio than the existing state-of-the-art techniques. It is shown that the proposed approximate models achieve higher processing accuracy than existing works but with significant improvements in power and performance. VLSI_IEEE_25 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 10000/- TOPIC : Vector Processing-Aware Advanced Clock-Gating Techniques for Low-Power Fused Multiply-Add Abstract : The need for power efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a retailoring for the mobile market that they are entering now. Floating-point (FP) fused multiply-add (FMA), being a functional unit with high power consumption, deserves special attention. Although clock gating is a well-known method to reduce switching power in synchronous designs, there are unexplored opportunities for its application to vector processors, especially when considering active operating mode. In this research, we comprehensively identify, propose, and evaluate the most suitable clock-gating techniques for vector FMA units (VFUs). These techniques ensure power savings without jeopardizing the timing. We evaluate the proposed techniques using both synthetic and “real-world” application-based benchmarking. Using vector masking and vector multilane-aware clock gating, we report power reductions of up to 52%, assuming active VFU operating at the peak performance. Among other findings, we observe that vector instruction-based clock-gating techniques achieve power savings for all vector FP instructions. Finally, when evaluating all techniques together, using “real-world” benchmarking, the power reductions are up to
  • 5. NXFEE INNOVATION SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development) #45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4 Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735. 80%. Additionally, in accordance with processor design trends, we perform this research in a fully parameterizable and automated fashion. VLSI_IEEE_29 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 12000/- TOPIC : A Flexible Wildcard-Pattern Matching Accelerator via Simultaneous Discrete Finite Automata Abstract : Regular expression matching becomes indispensable elements of Internet of Things network security. However, traditional ternary content addressable memory (TCAM) search engine is unable to handle patterns with wildcards, as it precisely tracks only one active state with single transition. This paper proposes a promising simultaneous pattern matching methodology for wildcard patterns by two separated engines to represent discrete finite automata. A key preprocessing to encode possible postfix pattern by a unique key ensures that follow-up patterns can accurately traverse all possible matches with limited hardware resources. This approach is practical and scalable for achieving good performance and low space consumption in network security, and it can be applicable to any regular expressions even with multi-wildcard patterns. The experimental results demonstrate that this scheme can efficiently and accurately recognize wildcard patterns by simultaneously tracking only two active states. By adopting SRAM TCAM in the proposed architecture, the energy consumption is reduced to around 39%, compared with the energy consumption using a computing system that contains a large memory lookup and comparison overhead. VLSI_IEEE_30 (BACK-END) SOFTWARE : TANNER EDA STUDENT COST MRP: RS. 8000/- TOPIC : Low-Power and Fast Full Adder by Exploring New XOR and XNOR Gates Abstract : In this paper, novel circuits for XOR/XNOR and simultaneous XOR–XNOR functions are proposed. The proposed circuits are highly optimized in terms of the power consumption and delay, which are due to low output capacitance and low short- circuit power dissipation. We also propose six new hybrid 1-bit full-adder (FA) circuits based on the novel full-swing XOR–XNOR or XOR/XNOR gates. Each of the proposed circuits has its own merits in terms of speed, power consumption, powerdelay product (PDP), driving ability, and so on. To investigate the performance of the proposed designs, extensive HSPICE and Cadence Virtuoso simulations are performed. The simulation results, based on the 65-nm CMOS process technology model, indicate that the proposed designs have superior speed and power against other FA designs. A new transistor sizing method is presented to optimize the PDP of the circuits. In the proposed method, the numerical computation particle swarm optimization algorithm is used to achieve the desired value for optimum PDP with fewer iterations. The proposed circuits are investigated in terms of variations of the supply and threshold voltages, output capacitance, input noise immunity, and the size of transistors.
  • 6. NXFEE INNOVATION SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development) #45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4 Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735. VLSI_IEEE_31 (BACK-END) SOFTWARE : TANNER EDA STUDENT COST MRP: RS. 8000/- TOPIC : A 0.9-V 12-bit 100-MS/s 14.6-fJ/Conversion-Step SAR ADC in 40-nm CMOS Abstract : This paper presents a low-power 12-bit 100-MS/s asynchronous successive approximation register analog-to-digital converter (SAR ADC). Several techniques are developed to enhance the ADC performance. The non binary capacitor array with small digital-to-analog converter (DAC) capacitors (total 394 fF) allows for reducing DAC settling time and power consumption while maintaining extremely high hardware utilization. The proposed nonlinear capacitance correction method solves the nonlinear capacitance problems of the comparator when the small unit capacitor is used. The latch output glitch removal method ensures the speed and accuracy of the comparator at the low supply voltage. Furthermore, the proposed high-speed SAR logic and timing sequence improved SAR logic’s operating speed by 75% compared with traditional SAR logic. The prototype was fabricated using a 40-nm CMOS technology. At a 0.9-V supply and 100-MS/s sampling rate, the ADC achieves a signal-to-noise distortion ratio of 67.3 dB and consumes 2.6 mW, resulting in a figure of merit of 14.6 fJ/conversion-step. The ADC core occupies an active area of only 50 × 280 µm2. VLSI_IEEE_36 (BACK-END) SOFTWARE : TANNER EDA STUDENT COST MRP: RS. 8000/- TOPIC : SRAM Circuits for True Random Number Generation Using Intrinsic Bit Instability Abstract : This paper describes a novel approach to a true random number generator (TRNG) using SRAM circuits. The principles of operation are described in the context of past work on integrated circuit TRNGs. The required modifications to standard SRAM arrays are minor and have little impact on the area. Experimental results from large 1- Mbit SRAM arrays fabricated on a 55-nm process using the foundry supplied SRAM cell layouts show good results. Simple helper functions, suitable for very small hardware implementation, allow improvement, including the ability for the resulting binary strings to pass all of the National Institute of Standards randomness tests. We describe the circuits, their principle of operation and statistical behavior, as well as the underlying physical mechanisms providing the entropy. VLSI_IEEE_44 (BACK-END) SOFTWARE : TANNER EDA STUDENT COST MRP: RS. 10000/- TOPIC : Improving Error Correction Codes for Multiple-Cell Upsets in Space Applications Abstract : Currently, faults suffered by SRAM memory systems have increased due to the aggressive CMOS integration density. Thus, the probability of occurrence of single- cell upsets (SCUs) or multiple-cell upsets (MCUs) augments. One of the main causes of MCUs in space applications is cosmic radiation. A common solution is the use of error correction codes (ECCs). Nevertheless, when using ECCs in space applications, they must achieve a good balance between error coverage and redundancy, and their encoding/decoding circuits must be efficient in terms of area, power, and delay. Different codes have been proposed to tolerate MCUs. For instance, Matrix codes use Hamming codes and parity checks in a bi-dimensional layout to correct and detect some patterns of MCUs. Recently presented, column–line–code (CLC) has been designed to tolerate MCUs in space applications. CLC is a modified Matrix code, based on extended Hamming codes and parity checks. Nevertheless, a common property of these codes is the high redundancy introduced. In this paper, we present a series of new low
  • 7. NXFEE INNOVATION SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development) #45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4 Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735. redundant ECCs able to correct MCUs with reduced area, power, and delay overheads. Also, these new codes maintain, or even improve, memory error coverage with respect to Matrix and CLC codes. HIGH SPEED DATA TRANSMISSION VLSI_IEEE_04 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 10000/- TOPIC : Approximate Error Detection With Stochastic Checkers Abstract : Designing reliable systems, while eschewing the high overheads of conventional fault tolerance techniques, is a critical challenge in the deeply scaled CMOS and post CMOS era. To address this challenge, we leverage the intrinsic resilience of application domains such as multimedia, recognition, mining, search, and analytics where acceptable outputs are produced despite occasional approximate computations. We propose stochastic checkers (checkers designed using stochastic logic) as a new approach to performing error checking in an approximate manner at greatly reduced overheads. Stochastic checkers are inherently inaccurate and require long latencies for computation. To limit the loss in error coverage, as well as false positives (correct outputs flagged as erroneous), caused due to the approximate nature of stochastic checkers, we propose input permuted partial replicas of stochastic logic, which improves their accuracy with minimal increase in overheads. To address the challenge of long error detection latency, we propose progressive checking policies that provide an early decision based on a prefix of the checker’s output bit stream. This technique is further enhanced by employing progressively accurate binary-to-stochastic converters. Across a suite of error-resilient applications, we observe that stochastic checkers lead to greatly reduced overheads (29.5% area and 21.5% power, on average) compared with traditional fault tolerance techniques while maintaining high coverage and very low false positives. VLSI_IEEE_06 (BACK-END) SOFTWARE : TANNER EDA STUDENT COST MRP: RS. 8000/- TOPIC : A 0.65-V, 500-MHz Integrated Dynamic and Static RAM for Error Tolerant Applications Abstract : The diminishing returns provided by voltage scaling have led to a recent paradigm shift toward so-called “approximate computing,” where computation accuracy is traded off for cost in error-tolerant applications. In this paper, a novel approach to achieving the power–performance–area versus data integrity tradeoff is proposed by integrating robust static memory cells and error-prone dynamic cells within a single array. In addition, the resulting integrated dynamic and static random access memory (iD-SRAM) provides the ability to trade off power consumption and accuracy on-the-fly according to the current conditions and operating mode. A 4-kB iD-SRAM array was implemented in a low-power, 65-nm CMOS technology, providing as much as an 80% power reduction and a 20% area reduction as compared with standard approaches, when applied to a video decoder at 500 MHz.
  • 8. NXFEE INNOVATION SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development) #45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4 Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735. VLSI_IEEE_07 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 12000/- TOPIC : Efficient FPGA Mapping of Pipeline SDF FFT Cores Abstract : In this paper, an efficient mapping of the pipeline single-path delay feedback (SDF) fast Fourier transform (FFT) architecture to field-programmable gate arrays (FPGAs) is proposed. By considering the architectural features of the target FPGA, significantly better implementation results are obtained. This is illustrated by mapping an R22SDF 1024-point FFT core toward both Xilinx Virtex-4 and Virtex-6 devices. The optimized FPGA mapping is explored in detail. Algorithmic transformations that allow a better mapping are proposed, resulting in implementation achievements that by far outperforms earlier published work. For Virtex-4, the results show a 350% increase in throughput per slice and 25% reduction in block RAM (BRAM) use, with the same amount of DSP48 resources, compared with the best earlier published result. The resulting Virtex-6 design sees even larger increases in throughput per slice compared with Xilinx FFT IP core, using half as many DSP48E1 blocks and less BRAM resources. The results clearly show that the FPGA mapping is crucial, not only the architecture and algorithm choices. VLSI_IEEE_08 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 10000/- TOPIC : Algorithm and Architecture Design of Adaptive Filters With Error Nonlinearities Abstract : This paper presents a framework based on the logarithmic number system to implement adaptive filters with error nonlinearities in hardware. The framework is demonstrated through pipelined implementations of two recently proposed adaptive filtering algorithms based on logarithmic cost, namely, least mean logarithmic square (LMLS) and least logarithmic absolute difference (LLAD). To the best of our knowledge, the proposed architectures are the first attempts to implement both LMLS and LLAD algorithms in hardware. We derive error computing algorithms to realize the nonlinear error functions for LMLS and LLAD and map them onto hardware. We also propose a novel variable-α scheme to enhance the original LMLS algorithm and prove its robustness and suitability for VLSI implementations in practical applications. Detailed bit width and error analysis are carried out for the proposed VLSI fixed point implementations. Post layout implementation results show that with an additional multiplier over conventional least mean square (LMS), 7-dB improvement in steady- state mean square deviation performance can be achieved and with the proposed variable-α scheme, 12-dB improvement can be achieved without compromising the convergence. We will show that LMLS can potentially replace LMS in practical applications, by demonstrating a proof-of-concept by extending the framework to transform domain adaptive filters.
  • 9. NXFEE INNOVATION SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development) #45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4 Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735. VLSI_IEEE_17 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 16000/- TOPIC : Design and FPGA Implementation of a Reconfigurable Digital Down Converter for Wideband Applications Abstract : This brief presents a field-programmable gate array-based 2 implementation of a reconfigurable digital down converter (DDC) that 3 can process input bandwidth of up to 3.6 GHz and provide a flexible 4 down-converted output. The proposed DDC consists of a mixer and 5 a re-sampling filter. The re-sampling filter can work at much higher 6 clock rate. The reason is that all the single-cycle recursive loops in the 7 re- sampling filter are pipelined by using either real/imaginary part-time 8 multiplexing or parallel processing technique. With features like arbitrary 9 sampling rate conversion, and dynamic configuration, the proposed design 10 is highly flexible, so that it can generate a down-converted output with 11 sampling rate, selectable within the range of 1 kS/s–225 MS/s. Moreover, 12 the flexibility is further improved by being able to specify the output 13 sampling rate and center frequency to a resolution of less than 1 S/s. The 14 experimental results show that the proposed design can achieve the same 15 functionality as the existing work but with fewer hardware resources. VLSI_IEEE_18 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 16000/- TOPIC : The Implementation of the Improved OMP for AIC Reconstruction Based on Parallel Index Selection Abstract : Sparse signal recovery becomes extremely challenging for a variety of real- time applications. In this paper, we improve the orthogonal matching pursuit (OMP) algorithm based on parallel correlation indices selection mechanism in each iteration and Goldschmidt algorithm. Simulation results show that the improved OMP algorithm with a reduced number of iterations and low hardware complexity of matrix operations has higher success rate and recovery signal-to-noise-ratio (RSNR) for sparse signal recovery. This paper presents an efficient complex valued system hardware architecture of the recovery algorithm for analog-to-information structure based on compressive sensing. The proposed architecture is implemented and validated on the Xilinx Virtex6 field-programmable gate array (FPGA) for signal reconstruction with N = 1024, K = 36, and M = 256. The implementation results showed that the improved OMP algorithm achieved a higher RSNR of 31.04 dB compared with the original OMP algorithm. This synthesized design consumes a few percentages of the hardware resources of the FPGA chip with the clock frequency of 135.4 MHZ and reconstruction time of 170 µs, which is faster than the existing design.
  • 10. NXFEE INNOVATION SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development) #45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4 Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735. VLSI_IEEE_20 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 12000/- TOPIC : Approximate Hybrid High Radix Encoding for Energy-Efficient Inexact Multipliers Abstract : Approximate computing forms a design alternative that exploits the intrinsic error resilience of various applications and produces energy-efficient circuits with small accuracy loss. In this paper, we propose an approximate hybrid high radix encoding for generating the partial products in signed multiplications that encodes the most significant bits with the accurate radix-4 encoding and the least significant bits with an approximate higher radix encoding. The approximations are performed by rounding the high radix values to their nearest power of two. The proposed technique can be configured to achieve the desired energy–accuracy tradeoffs. Compared with the accurate radix-4 multiplier, the proposed multipliers deliver up to 56% energy and 55% area savings, when operating at the same frequency, while the imposed error is bounded by a Gaussian distribution with near-zero average. Moreover, the proposed multipliers are compared with state-of-the-art inexact multipliers, outperforming them by up to 40% in energy consumption, for similar error values. Finally, we demonstrate the scalability of our technique. VLSI_IEEE_23 (BACK-END) SOFTWARE : TANNER EDA STUDENT COST MRP: RS. 8000/- TOPIC : Low Phase Noise Ku-Band VCO With Optimal Switched-Capacitor Bank Design Abstract : In this brief, a low phase noise Ku-band voltage-controlled oscillator (VCO) fabricated in a 130-nm BiCMOS process is presented. The phase noise mechanism of the switched-capacitor bank is analyzed, an optimum bank design to reduce phase noise is proposed, and a tradeoff with tuning range is discussed. The prototype 12.2–13.1-GHz VCO achieves a measured phase noise of −120.6 dBc/Hz at 1-MHz offset when running at 12.67 GHz. The VCO core consumes a power of 17.7 mW and attains a figure of merit of 190. VLSI_IEEE_24 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 10000/- TOPIC : A High-Accuracy Programmable Pulse Generator With a 10-ps Timing Resolution Abstract : Automatic test equipment must have high-precision and low-power pulse generators (PGs) for testing memory and device-under-test ICs. This paper describes a high-accuracy and wide-data-rate-range PG with a 10-ps time resolution. The PG comprises an edge combiner (EC) and a multiphase clock generator (MPCG). The EC can produce an arbitrary waveform through 32 phase outputs of the MPCG. The EC adopts a one/zero detector and phase selection logic to define an operational data rate range and a timing resolution, respectively. Therefore, the EC uses the phase selection logic to combine the period window of the one/zero detector with the MPCG output phases. The EC also uses a countdown counter for a wide operational range. In the MPCG, a multiphase oscillator (MPO) adopts a ring oscillator scheme with sub feedback loops to extend its maximum operational frequency. The MPO also uses a phase error corrector to reduce the output phase error resulting from process and layout mismatches. Thus, the PG can obtain high accuracy waveforms owing to small phase errors. The test chip was implemented using a 0.13-µm CMOS process. The core area and power consumption of the PG were measured to be 250 × 300 µm2 and 18.7 mW, respectively. The data rate range of the PG was determined to be from 3.2 kHz to 893 MHz. The time
  • 11. NXFEE INNOVATION SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development) #45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4 Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735. resolution and average accuracy of the PG were measured to be 10 ps and ±0.3 LSB, respectively. VLSI_IEEE_32 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 20000/- TOPIC : A Variable-Size FFT Hardware Accelerator Based on Matrix Transposition Abstract : Fast Fourier transform (FFT) is the kernel and the most time-consuming algorithm in the domain of digital signal processing, and the FFT sizes of different applications are very different. Therefore, this paper proposes a variable-size FFT hardware accelerator, which fully supports the IEEE-754 single-precision floating-point standard and the FFT calculation with a wide size range from 2 to 220 points. First, a parallel Cooley–Tukey FFT algorithm based on matrix transposition (MT) is proposed, which can efficiently divide a large size FFT into several small size FFTs that can be executed in parallel. Second, guided by this algorithm, the FFT hardware accelerator is designed, and several FFT performance optimization techniques such as hybrid twiddle factor generation, multibank data memory, block MT, and token-based task scheduling are proposed. Third, its VLSI implementation is detailed, showing that it can work at 1 GHz with the area of 2.4 mm2 and the power consumption of 91.3 mW at 25 ◦C, 0.9 V. Finally, several experiments are carried out to evaluate the proposal’s performance in terms of FFT execution time, resource utilization, and power consumption. Comparative experiments show that our FFT hardware accelerator achieves at most 18.89× speedups in comparison to two software-only solutions and two hardware dedicated solutions. VLSI_IEEE_33 (BACK-END) SOFTWARE : TANNER EDA STUDENT COST MRP: RS. 8000/- TOPIC : A 12-bit 40-MS/s SAR ADC With a Fast-Binary-Window DAC Switching Scheme Abstract : This paper presents a 12-bit 40-MS/s successive approximation register analog-to-digital converter (ADC) for ultrasound imaging systems. By incorporating a fast binary window digital-to-analog converter (DAC) switching technique, the problematic most significant bit transition glitch was removed to improve linearity without increasing the input capacitance or using a calibration scheme. A hybrid DAC was also developed to overcome the yield problem that occurs when a tiny unit capacitance is used in the DAC. Moreover, a reference buffer was used to accelerate the DAC settling to achieve high speed conversion. The prototype ADC was fabricated using a 130-nm CMOS technology. The ADC core occupied an active area of 0.1 mm2 and consumed a total power of 1.32 mW when a 1.2-V supply was used at a conversion rate of 40 MS/s. The measured peak signal-to-noise-and-distortion ratio and spuriousfree dynamic range were 64 and 77.5 dB, respectively. The peak effective number of bits was 10.33, which is equivalent to a Walden figure-of-merit of 25.6 fJ/conversion step.
  • 12. NXFEE INNOVATION SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development) #45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4 Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735. VLSI_IEEE_34 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 10000/- TOPIC : Combating Data Leakage Trojans in Commercial and ASIC Applications With Time-Division Multiplexing and Random Encoding Abstract : Globalization of microchip fabrication opens the possibility for an attacker to insert hardware Trojans into a chip during the manufacturing process. While most defensive methods focus on detection or prevention, a recent method, called Randomized Encoding of Combinational Logic for Resistance to Data Leakage (RECORD), uses data randomization to prevent hardware Trojans from leaking meaningful information even when the entire design is known to the attacker. Both RECORD and its sequential variant require significant area and power overhead. In this paper, a Time- Division Multiplexed version of the RECORD design process is proposed which reduces area overhead by 63% and power by 56%. This time-division multiplexing (TDM) concept is further refined to allow commercial off the shelf (COTS) products and IP cores to be safely operated from a separate chip. These new methods tradeoff latency (5.3× for TDM and 3.9× for COTS) and energy use to accomplish area and power savings and achieve greater security than the original RECORD process. VLSI_IEEE_35 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 10000/- TOPIC : A 3.2-GHz Supply Noise-Insensitive PLL Using a Gate-Voltage-Boosted Source- Follower Regulator and Residual Noise Cancellation Abstract : In this brief, we propose a supply noise-insensitive charge pump phase- locked loop (PLL) using a source-follower (SF) regulator and noise cancellation. In order to minimize the voltage drop of the SF regulator while improving supply rejection, a gate-voltage-boosting technique and the body-controlled noise cancellation are proposed. To suppress the phase noise from the ring oscillator, a reference multiplier is employed to maximize the PLL loop bandwidth. Implemented in 65-nm CMOS, a prototype PLL at 3.2 GHz achieves supply noise spur of less than −33 dBc for a 50-mVpp supply noise around the loop bandwidth while consuming 3.12 mW from a 1-V supply. VLSI_IEEE_37 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 12000/- TOPIC : Low-Complexity VLSI Design of Large Integer Multipliers for Fully Homomorphic Encryption Abstract : Large integer multiplication has been widely used in fully homomorphic encryption (FHE). Implementing feasible large integer multiplication hardware is thus critical for accelerating the FHE evaluation process. In this paper, a novel and efficient operand reduction scheme is proposed to reduce the area requirement of radix-r butterfly units. We also extend the singleport, merged-bank memory structure to the design of number theoretic transform (NTT) and inverse NTT (INTT) for further area minimization. In addition, an efficient memory addressing scheme is developed to support both NTT/INTT and resolving carries computations. Experimental results reveal that significant area reductions can be achieved for the targeted 786 432- and 1 179 648-bit NTT-based multipliers designed using the proposed schemes in comparison with the related works. Moreover, the two multiplications can be accomplished in 0.196 and 2.21 ms, respectively, based on 90-nm CMOS technology. The low-complexity feature of the proposed large integer multiplier designs is thus obtained without sacrificing the
  • 13. NXFEE INNOVATION SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development) #45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4 Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735. time performance. VLSI_IEEE_38 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 10000/- TOPIC : Algorithm and VLSI Architecture Design of Proportionate-Type LMS Adaptive Filters for Sparse System Identification Abstract : Proportionate-type normalized LMS (Pt-NLMS) family of adaptive filtering algorithms for sparse system identification pose significant implementation challenges due to their high computational complexity especially for real-time applications like network echo cancelation. In this paper, we make the first attempt to implement Pt- NLMS algorithms in hardware. Several reformulations are proposed to simplify the original Pt-NLMS algorithms, thereby making them amenable to realtime VLSI implementations and the reformulated algorithms referred as delayed µ-law proportionate LMS (DMPLMS) algorithm for white input and delayed wavelet MPLMS (DWMPLMS) for colored input are then implemented in hardware. Simulation studies demonstrate that the performance loss is very small for the proposed reformulations. We implemented the proposed designs considering 16-bit fixed point representation in hardware, and synthesis results show that the DMPLMS architecture with ≈30% increase in hardware over the state-of-the-art conventional delayed LMS architecture achieves 3× improvement in convergence rate for white input and the DWMPLMS architecture with ≈70% increase in hardware achieves 10× improvement in convergence rate for correlated input conditions. VLSI_IEEE_41 (BACK-END) SOFTWARE : TANNER EDA STUDENT COST MRP: RS. 10000/- TOPIC : A Fast-Locking, Low-Jitter Pulse width Control Loop for High-Speed ADC Abstract : A fast-locking, high-precision, and low-jitter pulse width control loop (PWCL) for high-speed high-resolution analog-to-digital converter is presented. Only through controlling the delay of rising edge to adjust duty cycle, the clock jitter could be suppressed greatly. An improved charge pump with a follower circuit and self-biased loop was designed to decrease the voltage ripples for higher accuracy and lower jitter. A startup circuit was adopted to enable the pulse width control loop lock rapidly. With the SMIC 0.18 µm 3.3 V CMOS process, the simulation and measured results show that within 180 ns the PWCL can lock the clock duty cycles for the accuracy of 50 ± 1% with 10%∼90% input duty cycle from 50 to 550 MHz. The rms-jitter is 73 fs at 250 MHz. The active area is about 0.023 mm2.
  • 14. NXFEE INNOVATION SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development) #45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4 Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735. AREA EFFICIENT/ TIMING & DELAY REDUCTION VLSI_IEEE_03 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 10000/- TOPIC : A Residue-to-Binary Converter for the Extended Four-Moduli Set {2n − 1, 2n + 1, 22n + 1, 22n+p} Abstract : This brief presents a residue-to-binary converter for the moduli set {2n − 1, 2n + 1, 22n + 1, 22n+ p}, where n is a positive integer and 0 ≤ p ≤ n − 2. The converter consists of three simplified 4n-bit carry-save adders (CSAs) along with a modulo (24n −1) adder. The main contribution of this brief is reducing the requirements of the proposed CSA network, which has impacted the area, delay, power and energy. Compared with four-moduli and five-moduli sets that have the dynamic range 2v(24n −1), where v = n or 2n, the proposed converter resulted in the average area, delay, power, and energy reductions of 22.7%, 9.2%, 17.8%, and 24.5%, respectively. Moreover, the throughput rate per unit area has been improved by an average of 48.7%. VLSI_IEEE_12 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 10000/- TOPIC : An Efficient Fault-Tolerance Design for Integer Parallel Matrix–Vector Multiplications Abstract : Parallel matrix processing is a typical operation in many systems, and in particular matrix–vector multiplication (MVM) is one of the most common operations in the modern digital signal processing and digital communication systems. This paper proposes a fault tolerant design for integer parallel MVMs. The scheme combines ideas from error correction codes with the self-checking capability of MVM. Field- programmable gate array evaluation shows that the proposed scheme can significantly reduce the overheads compared to the protection of each MVM on its own. Therefore, the proposed technique can be used to reduce the cost of providing fault tolerance in practical implementations. VLSI_IEEE_15 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 10000/- TOPIC : Extending 3-bit Burst Error-Correction Codes With Quadruple Adjacent Error Correction Abstract : The use of error-correction codes (ECCs) with advanced correction capability is a common system-level strategy to harden the memory against multiple bit upsets (MBUs). Therefore, the construction of ECCs with advanced error correction and low redundancy has become an important problem, especially for adjacent ECCs. Existing codes for mitigating MBUs mainly focus on the correction of up to 3-bit burst errors. As the technology scales and cell interval distance decrease, the number of affected bits can easily extend to more than 3 bit. The previous methods are therefore not enough to satisfy the reliability requirement of the applications in harsh environments. In this paper, a technique to extend 3-bit burst error-correction (BEC) codes with quadruple adjacent error correction (QAEC) is presented. First, the design rules are specified and then a searching algorithm is developed to find the codes that comply with those rules. The H matrices of the 3-bit BEC with QAEC obtained are presented. They do not require additional parity check bits compared with a 3-bit BEC code. By applying the new algorithm to previous 3-bit BEC codes, the performance of 3-bit BEC is also remarkably improved. The encoding and decoding procedure of the proposed codes is illustrated
  • 15. NXFEE INNOVATION SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development) #45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4 Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735. with an example. Then, the encoders and decoders are implemented using a 65-nm library and the results show that our codes have moderate total area and delay overhead to achieve the correction ability extension. VLSI_IEEE_19 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 10000/- TOPIC : A 588-Gb/s LDPC Decoder Based on Finite-Alphabet Message Passing Abstract : An ultrahigh throughput low-density paritycheck (LDPC) decoder with an unrolled full-parallel architecture is proposed, which achieves the highest decoding throughput compared to previously reported LDPC decoders in the literature. The decoder benefits from a serial message-transfer approach between the decoding stages to alleviate the well-known routing congestion problem in parallel LDPC decoders. Furthermore, a finite-alphabet message passing algorithm is employed to replace the VN update rule of the standard min-sum (MS) decoder with lookup tables, which are designed in a way that maximizes the mutual information between decoding messages. The proposed algorithm results in an architecture with reduced bit-width messages, leading to a significantly higher decoding throughput and to a lower area compared to an MS decoder when serial message transfer is used. The architecture is placed and routed for the standard MS reference decoder and for the proposed finite-alphabet decoder using a custom pseudo hierarchical backend design strategy to further alleviate routing congestions and to handle the large design. Post layout results show that the finite-alphabet decoder with the serial message transfer architecture achieves a throughput as large as 588 Gb/s with an area of 16.2 mm2 and dissipates an average power of 22.7 pJ per decoded bit in a 28-nm fully depleted silicon on isulator library. Compared to the reference MS decoder, this corresponds to 3.1 times smaller area and 2 times better energy efficiency. VLSI_IEEE_21 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 10000/- TOPIC : Basic-Set Trellis Min–Max Decoder Architecture for Non binary LDPC Codes With High-Order Galois Fields Abstract : Non binary low-density parity-check (NB-LDPC) codes outperform their binary counterparts in terms of error correction performance. However, the drawback of NB-LDPC decoders is high complexity, especially for the check node unit (CNU), and the complexity increases considerably when increasing the Galois-field (GF) order. In this paper, a novel basic-set trellis min–max algorithm is proposed to greatly reduce not only the CNU complexity but also the number of messages exchanged between the check node and the variable node compared with previous studies, which is highly efficient for higher order GFs. In addition, the proposed CNU is designed to compute the messages in a parallel way. Layered decoder architectures based on the proposed algorithm were implemented for the (837, 726) NB-LDPC code over GF(32) and the (1512, 1323) code over GF(64) using 90-nm CMOS technology, and obtained a reduction in the complexity by 30% and 37% for the CNU, and 40% and 37.4% for the whole decoder, respectively. Moreover, the proposed decoder achieves a higher throughput at 1.67 Gbit/s and 1.4 Gbit/s compared with the other state-of-the-art high-rate NB-LDPC decoders with high- order GFs.
  • 16. NXFEE INNOVATION SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development) #45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4 Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735. VLSI_IEEE_22 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 12000/- TOPIC : Analysis and Design of Cost-Effective, High-Throughput LDPC Decoders Abstract : This paper introduces a new approach to cost effective, high-throughput hardware designs for low-density parity-check (LDPC) decoders. The proposed approach, called non surjective finite alphabet iterative decoders (NS-FAIDs), exploits the robustness of message-passing LDPC decoders to inaccuracies in the calculation of exchanged messages, and it is shown to provide a unified framework for several designs previously proposed in the literature. NS-FAIDs are optimized by density evolution for regular and irregular LDPC codes, and are shown to provide different tradeoffs between hardware complexity and decoding performance. Two hardware architectures targeting high-throughput applications are also proposed, integrating both Min-Sum (MS) and NS- FAID decoding kernels. ASIC post synthesis implementation results on 65-nm CMOS technology show that NS-FAIDs yield significant improvements in the throughput to area ratio, by up to 58.75% with respect to the MS decoder, with even better or only slightly degraded error correction performance. VLSI_IEEE_26 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 12000/- TOPIC : ULV-Turbo Cache for an Instantaneous Performance Boost on Asymmetric Architectures Abstract : An asymmetric architecture is commonly used in modern embedded systems to reduce energy consumption. The systems tend to execute more applications in the energy-efficient core, which typically employs ultralow voltage (ULV) to save energy. However, caches become a reliability and performance barrier that limits the minimum operating voltage and blocks system performance in the ULV environment. The poor performance of an ultralow-voltage core causes most workload requirements to awaken and then execute on the host core, leading to high energy consumption. In this paper, we propose a ULV-Turbo cache based on a ULV-selective-ally 8T static random access memory (SRAM) that is able to perform reliable ultralow-voltage operation and provide the speedup function of SRAM rows ally. The system is able to speed up the ULV core instantaneously and execute more applications with the ULV-Turbo cache. In our system-wide evaluation based on a real attitude and heading reference system workload on an asymmetric wearable system, the ULV-Turbo cache reduces the energy consumption of the entire system by approximately 36%.
  • 17. NXFEE INNOVATION SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development) #45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4 Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735. VLSI_IEEE_27 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 12000/- TOPIC : Low-Complexity Methodology for Complex Square-Root Computation Abstract : In this brief, we propose a low-complexity methodology to compute a complex square root using only a circular coordinate rotation digital computer (CORDIC) as opposed to the state-of-the-art techniques that need both circular as well as hyperbolic CORDICs. Subsequently, an architecture has been designed based on the proposed methodology and implemented on the ASIC platform using the UMC 180-nm Technology node with 1.0 V at 5MHz. Field programmable gate array (FPGA) prototyping using Xilinx’ Virtex-6 (XC6v1x240t) has also been carried out. After thorough theoretical analysis and experimental validations, it can be inferred that the proposed methodology reduces 21.15% slice look up tables (on FPGA platform) and saves 20.25% silicon area overhead and decreases 19% power consumption (on ASIC platform) when compared with the state-of-the-art method without compromising the computational speed, throughput, and accuracy. VLSI_IEEE_28 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 10000/- TOPIC : Securing the PRESENT Block Cipher Against Combined Side-Channel Analysis and Fault Attacks Abstract : In this paper, we present and evaluate a hardware implementation of the PRESENT block cipher secured against both side-channel analysis and fault attacks (FAs). The side-channel security is provided by the first-order threshold implementation masking scheme of the serialized PRESENT proposed by Poschmann et al. For the FA resistance, we employ the Private Circuits II countermeasure presented by Ishai et al. at Eurocrypt 2006, which we tailor to resist arbitrary 1-bit faults. We perform a side- channel evaluation using the state-of-the-art leakage detection tests, quantify the resource overhead of the Private Circuits II countermeasure, subdue the implementation to established differential FAs against the PRESENT block cipher, and contemplate on the structural resistance of the countermeasure. This paper provides the detailed instructions on how to successfully achieve a secure Private Circuits II implementation for the data path as well as the control logic. VLSI_IEEE_39 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 10000/- TOPIC : Multilevel Half-Rate Phase Detector for Clock and Data Recovery Circuits Abstract : In this brief, a half-rate (HR) bang-bang (BB) phase detector (PD) with multiple decision levels is proposed for clock and data recovery (CDR) circuits. The combination allows the oscillator to run at half the input data rate while providing information about the sign and magnitude of the phase shift between the PD inputs. This allows a finer control of the frequency of the oscillator in the phase-locked loop (PLL) of the CDR circuit, which results in up to 30% less output clock jitter than with a conventional two- levels HR BB PD. Thanks to this, the bit error rate can be decreased by up to 5× in a 5- Gb/s CDR circuit. The proposed topology was implemented in a 28-nm FDSOI CMOS technology providing average power consumption below 76 µW with a supply voltage of 1 V. Although multilevel (ML) BB PDs have already been proposed in some PLL-based CDR with very interesting results, a specific design of the PD has to be implemented for an HR system. This brief provides the first ML-HR-BBPD.
  • 18. NXFEE INNOVATION SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development) #45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4 Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735. VLSI_IEEE_40 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 12000/- TOPIC : Fast Neural Network Training on FPGA Using Quasi-Newton Optimization Method Abstract : In this brief, a customized and pipelined hardware implementation of the quasi-Newton (QN) method on field-programmable gate array (FPGA) is proposed for fast artificial neural networks onsite training, targeting at the embedded applications. The architecture is scalable to cope with different neural network sizes while it supports batch-mode training. Experimental results demonstrate the superior performance and power efficiency of the proposed implementation over CPU, graphics processing unit, and FPGA QN implementations. VLSI_IEEE_42 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 10000/- TOPIC : Feedback-Based Low-Power Soft-Error-Tolerant Design for Dual-Modular Redundancy Abstract : Triple-modular redundancy (TMR), which consists of three identical modules and a voting circuit, is a common architecture for soft-error tolerance. However, the original TMR suffers from two major drawbacks: the large area overhead and the vulnerability of the voter. In order to overcome these drawbacks, we propose a new complementary dual-modular redundancy (CDMR) scheme for mitigating the effect of soft errors. Inspired by the Markov random field (MRF) theory, a two-stage voting system is implemented in CDMR, including a first stage optimal MRF structure and a second-stage high-performance merging unit. The CDMR scheme can reduce the voting circuit area by 20% while saving the area of one redundant module, achieving at least 26% error-rate reduction at an ultralow supply voltage of 0.25 V with 8.33% faster timing compared to previous voter designs. VLSI_IEEE_43 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 10000/- TOPIC : A Simple Yet Efficient Accuracy Configurable Adder Design Abstract : Approximate computing is a promising approach for low-power IC design and has recently received considerable research attention. To accommodate dynamic levels of approximation, a few accuracy-configurable adder (ACA) designs have been developed in the past. However, these designs tend to incur large area overheads as they rely on either redundant computing or complicated carry prediction. Some of these designs include error detection and correction circuitry, which further increase the area. In this paper, we investigate a simple ACA design that contains no redundancy or error detection/correction circuitry and uses very simple carry prediction. The simulation results show that our design dominates the latest previous work on accuracy-delay- power tradeoff while using 39% lower area. In the best case, the iso-delay power of our design is only 16% of accurate adder regardless of degradation in accuracy. One variant of this design provides finer-grained and larger tunability than that of the previous works. Moreover, we propose a delay adaptive self-configuration technique to further improve the accuracy-delay-power tradeoff. The advantages of our method are confirmed by the applications in multiplication and discrete cosine transform computing.
  • 19. NXFEE INNOVATION SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development) #45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4 Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735. Audio, Image and Video Processing VLSI_IEEE_14 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 20000/- TOPIC : An Energy-Efficient Programmable Many core Accelerator for Personalized Biomedical Applications Abstract : Wearable personalized health monitoring systems can offer a cost-effective solution for human health care. These systems must constantly monitor patients’ physiological signals and provide highly accurate, and quick processing and delivery of the vast amount of data within a limited power and area footprint. These personalized biomedical applications require sampling and processing multiple streams of physiological signals with a varying number of channels and sampling rates. The processing typically consists of feature extraction, data fusion, and classification stages that require a large number of digital signal processing (DSP) and machine learning (ML) kernels. In response to these requirements, in this paper, a tiny, energy efficient, and domain-specific many core accelerator referred to as power-efficient nano clusters (PENC) is proposed to map and execute the kernels of these applications. Simulation results show that the PENC is able to reduce energy consumption by up to 80% and 25% for DSP and ML kernels, respectively, when optimally parallelized. In addition, we fully implemented three compute-intensive personalized biomedical applications, namely, multichannel seizure detection, multi physiological stress detection, and standalone tongue drive system (sTDS), to evaluate the proposed many core performance relative to commodity embedded CPU, graphical processing unit (GPU), and field programmable gate array (FPGA)-based implementations. For these three case studies, the energy consumption and the performance of the proposed PENC many core, when acting as an accelerator along with an Intel Atom processor as a host, are compared with the existing commercial off-the-shelf general purpose, customizable, and programmable embedded platforms, including Intel Atom, Xilinx Artix-7 FPGA, and NVIDIA TK1 advanced RISC machine -A15 and K1 GPU system on a chip. For these applications, the PENC many core is able to significantly improve throughput and energy efficiency by up to 1872× and 276×, respectively. For the most computational intensive application of seizure detection, the PENC many core is able to achieve a throughput of 15.22 giga-operations- per-second (GOPs), which is a 14× improvement in throughput over custom FPGA solution. For stress detection, the PENC achieves a throughput of 21.36 GOPs and an energy efficiency of 4.23 GOP/J, which is 14.87× and 2.28× better over FPGA implementation, respectively. For the sTDS application, the PENC improves a through put by 5.45× and an energy efficiency by 2.37× over FPGA implementation.
  • 20. NXFEE INNOVATION SEMICONDUCTOR IP & PRODUCT DEVELOPMENT COMPANY NXFEE Innovation (Semiconductor IP & VLSI IEEE Transaction & Product Development) #45, Vivekananda street, Dhevan kandappa Mudaliarnagar, Nainarmandapam, Pondicherry-4 Web: www.nxfee.com Email: nxfee.innovation@gmail.com Ph: +91 9789443203, +91 9677783735. VLSI_IEEE_16 (FRONT-END) SOFTWARE: MODELSIM & XILINX STUDENT COST MRP: RS. 18000/- TOPIC : VLSI Design of an ML-Based Power-Efficient Motion Estimation Controller for Intelligent Mobile Systems Abstract : In this paper, a machine learning (ML)-based power-efficient motion estimation (ME) controller algorithm and VLSI architecture incorporating coding bandwidth and rate distortion (R-D) cost using convex optimization are proposed to effectuate a smart and bandwidth-efficient ME design for intelligent mobile systems. To be smart and adapt to time altering coding bandwidth using intelligent power- management techniques in modern application processor systems, we first propose an ML-based bandwidth-on-demand ME controller algorithm based on the convex optimization method to resolve the lack of an awareness of coding bandwidth in prior ME designs. Then, a hardware-friendly and power-efficient VLSI architecture is developed to implement an intelligent, high-performance, and low-power ME controller design that can be combined with prior ME designs to satisfy the bandwidth-efficient ME design target under bandwidth constraints. The final implementation results show that the proposed smart ME controller architecture using our proposed bandwidth control scheme costs 0.816K gate counts, consumes 0.873 mW of power at a working frequency of 1.1 GHz with Taiwan Semiconductor Manufacture Company (TSMC) 90-nm CMOS technology, and achieves an average bandwidth reduction of 56.08% compared with previous non-band width on-demand ME designs for high-definition (HD) videos.