panoramictech.com

EUVL Modeling With Panoramic Technology

Panoramic Technology has been the leader in EUVL modeling since 1999.  Both HyperLith and EM-Suite are capabile of simulating many aspects of EUVL.

Several Important Features Make Panoramic the Most Powerful

  • Fourier Boundary Condition
  • Ultra-large FDTD simulation domains (>2 billion cells, >100GB memory)
  • Distributed Computing
  • Hardware Acceleration
  • EUV Defect Generator
  • Calibrated EUV Resist Models
  • Advanced Resist Modeling Infrastructure (ARMI) for developing new EUV resist models (including stochasitic models)
  • Experience & Support

What Panoramic Technology Offers

  • Simulation Software (EM-Suite, HyperLith, TEMPESTpr2, SimRunner/PSS/HSS, Resist, ARMI, SOAPI)
  • Consulting Services
  • Combination of Consulting and Software

 

 

We're smarter!

Cool


We're faster.

Our in-house programmers are amazing.  They can implement and deploy new features in a flash.  They know all aspects of the code inside-and-out.


We're more motivated.

We are a small, employee-owned company.  Our employees are highly motivated and success-oriented.


We're more focused.

We're not some small part of a bigger company.  We're not trying to sell machines or EDA software.  We're focused on selling simulation software for advanced lithography research - and that's it!


We're smaller and more efficient.

(a nice way to say that we're frugal!)


We have a better business model.

We feel that being a small, independent company is the way to go.  We spend less on advertising, and more on development.  We're not after a quick-buck - we prefer slow and steady growth.  We're not rapidly raising prices - we want our customers to be accustomed to our steady and reasonable prices.  We offer a better simulator at a lower price.

 

Panoramic Technology Inc. has the following positions open:

Photoresist Modeling/Applications Engineer

Requirements

  • Photoresist modeling experience required.
  • Lithography simulation experience required.
  • Candidate should have puplished papers in the field of resist simulation and modeling at conferences such as SPIE Advanced Lithography and SPIE Photomask.

Position will involve

  • developing photoresist models
  • tuning resist parameter sets
  • working with customers on photoresist modeling and litho simulation in general
  • photoresist and lithography simulation research

Benefits

  • extremely exciting small-company atmosphere
  • ability to work from your home (even if you live in Texas for example!)
  • or, working in Berkeley, CA
  • benefits:  retirement plan, health insurance
  • salary will be comptitive - with significant bonus potential

At Panoramic Technology we have been continuously improving our lithography simulation software for over ten years.  We're always at the forefront of the technology, being first to implement new features before our competitors (EUV, polarization issues in immersion, distributed computing, wafer topography etc.)

We continue to maintain our vigorous pace of development and we continue to raise the bar for lithography research simulators.

Near Term Plans

  • Advanced Resist Modeling Infrastructure (ARMI) - gives the user the ability to do advanced resist model development "in-house".
    • PanTune - a general purpose "tuner" that can be used for resist model parameter calibration.
    • User-Written Resist Models (UWRM) - user can program their own custom resist models and insert them into the EM-Suite/HyperLith simulation infrastructure - as a direct peer to the existing resist models.
  • Continued resist modeling research - we are working with several customers on tuning resist parameter sets, and solving modeling issues with EUV and DUV resists.
  • Extend SOAPI (our MATLAB(TM)/Java(TM) API) to HyperLith
  • Improve Gazillion and PanOPC integration into HyperLith
  • Application notes, examples, training-videos, documentation
  • Wafer-topography/Double-Patterning research, modeling, GUI improvements
  • Develop "quasi-rigorous" mask models for EUV

Long Term Plans

  • Maintain the lead in power and flexibility
  • Incorporate new features as the technology advances
  • Continuously improve speed, and ability to simulate larger areas
  • Larger area (but not full-chip) OPC correction that is better suited to manufacturing
  • Continuous GUI improvements - ease-of-use-without-sacrificing-power-and-flexibility always a priority
  • Source-Mask-Optimization & Inverse Algorithms

Ultimately, we plan to become the dominant lithography "research" simulator.  We feel this will happen because we deliver the most powerful, most flexible lithography simulator at a price that can not be matched by our competitors.  (see How and Why Can Panoramic Technology Offer the Best Lithography Research Simulator for Such an Amazing Low Price?)

 

 

The goal of this page is not to demonstrate the raw speed of the simulator, but rather to demonstrate the speed-up that can be obtained by using multiple cores, processors, GPU's in different ways. You can run these simulations on your own machine (they are based off the examples that ship with the software) and see how your hardware compares to the machines we've tested.

#

Simulation

Hardware

PSS/HSS configuration

PSS/HSS license requirement

Effective Cycle Time (s/cycle)*

Comment

A1

Elbow.sim, 3GB, 3D EUV with Fourier Boundary Condition, non-complex

Box #1: 2x Opteron 285, 16GB DDR400

1x 1-threaded-PSS

1/0

187

Single-core (i.e. no SimRunner)

A2

"

"

1x 4-threaded-PSS

4/0

95

4 cores give 2X speedup with multi-threading (4 cores working on one job)

A3

"

"

2x 1-threaded-PSS

2/0

104

2 cores give almost 2X speedup with job-distribution (2 cores working on two jobs independently).  This is always more efficient than multi-threading, but requires more memory.

A4

"

"

2x 2-threaded-PSS

4/0

61

combination of multi-threading and job distribution seems optimal - 4 cores giving 3X speedup - requires memory for two simulations.  Seems reasonable on AMD dual-core architecture where each processor (pair of cores) has it's own memory controller and "close" memory.

A5

"

"

1x SuperPSS

 -{4x 1-threaded PSS}

4/0

64

almost 3X speedup with 4 cores, but uses less memory than #A4.  Much faster than #A2.

A6

"

Box #1: 2x Opteron 280, 16GB DDR 400

Box #2, 2x Opteron 270, 16GB DDR 400

1x SuperPSS

-{8x 1-threaded PSS}

8/0

60

Not much faster than #A4 or #A5.  Uses less memory per machine than #A4.

A7

"

"

2x SuperPSS{2x 2-threaded-PSS}

8/0

49

The Opteron 270 machine is slower.  If both machines were opteron 285's than we would expect double the performance of #A4.

A8

"

Box #1: 2x Tesla C870

1x 2-GPU-HSS

0/2

18

Simulation fits entirely within two cards.

A9

"

Box #1: 1x Tesla C870

1x 1-GPU_HSS

0/1

29

More than 2X faster than #A4. (1 HSS license vs. 4 PSS licenses)

A10

"

Box #1: 2x Tesla C870

Box #2: 2x Tesla C870

2x 2-GPU-HSS

0/4

9

Double the performance of #A8 (running two cases at once)

A11

"

"

1x SuperPSS{2x 2-GPU-HSS}

0/4

43

Bad performance because of communication overhead for SuperPSS.

B1

AltPSM_Contacts with pitch=2.2 (9.1GB)

Box #1: 2x Opteron 285, 16GB DDR366, 2x C870

1x 1-GPU_HSS

0/1

162

The DDR 400 memory was slowed to 366MHz.

B2

Box #2: 2xIntel 5440 32GB, DDR2 667, 1X C870

1x 1-GPU_HSS

0/1

135

This machine has faster memory compared with B1.

B3

2x SuperPSS{4x 1-threaded-PSS}

8/0

288

Using all 8 cores is slower than 1x Tesla C870 on the same machine. (see B2)

B4

Box #1: 2x Opteron 285, 16GB DDR366, 2x C870

1x 2-GPU_HSS

0/2

101

Using 2 C870's compared to 1 C870 gives 101s to 162s. So, don't get a 2X speedup (as expected) – but do get a decent speed-up (162/101=1.6X speedup)

B5

Box #2: 2xIntel 5440 32GB, DDR2 667, 1X C870

1x 8-threaded PSS

8/0

326

See B3.

B6

2x SuperPSS{2x 2-threaded-PSS}

8/0

314

See B5 & B3.

B7

1x SuperPSS{1x 8-threaded-PSS, 1x 1-GPU_HSS}

8/1

287

Better to just use HSS alone. The PSS's can't help it – just slow it down. See B2.

C1

AltPSM_Contacts, pitch=0.3 (169MB)

Box #2: 2xIntel 5440 32GB, DDR2 667, 1X 8800 GT-OC

1x 1-GPU_HSS

0/1

1.57

This is just a graphics card (8800 GT-OC) with 512MB GDDR3 memory. The card was driving video during the simulation (maybe a bit faster without video)

C2

Box #2: 2xIntel 5440 32GB, DDR2 667, 1X C870

0/1

1.29

Compare to C1. The TESLA C870 beats the less expensive 8800 GT-OC even for small simulation that fits entirely with the card's memory.

C3

Box #2: 2xIntel 5440 32GB, DDR2 667

1x 1-threaded-PSS

1/0

9.90

Tesla C870 is 7.67X faster than single core of Intel 5440. 8800 GT-OC is only 6.3X faster

C4

Box #1: 2x Opteron 285, 16GB DDR366, 2x C870

1/0

9.90

Older Opteron 285 same speed as newer Intel 5440!?

C5

1x 1-GPU_HSS

0/1

1.35

Tesla C870 on Opteron 285 with 366MHz DDR is slower than Tesla C870 on Intel 5440 with DDR2 667MHz. (expected)

D1

AltPSM_Contacts, pitch=0.8 (1.2GB)

Box #1: 2x Opteron 285, 16GB DDR366, 2x C870

1x 1-GPU_HSS

0/1

9.9

Simulation fits entirely within the Tesla C870's 1.5GB memory.

D2

1x 1-threaded-PSS

1/0

143

Compare to D1. Here the Tesla C870 is 14X faster than the Opteron 285 Processor. This is the “sweet spot” for the C870 because the simulation is large, but still fits inside the card.

D3

Box #2: 2xIntel 5440 32GB, DDR2 667

1/0

83

Here we see the newer Intel 5440/DDR2 667MHz beating the older Opteron 285/DDR 366MHz (expected)

D4

Box #2: 2xIntel 5440 32GB, DDR2 667, 1x C870

1x 1-GPU_HSS

0/1

8.7

Here we we 9.5X speed-up when compared to late model Intel 5440 processor. Note, this cycle time is faster than C870 on the older Opteron machine (D1). So, host system does matter.

E1
AltPSM_Contacts, pitch=0.3 (169MB)
Box #1: 2x Opteron 285, 16GB DDR366, 2x C870 1x 1-GPU_HSS
0/1
1.35
compare with E1a
E1a
"
" (but with 2XC1060)
"
"
0.67
compare with E1 - the C1060 has 2X the processing power as the C870
E2
"
" (but with 2X C870)
1x 2-GPU_HSS
0/2
1.42
as expected no improving when using more cards on a small simulation that fits within one card (compare to E1)
E2a
"
" (but with 2X C1060)
"
"
.65
basically same as E1a
E3
"
" (but with 2X C870)
2x 1-GPU_HSS
0/2
.68
running two simulations at the same time - compare to E1
E3a
"
" (but with 2X C1060) "
"
.36
" - compare to E1a
E4
Elbow.sim, 3GB, 3D EUV with Fourier Boundary Condition, non-complex " (but with 2X C870) 2x 1-GPU_HSS
0/2
17.5
Compare with E4a
E4a
"
" (but with 2X C1060) "
"
8.25
C1060 more than 2X faster than C870 - compare with E4
E5
"
" (but with 2X C870) 1x 2-GPU_HSS
0/2
18.4
Compare with E5a
E5a
"
" (but with 2X C1060) "
"
14.8
Not so great improvement of C870 is expected because 2nd card is not utilized at all as simulation fits within the first card.  In E5, both C870's are running at same time, in here (E5a) only one card is running while the other sits idle.
E6
Elbow.sim, with 6 degree incidence (complex simulation) and pitch=76nm, 10GB, 3D EUV with Fourier Boundary Condition
" (but with 2X C870) 1x 2-GPU_HSS
0/2
92
domain divided into 7 parts - the first 6 parts run in simultaneous pairs, and the 7th part runs on one card while the other remains idle - card utilization is 7/8=87.5% (excluding CPU memory xfer overhead)
E6a
"
" (but with 2X C1060) "
"
67
domain divided into 3 parts - the first 2 parts run simultaneously, and the 3rd parts runs on one card while the other remains idle - card utilization is 3/4=75% (excluding CPU memory xfer overhead)  The reason there is not 2X speedup over E6 is because GPU utilization is lower, and CPU xfer overhead might be large - especially since box has DDR 336 (not even DDR2) and only PCI x16 generation 1 (not generation 2.0).  Probably with PCI Express x16 (gen 2) and DDR2 - 800, improvement will be closer to 2X.

*Note:  "Effective" cycle time is the total cycle time divided by the number of cases running.  For example, if you have 5 PSS's running 5 different simulations (of the same size) and each has a cycle time of 10s, then the effective cycle time would be 10s/5=2s.  A "cycle" is amount of time TEMPESTpr2 takes to propagate the fields one wavelength.