Investigating the Methodology of papers at the International
Symposium on Computer Architecture
(ISCA)
ISCA is considered by some to be the "Top" Conference in Computer Architecture.
In this field conference proceeding publications are considered more important
than journal articles, so the articles published at ISCA should be the best
in the world. However upon reading the papers involved it's somehow puzzling
how some of them made it in.
Allegations of peer-review
irregularities (involving issues of methodology which possibly led to
the death of a student)
have led us to investigate a bit the methodology found in past
ISCA conferences.
The methodology is computer architectures has always been a bit strained,
as due to the rapid progress in the field it is often not practical to
build actual hardware. Building an actual CPU that implements your idea
may take years and millions of dollars, and by the time an academic can
manage to produce results the industry has possibly moved on extensively.
Methodology at ISCA has been under question for a long time, enough so
that there is the
Annual Workshop on Duplicating, Deconstructing, and Debunking (WDDD)
dedicated to looking at problems in methodology in ISCA.
However this workshop is completely toothless: despite exposing many issues
with methodology not a single paper has ever been withdrawn, and as far
as anyone knows it's not actually possible to have a past paper withdrawn
from ISCA even if a major glaring issue is found with it.
Methodology Challenge One -- "Cycle Accurate" Simulators
To get around this academics typically use "cycle-accurate" simulators that
attempt to model CPU behavior well enough that experiments can be done
without constructing actual hardware. The problem with this is
the simulators are of varying level of accuracy, and are rarely validated
against real hardware.
Often out-dated, mis-configured, or hastily-thrown-together-before-a-deadline
simulators are used and any changes made are rarely released. This means
validating can be a challenge as anyone attempting validation must start
from scratch with a vague set of parameters, so few bother.
Some investigation has been done on the error introduced by
"cycle-accurate" simulation, for example see the
2008 Workshop on Duplicating, Deconstructing, and Debunking
(WDDD) (co-located at ISCA'08) paper
Are Cycle Accurate Simulators a Waste of Time by Weaver and McKee.
This led to a panel discussion at the conference, but didn't lead to
any changes in methodology.
Methodology Challenge Two -- Slow Benchmarks
One primary problem with simulation is that it is slow.
"Cycle-accurate" simulation can be thousands of times slower than
running on a real machine, which means a benchmark that takes a few
minutes to run might take weeks to months in simulation.
Most computer architects are not willing to wait months for results,
so some shortcuts are taken to speed up results. However this
introduces errors, which are rarely accounted for in papers.
There has been work to use statistical methods to reduce this error,
for example the
Simpoint work by Calder. However generating Simpoints can be mildly
complicated so most researchers do not bother with this.
By far the most common way of speeding up benchmarks is to pick an
arbitrary amount of instructions to "skip" on the assumption that
program startup (initialization) is not indicative of full-program
behavior. Often somewhere between a million and a billion instructions
(less than a second of run-time on a multi-GHz modern processor) is skipped.
Then detailed simulation happens for a few billion (a second or two) of
program runs. Then these results are presented in the paper as if they
correspond to the results from a full run.
A look at the error introduced with various ways of shortening runs
can be seen in the paper
Using Dynamic Binary Instrumentation to Generate Multi-Platform Simpoints: Methodology and Accuracy
by Weaver and McKee.
Methodology Challenge Three -- Simulator Configuration
Often the results from a single simulator run on a single simulator
are used as the source of results for a paper. Again, this is often
due to a time crunch and running a wide variety of input parameters on
a wide variety of benchmarks can be time consuming, especially if your
grad student is still actively hacking on the simulator just days before
the deadline.
Ideally the simulator configuration would match some real-world processor,
or a near-future processor if you are looking ahead. The problem is
companies are secretive about their processor internals so when configuring
a simulator you have to make some guesses. Which might be OK if then
the simulator was validated against real hardware, but that's hard to
do and rarely done. Often odd choices are made, such as unrealisticly
fast caches, impossibly fast DRAM, odd branch predictor settings not
seen on real computers, etc. This can mean the results found in a
paper might be due to weird configuration choices rather than the
actual novel idea you thought you were modeling.
There are other challenges with academic simulators, in that they are often
out of date or hard to use. Sometimes extremely old simulators are used
too as researchers are familiar with them; sim-alpha (simulating the DEC Alpha
processor) was used in in computer architecture conferences for years,
long after DEC and Alpha were both distant memories.
Methodology Challenge Four -- Measurement Bias
In the previous section we mentioned that often only a limited set of
parameters is investigated when doing a study. It has been shown though
that there's enough randomness in systems that very small changes can
often change output, sometimes by large amounts that will overwhelm
the actual experiment you were trying to perform.
This is looked at in the paper
Producing Wrong Data Without Doing Anything Obviously Wrong!
by Mytkowicz, Diwan, Hauswirth and Sweeney.
They look at Measurement Bias, where small changes in setup can lead
to wrong conclusions, and they find this is common in Computer
Architecture papers.
Ideally to avoid measurement bias results would be gathered on a variety
of simulators with a variety of configurations, but apparently it is rare
for those publishing at ISCA to have enough time to do that.
Methodology Challenge Five -- One-Cycle L1 Caches
An alarming number of simulation parameters in ISCA papers specify
the use of unrealistically fast 1-cycle L1 caches, something that
has not existed in real processors in decades. See
this page for more on that issue.
Detailed look at papers being investigated
Back to our ISCA investigation page