Investigating the Methodology of papers at the International Symposium on Computer Architecture (ISCA)

ISCA is considered by some to be the "Top" Conference in Computer Architecture. In this field conference proceeding publications are considered more important than journal articles, so the articles published at ISCA should be the best in the world. However upon reading the papers involved it's somehow puzzling how some of them made it in. Allegations of peer-review irregularities (involving issues of methodology which possibly led to the death of a student) have led us to investigate a bit the methodology found in past ISCA conferences.

The methodology is computer architectures has always been a bit strained, as due to the rapid progress in the field it is often not practical to build actual hardware. Building an actual CPU that implements your idea may take years and millions of dollars, and by the time an academic can manage to produce results the industry has possibly moved on extensively.

Methodology at ISCA has been under question for a long time, enough so that there is the Annual Workshop on Duplicating, Deconstructing, and Debunking (WDDD) dedicated to looking at problems in methodology in ISCA. However this workshop is completely toothless: despite exposing many issues with methodology not a single paper has ever been withdrawn, and as far as anyone knows it's not actually possible to have a past paper withdrawn from ISCA even if a major glaring issue is found with it.

Methodology Challenge One -- "Cycle Accurate" Simulators

To get around this academics typically use "cycle-accurate" simulators that attempt to model CPU behavior well enough that experiments can be done without constructing actual hardware. The problem with this is the simulators are of varying level of accuracy, and are rarely validated against real hardware. Often out-dated, mis-configured, or hastily-thrown-together-before-a-deadline simulators are used and any changes made are rarely released. This means validating can be a challenge as anyone attempting validation must start from scratch with a vague set of parameters, so few bother.

Some investigation has been done on the error introduced by "cycle-accurate" simulation, for example see the 2008 Workshop on Duplicating, Deconstructing, and Debunking (WDDD) (co-located at ISCA'08) paper Are Cycle Accurate Simulators a Waste of Time by Weaver and McKee. This led to a panel discussion at the conference, but didn't lead to any changes in methodology.

Methodology Challenge Two -- Slow Benchmarks

One primary problem with simulation is that it is slow. "Cycle-accurate" simulation can be thousands of times slower than running on a real machine, which means a benchmark that takes a few minutes to run might take weeks to months in simulation. Most computer architects are not willing to wait months for results, so some shortcuts are taken to speed up results. However this introduces errors, which are rarely accounted for in papers.

There has been work to use statistical methods to reduce this error, for example the Simpoint work by Calder. However generating Simpoints can be mildly complicated so most researchers do not bother with this.

By far the most common way of speeding up benchmarks is to pick an arbitrary amount of instructions to "skip" on the assumption that program startup (initialization) is not indicative of full-program behavior. Often somewhere between a million and a billion instructions (less than a second of run-time on a multi-GHz modern processor) is skipped. Then detailed simulation happens for a few billion (a second or two) of program runs. Then these results are presented in the paper as if they correspond to the results from a full run.

A look at the error introduced with various ways of shortening runs can be seen in the paper Using Dynamic Binary Instrumentation to Generate Multi-Platform Simpoints: Methodology and Accuracy by Weaver and McKee.

Methodology Challenge Three -- Simulator Configuration

Often the results from a single simulator run on a single simulator are used as the source of results for a paper. Again, this is often due to a time crunch and running a wide variety of input parameters on a wide variety of benchmarks can be time consuming, especially if your grad student is still actively hacking on the simulator just days before the deadline.

Ideally the simulator configuration would match some real-world processor, or a near-future processor if you are looking ahead. The problem is companies are secretive about their processor internals so when configuring a simulator you have to make some guesses. Which might be OK if then the simulator was validated against real hardware, but that's hard to do and rarely done. Often odd choices are made, such as unrealisticly fast caches, impossibly fast DRAM, odd branch predictor settings not seen on real computers, etc. This can mean the results found in a paper might be due to weird configuration choices rather than the actual novel idea you thought you were modeling.

There are other challenges with academic simulators, in that they are often out of date or hard to use. Sometimes extremely old simulators are used too as researchers are familiar with them; sim-alpha (simulating the DEC Alpha processor) was used in in computer architecture conferences for years, long after DEC and Alpha were both distant memories.

Methodology Challenge Four -- Measurement Bias

In the previous section we mentioned that often only a limited set of parameters is investigated when doing a study. It has been shown though that there's enough randomness in systems that very small changes can often change output, sometimes by large amounts that will overwhelm the actual experiment you were trying to perform. This is looked at in the paper Producing Wrong Data Without Doing Anything Obviously Wrong! by Mytkowicz, Diwan, Hauswirth and Sweeney. They look at Measurement Bias, where small changes in setup can lead to wrong conclusions, and they find this is common in Computer Architecture papers.

Ideally to avoid measurement bias results would be gathered on a variety of simulators with a variety of configurations, but apparently it is rare for those publishing at ISCA to have enough time to do that.

Methodology Challenge Five -- One-Cycle L1 Caches

An alarming number of simulation parameters in ISCA papers specify the use of unrealistically fast 1-cycle L1 caches, something that has not existed in real processors in decades. See this page for more on that issue.

Detailed look at papers being investigated

ISCA 2020

Back to our ISCA investigation page