Investigating the Methodology of papers at the 2020 International Symposium on Computer Architecture (ISCA2020)

We are gradually going through some of the papers published at ISCA2020 and reviewing the methodology used, and see if the papers themselves are repeatable scientific research. Some details on the methodology challenges found in architecture papers can be found here.

More papers will be added as we have time, as it can be a lot of work to properly look at the papers, and often the papers themselves are a bit vague about the methodology used.

Commutative Data Reordering: A New Technique to Reduce Data Movement Energy on Sparse Inference Workloads

Paper by Ben Feinberg (Sandia); Benjamin C. Heyman, Darya Mikhailenko Ryan Wong, An Ho, Engin Ipek (Rochester)
Proposes Commutative Data Reordering to reorder the transfer of weight matrix values to GPGPUS doing neural network tasks. By re-ordering can reduce the number of 1s sent, which reduces energy usage of DRAM IO by 53% over the data bus invert coding used in DDR4.
Methodology: uses heavily modified GPGPU-Sim
Methodology Questions asked of the authors
- Can you provide the configuration files and source code used by your simulator?
Author response:
- e-mail sent 29 June 2020, no reply
- e-mail sent 12 November 2020, no reply
- e-mail sent, 10 December 2020, cc-ing dean and VPR at Rochester
  
  Received response saying the faculty advisor left the University in May and left no forwarding e-mail.
  A co-author provided forwarding info.
  
  Finally on 21 December 2020 a co-author replied admitting that "Unfortunately, we do not have the time or the manpower to comment the code in detail and prepare a further documentation that would make it appropriate for release by others." They follow up by saying: "Similarly to the vast majority of authors in our community, we expect that you will be able to implement the proposed concepts in your own simulation infrastructure for comparison against other alternatives." They finally remark that "some of the authors current employers are acutely sensitive to potential violations of export control law."
  
  In any case, it sounds as if their research is not reproducible and the paper should be withdrawn.

Tailored Page Sizes

Paper by Faruk Guvenilir (UT Austin/Microsoft); Yale Patt (UT Austin)
Proposes tailored page sizes, where any power-of-two sized page can be used for an application. Claims can remove 98% of page walk accesses and 97% of L1 TLB misses on SPEC17.
Uses Pin-based OS allocator and VM simulator. Modeled TLB hierarchy and MMU caches
Background is great, Results section is just a huge confusing mess. don't say which graphs generated by which simulator or perf counters. Also throw around ColT and RMM without describing them at all.
Methodology Questions asked of the authors:
- Why did you use Linux 3.10, a kernel released in 2013?
- Can you provide the source to your Pin-based simulator?
- Can you provide the configuration used with Zsim?
- Did you take any actions to avoid Measurement bias?
Author response:
- e-mail sent 29 June 2020: no reply
- e-mail sent 12 November 2020: no reply
- e-mail sent, 10 December 2020, cc-ing dean and VPR at UT: no reply

BabelFish: Fusing Address Translations for Containers

Paper by Skarlatos, Darbaz, Gopireddy, Kim, and Torrellas, all at the University of Illinois Urbana-Champaign
One author (Torrellas) is Chair of IEEE TCCA, the organization that co-sponsors ISCA
Supported by NSF Grants CNS 17-63658, CNS 17-05047, and CCF 16-29431.
Paper proposes BabelFish which is a way to share TLB and page tables across multiple Docker containers on a server.
Paper claims can reduce execution time from 10-55%
Methodology is a complex simulator setup using Simics, SST, Cacti. Benchmarks involve arbitrary fastforwarding and truncated run times. Mix of existing and custom benchmarks.
Methodology Questions asked of the authors:
- The x86-64 system be modeled is a 2GHz x86-64 OoO system with dual issue. Is this supposed to model some current or future Intel system?
- Your cache configuration has an 32kB L1 cache with 2-cycle access time. This is very short, has there ever been a 32kB cache on an x86 processor with such a short access time?
- Can you provide the configuration files used by your simulators?
- Can you provide the source code for the custom C++ benchmarks you use (parse, hash, and marshal)?
- Your benchmarks are run by fastforwarding and truncating the run time. Were the ff and truncate amounts chosen by a statistical method, such as Simpoint, or were they arbitrarily chosen?
Author response:
- e-mail sent 29 June 2020: no reply
- e-mail sent 12 November 2020: no reply
- e-mail sent, 10 December 2020, cc-ing dean and VPR at UIUC: no reply

Efficiently Supporting Dynamic Task Parallelism on Heterogeneous Cache-Coherent Systems

Paper by Moyang Wang, Tuan Ta, Lin Cheng, Christopher Batten. All at Cornell University
Supported by DARPA and Intel
Proposes "direct task stealing" using user-level interrupts to bypass the memory system on big-LITTLE systems. Find either a 7x or 1.4x speedup, or maybe 21% speedup.
Methodology is a simulator using RISCV gem5. While hoping for large system, simulator limitations mean they only simulate a 64-core system.
Cilk and Ligra benchmarks, hand ported, use "moderate" input sizes to benchmarks. Only run for a few 100M instructions Use x86 benchmark results but RISC-V simulator?
Methodology Questions asked of the authors:
- Are you running RISC-V binaries, or are you running x86 binaries through a RISC-V simulator?
- 1-cycle cache latency on the 64kB cache seems a bit optimistic. Do you think this is possible?
- What frequency are you assuming the processor is running at?
- Can you provide the simulator config files used in your evaluation?
- Can you provide the source code for your modified cilk and ligra benchmarks?
- How did you choose the "moderate" sizes for the runs of your benchmarks? Were these statistically chosen (like SimPoint) or are they arbitrary?
Author response:
- e-mail sent 29 June 2020: no reply
- e-mail sent 12 November 2020: no reply
- e-mail sent, 10 December 2020, cc-ing dean and VPR at Cornell: no reply

Back to the ISCA methodology overview