Investigating the Methodology of papers at the 2020 International
Symposium on Computer Architecture
(ISCA2020)
We are gradually going through some of the papers published at ISCA2020
and reviewing the methodology used, and see if the papers themselves
are repeatable scientific research. Some details on the methodology
challenges found in architecture papers can be found
here.
More papers will be added as we have time, as it can be a lot of work
to properly look at the papers, and often the papers themselves are a bit
vague about the methodology used.
Commutative Data Reordering:
A New Technique to Reduce Data Movement
Energy on Sparse Inference Workloads
- Paper by Ben Feinberg (Sandia); Benjamin C. Heyman, Darya Mikhailenko
Ryan Wong, An Ho, Engin Ipek (Rochester)
- Proposes Commutative Data Reordering to reorder the transfer of
weight matrix values to GPGPUS doing neural network tasks. By re-ordering
can reduce the number of 1s sent, which reduces energy usage of DRAM IO
by 53% over the data bus invert coding used in DDR4.
- Methodology: uses heavily modified GPGPU-Sim
- Methodology Questions asked of the authors
- Can you provide the configuration files and source code
used by your simulator?
- Author response:
- e-mail sent 29 June 2020, no reply
- e-mail sent 12 November 2020, no reply
- e-mail sent, 10 December 2020, cc-ing dean and VPR at Rochester
Received response saying the faculty advisor left the
University in May and left no forwarding e-mail.
A co-author provided forwarding info.
Finally on 21 December 2020 a co-author
replied admitting that
"Unfortunately, we do not have the time or the manpower to
comment the code in detail and prepare a further
documentation that would make it appropriate for
release by others."
They follow up by saying:
"Similarly to the vast majority of authors in our community,
we expect that you will be able to implement the proposed
concepts in your own simulation infrastructure for comparison
against other alternatives."
They finally remark that
"some of the authors current employers are acutely
sensitive to potential violations of export control law."
In any case, it sounds as if their research is not reproducible
and the paper should be withdrawn.
Tailored Page Sizes
- Paper by
Faruk Guvenilir (UT Austin/Microsoft); Yale Patt (UT Austin)
- Proposes tailored page sizes, where any power-of-two sized page can be
used for an application. Claims can remove 98% of page walk accesses
and 97% of L1 TLB misses on SPEC17.
- Uses Pin-based OS allocator and VM simulator.
Modeled TLB hierarchy and MMU caches
- Background is great, Results section is just a huge confusing mess.
don't say which graphs generated by which simulator or perf counters.
Also throw around ColT and RMM without describing them at all.
- Methodology Questions asked of the authors:
- Why did you use Linux 3.10, a kernel released in 2013?
- Can you provide the source to your Pin-based simulator?
- Can you provide the configuration used with Zsim?
- Did you take any actions to avoid Measurement bias?
- Author response:
- e-mail sent 29 June 2020: no reply
- e-mail sent 12 November 2020: no reply
- e-mail sent, 10 December 2020, cc-ing dean and VPR at UT: no reply
BabelFish: Fusing Address Translations for Containers
- Paper by Skarlatos, Darbaz, Gopireddy, Kim, and Torrellas, all
at the University of Illinois Urbana-Champaign
- One author (Torrellas) is Chair of IEEE TCCA, the organization that
co-sponsors ISCA
- Supported by NSF Grants
CNS 17-63658, CNS 17-05047, and CCF 16-29431.
- Paper proposes BabelFish which is a way to share
TLB and page tables across multiple Docker containers on a server.
- Paper claims can reduce execution time from 10-55%
- Methodology is a complex simulator setup using Simics, SST, Cacti.
Benchmarks involve arbitrary fastforwarding and truncated run times.
Mix of existing and custom benchmarks.
- Methodology Questions asked of the authors:
- Author response:
- e-mail sent 29 June 2020: no reply
- e-mail sent 12 November 2020: no reply
- e-mail sent, 10 December 2020, cc-ing dean and VPR at UIUC: no reply
Efficiently Supporting Dynamic Task Parallelism on Heterogeneous Cache-Coherent Systems
- Paper by Moyang Wang, Tuan Ta, Lin Cheng, Christopher Batten.
All at Cornell University
- Supported by DARPA and Intel
- Proposes "direct task stealing" using user-level interrupts to bypass
the memory system on big-LITTLE systems. Find either a 7x or 1.4x
speedup, or maybe 21% speedup.
- Methodology is a simulator using RISCV gem5.
While hoping for large system, simulator limitations mean they only simulate
a 64-core system.
- Cilk and Ligra benchmarks, hand ported, use "moderate" input sizes to
benchmarks. Only run for a few 100M instructions
Use x86 benchmark results but RISC-V simulator?
- Methodology Questions asked of the authors:
- Are you running RISC-V binaries, or are you running x86 binaries
through a RISC-V simulator?
-
1-cycle cache latency on the 64kB cache seems a bit optimistic.
Do you think this is possible?
- What frequency are you assuming the processor is running at?
- Can you provide the simulator config files used in your evaluation?
- Can you provide the source code for your modified cilk and ligra
benchmarks?
- How did you choose the "moderate" sizes for the runs of your benchmarks?
Were these statistically chosen (like SimPoint) or are they
arbitrary?
- Author response:
- e-mail sent 29 June 2020: no reply
- e-mail sent 12 November 2020: no reply
- e-mail sent, 10 December 2020, cc-ing dean and VPR at Cornell: no reply
Back to the ISCA methodology overview