Classifying Edits to Variability in Source Code
This is the replication package for our paper Classifying Edits to Variability in Source Code accepted at the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022).
This replication package consists of four parts:
- DiffDetective: For our validation, we built DiffDetective, a java library and command-line tool to classify edits to variability in git histories of preprocessor-based software product lines.
- Appendix: The appendix of our paper is given in PDF format in the file appendix.pdf.
- Haskell Formalization: We provide an extended formalization in the Haskell programming language as described in our appendix. Its implementation can be found in the Haskell project in the proofs directory.
- Dataset Overview: We provide an overview of the 44 inspected datasets with updated links to their repositories in the file docs/datasets/all.md.
1. DiffDetective
DiffDetective is a java library and command-line tool to parse and classify edits to variability in git histories of preprocessor-based software product lines by creating variation diffs and operating on them.
We offer a Docker setup to easily replicate the validation performed in our paper. In the following, we provide a quickstart guide for running the replication. You can find detailed information on how to install Docker and build the container in the INSTALL file, including detailed descriptions of each step and troubleshooting advice.
Prerequisite
All following commands assume that working directory of your terminal is the esecfse
directory. Please switch directories, if this is not the case:
cd DiffDetective/replication/esecfse22
1.1 Build the Docker container
Start the docker deamon.
Clone this repository.
Open a terminal and navigate to the root directory of this repository.
To build the Docker container you can run the build
script corresponding to your operating system.
Windows:
.\build.bat
Linux/Mac (bash):
./build.sh
1.2 Start the replication
To execute the replication you can run the execute
script corresponding to your operating system with replication
as first argument.
Windows:
.\execute.bat replication
Linux/Mac (bash):
./execute.sh replication
WARNING! The replication will at least require an hour and might require up to a day depending on your system. Therefore, we offer a short verification (5-10 minutes) which runs DiffDetective on only four of the datasets. You can run it by providing “verification” as argument instead of “replication” (i.e.,
.\execute.bat verification
,./execute.sh verification
). If you want to stop the execution, you can call the provided script for stopping the container in a separate terminal. When restarted, the execution will continue processing by restarting at the last unfinished repository.Windows:
.\stop-execution.bat
Linux/Mac (bash):
./stop-execution.sh
You might see warnings or errors reported from SLF4J like Failed to load class "org.slf4j.impl.StaticLoggerBinder"
which you can safely ignore.
Further troubleshooting advice can be found at the bottom of the Install file.
1.3 View the results in the results directory
All raw results are stored in the results directory. The aggregated results can be found in the following files. (Note that the links below only have a target after running the replication or verification.)
- speed statistics: contains information about the total runtime, median runtime, mean runtime, and more.
- classification results: contains information about how often each class was found, and more.
Moreover, the results comprise the (LaTeX) tables that are part of our paper and appendix.
Documentation
DiffDetective is documented with javadoc. The documentation can be accessed on this website. Notable classes of our library are:
- DiffTree and DiffNode implement variation diffs from our paper. A variation diff is represented by an instance of the
DiffTree
class. It stores the root node of the diff and offers various methods to parse, traverse, and analyze variation diffs.DiffNode
s represent individual nodes within a variation diff. - EditClassValidation contains the main method for our validation.
- ProposedEditClasses holds the catalog of the nine edit classes we proposed in our paper. It implements the interface EditClassCatalogue, which allows to define custom edit classifications.
- BooleanAbstraction contains data and methods for boolean abstraction of higher-order logic formulas. We use this for macro parsing.
- GitDiffer may parse the history of a git repository to variation diffs.
- The datasets package contains various classes for describing and loading datasets.
2. Appendix
Our appendix consists of:
- An extended formalization of our concepts in the Haskell programming language. The corresponding source code is also part of this replication package (see below).
- The proofs for (a) the completeness of variation diffs to represent edits to variation trees, and (b) the completeness and unambiguity of our edit classes.
- An inspection of edit patterns from related work to show that existing patterns are either composite patterns built from our edit classes or similar to one of our edit classes. The used diffs of these patterns can also be found in docs/compositepatterns.
- The complete results of our validation for all 44 datasets.
3. Haskell Formalization
The extended formalization is a Haskell library in the proofs
subdirectory.
Since the proofs
library is its own software project, we provide a separate documentation of requirements and installation instructions within the projects subdirectory.
Requirements and instructions for setting up the build environment (Stack) are given in proofs/REQUIREMENTS.md.
How to build our library and how to run the example is described in the proofs/INSTALL.md.
4. Dataset Overview
4.1 Open-Source Repositories
We provide an overview of the used 44 open-source preprocessor-based software product lines in the docs/datasets/all.md file. As described in our paper in Section 5.1, this list contains all systems that were studied by Liebig et al., extended by four new subject systems (Busybox, Marlin, LibSSH, Godot). We provide updated links for each system’s repository.
4.2 Forked Repositories for Replication
To guarantee the exact replication of our validation, we created forks of all 44 open-source repositories at the state we performed the validation for our paper.
The forked repositories are listed in the replication datasets and are located at the Github user profile DiffDetective.
These repositories are used when running the replication as described under 1.2
and in the INSTALL.
5. Running DiffDetective on Custom Datasets
You can also run DiffDetective on other datasets by providing the path to the dataset file as first argument to the execution script:
Windows:
.\execute.bat path\to\custom\dataset.md
Linux/Mac (bash):
./execute.sh path/to/custom/dataset.md
The input file must have the same format as the other dataset files (i.e., repositories are listed in a Markdown table). You can find dataset files in the docs/datasets folder.