DiffDetective - Variability-Aware Source Code Differencing
DiffDetective is an open-source Java library for variability-aware source code differencing and the analysis of version histories of software product lines. This means that DiffDetective can turn a generic differencer into a variability-aware differencer by means of a pre- or post-processing. DiffDetective is centered around formally verified data structures for variability (variation trees) and variability-aware diffs (variation diffs). These data structures are generic, and DiffDetective currently implements C preprocessor support to parse respective annotations when used to implement variability. The picture below depicts the process of variability-aware differencing.
Given two states of a C-preprocessor annotated source code file (left), for example before and after a commit, DiffDetective constructs a variability-aware diff (right) that distinguishes changes to source code from changes to variability annotations. DiffDetective can construct such a variation diff either, by first using a generic differencer, and separating the information (center path), or by first parsing both input versions to an abstract representation, a variation tree (center top and bottom), and constructing a variation diff using a tree differencing algorithm in a second step.
Additionally, DiffDetective offers a flexible framework for large-scale empirical analyses of git version histories of statically configurable software. In multiple studies, DiffDetective was successfully employed to study the commit histories of up to 44 open-source git repositories, including the Linux Kernel, GCC, Vim, Emacs, or the Godot game engine.
Setup
DiffDetective is a Java Maven library. While DiffDetective depends on some custom libraries (FeatureIDE library, Sat4j, Functjonal) these are prepackaged with DiffDetective. So all you need is Java ≥16 and Maven or alternatively Nix. In the following, we explain the setup with Java and Maven, as well as via Nix.
Cloning the Repository
Clone this repository and navigate inside it:
git clone https://github.com/VariantSync/DiffDetective
cd DiffDetective
In case you are using Nix Flakes, you may skip cloning the repository.
Building and Installing
You can build and install DiffDetective with Maven such that it can be used in your own project. Alternatively, you can use a jar which includes all necessary dependencies. Such a jar can either be built manually using Maven or using Nix.
Building and Installing With Maven
First, Maven needs to be installed. Either provide it yourself (e.g., using a system package manager or on Windows, download from their website) or, if you have Nix installed, run nix-shell
(stable Nix) or nix develop
(Nix Flakes) to provide all necessary build tools.
Next, build DiffDetective and install it on your system so that you can access it from your own projects:
mvn install
To add DiffDetective as a dependency to your own project, add the following snippet to the pom.xml of your Maven project, but make sure to pick the right version number. The current version number can be obtained by running scripts/version.sh
<dependency>
<groupId>org.variantsync</groupId>
<artifactId>DiffDetective</artifactId>
<version>2.2.0</version>
</dependency>
If you prefer to just use a jar file, you can find a jar file with all dependencies at DiffDetective/target/diffdetective-2.2.0-jar-with-dependencies.jar
(again, the version number might be different).
You can (re-)produce this jar file by either running mvn package
or mvn install
within you local clone of DiffDetective.
Disclaimer: Setup tested with maven version 3.6.3.
Building with Nix
Alternatively to manually building using Maven, Nix can be used. Both a flake.nix and a default.nix are provided. Hence, you can build DiffDetective using
nix-build # stable version
# or
nix build # Flake version
In case you are using Nix Flakes, you can skip cloning the repository as usual: nix build github:VariantSync/DiffDetective#.
Afterward, the result symlink points to the Javadoc, the DiffDetective jar and a simple script for executing a DiffDetective main class provided as argument (e.g., evaluations used in previous research, see below under ‘Publications’).
How to Get Started
For a demonstration on how to get started using the library, we have prepared a demo repository here. You may clone it as a template and example for including the library into your own projects. Additionally, there is a screencast available on YouTube, guiding you through the demo’s setup and source code:
Publications
Variability-Aware Differencing with DiffDetective (FSE 2024, ⭐ Best Demo Paper ⭐)
P. M. Bittner, A. Schultheiß, B. Moosherr, T. Kehrer, T. Thüm. Variability-Aware Differencing with DiffDetective. Demonstrations at International Conference on the Foundations of Software Engineering 2024, ACM, New York, NY, July 2024
This paper gives an overview of DiffDetective, its design, features, use-cases, and past case studies. We recommend reading this paper if you are interested in the design of DiffDetective or if you consider using it for your own projects or research. The paper is accompanied by a demo project as well as a screencast (also see How to Get Started
above).
Classifying Edits to Variability in Source Code (ESEC/FSE 2022)
P. M. Bittner, C.Tinnes, A. Schultheiß, S. Viegener, T. Kehrer, T. Thüm. Classifying Edits to Variability in Source Code. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022), ACM, New York, NY, November 2022
This was the initial work, introducing DiffDetective as a means to conduct an empirical evaluation of a classification of edits. In particular, we used DiffDetective to classify the effect of edits on the variability of the edited source code in the change histories of 44 open-source C-preprocessor-based software projects.
The classification is implemented within the org.variantsync.diffdetective.editclass package.
The empirical evaluation of the classification, including a respective main
method, is implemented in the org.variantsync.diffdetective.experiments.esecfse22 package.
The original replication package can be found on the esecfse branch or via the DOI 10.5281/zenodo.7110095. The replication is also available for the most recent version of DiffDetective with various improvements, which will likely yield to slightly different results than the initial study. The updated replication package can be found in the replication/esecfse22 subdirectory with its own README.
Views on Edits to Variational Software (SPLC 2023)
P. M. Bittner, A. Schultheiß, S. Greiner, B. Moosherr, S. Krieter, C. Tinnes, T. Kehrer, T. Thüm. Views on Edits to Variational Software. In Proceedings of the 27th ACM International Systems and Software Product Line Conference (SPLC 2023), ACM, New York, NY, August 2023
In this work, we used DiffDetective for a feasibility study of creating views on edits to C-preprocessor based software. The idea of a view is to act as a filter on relevant parts of a system. For instance, a piece of source code may be deemed relevant if it implements a certain feature. A view on an edit thus is a simplified form of an edit that, for example, contains only changes to a certain feature. From a mathematical perspective, creating such views is in fact a lifting of operations on single revisions of variational systems to operations on diffs of variational systems.
Views are implemented within the org.variantsync.diffdetective.variation.tree.view and org.variantsync.diffdetective.variation.diff.view packages for variaton trees and diffs, respectively.
The empirical evaluation of the view algorithms, including a respective main
method, is implemented in the org.variantsync.diffdetective.experiments.views package.
The original replication package can be found on the splc23-views
branch within the directory replication/splc23-views or via the DOI 10.5281/zenodo.8027920. The replication is also available for the most recent version of DiffDetective with an up-to-date version of DiffDetective, which will likely yield to slightly different results than the initial study. The updated replication package can be found in the replication/splc23-views subdirectory with it’s own README.
Explaining Edits to Variability Annotations in Evolving Software Product Lines (VaMoS 2024)
L. Güthing, P. M. Bittner, I. Schaefer, T. Thüm. Explaining Edits to Variability Annotations in Evolving Software Product Lines. In Proceedings of the 18th International Working Conference on Variability Modelling of Software-Intensive Systems (VaMoS 2024), ACM, New York, NY, February 2024
In this work, we formalized an extension of variation diffs, with a typing for edges and pair-wise relations for variability annotations (i.e., mapping nodes in variation diffs). Such edge-typed variation diffs show for example that two annotations exclude or imply each other. Such edge-typed diffs might help better explaining or analyzing edits in the future.
Edge-typed variation diffs and the replication package are implemented in a fork of DiffDetective (https://github.com/guethilu/DiffDetective). The replication package is archived under the DOI 10.5281/zenodo.10286851.
Student Theses
DiffDetective was extended and used within bachelor’s and master’s theses:
- Constructing Variation Diffs Using Tree Diffing Algorithms, Benjamin Moosherr, Bachelor’s Thesis, 2023, DOI 10.18725/OPARU-50108: Benjamin added support for tree-differencing and integrated the GumTree differencer (Github, Paper). In his thesis, Benjamin also reviewed a range of quality metrics for tree-diffs with focus on their applicability for rating variability-aware diffs. The org.variantsync.diffdetective.experiments.thesis_bm package implements the corresponding empirical study and may serve as an example on how to use the tree-differencing.
- Reverse Engineering Feature-Aware Commits From Software Product-Line Repositories, Lukas Bormann, Bachelor’s Thesis, 2023, 10.18725/OPARU-47892: Lukas implemented an algorithm for feature-based commit-untangling, which turns variation diff into a series of smaller diffs, each of which contains an edit to a single feature or feature formula. This work was later refined in our publication Views on Edits to Variational Software illustrated above.
- Inspecting the Evolution of Feature Annotations in Configurable Software, Lukas Güthing, Master’s Thesis, 2023: Lukas implemented different edge-types for associating variability annotations within variation diffs. He published his work later at VaMoS 2024 under the title Explaining Edits to Variability Annotations in Evolving Software Product Lines, illustrated above.
- Empirical Evaluation of Feature Trace Recording on the Edit History of Marlin, Sören Viegener, Bachelor’s Thesis, 2021, DOI 10.18725/OPARU-38603: In his thesis, Sören started the DiffDetective project and implemented the first version of an algorithm, which parses text-based diffs to C-preprocessor files to variation diffs. He also came up with an initial classification of edits, which we wanted to reuse to evaluate Feature Trace Recording, a method for deriving variability annotations from annotated patches.