Class Analysis
java.lang.Object
org.variantsync.diffdetective.analysis.Analysis
Encapsulates the state and control flow during an analysis of the commit history of multiple
repositories using
VariationDiff
s. Each repository is processed sequentially but the commits
of each repository can be processed in parallel.
For thread safety, each thread receives its own instance of Analysis
. The getters
provides access to the current state of the analysis in one thread. Depending on the current
phase
only a subset of the state accessible via getters may be valid.
- Author:
- Paul Bittner, Benjamin Moosherr
- See Also:
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic interface
Hooks for analyzing commits usingVariationDiff
s.static final class
The effective runtime in seconds that we have when using multithreading.static final class
The total number of commits in the observed history of the given repository. -
Field Summary
Modifier and TypeFieldDescriptionstatic final int
Default value forcommitsToProcessPerThread
protected org.eclipse.jgit.revwalk.RevCommit
protected CommitDiff
protected PatchDiff
protected VariationDiff<DiffLinesLabel>
protected GitDiffer
static final String
File extension that is used when writing AnalysisResults to disk.protected final List<Analysis.Hooks>
protected final Path
protected Path
protected final Repository
protected final AnalysisResult
static final String
File name that is used to store the analysis results for each repository. -
Constructor Summary
ConstructorDescriptionAnalysis
(String taskName, List<Analysis.Hooks> hooks, Repository repository, Path outputDir) Constructs the state used during an analysis. -
Method Summary
Modifier and TypeMethodDescription<T extends Metadata<T>>
voidappend
(AnalysisResult.ResultKey<T> resultKey, T value) Convenience function forAnalysisResult.append(org.variantsync.diffdetective.analysis.AnalysisResult.ResultKey<T>, T)
ongetResult()
.static <T> void
exportMetadata
(Path outputDir, Metadata<T> metadata) Exports the given metadata object to a file named accordingTOTAL_RESULTS_FILE_NAME
in the given directory.static <T> void
exportMetadataToFile
(Path outputFile, Metadata<T> metadata) Exports the given metadata object to the given file.static AnalysisResult
forEachCommit
(Supplier<Analysis> analysis) Same asforEachCommit(Supplier, int, int)
.static AnalysisResult
forEachCommit
(Supplier<Analysis> analysisFactory, int commitsToProcessPerThread, int nThreads) Runs the analysis for the repository given inAnalysis(java.lang.String, java.util.List<org.variantsync.diffdetective.analysis.Analysis.Hooks>, org.variantsync.diffdetective.datasets.Repository, java.nio.file.Path)
.static void
forEachRepository
(List<Repository> repositoriesToAnalyze, Path outputDir, BiConsumer<Repository, Path> analyzeRepository) RunsanalyzeRepository
on each repository, skipping repositories where an analysis was already run.static AnalysisResult
forSingleCommit
(String commitHash, Analysis analysis) Runs the analysis for the repository given inAnalysis(java.lang.String, java.util.List<org.variantsync.diffdetective.analysis.Analysis.Hooks>, org.variantsync.diffdetective.datasets.Repository, java.nio.file.Path)
on the given commit only.static void
forSinglePatch
(String commitHash, String fileName, Analysis analysis) Runs the analysis for the repository given inAnalysis(java.lang.String, java.util.List<org.variantsync.diffdetective.analysis.Analysis.Hooks>, org.variantsync.diffdetective.datasets.Repository, java.nio.file.Path)
on the given patch only.<T extends Metadata<T>>
Tget
(AnalysisResult.ResultKey<T> resultKey) Convenience getter forAnalysisResult.get(org.variantsync.diffdetective.analysis.AnalysisResult.ResultKey<T>)
ongetResult()
.org.eclipse.jgit.revwalk.RevCommit
The currently processed commit.The currently processed commit diff.The currently processed patch.The currently processed patch.The destination for results which are written to disk.The destination for results which are written to disk and specific to the currently processed commit batch.The repository this analysis is run on.The results of the analysis.protected void
protected void
processCommitBatch
(List<org.eclipse.jgit.revwalk.RevCommit> commits) Sequential analysis of allcommits
as one batch.protected void
protected <Hook> boolean
runFilterHook
(ListIterator<Hook> hook, org.apache.commons.lang3.function.FailableBiFunction<Hook, Analysis, Boolean, Exception> callHook) protected <Hook> void
runHook
(ListIterator<Hook> hook, org.apache.commons.lang3.function.FailableBiConsumer<Hook, Analysis, Exception> callHook) protected <Hook> void
runReverseHook
(ListIterator<Hook> hook, org.apache.commons.lang3.function.FailableBiConsumer<Hook, Analysis, Exception> callHook)
-
Field Details
-
EXTENSION
File extension that is used when writing AnalysisResults to disk.- See Also:
-
TOTAL_RESULTS_FILE_NAME
File name that is used to store the analysis results for each repository.- See Also:
-
COMMITS_TO_PROCESS_PER_THREAD_DEFAULT
public static final int COMMITS_TO_PROCESS_PER_THREAD_DEFAULTDefault value forcommitsToProcessPerThread
-
hooks
-
repository
-
differ
-
currentCommit
protected org.eclipse.jgit.revwalk.RevCommit currentCommit -
currentCommitDiff
-
currentPatch
-
currentVariationDiff
-
outputDir
-
outputFile
-
result
-
-
Constructor Details
-
Analysis
Constructs the state used during an analysis.- Parameters:
taskName
- the name of the overall analysis taskhooks
- the hooks to be run for analysisrepository
- the repository to analyzeoutputDir
- the directory where all results are saved
-
-
Method Details
-
getRepository
The repository this analysis is run on. Always valid. -
getCurrentCommit
public org.eclipse.jgit.revwalk.RevCommit getCurrentCommit()The currently processed commit. Valid during the commitphase
. -
getCurrentCommitDiff
The currently processed commit diff. Valid whenAnalysis.Hooks.onParsedCommit(org.variantsync.diffdetective.analysis.Analysis)
is called until the end of the commit phase. -
getCurrentPatch
The currently processed patch. Valid during the patchphase
. -
getCurrentVariationDiff
The currently processed patch. Valid only duringAnalysis.Hooks.analyzeVariationDiff(org.variantsync.diffdetective.analysis.Analysis)
. -
getOutputDir
The destination for results which are written to disk. Always valid. -
getOutputFile
The destination for results which are written to disk and specific to the currently processed commit batch. Valid during the batchphase
. -
getResult
The results of the analysis. This may be modified by any hook and should be initialized inAnalysis.Hooks.initializeResults(org.variantsync.diffdetective.analysis.Analysis)
(e.g. by usingappend(org.variantsync.diffdetective.analysis.AnalysisResult.ResultKey<T>, T)
). Always valid. -
get
Convenience getter forAnalysisResult.get(org.variantsync.diffdetective.analysis.AnalysisResult.ResultKey<T>)
ongetResult()
. Always valid. -
append
Convenience function forAnalysisResult.append(org.variantsync.diffdetective.analysis.AnalysisResult.ResultKey<T>, T)
ongetResult()
. Always valid. -
forEachRepository
public static void forEachRepository(List<Repository> repositoriesToAnalyze, Path outputDir, BiConsumer<Repository, Path> analyzeRepository) RunsanalyzeRepository
on each repository, skipping repositories where an analysis was already run. This skipping mechanism doesn't distinguish between different analyses as it only checks for the existence ofTOTAL_RESULTS_FILE_NAME
. Delete this file to rerun the analysis.For each repository a directory in
outputDir
is passed toanalyzeRepository
where the results of the given repository should be written.- Parameters:
repositoriesToAnalyze
- the repositories for whichanalyzeRepository
is runoutputDir
- the directory where all repositories will save their resultsanalyzeRepository
- the callback which is invoked for each repository
-
forSingleCommit
Runs the analysis for the repository given inAnalysis(java.lang.String, java.util.List<org.variantsync.diffdetective.analysis.Analysis.Hooks>, org.variantsync.diffdetective.datasets.Repository, java.nio.file.Path)
on the given commit only.Analysis.Hooks
passed toAnalysis(java.lang.String, java.util.List<org.variantsync.diffdetective.analysis.Analysis.Hooks>, org.variantsync.diffdetective.datasets.Repository, java.nio.file.Path)
are the main customization point for executing different analyses.- Parameters:
commitHash
- the commit to analyze relative to its first parentanalysis
- the analysis to run
-
forSinglePatch
Runs the analysis for the repository given inAnalysis(java.lang.String, java.util.List<org.variantsync.diffdetective.analysis.Analysis.Hooks>, org.variantsync.diffdetective.datasets.Repository, java.nio.file.Path)
on the given patch only.Analysis.Hooks
passed toAnalysis(java.lang.String, java.util.List<org.variantsync.diffdetective.analysis.Analysis.Hooks>, org.variantsync.diffdetective.datasets.Repository, java.nio.file.Path)
are the main customization point for executing different analyses. The Hooks will be manipulated in that a new hook for patch filtering will be inserted as the first hook for as long as the analysis runs. This hook will be removed afterwards. It is assumed that this hook remains at the same place and is not manipulated by the user.- Parameters:
commitHash
- the commit to analyze relative to its first parentfileName
- the name of the file that was edited in the given commitanalysis
- the analysis to run
-
forEachCommit
Same asforEachCommit(Supplier, int, int)
. Defaults toCOMMITS_TO_PROCESS_PER_THREAD_DEFAULT
and a machine dependent number ofDiagnostics.getNumberOfAvailableProcessors()
. -
forEachCommit
public static AnalysisResult forEachCommit(Supplier<Analysis> analysisFactory, int commitsToProcessPerThread, int nThreads) Runs the analysis for the repository given inAnalysis(java.lang.String, java.util.List<org.variantsync.diffdetective.analysis.Analysis.Hooks>, org.variantsync.diffdetective.datasets.Repository, java.nio.file.Path)
. The repository history is processed in batches ofcommitsToProcessPerThread
onnThreads
in parallel.Analysis.Hooks
passed toAnalysis(java.lang.String, java.util.List<org.variantsync.diffdetective.analysis.Analysis.Hooks>, org.variantsync.diffdetective.datasets.Repository, java.nio.file.Path)
are the main customization point for executing different analyses. By default only the total number of commits and the total runtime with multithreading of theVariationDiff
parsing is recorded.- Parameters:
analysisFactory
- creates independent (at least thread safe) instances the analysis statecommitsToProcessPerThread
- the commit batch sizenThreads
- the number of parallel processed commit batches
-
processCommitBatch
protected void processCommitBatch(List<org.eclipse.jgit.revwalk.RevCommit> commits) throws Exception Sequential analysis of allcommits
as one batch.- Parameters:
commits
- the commit batch to be processed- Throws:
Exception
- See Also:
-
processCommit
- Throws:
Exception
-
processPatch
- Throws:
Exception
-
runHook
protected <Hook> void runHook(ListIterator<Hook> hook, org.apache.commons.lang3.function.FailableBiConsumer<Hook, Analysis, throws ExceptionException> callHook) - Throws:
Exception
-
runFilterHook
protected <Hook> boolean runFilterHook(ListIterator<Hook> hook, org.apache.commons.lang3.function.FailableBiFunction<Hook, Analysis, throws ExceptionBoolean, Exception> callHook) - Throws:
Exception
-
runReverseHook
protected <Hook> void runReverseHook(ListIterator<Hook> hook, org.apache.commons.lang3.function.FailableBiConsumer<Hook, Analysis, throws ExceptionException> callHook) - Throws:
Exception
-
exportMetadata
Exports the given metadata object to a file named accordingTOTAL_RESULTS_FILE_NAME
in the given directory.- Type Parameters:
T
- Type of the metadata.- Parameters:
outputDir
- The directory into which the metadata object file should be written.metadata
- The metadata to serialize
-
exportMetadataToFile
Exports the given metadata object to the given file. Overwrites existing files.- Type Parameters:
T
- Type of the metadata.- Parameters:
outputFile
- The file to write.metadata
- The metadata to serialize
-