Class DatasetFactory

java.lang.Object
org.variantsync.diffdetective.datasets.DatasetFactory

public class DatasetFactory extends Object
The DatasetFactory loads datasets and provides default values for DiffFilters and parse options. In particular, this class turns DatasetDescription objects into Repository objects.
Author:
Paul Bittner
  • Field Details

    • MARLIN

      public static final String MARLIN
      Name of Marlin.
      See Also:
    • LINUX

      public static final String LINUX
      Name of Linux.
      See Also:
    • PHP

      public static final String PHP
      Name of PHP.
      See Also:
    • DEFAULT_DIFF_FILTER

      public static final DiffFilter DEFAULT_DIFF_FILTER
      Default value for diff filters. It disallows merge commits, only considers patches that modified files, and only allows source files of C/C++ projects ("h", "hpp", "c", "cpp").
    • cloneDirectory

      private final Path cloneDirectory
  • Constructor Details

    • DatasetFactory

      public DatasetFactory(Path cloneDirectory)
      Creates a new DatasetFactory that will clone any loaded datasets to the given directy.
      Parameters:
      cloneDirectory - Directory to clone remote repositories to upon dataset loading.
  • Method Details

    • getDefaultDiffFilterFor

      public static DiffFilter getDefaultDiffFilterFor(String repositoryName)
      Returns the default DiffFilter for the repository with the given name. For Marlin, this applies the same DiffFilter as Stanciulescu et al. did in their ICSME paper.
      See Also:
    • getParseOptionsFor

      private static PatchDiffParseOptions getParseOptionsFor(String repositoryName)
      Returns the default parse options for the repository with the given name. For Marlin, uses the Marlin.ANNOTATION_PARSER.
    • create

      public Repository create(DatasetDescription dataset)
      Loads the repository of the given dataset description. This will laod the repository with the DiffFilter and ParseOptions provided by getDefaultDiffFilterFor(java.lang.String) and getParseOptionsFor(java.lang.String), respectively.
      Parameters:
      dataset - The dataset to load.
      Returns:
      A repository referencing the loaded dataset.
    • createAll

      public List<Repository> createAll(Collection<DatasetDescription> datasets, boolean preload, boolean pull)
      Runs create(org.variantsync.diffdetective.datasets.DatasetDescription) for all given dataset description. Optionally, may also preload the repository which means that the repository will be cloned if it is remote or unzipped if it is a zip archive. Optionally, may also run git pull on all repositories to update them.
      Parameters:
      datasets - Datasets to load.
      preload - Set to true iff the repositories should be cloned / unzipped in case they are not locally available already.
      pull - Set to true iff git pull should be run on all repositories before returning.
      Returns:
      Repository references for all dataset descriptions in the same order.
      See Also: