Class DatasetFactory

java.lang.Object
org.variantsync.diffdetective.datasets.DatasetFactory

public class DatasetFactory extends Object
The DatasetFactory loads datasets and provides default values for DiffFilters and parse options. In particular, this class turns DatasetDescription objects into Repository objects.
Author:
Paul Bittner
  • Field Details

  • Constructor Details

    • DatasetFactory

      public DatasetFactory(Path cloneDirectory)
      Creates a new DatasetFactory that will clone any loaded datasets to the given directy.
      Parameters:
      cloneDirectory - Directory to clone remote repositories to upon dataset loading.
  • Method Details

    • getDefaultDiffFilterFor

      public static DiffFilter getDefaultDiffFilterFor(String repositoryName)
      Returns the default DiffFilter for the repository with the given name. For Marlin, this applies the same DiffFilter as Stanciulescu et al. did in their ICSME paper.
      See Also:
    • getParseOptionsFor

      private static PatchDiffParseOptions getParseOptionsFor(String repositoryName)
      Returns the default parse options for the repository with the given name. For Marlin, uses the Marlin.ANNOTATION_PARSER.
    • create

      public Repository create(DatasetDescription dataset)
      Loads the repository of the given dataset description. This will laod the repository with the DiffFilter and ParseOptions provided by getDefaultDiffFilterFor(java.lang.String) and getParseOptionsFor(java.lang.String), respectively.
      Parameters:
      dataset - The dataset to load.
      Returns:
      A repository referencing the loaded dataset.
    • createAll

      public List<Repository> createAll(Collection<DatasetDescription> datasets, boolean preload, boolean pull)
      Runs create(org.variantsync.diffdetective.datasets.DatasetDescription) for all given dataset description. Optionally, may also preload the repository which means that the repository will be cloned if it is remote or unzipped if it is a zip archive. Optionally, may also run git pull on all repositories to update them.
      Parameters:
      datasets - Datasets to load.
      preload - Set to true iff the repositories should be cloned / unzipped in case they are not locally available already.
      pull - Set to true iff git pull should be run on all repositories before returning.
      Returns:
      Repository references for all dataset descriptions in the same order.
      See Also: