Skip to content
  • Aleksandar Mastilovic's avatar
    c82de8e0
    Introducing versioned artifact cache · c82de8e0
    Aleksandar Mastilovic authored
    * Changes to importing configured callable functions
    
    * Import a callable function only if it's present as a string in config
    
    * Added missing return type annotations
    
    * Better mechanism to import symbols defined in YAML config
    
    * Docstrings, better logging
    
    * cache_key_fn loadable from YAML, additional refactoring
    
    * Cache key functions moved to workflow_utils.artifact.cache module
    level
    * Cache key functions renamed
    * "current" and "tstamped" cache key functions removed from
    workflow_utils.artifact.cache module
    * Removed CacheKeyUtil class
    * cache_key_fn now loadable from YAML configuration
    
    * Refactoring of ArtifactCache classes
    
    * Removed cache_key method from FsArtifactCache and replaced it with a
    function as an argument to the constructor.
    * Created utility class CacheKeyUtil for different cache key function
    implementations.
    * Removed FsVersionedArtifactCache class, replaced it with regular
    FsArtifactCache with custom cache key functions.
    
    * README.md update, docstrings updated, put method removed from FsArtifactSource
    
    * README.md updated to reflect the refactored classes
    * Missing docstrings updated for methods in source.py and cache.py
    * put(artifact) method removed from FsArtifactSource
    * MavenArtifactSource constructor now checks if base_uri is not None
    
    * Recursive cache delete
    
    * Enabled recursive cache artifact delete
    * Added a relevant unit test
    
    * Refactoring work done
    
    * Since we have no plans on supporting underlying FS libraries other
    than fsspec, abstract base classes ArtifactCache and ArtifactSource
    have been removed
    * Added URI validity check in cache and source constructors
    * FsVersionedArtifactCache refactored to accept a callable argument
    that provides the final component of the cache output path, instead of
    automatically creating "current" and tstamped paths
    * Switched to using fsspec.core.url_to_fs function to get a handle to
    the underlying filesystem
    
    * Better tests, pendulum removed, performance improvement
    
    * tests/test_artifact.py improvements: hard-coded string for file
    content put into a shared variable, better fixture naming, cleaner
    comparison of source and cached folders/files
    * Removed dependency on pendulum library, switched to simple datetime
    * In versioned cache artifact, "current" copies from "tstamped" to
    improve performance
    
    * Another linter bug
    
    * Fix linting issue with regular expression backslash
    
    * Introducing versioned artifact cache
    
    In this MR we introduce a new kind of cache - a versioned cache.
    
    This cache stores copies of artifacts in two separate directories.
    One directory is named by a timestamp at the time of caching, in
    the YYYYMMDDHHmmss format, and the other directory is named "current".
    
    So, where a normal cache would take an artifact like this:
    
    rootfs://artifact_root_dir
    |
    +-- artifact.file
    
    and place it in cache like this:
    
    cachefs://cache_root_dir
    |
    +-- artifact.file
    
    the versioned cache will produce a directory layout as follows:
    
    cachefs://cache_root_dir
    |
    +-- 20241004121212
    | |
    | +-- artifact.file
    |
    +-- current
      |
      +-- artifact.file
    
    This MR also changes the way the library uses fsspec API - instead of
    opening a stream to write to caches, the library now uses fsspec's
    `copy` method that can work on both files and directories.
    c82de8e0
    Introducing versioned artifact cache
    Aleksandar Mastilovic authored
    * Changes to importing configured callable functions
    
    * Import a callable function only if it's present as a string in config
    
    * Added missing return type annotations
    
    * Better mechanism to import symbols defined in YAML config
    
    * Docstrings, better logging
    
    * cache_key_fn loadable from YAML, additional refactoring
    
    * Cache key functions moved to workflow_utils.artifact.cache module
    level
    * Cache key functions renamed
    * "current" and "tstamped" cache key functions removed from
    workflow_utils.artifact.cache module
    * Removed CacheKeyUtil class
    * cache_key_fn now loadable from YAML configuration
    
    * Refactoring of ArtifactCache classes
    
    * Removed cache_key method from FsArtifactCache and replaced it with a
    function as an argument to the constructor.
    * Created utility class CacheKeyUtil for different cache key function
    implementations.
    * Removed FsVersionedArtifactCache class, replaced it with regular
    FsArtifactCache with custom cache key functions.
    
    * README.md update, docstrings updated, put method removed from FsArtifactSource
    
    * README.md updated to reflect the refactored classes
    * Missing docstrings updated for methods in source.py and cache.py
    * put(artifact) method removed from FsArtifactSource
    * MavenArtifactSource constructor now checks if base_uri is not None
    
    * Recursive cache delete
    
    * Enabled recursive cache artifact delete
    * Added a relevant unit test
    
    * Refactoring work done
    
    * Since we have no plans on supporting underlying FS libraries other
    than fsspec, abstract base classes ArtifactCache and ArtifactSource
    have been removed
    * Added URI validity check in cache and source constructors
    * FsVersionedArtifactCache refactored to accept a callable argument
    that provides the final component of the cache output path, instead of
    automatically creating "current" and tstamped paths
    * Switched to using fsspec.core.url_to_fs function to get a handle to
    the underlying filesystem
    
    * Better tests, pendulum removed, performance improvement
    
    * tests/test_artifact.py improvements: hard-coded string for file
    content put into a shared variable, better fixture naming, cleaner
    comparison of source and cached folders/files
    * Removed dependency on pendulum library, switched to simple datetime
    * In versioned cache artifact, "current" copies from "tstamped" to
    improve performance
    
    * Another linter bug
    
    * Fix linting issue with regular expression backslash
    
    * Introducing versioned artifact cache
    
    In this MR we introduce a new kind of cache - a versioned cache.
    
    This cache stores copies of artifacts in two separate directories.
    One directory is named by a timestamp at the time of caching, in
    the YYYYMMDDHHmmss format, and the other directory is named "current".
    
    So, where a normal cache would take an artifact like this:
    
    rootfs://artifact_root_dir
    |
    +-- artifact.file
    
    and place it in cache like this:
    
    cachefs://cache_root_dir
    |
    +-- artifact.file
    
    the versioned cache will produce a directory layout as follows:
    
    cachefs://cache_root_dir
    |
    +-- 20241004121212
    | |
    | +-- artifact.file
    |
    +-- current
      |
      +-- artifact.file
    
    This MR also changes the way the library uses fsspec API - instead of
    opening a stream to write to caches, the library now uses fsspec's
    `copy` method that can work on both files and directories.
Loading