Skip to content
  • Aqu's avatar
    Migrate Refine from systemd to Airflow · 7357f779
    Aqu authored
    * Bump refinery source for Refine to Hive hourly DAG
    
    * Fix tests following change in app name
    
    * Cleanup config store mock
    
    * Revert change on Yarn Spark app name
    
    * Cleaner code
    
    * Rename RefineConfiguration class
    
    * Remove commented lines
    
    * Linting
    
    * Removing type test remanence
    
    * Remove deprecation warning
    
    * Remove deprecation
    
    * More pyspark-exension on HDFS
    
    * Set a static name to Refine Spark job for Datahub
    
    * Add comments about Spark conf job target
    
    * Rename xcom key var to store Refine job params
    
    * Linting
    
    * Fixes following review
    
    * Convert evolve table to SparkSubmitOperator
    
    * Rename a method
    
    * Clean up
    
    * Remove Refine to Iceberg (keep it for later)
    
    * Unclutter the test fixture
    
    * Better loggin in Refine to hive hourly
    
    * Rename a variable
    
    * Linting
    
    * Customize yarn app names in Refine
    
    * Remove useless mocker contextManager
    
    * Refine eventlogging_NavigationTiming in test cluster
    
    * Add stating warning about new Refine dag in production
    
    * Linting
    
    * Add meaningful task index to canary events dag
    
    * Fix catchup in delayed hourly timetable
    
    * Fix ordering in diff script
    
    * Add templated names for Refine tasks
    
    * Propagate email for variables to the Hive refine factory
    
    * Fix plugin initialization
    
    * Revert "Proposition to fix plugins initialization"
    
    This reverts commit f66bf1b3.
    
    * Proposition to fix plugins initialization
    
    * Add failing test about DAG serialization with DelayedHourlyTimetable
    
    * Linting
    
    * Add 2 hours delay for Refine to Hive DAG
    
    * Order by more column before diffing DFs
    
    As meta.dt may not be enough (truncated to ms).
    
    * Fix call following the jar removal from the files param
    
    * Bundle the jar into the archive
    
    To avoid putting the jar both in the --jar param and in the --files param.
    
    * Update spark logger in output_diff
    
    * Switch to output_diff script to python logger
    
    * Perform diffing on alphabetically rearranged DFs
    
    * Increase Refine driver memory size for small jobs
    
    * Update evolve table parameter
    
    * Update RefineHiveDataset module path
    
    * Fix configuration
    
    * Update RefineHiveDataset CLI params following refactoring in scala code
    
    * Linting
    
    * Add refine dag for analytics_test with a factory
    
    * mypy fix
    
    * linting fix
    
    * Fix unit tests
    
    * Force skein launch of pyspark app
    
    * Add map_index to yarn app name
    
    * Add logger to output_diff
    
    * Adjustment with refinery-source
    
    * Output mypy version in Gitlag CI
    
    * Formatting
    
    * black check
    
    * Isort
    
    * Read refine conf from ESC + Add output diff
    
    * Evolve and refine Hive & Iceberg tables
    
    * Add auto evolve Iceberg tables in Refine Iceberg DAG.
    * Add a new DAG to Refine to Hive tables
       - evolve Hive tables according to json schemas
       - Refine to Hive tables from HDFS Gobblin output
    
    Bug: T356762
    7357f779