Skip to content
  • Gmodena's avatar
    T277776 add found on wiki (#13) · 0fe8e0ee
    Gmodena authored
    * Extract a list of wikis from the note column.
    
    * Fix missing note record mock
    
    * store imagerec_prod as parquet
    
    * Add found_on column to prod dataset
    
    * Remove white spaces from found_on entries
    
    * Fix. reformat style
    
    * Add validation and EDA on found_on column
    
    * Store the output of hive locally.
    
    `hive -f` output contains some Parquet log noise,
    that is written to stdout and was redirected to
    the dataset.
    
    The export query and dataset generation logic have
    been modified to save data locally, without stdout
    redirection of the query result set.
    
    * Gracefully stop spark session before exit etl scripts.
    
    * Gracefully stop spark session before exit etl scripts.
    
    * Fix. notebook json post-merge clutter
    
    * Fix metrics notebook and merge with main.
    
    * Clear notebook output
    
    * Fix duplicated field in ddl
    
    * Add EOL to hive queries
    
    * Add termination after create ddl
    0fe8e0ee