Skip to content
  • Gmodena's avatar
    T275685 generate production datasets (#7) · 05888e6a
    Gmodena authored
    * Add script to generate and export production datasets
    
    * Move hql script to ddl
    
    * Document publish.sh
    
    * Add some crude metrics reporting
    
    * Store artifacts and metrics by run identifier
    
    * Fix variable names
    
    * Adjust var names, record timestamps in metrics
    
    * Enable dynamic partitioning
    
    * Add snapshot partition to production dataset
    
    * Fix dir name
    
    * Update publish.sh doc
    
    * Make virtual env before activationg
    
    * Fix: confidence_rating to source mapping
    
    * Add export data summary
    
    * Update validation notebook with regression cases
    
    * Add test for confidence mapping
    
    * Fix. call uuid4 for default dataset_id
    
    * Fix missing coma in column list
    
    * Export NULL values as empty strings.
    
    * Genedate data for all languages
    
    * Update data export changelog
    
    * Update data export changelog: set month to March
    
    * Clean up validation notebook
    
    * Load validation data from hive
    
    * Fix character escaping
    05888e6a