1. 31 Mar, 2021 3 commits
  2. 30 Mar, 2021 1 commit
  3. 22 Mar, 2021 2 commits
    • Clarakosi's avatar
      Implement parsing of “instance of” fields in ImageMatching production datasets (#9) · 7712d9f4
      Clarakosi authored
      * Update transform.py to parse "instance of" json blob
      
      * Update tests and fix transform.py schema changes
      
      * Simplify parsing logic, add metrics, and update tests
      
      * Updates based on code review
      7712d9f4
    • Gmodena's avatar
      T277552 project jdata store as parquet (#10) · 292c864a
      Gmodena authored
      * Project instanceof in model output
      
      * Upload raw model output to HDFS as paruqet
      
      * Add elt to PYTHONPATH when running pytest
      
      * Copy raw data to HDFS and convert it to parquet
      
      * Update doc
      
      * Add instance of to imagerec and store content as parquet
      
      * Fix. append to PYTHONPATH
      
      * Add placeholder instanceof column in mocks
      292c864a
  4. 18 Mar, 2021 1 commit
  5. 17 Mar, 2021 1 commit
    • Clarakosi's avatar
      T275165 dataset metrics (#8) · e4163f38
      Clarakosi authored
      * Add initial dataset metrics
      
      * Update draft dataset metrics with updated datasets
      
      * Add dataset metrics script and comparison of intermediate & final data
      
      * Add initial dataset metrics
      
      * Update draft dataset metrics with updated datasets
      
      * Add dataset metrics script and comparison of intermediate & final data
      
      * Changes based on code review
      
      * Add initial dataset metrics
      
      * Update draft dataset metrics with updated datasets
      
      * Add dataset metrics script and comparison of intermediate & final data
      
      * Add initial dataset metrics
      
      * Update draft dataset metrics with updated datasets
      
      * Add dataset metrics script and comparison of intermediate & final data
      
      * Changes based on code review
      
      * Update dataset_metrics_runner
      e4163f38
  6. 16 Mar, 2021 1 commit
    • Gmodena's avatar
      T275685 generate production datasets (#7) · 05888e6a
      Gmodena authored
      * Add script to generate and export production datasets
      
      * Move hql script to ddl
      
      * Document publish.sh
      
      * Add some crude metrics reporting
      
      * Store artifacts and metrics by run identifier
      
      * Fix variable names
      
      * Adjust var names, record timestamps in metrics
      
      * Enable dynamic partitioning
      
      * Add snapshot partition to production dataset
      
      * Fix dir name
      
      * Update publish.sh doc
      
      * Make virtual env before activationg
      
      * Fix: confidence_rating to source mapping
      
      * Add export data summary
      
      * Update validation notebook with regression cases
      
      * Add test for confidence mapping
      
      * Fix. call uuid4 for default dataset_id
      
      * Fix missing coma in column list
      
      * Export NULL values as empty strings.
      
      * Genedate data for all languages
      
      * Update data export changelog
      
      * Update data export changelog: set month to March
      
      * Clean up validation notebook
      
      * Load validation data from hive
      
      * Fix character escaping
      05888e6a
  7. 04 Mar, 2021 3 commits
  8. 02 Mar, 2021 9 commits
  9. 01 Mar, 2021 2 commits
  10. 26 Feb, 2021 2 commits
  11. 24 Feb, 2021 1 commit
  12. 23 Feb, 2021 6 commits
  13. 22 Feb, 2021 3 commits
  14. 17 Feb, 2021 2 commits
  15. 16 Feb, 2021 3 commits