- 22 Mar, 2021 2 commits
-
-
Clarakosi authored
* Update transform.py to parse "instance of" json blob * Update tests and fix transform.py schema changes * Simplify parsing logic, add metrics, and update tests * Updates based on code review
-
Gmodena authored
* Project instanceof in model output * Upload raw model output to HDFS as paruqet * Add elt to PYTHONPATH when running pytest * Copy raw data to HDFS and convert it to parquet * Update doc * Add instance of to imagerec and store content as parquet * Fix. append to PYTHONPATH * Add placeholder instanceof column in mocks
-
- 18 Mar, 2021 1 commit
-
-
Clarakosi authored
-
- 17 Mar, 2021 1 commit
-
-
Clarakosi authored
* Add initial dataset metrics * Update draft dataset metrics with updated datasets * Add dataset metrics script and comparison of intermediate & final data * Add initial dataset metrics * Update draft dataset metrics with updated datasets * Add dataset metrics script and comparison of intermediate & final data * Changes based on code review * Add initial dataset metrics * Update draft dataset metrics with updated datasets * Add dataset metrics script and comparison of intermediate & final data * Add initial dataset metrics * Update draft dataset metrics with updated datasets * Add dataset metrics script and comparison of intermediate & final data * Changes based on code review * Update dataset_metrics_runner
-
- 16 Mar, 2021 1 commit
-
-
Gmodena authored
* Add script to generate and export production datasets * Move hql script to ddl * Document publish.sh * Add some crude metrics reporting * Store artifacts and metrics by run identifier * Fix variable names * Adjust var names, record timestamps in metrics * Enable dynamic partitioning * Add snapshot partition to production dataset * Fix dir name * Update publish.sh doc * Make virtual env before activationg * Fix: confidence_rating to source mapping * Add export data summary * Update validation notebook with regression cases * Add test for confidence mapping * Fix. call uuid4 for default dataset_id * Fix missing coma in column list * Export NULL values as empty strings. * Genedate data for all languages * Update data export changelog * Update data export changelog: set month to March * Clean up validation notebook * Load validation data from hive * Fix character escaping
-
- 04 Mar, 2021 3 commits
-
-
Miriam Redi authored
T275685 automate pytest
-
Miriam Redi authored
T275162 enable spark metrics collection
-
Gmodena authored
-
- 02 Mar, 2021 9 commits
- 01 Mar, 2021 2 commits
- 26 Feb, 2021 2 commits
- 24 Feb, 2021 1 commit
-
-
Gabriele Modena authored
-
- 23 Feb, 2021 6 commits
-
-
Gabriele Modena authored
-
Gmodena authored
-
Gmodena authored
-
Gmodena authored
-
Gmodena authored
-
Miriam Redi authored
T274798 include all unillustrated articles
-
- 22 Feb, 2021 3 commits
- 17 Feb, 2021 2 commits
- 16 Feb, 2021 4 commits
-
-
Gmodena authored
The semantic of the algo output has changed to include all unillustrated aritcles, which includes articles with no matching images.
-
Gmodena authored
-
Miriam Redi authored
Production data etl
-
Miriam Redi authored
-
- 15 Feb, 2021 2 commits
-
-
Miriam Redi authored
Automate generation of .tsv files
-
Gmodena authored
-
- 10 Feb, 2021 1 commit
-
-
Gmodena authored
-