Article Quality merge requestshttps://gitlab.wikimedia.org/repos/research/article-quality/-/merge_requests2022-03-17T13:01:38Zhttps://gitlab.wikimedia.org/repos/research/article-quality/-/merge_requests/1Add testing infrastructure files2022-03-17T13:01:38ZBmansurovAdd testing infrastructure filesBmansurovBmansurovhttps://gitlab.wikimedia.org/repos/research/article-quality/-/merge_requests/2Convert notebook to scripts2022-04-04T10:53:19ZBmansurovConvert notebook to scriptsIssue: #1Issue: #1BmansurovBmansurovhttps://gitlab.wikimedia.org/repos/research/article-quality/-/merge_requests/3Return quality scores as a dataframe2022-04-15T15:36:22ZBmansurovReturn quality scores as a dataframeIssue: #1Issue: #1BmansurovBmansurovhttps://gitlab.wikimedia.org/repos/research/article-quality/-/merge_requests/4Add column alias2022-04-21T16:42:22ZBmansurovAdd column aliashttps://gitlab.wikimedia.org/repos/research/article-quality/-/merge_requests/5Get rid of wmfdata2022-05-10T02:13:11ZBmansurovGet rid of wmfdatahttps://gitlab.wikimedia.org/repos/research/article-quality/-/merge_requests/6Introduce running mode2022-05-31T11:55:34ZBmansurovIntroduce running modeWe can now compute quality scores in either a production or development mode.
Closes #5We can now compute quality scores in either a production or development mode.
Closes #5BmansurovBmansurovhttps://gitlab.wikimedia.org/repos/research/article-quality/-/merge_requests/7Add CI2022-06-12T05:00:04ZBmansurovAdd CIIssue: #3Issue: #3BmansurovBmansurovhttps://gitlab.wikimedia.org/repos/research/article-quality/-/merge_requests/8Allow running the scripts multiple times without overwriting the existing data2022-06-15T15:04:45ZBmansurovAllow running the scripts multiple times without overwriting the existing dataBmansurovBmansurovhttps://gitlab.wikimedia.org/repos/research/article-quality/-/merge_requests/9Ship development.sql when installing the package2022-06-22T09:16:13ZBmansurovShip development.sql when installing the packageThis is needed for the Airflow job that creates development data.
Issue: #2This is needed for the Airflow job that creates development data.
Issue: #2https://gitlab.wikimedia.org/repos/research/article-quality/-/merge_requests/10Calculate quality scores for all revisions2022-09-06T12:41:15ZBmansurovCalculate quality scores for all revisionsCloses #6Closes #6BmansurovBmansurovhttps://gitlab.wikimedia.org/repos/research/article-quality/-/merge_requests/11Bump versions for spark32023-02-09T20:44:29ZFabian KaelinBump versions for spark3https://gitlab.wikimedia.org/repos/research/article-quality/-/merge_requests/12Article quality for historical revisions2023-03-14T17:43:24ZFabian KaelinArticle quality for historical revisionsThe changes in this MR were done to facilitate the computation of article scores for all historical revisions
- add the revision timestamp to output schemas (features and scores)
- optimizations to the SQL queries to avoid joins, avoid ...The changes in this MR were done to facilitate the computation of article scores for all historical revisions
- add the revision timestamp to output schemas (features and scores)
- optimizations to the SQL queries to avoid joins, avoid shuffling wikitext, filter out data before joins
- restructure the main application arguments to facilitate configuration
The production output tables for the article quality dag are in the [article_quality](https://hue.wikimedia.org/hue/metastore/tables/article_quality?source_type=hive) table.https://gitlab.wikimedia.org/repos/research/article-quality/-/merge_requests/13Incremental article features2023-05-04T00:54:12ZFabian KaelinIncremental article features- split article quality job into a separate feature and scores job
- optimize the query plan (avoid join/shuffle on wikitext)
- add CI job to build a conda env from a branch- split article quality job into a separate feature and scores job
- optimize the query plan (avoid join/shuffle on wikitext)
- add CI job to build a conda env from a branchhttps://gitlab.wikimedia.org/repos/research/article-quality/-/merge_requests/14Fix for writing to external hive table2023-05-04T18:54:54ZFabian KaelinFix for writing to external hive tablewrite with hive format for external tablewrite with hive format for external table