Commit 458c29e7 authored by Gmodena's avatar Gmodena
Browse files

Add a sqlite make target.

This commits adds a new `sqlite` target to Makefile.
When invoked, it will load IMA data into a sqlite database.
parent c05cb8ea
......@@ -11,5 +11,8 @@ data: download
tar xvjf ${dataset_archive} -C ${dataset}
cat ${dataset}/prod* | shuf > ${dataset}/matches.tsv
rm ${dataset}/prod*
test -d ${dataset} || make data
sqlite3 < ddl/imagerec.sqlite ${dataset}/imagerec.db
rm -r ${dataset} ${dataset_archive}
......@@ -23,3 +23,9 @@ $ docker-compose <up|down> [--build] cassandra-load-imagerec
Rows not imported will be stored under `ingestion_status/import_imagerec_matches.err`.
# Other targets
`make sqlite`
to load IMA data into a sqlite database under `imagerec_prod/matches.db`.
CREATE TABLE matches(page_id INTEGER,
page_title TEXT,
image_id TEXT,
confidence_rating TEXT,
source TEXT,
dataset_id TEXT,
insertion_ts REAL,
wiki TEXT,
found_on TEXT);
UPDATE matches SET image_id = NULL WHERE image_id = '';
UPDATE matches SET confidence_rating = NULL WHERE confidence_rating = '';
UPDATE matches SET source = NULL WHERE source = '';
UPDATE matches SET found_on = NULL WHERE found_on = '';
DROP INDEX IF EXISTS matches_wiki_page_id;
CREATE INDEX matches_wiki_page_id ON matches(wiki, page_id);
.mode ascii
.separator "\t" "\n"
.timer on
.import imagerec_prod/matches.tsv matches
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment