README.md 806 Bytes
Newer Older
Gabriele Modena's avatar
Gabriele Modena committed
1
2
# wmf-cassandra-imagematching
A Docker Compose configuration for testing/developing Cassandra ingestion of IMA data.
Gmodena's avatar
Gmodena committed
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

# Requirements

You will need Docker Engine and Docker Compose. On non-linux systems, you'll need to install
`coreutils`. The latter is needed to satisfy a dependency on `shuf`.

# Data preparation

Run
```
$ make data
```

The command will download the lastet available `imagerec_prod` tarball, combine wiki files into a single dataset,
and shuffles records. Output will be available under `imagerec_prod`.

# Running
```
$ docker-compose <up|down> [--build] cassandra-load-imagerec
```

24
25
Rows not imported will be stored under `ingestion_status/import_imagerec_matches.err`.

Gmodena's avatar
Gmodena committed
26
27
28
29
30
31

# Other targets
Run
`make sqlite` 

to load IMA data into a sqlite database under `imagerec_prod/matches.db`.