Dr0ptp4kt (7048cbc7) at 13 Feb 19:09
Introduce error investigation helper scripts
Adds a pair of scripts to assist with error investigation. One is specially tailored to the logs being emit by the CirrusSearch Saneitizer. The other is a generic pyspark helper for collecting events related to some producer error we are investigating.
Bug: T356655
Just informational, I don't know that action is required: I notice this doesn't have info
in its prop
field, but I don't see resultant fields with info
being processed in the file anyway.
Dr0ptp4kt (917e3ec6) at 13 Feb 16:31
Introduce error investigation helper scripts
... and 1 more commit
Dr0ptp4kt (a98fb6ac) at 07 Feb 19:33
Merge branch 'T349512-wdqs-sampling' into 'main'
... and 1 more commit
Related task: T349512
Main edits:
Other edits:
.gitignore
file that ignores .ipynb_checkpoints
You know what, this is the default behavior of Jupyter HTML output now that I look. I'm going to merge this thing, and we can deal its <script>
behavior in a different venue.
Maybe add a note that the HTML files pull in stuff from Cloudflare. I'm not anymore concerned about this than I am about the web at large, but maybe just good for folks to understand. Alternative would be to PDF the outputs.
Dr0ptp4kt (27856c1d) at 29 Jan 08:13
Dr0ptp4kt (27856c1d) at 26 Jan 21:25
Update some dependencies and note flaky tests
Yeah. I'm thinking after I touch up this MR and we merge, IIF we need to modify further (and y'know, it wouldn't be totally surprising if we did), then we switch over to the develop
branch, and who knows, maybe we switch to developing against upstream directly?
The CSV utility you wrote, for example, would be useful for the broader community, especially performance engineers.
It's possible. I run the VM with 2 cores and 8 GB of allocated RAM, so there could be synchronization issues. It's also possible there's some sort of timing problem with ports not freeing or something - I've also seen things where VMs fiddle with TCP/IP in ways that defy what one is accustomed to in a host OS (personal favorite is sequence numbers not being randomized in a VM when they're always randomized in a host OS; that can defy assumptions in all kinds of code, sometimes detrimentally).
Right, will remove, keeping it more on on the minimalist side.
Yeah, at least in the foreseeable future! Will remove.
Right, no other tests in here - I was trying to comment out only where there were multiple asserts within a test. Will uncomment and move the reason up to the @Ignore
.
Configure SAST in .gitlab-ci.yml
using the GitLab managed template. You can add variable overrides to customize SAST settings.
Configure SAST in .gitlab-ci.yml
using the GitLab managed template. You can add variable overrides to customize SAST settings.
Dr0ptp4kt (55cd28cc) at 23 Jan 22:59
Update some dependencies and note flaky tests
For basic check that the build works, it looks okay on stat1006
with 10 seconds of a warmup query and 10 seconds of a sample query.
dr0ptp4kt@stat1006:~/iguanaland$ bash start-iguana.sh wdqs-split-test.yml
dr0ptp4kt@stat1006:~/iguanaland$ /usr/lib/jvm/java-1.11.0-openjdk-amd64/bin/java -cp iguana-3.3.3.jar org.aksw.iguana.rp.analysis.TabularTransform -q result.nt
dr0ptp4kt@stat1006:~/iguanaland$ cut -f1,4,5,6,7,8,9,10,11,12 -d"," result.csv | cut -f1,6,7,8,9,10 -d"," | sed 's/,/\t\t/g'
endpointLabel resultSize unknownException wrongCodes qps penalizedQPS
baseline 1 0 0 46.34994206257242 46.34994206257242
baseline 212 0 0 19.952114924181963 19.952114924181963
baseline 2 0 0 20.967730662510398 20.967730662510398
scholarly_article 1 0 0 56.92908262849706 56.92908262849706
scholarly_article 0 0 0 50.76356867887813 50.76356867887813
scholarly_article 0 0 0 45.57608167233836 45.57608167233836
wikidata_main_graph 1 0 0 57.22624395547799 57.22624395547799
wikidata_main_graph 212 0 0 19.054878048780488 19.054878048780488
wikidata_main_graph 2 0 0 23.862741510829707 23.862741510829707