User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
AINL 2016: Shavrina, Selegey
1. Are the results of your corpus
research really reliable?
Getting automatic result analysis on
GICR.
Tatiana Shavrina, Daniil Selegey
AINL FRUCT, SPb, 12.11.2016
2. Big Corpora Problem:
1. Billions of words, mostly coming from
social media
2. Getting just the IPM and search
results in KWIC format doesn’t tell
you if the results are biased
3. A lot of metatext attributes – URLs,
doc IDs, author IDs, region, gender,
genre etc. – all are potential source
of bias
Users need corpus tools to see all statistics of the
search area to check for homogeneity with the
whole corpus.