-
Notifications
You must be signed in to change notification settings - Fork 51
ParlaMint-SI: additional metadata files for sentiment? #897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think it is possible to add more columns into But the question is whether it belongs to So, do we want to add another format?
I have no strong opinion on that (yet). |
Yes, this was exactly my thinking, i.e. we introduce one more set of tsv files, called e.g. The files should have a header row, and I'd suggest these are the columns:
|
Although most of the scripts for adding sentiment and topic to the corpora have been already made, this issue has not been addressed yet. I guess either me or @matyaskopp should make the script if we have the definitive list of columns for the files. What I did - for now - is to add the s-level sentiment score directly to CoNLL-U files, it doesn't hurt, and, in fact, with this we don't, strictly speaking even need the envisaged extra TSVs, as the info is in CoNLL-U. Right now only s-level sentiment is encoded, but I guess (for SI) u-level could be added in the same way. The format is like this:
|
I had a look, and I guess I should do it, given that I made the parlamint2meta.xsl script, and this will be similar. |
The planned new version of the ParlaMint-SI corpus will, in addition to sentence-level sentiment, also include sentiment annotations for whole utterances (i.e. speech- level sentiment).
This could be included in our metadata files (*-meta.tsv). However, since SI will be the only corpus containing this additional information, the other corpora would be missing this information in their metadata files (resulting in columns that would be empty in other 28 corpora).
Would it be possible to add new metadata files focussing on the sentiment (e.g. ID, annotated element (u or s), sentiment class and numeric value for the sentiment)? This would in turn allow easier (pre-)processing of the corpus for further analyses/research, as the sentiment would be included as metadata and would eliminate the need to extract it from TEI.ana.
The text was updated successfully, but these errors were encountered: