Term | POS Tags | Entity Tags |
---|---|---|
VOA | NNP | B-ORG |
's | POS | O |
Mil | NNP | B-PER |
Arcega | NNP | I-PER |
reports | VBZ | O |
. | . | O |
Mr. | NNP | B-PER |
Blair | NNP | B-PER |
left | VBD | O |
for | IN | O |
Turkey | NNP | B-GEO |
Friday | NNP | B-TIM |
from | IN | I-TIM |
Brussels | NNP | I-TIM |
, | , | O |
where | WRB | O |
he | PRP | O |
was | VBD | O |
attending | VBG | O |
a | DT | O |
European | NNP | B-ORG |
union | NNP | I-ORG |
summit | NN | O |
. | . | O |
Indian | JJ | B-GPE |
Foreign | NNP | O |
secretary | NNP | O |
shyam | NNP | B-PER |
Saran | NNP | I-PER |
and | CC | O |
U.S. | NNP | B-ORG |
Undersecretary | NNP | I-ORG |
of | IN | I-ORG |
State | NNP | I-ORG |
Nicholas | NNP | I-ORG |
Burns | NNP | I-ORG |
are | VBP | O |
are | VBP | O |
leading | VBG | O |
the | DT | O |
negotiations | NNS | O |
. | . | O |
Format | IOB format |
License | CDLA-Sharing |
Domain | Natural Language Processing |
Number of Records | 1,314,115 terms |
Size | 10 MB |
Origin | University of Groningen |
Dataset Version Update | Version 2 - May 14, 2020 Version 1 - December 19, 2019 |
Dataset Coverage | The dataset contains only documents authored by Voice of America VOA, together with documents from the MASC dataset and the CIA World Factbook. |
Business Use Case |
Linguistics |
Feature | Description |
---|---|
Term | Word | POS Tags |
Indicate part-of-speech tagging of each term. For example:
This is just a show case of a few POS tags, For more details, please refer to this alphabetical list from UPenn. |
Entity Tags |
The entity tags cover 8 types of named entities: persons, locations, organizations, geo-political entities, artifacts, events, natural objects, time, as well as a tag for 'no entity'. The entity types furthermore may be tagged with either a 'B-' tag or 'I-' tag. A 'B-' tag indicates the first term of a new entity (or only term of a single-term entity), while subsequent terms in an entity will have an 'I-' tag. For example, 'New York' would be tagged as ['B-GEO', 'I-GEO'] while 'London' would be tagged as 'B-GEO'.The annotation scheme for named entities in the GMB distinguishes the following eight classes:
|