Skip to content

Datasets Benchmarks

NER datasets

conll2003-NER

Test Splits: 1 [231 File(s) / split] | Overall Tested Ratio: 16.58% [231/1393 Files]

Embedding Model Micro F1 Macro F1 Support
answerdotai/ModernBERT-large 89.59 88.07 5,648
google-bert/bert-base-cased 90.95 89.28 5,648
google-bert/bert-large-cased 91.51 89.81 5,648
FacebookAI/xlm-roberta-large 92.75 91.26 5,648
FacebookAI/roberta-large 93.05 91.45 5,648
google/t5-v1_1-xl 93.39 91.85 5,648
google/flan-t5-xl 93.57 92.38 5,648
google/mt5-xl 93.70 92.32 5,648

Best Embedding Model: google/mt5-xl

NER_tag precision recall f1_score support
LOC 94.93 94.30 94.62 1,668
ORG 91.72 93.32 92.51 1,661
PER 98.09 98.33 98.21 1,617
MISC 83.71 84.19 83.95 702
micro_avg 93.49 93.91 93.70 5,648
macro_avg 92.11 92.54 92.32 5,648

================================================================================

litbank

Test Splits: 10 [10 File(s) / split] | Overall Tested Ratio: 100.00% [100/100 Files]

Embedding Model Micro F1 Macro F1 Support
answerdotai/ModernBERT-large 86.09 58.90 29,103
google-bert/bert-base-cased 87.62 62.08 29,103
google-bert/bert-large-cased 87.93 64.56 29,103
google/t5-v1_1-xl 88.65 66.27 29,103
FacebookAI/xlm-roberta-large 88.70 66.69 29,103
FacebookAI/roberta-large 88.85 67.02 29,103
google/flan-t5-xl 88.96 66.24 29,103
google/mt5-xl 89.00 65.68 29,103

Best Embedding Model: google/mt5-xl

NER_tag precision recall f1_score support
PER 93.56 93.55 93.55 24,180
FAC 68.70 65.67 67.15 2,330
LOC 66.84 61.13 63.86 1,289
GPE 78.60 72.05 75.18 948
VEH 73.48 64.25 68.56 207
ORG 55.56 16.78 25.77 149
micro_avg 89.56 88.58 89.00 29,103
macro_avg 72.79 62.24 65.68 29,103

================================================================================

litbank-fr

Test Splits: 29 [1 File(s) / split] | Overall Tested Ratio: 100.00% [29/29 Files]

Embedding Model Micro F1 Macro F1 Support
almanach/moderncamembert-cv2-base 84.43 54.84 38,630
almanach/moderncamembert-base 85.31 57.20 38,630
FacebookAI/xlm-roberta-large 87.70 60.39 38,630
almanach/camembert-base 87.76 61.43 38,630
google/mt5-xl 88.16 61.62 38,630
almanach/camembert-large 88.19 62.43 38,630

Best Embedding Model: almanach/camembert-large

NER_tag precision recall f1_score support
PER 92.22 93.68 92.94 32,349
FAC 69.26 72.01 70.61 2,297
TIME 58.45 59.18 58.81 1,683
GPE 76.00 76.61 76.31 868
LOC 62.18 46.73 53.36 781
VEH 65.49 47.95 55.36 463
ORG 40.74 23.28 29.63 189
micro_avg 87.84 88.66 88.19 38,630
macro_avg 66.33 59.92 62.43 38,630

================================================================================

long-litbank-fr-PER-only

Test Splits: 32 [1 File(s) / split] | Overall Tested Ratio: 100.00% [32/32 Files]

Embedding Model Micro F1 Macro F1 Support
almanach/moderncamembert-cv2-base 91.94 91.94 71,883
almanach/moderncamembert-base 92.74 92.74 71,883
FacebookAI/xlm-roberta-large 94.45 94.45 71,883
almanach/camembert-base 94.57 94.57 71,883
almanach/camembert-large 94.74 94.74 71,883
google/mt5-xl 94.74 94.74 71,883

Best Embedding Model: google/mt5-xl

NER_tag precision recall f1_score support
PER 94.96 94.53 94.74 71,883
micro_avg 94.96 94.53 94.74 71,883
macro_avg 94.96 94.53 94.74 71,883

================================================================================

ontonotes5_english-NER

Test Splits: 1 [207 File(s) / split] | Overall Tested Ratio: 5.69% [207/3637 Files]

Embedding Model Micro F1 Macro F1 Support
answerdotai/ModernBERT-large 88.66 78.53 11,257
google-bert/bert-large-cased 89.40 79.53 11,257
google-bert/bert-base-cased 89.42 80.20 11,257
google/flan-t5-xl 90.52 81.81 11,257
FacebookAI/xlm-roberta-large 90.59 81.32 11,257
google/mt5-xl 90.66 82.30 11,257
google/t5-v1_1-xl 90.74 82.05 11,257
FacebookAI/roberta-large 90.84 81.96 11,257

Best Embedding Model: FacebookAI/roberta-large

NER_tag precision recall f1_score support
GPE 97.75 97.01 97.38 2,240
PERSON 96.45 95.62 96.03 1,988
ORG 91.82 92.59 92.21 1,795
DATE 87.25 89.21 88.22 1,603
CARDINAL 85.15 81.05 83.05 934
NORP 94.43 94.77 94.60 841
PERCENT 89.46 89.97 89.71 349
MONEY 87.42 88.54 87.97 314
TIME 66.67 63.21 64.89 212
ORDINAL 85.00 87.18 86.08 195
LOC 75.00 80.45 77.63 179
WORK_OF_ART 78.43 72.29 75.24 166
FAC 80.67 71.11 75.59 135
QUANTITY 80.18 84.76 82.41 105
PRODUCT 75.90 82.89 79.25 76
EVENT 71.23 82.54 76.47 63
LAW 70.97 55.00 61.97 40
LANGUAGE 85.71 54.55 66.67 22
micro_avg 91.01 90.73 90.84 11,257
macro_avg 83.31 81.26 81.96 11,257

================================================================================

conll2003-NER Mention Spans Detection (test set)

OntoNotes 5 - NER Mention Spans Detection (test set)

litbank-en We evaluated PROPP’s NER pipeline using multiple transformer-based embedding models. (test sets)

Coreference Resolution datasets

litbank (GOLD mentions)

Test Splits: 1 [10 File(s) / split] | Overall Tested Ratio: 10.00% [10/100 Files]

embedding_model avg_tokens MUC_f1 B3_f1 CEAFe_f1 CONLL_f1
answerdotai/ModernBERT-large 2,091 86.07 70.49 74.47 77.01
FacebookAI/xlm-roberta-large 2,091 87.12 74.06 76.15 79.11
google-bert/bert-large-cased 2,091 87.92 73.26 77.20 79.46
google/mt5-xl 2,091 88.73 77.53 77.23 81.17
FacebookAI/roberta-large 2,091 89.03 78.31 77.96 81.77
google/flan-t5-xl 2,091 89.25 78.66 78.76 82.22

litbank-fr (GOLD mentions)

Test Splits: 29 [1 File(s) / split] | Overall Tested Ratio: 100.00% [29/29 Files]

embedding_model avg_tokens MUC_f1 B3_f1 CEAFe_f1 CONLL_f1
almanach/moderncamembert-cv2-base 9,551 85.62 56.20 64.64 68.82
almanach/moderncamembert-base 9,551 87.12 59.31 67.00 71.14
almanach/camembert-base 9,551 87.95 63.31 65.66 72.31
FacebookAI/xlm-roberta-large 9,551 90.05 68.46 69.43 75.98
almanach/camembert-large 9,551 90.60 69.79 71.29 77.23

long-litbank-fr-PER-only (GOLD mentions)

Test Splits: 32 [1 File(s) / split] | Overall Tested Ratio: 100.00% [32/32 Files]

embedding_model avg_tokens MUC_f1 B3_f1 CEAFe_f1 CONLL_f1
almanach/moderncamembert-cv2-base 17,378 85.07 45.69 40.78 57.18
almanach/moderncamembert-base 17,378 85.38 47.17 42.46 58.34
almanach/camembert-base 17,378 89.05 54.92 46.14 63.37
FacebookAI/xlm-roberta-large 17,378 91.29 62.87 52.59 68.91
almanach/camembert-large 17,378 91.99 65.07 56.52 71.19
google/mt5-xl 17,378 92.86 69.54 57.58 73.32