Skip to content

Training a new Pipeline

Training a new coreference resolution pipeline from scratch

Dataset Preparation

First, you need an annotated dataset.

This dataset should contain:

  • The raw text files
  • The annotations minimal infos:
    • The start and end of the annotation (character indexes in the raw text)
    • The label of the annotation (type of entity)
    • The coreference chains ID

Ready to use annotated datasets can be downloaded directly from the datasets section.