Skip to content Skip to sidebar Skip to footer

Add Multiple Entityruler With Spacy (valueerror: 'entity_ruler' Already Exists In Pipeline)

The following link shows how to add custom entity rule where the entities span more than one token. The code to do that is below: import spacy from spacy.pipeline import EntityRule

Solution 1:

You can add another custom entity ruler to your pipeline by changing its name (to avoid name collision). Here is some code to illustrate, but please read the remark below:

import spacy
from spacy.pipeline import EntityRuler
nlp = spacy.load('en_core_web_sm', disable = ['ner'])
rulerPlants = EntityRuler(nlp, overwrite_ents=True)
flowers = ["rose", "tulip", "african daisy"]
for f in flowers:
    rulerPlants.add_patterns([{"label": "flower", "pattern": f}])
animals = ["cat", "dog", "artic fox"]
rulerAnimals = EntityRuler(nlp, overwrite_ents=True)
for a in animals:
    rulerAnimals.add_patterns([{"label": "animal", "pattern": a}])

rulerPlants.name = 'rulerPlants'
rulerAnimals.name = 'rulerAnimals'
nlp.add_pipe(rulerPlants)
nlp.add_pipe(rulerAnimals)

doc = nlp("cat and artic fox, plant african daisy")
for ent in doc.ents:
    print(ent.text , '->', ent.label_)

#output:#cat -> animal#artic fox -> animal#african daisy -> flower

We can verify that the pipeline does contain both entity rulers:

print(nlp.pipe_names)
# ['tagger', 'parser', 'rulerPlants', 'rulerAnimals']

Remark: I would suggest using the simpler and more natural approach of making a new entity ruler which contains the rules of both entity rulers:

rulerAll = EntityRuler(nlp)
rulerAll.add_patterns(rulerAnimals.patterns)
rulerAll.add_patterns(rulerPlants.patterns)

Finally concerning your question about best practices for entity labels, it is a common practice to use abbreviations written with capital letters (see Spacy NER documentation) for example ORG, LOC, PERSON, etc..

Edits following questions:

1)If you do not need Spacy's default Named Entity Recognition (NER), then I would suggest disabling it as that will speedup computations and avoid interference (see discussion about this here). Disabling NER will not cause unexpected downstream results (your document just won't be tagged for the default entities LOC, ORG, PERSON etc..).

2)There is this idea in programming that "Simple is better than complex." (see here). There can be some subjectivity as to what constitutes a simpler solution. I would think that a processing pipeline with fewer components is simpler (i.e. the pipeline containing both entity rulers would seem more complex to me). However depending on your needs in terms of profiling, adjustability etc.. It might be simpler for you have several different entity rulers as described in the first part of this solution. It would be nice to get the author's of Spacy to give their view on these two different design choices.

3) Naturally, the single entity ruler above can be directly created as follows:

rulerAll = EntityRuler(nlp, overwrite_ents=True)
forfin flowers:
    rulerAll.add_patterns([{"label": "flower", "pattern": f}])
forain animals:
    rulerAll.add_patterns([{"label": "animal", "pattern": a}])

The other code above shown for constructing rulerAll is meant to illustrate how we can query an entity ruler for the list of patterns which have been added to it. In practice we would construct rulerAll directly without first constructing rulerPlant and rulerAnimal. Unless we wanted to test and profile these (rulerPlant and rulerAnimal) individually.

Post a Comment for "Add Multiple Entityruler With Spacy (valueerror: 'entity_ruler' Already Exists In Pipeline)"