Is There A Method Of Rule Based Matching Of Spacy To Match Patterns?
i want to use rule based matching i have a text like each word with POS:  text1= 'it_PRON is_AUX a_DET beautiful_ADJ  apple_NOUN'   text2= 'it_PRON is_AUX a_DET beautiful_ADJ and_C
Solution 1:
I understand it that you have texts = ["it is a beautiful apple", "it is a beautiful and big apple"], and plan to define a couple of Matcher patterns to extract certain POS patterns in the texts you have.
You may define a list of lists with desired patterns, and pass as the third+ argument to matcher.add:
from spacy.matcher import Matcher,PhraseMatcher
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab,validate=True)
patterns = [
    [{'POS': 'ADJ'}, {'POS': 'NOUN'}],
    [{'POS': 'ADJ'}, {'POS': 'CCONJ'}, {'POS': 'ADJ'}, {'POS': 'NOUN'}],
    [{'POS': 'ADJ'}, {'POS': 'PUNCT'}, {'POS': 'ADJ'}, {'POS': 'NOUN'}]
]
matcher.add("process_1", None, *patterns)
texts= ["it is a beautiful apple", "it is a beautiful and big apple"]
for text in texts:
    doc = nlp(text)
    matches = matcher(doc)
    for _, start, end in matches:
        print(doc[start:end].text)
   
# => beautiful apple#    beautiful and big apple#    big apple Solution 2:
I don't know spacy but here's a re (standard library module) solution:
import re
REGEX = re.compile(r"\w+_ADJ +(?:\w+(?:_CCONJ|_PUNCT) +\w+_ADJ +)*\w+_NOUN")
defextract(s):
    try:
        [extracted] = re.findall(REGEX, s)
    except ValueError:
        return []
    else:
        return extracted.split()
>>> extract("it_PRON is_AUX a_DET beautiful_ADJ and_CCONJ big_ADJ apple_NOUN")
['beautiful_ADJ', 'and_CCONJ', 'big_ADJ', 'apple_NOUN']
>>> extract("it_PRON is_AUX a_DET beautiful_ADJ apple_NOUN")
['beautiful_ADJ', 'apple_NOUN']
Post a Comment for "Is There A Method Of Rule Based Matching Of Spacy To Match Patterns?"