Fine tuning Vs. Training Spacy NER #9233
-
I am having difficulty understanding if my model is fine tuning or training from scratch. My objective is to fine tune some NER data with the PER, GPE, ORG labels. My config is as follows:
My data had 12 sets of text that was initially in the following format after annotating it:
This was then converted into spacy format like this:
Then to fine tune I ran the following in my terminal:
This brings up several questions on my part:
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
In most cases, if a component has a
You have a factory so the NER model is being trained from scratch here. To resume training, change the contents of this block to However, note in general re-training models like this is tricky due to catastrophic forgetting. You'll typically get better performance by training from a full dataset. |
Beta Was this translation helpful? Give feedback.
-
Hello. I'm facing an issue related to the use of the
To fine tune, I run this on a Jupyter Notebook:
However, it does not work properly. This is the output message I get:
I have been searching some information about this I'm using Python 3.12.9. Also, these are the versions of the different packages I'm using in my virtual environment:
I have created my training dataset in the same way the user that opened this thread did, but the entities I have in the dataset are |
Beta Was this translation helpful? Give feedback.
In most cases, if a component has a
factory
, it's being trained from scratch. If it has asource
instead it's being loaded from a pipeline. This is a little confusing with Transformers, since even with afactory
they load a pretrained Transformer model, but it's true for most other components.You have a factory so the NER model is being trained from scratch here.
To resume training, change the contents of this block to
source = "en_core_web_trf"
, and remove the othercomponents.ner
blocks.However, note in general re-training models like this is tricky due to catastrophic forgettin…