ihk commited on
Commit
3dd32cd
1 Parent(s): 5f8a486

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -20
README.md CHANGED
@@ -1,36 +1,59 @@
1
  ---
2
  base_model: jjzha/jobbert-base-cased
3
- tags:
4
- - generated_from_trainer
5
  model-index:
6
  - name: jobbert-base-cased-compdecs
7
  results: []
 
 
 
 
 
 
8
  ---
9
 
10
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
11
- should probably proofread and complete it, then remove this comment. -->
12
 
13
- # jobbert-base-cased-compdecs
14
 
15
- This model is a fine-tuned version of [jjzha/jobbert-base-cased](https://huggingface.co/jjzha/jobbert-base-cased) on the None dataset.
16
- It achieves the following results on the evaluation set:
17
- - Loss: 0.4622
18
 
19
- ## Model description
20
 
21
- More information needed
22
 
23
- ## Intended uses & limitations
 
24
 
25
- More information needed
 
 
 
26
 
27
- ## Training and evaluation data
28
 
29
- More information needed
30
 
31
- ## Training procedure
 
 
32
 
33
- ### Training hyperparameters
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  The following hyperparameters were used during training:
36
  - learning_rate: 2e-05
@@ -41,13 +64,20 @@ The following hyperparameters were used during training:
41
  - lr_scheduler_type: linear
42
  - num_epochs: 10
43
 
44
- ### Training results
45
-
46
 
 
 
 
 
 
 
 
 
47
 
48
- ### Framework versions
49
 
50
  - Transformers 4.32.0
51
  - Pytorch 2.0.1+cu118
52
  - Datasets 2.14.4
53
- - Tokenizers 0.13.3
 
1
  ---
2
  base_model: jjzha/jobbert-base-cased
 
 
3
  model-index:
4
  - name: jobbert-base-cased-compdecs
5
  results: []
6
+ license: mit
7
+ language:
8
+ - en
9
+ metrics:
10
+ - accuracy
11
+ pipeline_tag: text-classification
12
  ---
13
 
14
+ ## 🖊️ Model description
 
15
 
16
+ This model is a fine-tuned version of [jjzha/jobbert-base-cased](https://huggingface.co/jjzha/jobbert-base-cased). JobBERT is a continuously pre-trained bert-base-cased checkpoint on ~3.2M sentences from job postings.
17
 
18
+ It has been fine tuned with a classification head to binarily classify job advert sentences as being a `company description` or not.
 
 
19
 
20
+ The model was trained on **486 labelled company description sentences** and **1000 non company description sentences less than 250 characters in length.**
21
 
 
22
 
23
+ It achieves the following results on a held out test set 147 sentences:
24
+ - Accuracy: 0.92157
25
 
26
+ | Label | precision | recall | f1-score | support |
27
+ | ----------- | ----------- | ----------- |----------- |----------- |
28
+ | not company description | 0.930693 |0.959184|0.944724|98|
29
+ | company description | 0.913043 |0.857143|0.884211|49|
30
 
31
+ ## 🖨️ Use
32
 
33
+ To use the model:
34
 
35
+ ```
36
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
37
+ from transformers import pipeline
38
 
39
+ model = AutoModelForSequenceClassification.from_pretrained("ihk/jobbert-base-cased-compdecs")
40
+ tokenizer = AutoTokenizer.from_pretrained("ihk/jobbert-base-cased-compdecs")
41
+
42
+ comp_classifier = pipeline('text-classification', model=model, tokenizer=tokenizer)
43
+ ```
44
+ An example use is as follows:
45
+
46
+ ```
47
+ job_sent = "Would you like to join a major manufacturing company?"
48
+ comp_classifier(job_sent)
49
+
50
+ >> [{'label': 'LABEL_1', 'score': 0.9953641891479492}]
51
+ ```
52
+
53
+ The intended use of this model is to extract company descriptions from online job adverts to use in downstream tasks such as mapping to [Standardised Industrial Classification (SIC)](https://www.gov.uk/government/publications/standard-industrial-classification-of-economic-activities-sic) codes.
54
+
55
+
56
+ ### ⚖️ Training hyperparameters
57
 
58
  The following hyperparameters were used during training:
59
  - learning_rate: 2e-05
 
64
  - lr_scheduler_type: linear
65
  - num_epochs: 10
66
 
67
+ ### ⚖️ Training results
 
68
 
69
+ The fine-tuning metrics are as follows:
70
+ - eval_loss: 0.462236
71
+ - eval_runtime: 0.629300
72
+ - eval_samples_per_second: 233.582000
73
+ - eval_steps_per_second: 15.890000
74
+ - epoch: 10.000000
75
+ - perplexity: 1.590000
76
+ -
77
 
78
+ ### ⚖️ Framework versions
79
 
80
  - Transformers 4.32.0
81
  - Pytorch 2.0.1+cu118
82
  - Datasets 2.14.4
83
+ - Tokenizers 0.13.3