dhuynh95 commited on
Commit
f8720c8
โ€ข
1 Parent(s): 30962e5

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +2 -0
app.py CHANGED
@@ -137,6 +137,8 @@ To raise awareness of this issue, we show in this demo how much [StarCoder](http
137
  We found that **StarCoder memorized at least 8% of the training samples** we used, which highlights the high risks of LLMs exposing the training set. We provide a notebook to reproduce our results [here](https://colab.research.google.com/drive/1YaaPOXzodEAc4JXboa12gN5zdlzy5XaR?usp=sharing). ๐Ÿ‘ˆ
138
 
139
  To evaluate memorization of the training set, we can prompt StarCoder with the first tokens of an example from the training set. If StarCoder completes the prompt with an output that looks very similar to the original sample, we will consider this sample to be memorized by the LLM. ๐Ÿ’พ
 
 
140
  """
141
 
142
  memorization_definition = """
 
137
  We found that **StarCoder memorized at least 8% of the training samples** we used, which highlights the high risks of LLMs exposing the training set. We provide a notebook to reproduce our results [here](https://colab.research.google.com/drive/1YaaPOXzodEAc4JXboa12gN5zdlzy5XaR?usp=sharing). ๐Ÿ‘ˆ
138
 
139
  To evaluate memorization of the training set, we can prompt StarCoder with the first tokens of an example from the training set. If StarCoder completes the prompt with an output that looks very similar to the original sample, we will consider this sample to be memorized by the LLM. ๐Ÿ’พ
140
+
141
+ โš ๏ธNon responsiveness: We use Hugging Face Pro Inference solution to query StarCoder, which might be not available. If the demo does not work, please try later.
142
  """
143
 
144
  memorization_definition = """