UAE-Code-Large-V1 / README.md
SeanLee97's picture
Update README.md
da347b7 verified
|
raw
history blame
4.79 kB
---
license: mit
datasets:
- WhereIsAI/github-issue-similarity
language:
- en
library_name: sentence-transformers
pipeline_tag: feature-extraction
---
# WhereIsAI/UAE-Code-Large-V1
📢 `WhereIsAI/UAE-Code-Large-V1` **is licensed under MIT. Feel free to use it in any scenario.**
If you use it for academic papers, we would greatly appreciate it if you could cite us. 👉 [citation info](#citation).
This model is trained on the [GIS: Github Issue Similarity](https://huggingface.co/datasets/WhereIsAI/github-issue-similarity) dataset using [AnglE](https://github.com/SeanLee97/AnglE) loss (https://arxiv.org/abs/2309.12871).
It can be used to measure **code/issue similarity**.
Results (test set):
- Spearman correlation: 71.19
- Accuracy: 84.37
## Usage
### 1. angle-emb
You can use it via `angle-emb` as follows:
install:
```
python -m pip install -U angle-emb
```
example:
```python
from scipy import spatial
from angle_emb import AnglE
model = AnglE.from_pretrained('WhereIsAI/UAE-Code-Large-V1').cuda()
quick_sort = '''# Approach 2: Quicksort using list comprehension
def quicksort(arr):
if len(arr) <= 1:
return arr
else:
pivot = arr[0]
left = [x for x in arr[1:] if x < pivot]
right = [x for x in arr[1:] if x >= pivot]
return quicksort(left) + [pivot] + quicksort(right)
# Example usage
arr = [1, 7, 4, 1, 10, 9, -2]
sorted_arr = quicksort(arr)
print("Sorted Array in Ascending Order:")
print(sorted_arr)'''
bubble_sort = '''def bubblesort(elements):
# Looping from size of array from last index[-1] to index [0]
for n in range(len(elements)-1, 0, -1):
swapped = False
for i in range(n):
if elements[i] > elements[i + 1]:
swapped = True
# swapping data if the element is less than next element in the array
elements[i], elements[i + 1] = elements[i + 1], elements[i]
if not swapped:
# exiting the function if we didn't make a single swap
# meaning that the array is already sorted.
return
elements = [39, 12, 18, 85, 72, 10, 2, 18]
print("Unsorted list is,")
print(elements)
bubblesort(elements)
print("Sorted Array is, ")
print(elements)'''
vecs = model.encode([
'def echo(): print("hello world")',
quick_sort,
bubble_sort
])
print('cos sim (0, 1):', 1 - spatial.distance.cosine(vecs[0], vecs[1]))
print('cos sim (0, 2)', 1 - spatial.distance.cosine(vecs[0], vecs[2]))
print('cos sim (1, 2):', 1 - spatial.distance.cosine(vecs[1], vecs[2]))
```
output:
```
cos sim (0, 1): 0.34329649806022644
cos sim (0, 2) 0.3627094626426697
cos sim (1, 2): 0.6972219347953796
```
## sentence-transformers
You can also use it via `sentence-transformers`
```python
from scipy import spatial
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('WhereIsAI/UAE-Code-Large-V1').cuda()
quick_sort = '''# Approach 2: Quicksort using list comprehension
def quicksort(arr):
if len(arr) <= 1:
return arr
else:
pivot = arr[0]
left = [x for x in arr[1:] if x < pivot]
right = [x for x in arr[1:] if x >= pivot]
return quicksort(left) + [pivot] + quicksort(right)
# Example usage
arr = [1, 7, 4, 1, 10, 9, -2]
sorted_arr = quicksort(arr)
print("Sorted Array in Ascending Order:")
print(sorted_arr)'''
bubble_sort = '''def bubblesort(elements):
# Looping from size of array from last index[-1] to index [0]
for n in range(len(elements)-1, 0, -1):
swapped = False
for i in range(n):
if elements[i] > elements[i + 1]:
swapped = True
# swapping data if the element is less than next element in the array
elements[i], elements[i + 1] = elements[i + 1], elements[i]
if not swapped:
# exiting the function if we didn't make a single swap
# meaning that the array is already sorted.
return
elements = [39, 12, 18, 85, 72, 10, 2, 18]
print("Unsorted list is,")
print(elements)
bubblesort(elements)
print("Sorted Array is, ")
print(elements)'''
vecs = model.encode([
'def echo(): print("hello world")',
quick_sort,
bubble_sort
])
print('cos sim (0, 1):', 1 - spatial.distance.cosine(vecs[0], vecs[1]))
print('cos sim (0, 2)', 1 - spatial.distance.cosine(vecs[0], vecs[2]))
print('cos sim (1, 2):', 1 - spatial.distance.cosine(vecs[1], vecs[2]))
```
output:
```
cos sim (0, 1): 0.34329649806022644
cos sim (0, 2) 0.3627094626426697
cos sim (1, 2): 0.6972219347953796
```
# Citation
```bibtex
@article{li2023angle,
title={AnglE-optimized Text Embeddings},
author={Li, Xianming and Li, Jing},
journal={arXiv preprint arXiv:2309.12871},
year={2023}
}
```