|
--- |
|
license: mit |
|
datasets: |
|
- WhereIsAI/github-issue-similarity |
|
language: |
|
- en |
|
library_name: sentence-transformers |
|
pipeline_tag: feature-extraction |
|
--- |
|
|
|
# WhereIsAI/UAE-Code-Large-V1 |
|
|
|
📢 `WhereIsAI/UAE-Code-Large-V1` **is licensed under MIT. Feel free to use it in any scenario.** |
|
If you use it for academic papers, we would greatly appreciate it if you could cite us. 👉 [citation info](#citation). |
|
|
|
This model is trained on the [GIS: Github Issue Similarity](https://huggingface.co/datasets/WhereIsAI/github-issue-similarity) dataset using [AnglE](https://github.com/SeanLee97/AnglE) loss (https://arxiv.org/abs/2309.12871). |
|
It can be used to measure **code/issue similarity**. |
|
|
|
Results (test set): |
|
|
|
- Spearman correlation: 71.19 |
|
- Accuracy: 84.37 |
|
|
|
|
|
## Usage |
|
|
|
### 1. angle-emb |
|
|
|
You can use it via `angle-emb` as follows: |
|
|
|
install: |
|
|
|
``` |
|
python -m pip install -U angle-emb |
|
``` |
|
|
|
example: |
|
|
|
```python |
|
from scipy import spatial |
|
from angle_emb import AnglE |
|
|
|
model = AnglE.from_pretrained('WhereIsAI/UAE-Code-Large-V1').cuda() |
|
|
|
quick_sort = '''# Approach 2: Quicksort using list comprehension |
|
|
|
def quicksort(arr): |
|
if len(arr) <= 1: |
|
return arr |
|
else: |
|
pivot = arr[0] |
|
left = [x for x in arr[1:] if x < pivot] |
|
right = [x for x in arr[1:] if x >= pivot] |
|
return quicksort(left) + [pivot] + quicksort(right) |
|
|
|
# Example usage |
|
arr = [1, 7, 4, 1, 10, 9, -2] |
|
sorted_arr = quicksort(arr) |
|
print("Sorted Array in Ascending Order:") |
|
print(sorted_arr)''' |
|
|
|
|
|
bubble_sort = '''def bubblesort(elements): |
|
# Looping from size of array from last index[-1] to index [0] |
|
for n in range(len(elements)-1, 0, -1): |
|
swapped = False |
|
for i in range(n): |
|
if elements[i] > elements[i + 1]: |
|
swapped = True |
|
# swapping data if the element is less than next element in the array |
|
elements[i], elements[i + 1] = elements[i + 1], elements[i] |
|
if not swapped: |
|
# exiting the function if we didn't make a single swap |
|
# meaning that the array is already sorted. |
|
return |
|
|
|
elements = [39, 12, 18, 85, 72, 10, 2, 18] |
|
|
|
print("Unsorted list is,") |
|
print(elements) |
|
bubblesort(elements) |
|
print("Sorted Array is, ") |
|
print(elements)''' |
|
|
|
vecs = model.encode([ |
|
'def echo(): print("hello world")', |
|
quick_sort, |
|
bubble_sort |
|
]) |
|
|
|
|
|
print('cos sim (0, 1):', 1 - spatial.distance.cosine(vecs[0], vecs[1])) |
|
print('cos sim (0, 2)', 1 - spatial.distance.cosine(vecs[0], vecs[2])) |
|
print('cos sim (1, 2):', 1 - spatial.distance.cosine(vecs[1], vecs[2])) |
|
|
|
``` |
|
|
|
output: |
|
|
|
``` |
|
cos sim (0, 1): 0.34329649806022644 |
|
cos sim (0, 2) 0.3627094626426697 |
|
cos sim (1, 2): 0.6972219347953796 |
|
``` |
|
|
|
## sentence-transformers |
|
|
|
You can also use it via `sentence-transformers` |
|
|
|
```python |
|
from scipy import spatial |
|
from sentence_transformers import SentenceTransformer |
|
|
|
model = SentenceTransformer('WhereIsAI/UAE-Code-Large-V1').cuda() |
|
|
|
quick_sort = '''# Approach 2: Quicksort using list comprehension |
|
|
|
def quicksort(arr): |
|
if len(arr) <= 1: |
|
return arr |
|
else: |
|
pivot = arr[0] |
|
left = [x for x in arr[1:] if x < pivot] |
|
right = [x for x in arr[1:] if x >= pivot] |
|
return quicksort(left) + [pivot] + quicksort(right) |
|
|
|
# Example usage |
|
arr = [1, 7, 4, 1, 10, 9, -2] |
|
sorted_arr = quicksort(arr) |
|
print("Sorted Array in Ascending Order:") |
|
print(sorted_arr)''' |
|
|
|
|
|
bubble_sort = '''def bubblesort(elements): |
|
# Looping from size of array from last index[-1] to index [0] |
|
for n in range(len(elements)-1, 0, -1): |
|
swapped = False |
|
for i in range(n): |
|
if elements[i] > elements[i + 1]: |
|
swapped = True |
|
# swapping data if the element is less than next element in the array |
|
elements[i], elements[i + 1] = elements[i + 1], elements[i] |
|
if not swapped: |
|
# exiting the function if we didn't make a single swap |
|
# meaning that the array is already sorted. |
|
return |
|
|
|
elements = [39, 12, 18, 85, 72, 10, 2, 18] |
|
|
|
print("Unsorted list is,") |
|
print(elements) |
|
bubblesort(elements) |
|
print("Sorted Array is, ") |
|
print(elements)''' |
|
|
|
vecs = model.encode([ |
|
'def echo(): print("hello world")', |
|
quick_sort, |
|
bubble_sort |
|
]) |
|
|
|
|
|
print('cos sim (0, 1):', 1 - spatial.distance.cosine(vecs[0], vecs[1])) |
|
print('cos sim (0, 2)', 1 - spatial.distance.cosine(vecs[0], vecs[2])) |
|
print('cos sim (1, 2):', 1 - spatial.distance.cosine(vecs[1], vecs[2])) |
|
``` |
|
|
|
output: |
|
|
|
``` |
|
cos sim (0, 1): 0.34329649806022644 |
|
cos sim (0, 2) 0.3627094626426697 |
|
cos sim (1, 2): 0.6972219347953796 |
|
``` |
|
|
|
# Citation |
|
|
|
```bibtex |
|
@article{li2023angle, |
|
title={AnglE-optimized Text Embeddings}, |
|
author={Li, Xianming and Li, Jing}, |
|
journal={arXiv preprint arXiv:2309.12871}, |
|
year={2023} |
|
} |
|
``` |