File size: 4,876 Bytes
f5d9417
1700f8f
 
 
 
 
98a78d2
1700f8f
98a78d2
f5d9417
98a78d2
f5d9417
da347b7
 
f5d9417
c601ffc
1700f8f
f5d9417
1700f8f
f5d9417
1700f8f
 
f5d9417
 
1700f8f
f5d9417
1700f8f
f5d9417
1700f8f
f5d9417
1700f8f
f5d9417
1700f8f
 
 
f5d9417
1700f8f
f5d9417
98a78d2
1700f8f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f5d9417
98a78d2
f5d9417
1700f8f
f5d9417
1700f8f
 
 
 
 
f5d9417
1700f8f
f5d9417
1700f8f
f5d9417
1700f8f
 
 
f5d9417
1700f8f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f5d9417
1700f8f
f5d9417
98a78d2
1700f8f
 
 
98a78d2
f5d9417
1700f8f
f5d9417
1700f8f
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
---
license: mit
datasets:
- WhereIsAI/github-issue-similarity
language:
- en
library_name: sentence-transformers
pipeline_tag: feature-extraction
---

# WhereIsAI/UAE-Code-Large-V1

📢 `WhereIsAI/UAE-Code-Large-V1` **is licensed under MIT. Feel free to use it in any scenario.**
If you use it for academic papers, we would greatly appreciate it if you could cite us. 👉 [citation info](#citation).

This model builds upon [WhereIsAI/UAE-Large-V1](https://huggingface.co/WhereIsAI/UAE-Large-V1) and is fine-tuned on the [GIS: Github Issue Similarity](https://huggingface.co/datasets/WhereIsAI/github-issue-similarity) dataset using [AnglE](https://github.com/SeanLee97/AnglE) loss (https://arxiv.org/abs/2309.12871). 
It can be used to measure **code/issue similarity**.

Results (test set):

- Spearman correlation: 71.19
- Accuracy: 84.37


## Usage

### 1. angle-emb

You can use it via `angle-emb` as follows:

install:

```
python -m pip install -U angle-emb
```

example:

```python
from scipy import spatial
from angle_emb import AnglE

model = AnglE.from_pretrained('WhereIsAI/UAE-Code-Large-V1').cuda()

quick_sort = '''# Approach 2: Quicksort using list comprehension

def quicksort(arr):
    if len(arr) <= 1:
        return arr
    else:
        pivot = arr[0]
        left = [x for x in arr[1:] if x < pivot]
        right = [x for x in arr[1:] if x >= pivot]
        return quicksort(left) + [pivot] + quicksort(right)
 
# Example usage
arr = [1, 7, 4, 1, 10, 9, -2]
sorted_arr = quicksort(arr)
print("Sorted Array in Ascending Order:")
print(sorted_arr)'''


bubble_sort = '''def bubblesort(elements):
    # Looping from size of array from last index[-1] to index [0]
    for n in range(len(elements)-1, 0, -1):
        swapped = False
        for i in range(n):
            if elements[i] > elements[i + 1]:
                swapped = True
                # swapping data if the element is less than next element in the array
                elements[i], elements[i + 1] = elements[i + 1], elements[i]
        if not swapped:
            # exiting the function if we didn't make a single swap
            # meaning that the array is already sorted.
            return

elements = [39, 12, 18, 85, 72, 10, 2, 18]

print("Unsorted list is,")
print(elements)
bubblesort(elements)
print("Sorted Array is, ")
print(elements)'''

vecs = model.encode([
    'def echo(): print("hello world")',
    quick_sort,
    bubble_sort
])


print('cos sim (0, 1):', 1 - spatial.distance.cosine(vecs[0], vecs[1]))
print('cos sim (0, 2)', 1 - spatial.distance.cosine(vecs[0], vecs[2]))
print('cos sim (1, 2):', 1 - spatial.distance.cosine(vecs[1], vecs[2]))

```

output:

```
cos sim (0, 1): 0.34329649806022644
cos sim (0, 2) 0.3627094626426697
cos sim (1, 2): 0.6972219347953796
```

## sentence-transformers

You can also use it via `sentence-transformers`

```python
from scipy import spatial
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('WhereIsAI/UAE-Code-Large-V1').cuda()

quick_sort = '''# Approach 2: Quicksort using list comprehension

def quicksort(arr):
    if len(arr) <= 1:
        return arr
    else:
        pivot = arr[0]
        left = [x for x in arr[1:] if x < pivot]
        right = [x for x in arr[1:] if x >= pivot]
        return quicksort(left) + [pivot] + quicksort(right)
 
# Example usage
arr = [1, 7, 4, 1, 10, 9, -2]
sorted_arr = quicksort(arr)
print("Sorted Array in Ascending Order:")
print(sorted_arr)'''


bubble_sort = '''def bubblesort(elements):
    # Looping from size of array from last index[-1] to index [0]
    for n in range(len(elements)-1, 0, -1):
        swapped = False
        for i in range(n):
            if elements[i] > elements[i + 1]:
                swapped = True
                # swapping data if the element is less than next element in the array
                elements[i], elements[i + 1] = elements[i + 1], elements[i]
        if not swapped:
            # exiting the function if we didn't make a single swap
            # meaning that the array is already sorted.
            return

elements = [39, 12, 18, 85, 72, 10, 2, 18]

print("Unsorted list is,")
print(elements)
bubblesort(elements)
print("Sorted Array is, ")
print(elements)'''

vecs = model.encode([
    'def echo(): print("hello world")',
    quick_sort,
    bubble_sort
])


print('cos sim (0, 1):', 1 - spatial.distance.cosine(vecs[0], vecs[1]))
print('cos sim (0, 2)', 1 - spatial.distance.cosine(vecs[0], vecs[2]))
print('cos sim (1, 2):', 1 - spatial.distance.cosine(vecs[1], vecs[2]))
```

output:

```
cos sim (0, 1): 0.34329649806022644
cos sim (0, 2) 0.3627094626426697
cos sim (1, 2): 0.6972219347953796
```

# Citation

```bibtex
@article{li2023angle,
  title={AnglE-optimized Text Embeddings},
  author={Li, Xianming and Li, Jing},
  journal={arXiv preprint arXiv:2309.12871},
  year={2023}
}
```