File size: 2,466 Bytes
50ee307
 
1e476b6
 
 
 
50ee307
 
6e12f13
50ee307
 
1e476b6
50ee307
1e476b6
50ee307
1e476b6
 
50ee307
 
1e476b6
50ee307
6e12f13
50ee307
1e476b6
 
 
50ee307
6e12f13
50ee307
1e476b6
 
 
50ee307
1e476b6
50ee307
1e476b6
50ee307
1e476b6
 
 
 
 
 
 
 
 
 
 
 
 
 
50ee307
 
1e476b6
 
 
 
 
 
 
 
 
 
 
 
 
50ee307
1e476b6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
license: mit
datasets:
- WhereIsAI/github-issue-similarity
language:
- en
---

# SeanLee97/UAE-GIS-Large-V1


This model is trained on the [github-issue-similarity](https://huggingface.co/datasets/WhereIsAI/github-issue-similarity) dataset using [AnglE](https://github.com/SeanLee97/AnglE) and is used for measuring code similarity.

Results:

- Spearman correlation: 71.19
- Accuracy: 84.37


## Usage

### 1. Install

```
python -m pip install -U angle-emb
```

### 2. Example

```python
from scipy import spatial
from angle_emb import AnglE

model = AnglE.from_pretrained('SeanLee97/UAE-GIS-Large-V1').cuda()

quick_sort = '''# Approach 2: Quicksort using list comprehension

def quicksort(arr):
    if len(arr) <= 1:
        return arr
    else:
        pivot = arr[0]
        left = [x for x in arr[1:] if x < pivot]
        right = [x for x in arr[1:] if x >= pivot]
        return quicksort(left) + [pivot] + quicksort(right)
 
# Example usage
arr = [1, 7, 4, 1, 10, 9, -2]
sorted_arr = quicksort(arr)
print("Sorted Array in Ascending Order:")
print(sorted_arr)'''


bubble_sort = '''def bubblesort(elements):
    # Looping from size of array from last index[-1] to index [0]
    for n in range(len(elements)-1, 0, -1):
        swapped = False
        for i in range(n):
            if elements[i] > elements[i + 1]:
                swapped = True
                # swapping data if the element is less than next element in the array
                elements[i], elements[i + 1] = elements[i + 1], elements[i]
        if not swapped:
            # exiting the function if we didn't make a single swap
            # meaning that the array is already sorted.
            return

elements = [39, 12, 18, 85, 72, 10, 2, 18]

print("Unsorted list is,")
print(elements)
bubblesort(elements)
print("Sorted Array is, ")
print(elements)'''

vecs = model.encode([
    'def echo(): hello world',
    quick_sort,
    bubble_sort
])


print('cos sim (0, 1):', 1 - spatial.distance.cosine(vecs[0], vecs[1]))
print('cos sim (0, 2)', 1 - spatial.distance.cosine(vecs[0], vecs[2]))
print('cos sim (1, 2):', 1 - spatial.distance.cosine(vecs[1], vecs[2]))

```

output:

```
cos sim (0, 1): 0.3169282078742981
cos sim (0, 2) 0.3370905816555023
cos sim (1, 2): 0.6972219347953796
```

# Citation

```bibtex
@article{li2023angle,
  title={AnglE-optimized Text Embeddings},
  author={Li, Xianming and Li, Jing},
  journal={arXiv preprint arXiv:2309.12871},
  year={2023}
}
```