SeanLee97 commited on
Commit
1e476b6
·
verified ·
1 Parent(s): 49276d1

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -37
README.md CHANGED
@@ -1,57 +1,106 @@
1
  ---
2
  license: mit
3
- tags:
4
- - generated_from_trainer
5
- base_model: WhereIsAI/UAE-Large-V1
6
- model-index:
7
- - name: GIS-Large-V1
8
- results: []
9
  ---
10
 
11
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
- should probably proofread and complete it, then remove this comment. -->
13
 
14
- # GIS-Large-V1
15
 
16
- This model is a fine-tuned version of [WhereIsAI/UAE-Large-V1](https://huggingface.co/WhereIsAI/UAE-Large-V1) on the None dataset.
17
 
18
- ## Model description
19
 
20
- More information needed
 
21
 
22
- ## Intended uses & limitations
23
 
24
- More information needed
25
 
26
- ## Training and evaluation data
27
 
28
- More information needed
 
 
29
 
30
- ## Training procedure
31
 
32
- ### Training hyperparameters
 
 
33
 
34
- The following hyperparameters were used during training:
35
- - learning_rate: 5e-06
36
- - train_batch_size: 4
37
- - eval_batch_size: 8
38
- - seed: 42
39
- - distributed_type: multi-GPU
40
- - num_devices: 3
41
- - total_train_batch_size: 12
42
- - total_eval_batch_size: 24
43
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
44
- - lr_scheduler_type: linear
45
- - lr_scheduler_warmup_steps: 200
46
- - num_epochs: 1
47
 
48
- ### Training results
49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
 
52
- ### Framework versions
 
 
 
 
 
 
 
 
 
 
 
 
53
 
54
- - Transformers 4.39.3
55
- - Pytorch 2.0.1
56
- - Datasets 2.16.0
57
- - Tokenizers 0.15.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ datasets:
4
+ - WhereIsAI/github-issue-similarity
5
+ language:
6
+ - en
 
 
7
  ---
8
 
9
+ # SeanLee97/GIS-Large-V1
 
10
 
 
11
 
12
+ This model is trained on the [github-issue-similarity](https://huggingface.co/datasets/WhereIsAI/github-issue-similarity) dataset using [AnglE](https://github.com/SeanLee97/AnglE) and is used for measuring code similarity.
13
 
14
+ Results:
15
 
16
+ - Spearman correlation: 71.19
17
+ - Accuracy: 84.37
18
 
 
19
 
20
+ ## Usage
21
 
22
+ ### 1 Install
23
 
24
+ ```
25
+ python -m pip install -U angle-emb
26
+ ```
27
 
28
+ ### 2 Example
29
 
30
+ ```python
31
+ from scipy import spatial
32
+ from angle_emb import AnglE
33
 
34
+ model = AnglE.from_pretrained('SeanLee97/UAE-GIS-Large-V1').cuda()
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
+ quick_sort = '''# Approach 2: Quicksort using list comprehension
37
 
38
+ def quicksort(arr):
39
+ if len(arr) <= 1:
40
+ return arr
41
+ else:
42
+ pivot = arr[0]
43
+ left = [x for x in arr[1:] if x < pivot]
44
+ right = [x for x in arr[1:] if x >= pivot]
45
+ return quicksort(left) + [pivot] + quicksort(right)
46
+
47
+ # Example usage
48
+ arr = [1, 7, 4, 1, 10, 9, -2]
49
+ sorted_arr = quicksort(arr)
50
+ print("Sorted Array in Ascending Order:")
51
+ print(sorted_arr)'''
52
 
53
 
54
+ bubble_sort = '''def bubblesort(elements):
55
+ # Looping from size of array from last index[-1] to index [0]
56
+ for n in range(len(elements)-1, 0, -1):
57
+ swapped = False
58
+ for i in range(n):
59
+ if elements[i] > elements[i + 1]:
60
+ swapped = True
61
+ # swapping data if the element is less than next element in the array
62
+ elements[i], elements[i + 1] = elements[i + 1], elements[i]
63
+ if not swapped:
64
+ # exiting the function if we didn't make a single swap
65
+ # meaning that the array is already sorted.
66
+ return
67
 
68
+ elements = [39, 12, 18, 85, 72, 10, 2, 18]
69
+
70
+ print("Unsorted list is,")
71
+ print(elements)
72
+ bubblesort(elements)
73
+ print("Sorted Array is, ")
74
+ print(elements)'''
75
+
76
+ vecs = model.encode([
77
+ 'def echo(): hello world',
78
+ quick_sort,
79
+ bubble_sort
80
+ ])
81
+
82
+
83
+ print('cos sim (0, 1):', 1 - spatial.distance.cosine(vecs[0], vecs[1]))
84
+ print('cos sim (0, 2)', 1 - spatial.distance.cosine(vecs[0], vecs[2]))
85
+ print('cos sim (1, 2):', 1 - spatial.distance.cosine(vecs[1], vecs[2]))
86
+
87
+ ```
88
+
89
+ output:
90
+
91
+ ```
92
+ cos sim (0, 1): 0.3169282078742981
93
+ cos sim (0, 2) 0.3370905816555023
94
+ cos sim (1, 2): 0.6972219347953796
95
+ ```
96
+
97
+ # Citation
98
+
99
+ ```bibtex
100
+ @article{li2023angle,
101
+ title={AnglE-optimized Text Embeddings},
102
+ author={Li, Xianming and Li, Jing},
103
+ journal={arXiv preprint arXiv:2309.12871},
104
+ year={2023}
105
+ }
106
+ ```