SentenceTransformer based on microsoft/graphcodebert-base

This is a sentence-transformers model finetuned from microsoft/graphcodebert-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: microsoft/graphcodebert-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("buelfhood/SOCO-Java-GraphCodeBERT-ST")
# Run inference
sentences = [
    'import java.io.*;\nimport java.net.*;\nimport java.net.HttpURLConnection;\nimport javax.net.*;\nimport java.security.cert.*;\n\npublic class Dictionary\n{\n\tpublic static void main(String[] args)\n\t{\n\t\tBufferedReader in = null;\n\t\tboolean found = true;\n\t\tString word = null;\n\t\tString cmd = null;\n\t\tRuntime run = Runtime.getRuntime();\n\t\tProcess pro = null;\n\t\tBufferedReader inLine = null;\n\n\n\n\t\tString str = null;\n\t\tURLConnection connection = null;\n\n\t\ttry\n\t\t{\n\t\t\tFileReader reader = new FileReader("words");\n\t\t\tin = new BufferedReader(reader);\n\t\t\tSystem.out.println(" cracking....");\n\t\t\t\n\t\t\t{\n\t\t\t\tfound = true;\n\t\t\t\tword = new String(in.readLine());\n\n\t\t\t\tcmd = "wget --http-user= --http-passwd="+word +" http://sec-crack.cs.rmit.edu./SEC/2/index.php";\n\n\t\t\t\tpro = run.exec(cmd);\n\t\t\t\tinLine = new BufferedReader(new InputStreamReader(pro.getErrorStream()));\n\n\n\t\t\t\tif((str=inLine.readLine())!=null)\n\t\t\t\t{\n\n\t\t\t\t\twhile ((str=inLine.readLine())!=null)\n\t\t\t\t\t{\n\t\t\t\t\t\tif (str.endsWith("Required"))\n\t\t\t\t\t\t{\n\n\t\t\t\t\t\t\tfound = false;\n\t\t\t\t\t\t}\n\n\t\t\t\t\t}\n\t\t\t\t}\n\n\n\n\n\n\n\t\t\t\trun.gc();\n\t\t\t}\n\t\t\twhile (!found);\n\n\n\n\n\n\t\t}\n\t\tcatch (FileNotFoundException exc)\n\t\t{\n\t\t\tSystem.out.println(exc);\n\t\t}\n\t\tcatch (IOException exc)\n\t\t{\n\t\t\tSystem.out.println(exc);\n\t\t}\n        catch (NullPointerException ex)\n        {\n            System.out.println(word);\n        }\n\t\tfinally\n\t\t{\n\t\t\ttry\n\t\t\t{\n\t\t\t\tif (in!= null)\n\t\t\t\t{\n\t\t\t\t\tin.print();\n\t\t\t\t}\n\t\t\t}\n\t\t\tcatch (IOException e) {}\n\t\t}\n\t\tif (found == true)\n\t\t\tSystem.out.println("The password is :" + word);\n        else\n            System.out.println("NOT FOUND!");\n\t}\n}',
    '\nimport java.net.*;\nimport java.io.*;\nimport java.misc.*;\nimport java.io.BufferedInputStream;\nimport java.awt.*;\nimport java.awt.event.*;\n\npublic class WriteFile\n{\n   String url;\n   String fileName;\n   int flag;\n   private PrintWriter out2;\n   private TextArea response;\n   int status;\n   int mailFlag;\n\n   public WriteFile (String newUrl, String newFileName, int newFlag)\n   {\n       url = newUrl;\n       fileName = newFileName;\n       PrintWriter printW = null;\n       FileOutputStream fout;\n       flag = newFlag;\n       status = 0;\n       mailFlag = 0;\n\n       \n       File file = new File(fileName);\n       file.delete();\n\n       try\n       {\n          fout = new FileOutputStream(fileName,true);\n          printW = new PrintWriter(fout);\n       }\n       catch (IOException ioe)\n       {\n          System.out.println("IO Error : " + ioe);\n       }\n\n\n       URL u;\n       URLConnection uc;\n\n       try\n       {\n          u = new URL(url);\n          try\n          {\n             \n             uc = u.openConnection();\n\n             InputStream content = (InputStream)uc.getInputStream();\n             BufferedReader in = new BufferedReader (new InputStreamReader(content));\n\n             String line;\n\n             \n             while ((line = in.readLine()) != null)\n             {\n                \n                printW.println(line);\n\n             }\n          }\n          catch (Exception e)\n          {\n             System.out.println("Error: " + e);\n          }\n       }\n       catch (MalformedURLException e)\n       {\n          System.out.println(url + " is not a parseable URL");\n       }\n       \n       printW.print();\n\n\n       if(flag == 1)\n       {\n          \n           compareDiff("@.rmit.edu.");\n       }\n   }\n\n  String loadStream(InputStream in) throws IOException\n  {\n        int ptr = 0;\n        in = new BufferedInputStream(in);\n        StringBuffer buffer = new StringBuffer();\n\n        while( (ptr = in.next()) != -1 )\n        {\n            status++;\n            \n            buffer.append((char)ptr);\n            mailFlag++;\n            \n        }\n        return buffer.toString();\n   }\n\n    public void compareDiff(String emailAdd)\n    {\n       String cmds = "diff test1.txt test2.txt";\n       PrintWriter printW2 = null;\n       FileOutputStream fout2;\n       \n       File file = new File("diff.txt");\n       file.delete();\n       String ;\n\n       try\n       {\n          fout2 = new FileOutputStream("diff.txt",true);\n          printW2 = new PrintWriter(fout2);\n       }\n       catch (IOException ioe)\n       {\n          System.out.println("IO Error : " + ioe);\n       }\n\n       try\n       {\n\n\n          \n          Process ps = Runtime.getRuntime().exec(cmds);\n          PrintWriter out = new PrintWriter(new OutputStreamWriter(ps.getOutputStream()));\n\n          printW2.println(loadStream(ps.getInputStream())+"\\n");\n          printW2.print();\n\n\n          if(mailFlag != 0)\n          {\n             FileReader fRead2;\n             BufferedReader buf2;\n\n             try\n             {\n                fRead2 = new FileReader("diff.txt");\n                buf2 = new BufferedReader(fRead2);\n                String line2;\n                int i=0;\n\n                line = new String("  some changes  the web  as followed: \\n");\n                \n                Socket s = new Socket("wombat.cs.rmit.edu.", 25);\n                out2 = new PrintWriter(s.getOutputStream());\n\n                send(null);\n                send("HELO cs.rmit.edu.");\n                send("MAIL FROM: @.rmit.edu.");\n                \n                send("RCPT : @.rmit.edu.");\n                send("DATA");\n                \n\n                while( (line2 = buf2.readLine()) != null)\n                {\n                   \n line= new String(""+line2+"\\n");\n                   \n                   \n\n                }\n                \n                \n                \n                out2.print();\n                send(".");\n                s.print();\n             }\n             catch(FileNotFoundException e)\n             {\n                System.out.println("File not found");\n             }\n             catch(IOException ioe)\n             {\n                System.out.println("IO Error " + ioe);\n             }\n          }\n\n          System.out.println(loadStream(ps.getInputStream()));\n          \n          System.err.print(loadStream(ps.getErrorStream()));\n        }\n        catch(IOException ioe)\n        {\n            ioe.printStackTrace();\n        }\n    }\n\n    public void send(String s) throws IOException\n    {\n    \tresponse = new TextArea();\n      \tif(s != null)\n      \t{\n            response.append(s + "\\n");\n            out2.println(s);\n\t    out2.flush();\n\t}\n    }\n\n   public int getStatus()\n   {\n      return status;\n   }\n}',
    'import java.io.*;\nimport java.net.*;\nimport java.text.*;\nimport java.util.*;\n\nclass Dictionary {\n\n    private String password="";\n\n    private int num=401;\n\n\n    public static void main(String[] args) {\n\n\n      Dictionary URLcon;\n\n      int length = 0;\n\n      String passwd="";\n\n       int t0,t1;\n\n      String line ="";\n      \n      if (args.length == 0) {\n      \t\n      System.err.println (\n      \t\t\n      \t\t"Usage : java BruteForce <username>");\n      return;\n      \t\n      }\n      \n      String username = args[0];\n      \n      \n      t0=System.currentTimeMillis();\n      \n      System.out.println ("  " + new Date());\n      System.out.println ("Using Dictionary method  attack "+username+"\'s password.  Please waiting.......");\n\n      try{ BufferedReader in = new BufferedReader(new FileReader("/usr/share/lib/dict/words"));\n\n           while ((passwd=in.readLine())!=null) {\n\n           \t URLcon = new Dictionary (passwd,username);\n\n             if ((URLcon.num)!=401) {\n\n             \tt1=System.currentTimeMillis();\n\n                System.out.println("The password: "+ passwd);\n\n             \tdouble dt =t1-t0;\n\n             \tSystem.out.println("It took "+DecimalFormat.getInstance().format(dt/1000)+ " seconds");\n                \n                System.out.println ("Finish  " + new Date());\n                \n             \treturn;\n\n             }\n\n\n           \t}\n\n      }catch (FileNotFoundException e){\n      \tSystem.out.println(e);\n      }catch (IOException e){\n      \tSystem.out.println(e);\n      }\n\n\n       System.out.println(" not find the password");\n\n\n}\n\n   public  Dictionary  (String password,String username) {\n\n  \t  String urlString =  "http://sec-crack.cs.rmit.edu./SEC/2/" ;\n\n      \n      try {\n\n        String userPassword = username+":"+password ;\n\n        String encoding = new userPassword.misc.BASE64Encoder().encode (userPassword.getBytes());\n\n        URL url = new URL (urlString);\n\n        HttpURLConnection uc = (HttpURLConnection) url.openConnection();\n\n        uc.setRequestProperty ("Authorization", " " + encoding);\n\n         url = uc.getResponseCode();\n\n\n       }\n        catch(MalformedURLException e){\n       \t  System.out.println(e);\n       }catch(IOException e){\n          System.out.println(e);\n       }\n\n\n   }\n}',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 33,411 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string int
    details
    • min: 61 tokens
    • mean: 478.83 tokens
    • max: 512 tokens
    • min: 61 tokens
    • mean: 490.04 tokens
    • max: 512 tokens
    • 0: ~99.80%
    • 1: ~0.20%
  • Samples:
    sentence_0 sentence_1 label


    import java.net.;
    import java.io.
    ;

    public class sendMail {

    public void sendMail(String mailServer, String recipient, String result) {
    try {
    Socket s = new Socket(mailServer, 25);
    BufferedReader in = new BufferedReader
    (new InputStreamReader(s.getInputStream(), "8859_1"));
    BufferedWriter out = new BufferedWriter
    (new OutputStreamWriter(s.getOutputStream(), "8859_1"));

    send(in, out, "HELO client");

    send(in, out, "MAIL FROM: ");
    send(in, out, "RCPT : " + recipient);
    send(in, out, "DATA");
    send(out, "Subject: ");
    send(out, "From: Admin ");
    send (out, "\n");

    send(out, result);
    send(out, "\n.\n");
    send(in, out, "QUIT");

    }
    catch (Exception e) {
    e.printStackTrace();
    }
    }

    public void send(BufferedReader in, BufferedWriter out, String s) {
    try {
    out.write(s + "\n");
    out.flush();
    Sys...



    import java.io.;
    import java.util.
    ;
    import java.;
    import java.net.
    ;

    public class WatchDog
    {

    static Process p = null;
    static Process qproc = null;

    static BufferedReader bf = null;
    static StringTokenizer tok = null;

    static String Path = null;
    static String str = null;
    static String urlStr=null;
    static boolean changed = false;

    static File indexfile = new File("index.html");
    static File tmpfile = new File("tmpindex.html");
    static File mdfile = new File("md5file.txt");
    static File tmpmdfile = new File("tmpmd5file.txt");
    static PrintWriter mailwriter = null;


    public static void main(String[] args)
    {

    urlStr = "http://www.cs.rmit.edu./";

    try
    {

    mailwriter = new PrintWriter(new BufferedWriter(new FileWriter("tomail.txt", false)));

    getLatest(urlStr);
    parseFile();

    mailwriter.read();

    if(changed)
    {
    System.out.println("Sending Mail");
    ...
    0

    import java.io.;
    import java.net.
    ;

    public class BruteForce
    {
    private String myUsername = "";
    private String urlToCrack = "http://sec-crack.cs.rmit.edu./SEC/2";
    private int NUM_CHARS = 52;


    public static void main(String args[])
    {
    BruteForce bf = new BruteForce();
    }


    public BruteForce()
    {
    generatePassword();
    }




    public void generatePassword()
    {
    int index1 = 0, index2, index3;

    char passwordChars[] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I',
    'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R',
    'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
    'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i',
    'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r',
    's', 't', 'u', 'v', 'w', 'x', 'y', 'z' };


    while(index1 < NUM_CHARS)
    {
    index2 = 0;

    while(index2 < NUM_CHARS)
    {
    ...




    public class HoldSharedData
    {
    private int numOfConnections = 0;
    private int startTime;
    private int totalTime = 0;
    private String[] password;
    private int pwdCount;

    public HoldSharedData( int time, String[] pwd, int count )
    {
    startTime = time;

    password = pwd;
    pwdCount = count;
    }

    public int getPwdCount()
    {
    return pwdCount;
    }

    public void setNumOfConnections( )
    {
    numOfConnections ++;
    }

    public int getNumOfConnections()
    {
    return numOfConnections;
    }

    public int getStartTime()
    {
    return startTime;
    }

    public void setTotalTime( int newTotalTime )
    {
    totalTime = newTotalTime;
    }

    public int getTotalTime()
    {
    return totalTime;
    }

    public String getPasswordAt( int index )
    {
    return password[index];
    }
    }
    0


    import java.net.;
    import java.io.
    ;

    public class sendMail {

    public void sendMail(String mailServer, String recipient, String result) {
    try {
    Socket s = new Socket(mailServer, 25);
    BufferedReader in = new BufferedReader
    (new InputStreamReader(s.getInputStream(), "8859_1"));
    BufferedWriter out = new BufferedWriter
    (new OutputStreamWriter(s.getOutputStream(), "8859_1"));

    send(in, out, "HELO client");

    send(in, out, "MAIL FROM: ");
    send(in, out, "RCPT : " + recipient);
    send(in, out, "DATA");
    send(out, "Subject: ");
    send(out, "From: Admin ");
    send (out, "\n");

    send(out, result);
    send(out, "\n.\n");
    send(in, out, "QUIT");

    }
    catch (Exception e) {
    e.printStackTrace();
    }
    }

    public void send(BufferedReader in, BufferedWriter out, String s) {
    try {
    out.write(s + "\n");
    out.flush();
    Sys...


    import java.net.;
    import java.io.
    ;

    public class Base64Encoder
    {
    private final static char base64Array [] = {
    'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
    'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P',
    'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X',
    'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f',
    'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
    'o', 'p', 'q', 'r', 's', 't', 'u', 'v',
    'w', 'x', 'y', 'z', '0', '1', '2', '3',
    '4', '5', '6', '7', '8', '9', '+', '/'
    };

    public static String encode (String string)
    {
    String encodedString = "";
    byte bytes [] = string.getBytes ();
    int i = 0;
    int pad = 0;
    while (i < bytes.length)
    {
    byte b1 = bytes [i++];
    byte b2;
    byte b3;
    if (i >= bytes.length)
    {
    b2 = 0;
    b3 = 0;
    pad = 2;
    }
    else
    {
    b2 = bytes [i++];
    if (i >= bytes.length)
    ...
    0
  • Loss: BatchAllTripletLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.2393 500 0.1754
0.4787 1000 0.1994
0.7180 1500 0.209
0.9574 2000 0.1941

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.4
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.7.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

BatchAllTripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
13
Safetensors
Model size
125M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for buelfhood/SOCO-Java-GraphCodeBERT-ST

Finetuned
(28)
this model