SentenceTransformer based on Salesforce/codet5-small
This is a sentence-transformers model finetuned from Salesforce/codet5-small. It maps sentences & paragraphs to a 512-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: Salesforce/codet5-small
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 512 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: T5EncoderModel
(1): Pooling({'word_embedding_dimension': 512, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("buelfhood/SOCO-Java-CodeT5Small-ST")
# Run inference
sentences = [
'\n\n\n\n\n\nimport java.io.*;\nimport java.net.*;\n\n\n\npublic class Dictionary\n{\n public static void main (String args[]) throws IOException,\n MalformedURLException\n {\n final String username = "";\n final String fullurl = "http://sec-crack.cs.rmit.edu./SEC/2/";\n final String dictfile = "/usr/share/lib/dict/words";\n String temppass;\n String password = "";\n URL url = new URL(fullurl);\n boolean cracked = false;\n\n startTime = System.currentTimeMillis();\n\n \n BufferedReader r = new BufferedReader(new FileReader(dictfile));\n\n while((temppass = r.readLine()) != null && !cracked)\n { \n \n if(temppass.length() <= 3)\n {\n \n if(isAlpha(temppass))\n {\n \n Authenticator.setDefault(new MyAuthenticator(username,temppass));\n try{\n BufferedReader x = new BufferedReader(new InputStreamReader(\n url.openStream()));\n cracked = true;\n password = temppass;\n } catch(Exception e){}\n }\n }\n }\n\n stopTime = System.currentTimeMillis();\n \n if(!cracked)\n System.out.println("Sorry, couldnt find the password");\n else\n System.out.println("Password found: "+password);\n System.out.println("Time taken: "+(stopTime-startTime));\n }\n\n public static boolean isAlpha(String s)\n {\n boolean v = true;\n for(int i=0; i<s.length(); i++)\n {\n if(!Character.isLetter(s.charAt(i)))\n v = false;\n }\n return ;\n }\n}\n\n',
'\n\nimport java.net.*;\nimport java.text.*; \nimport java.util.*; \nimport java.io.*;\n\npublic class WatchDog {\n\n public WatchDog() {\n\n StringBuffer stringBuffer1 = new StringBuffer();\n StringBuffer stringBuffer2 = new StringBuffer();\n int i,j = 0;\n\n try{\n\n URL yahoo = new URL("http://www.cs.rmit.edu./students/"); \n BufferedReader in = new BufferedReader(new InputStreamReader(yahoo.openStream()));\n\n String inputLine = "";\n String inputLine1 = "";\n String changedtext= "";\n String changedflag= "";\n\n\n Thread.sleep(180);\n\n BufferedReader in1 = new BufferedReader(new InputStreamReader(yahoo.openStream()));\n\n\n while ((inputLine = in.readLine()) != null) {\n inputLine1 = in1.readLine();\n if (inputLine.equals(inputLine1)) {\n System.out.println("equal");\n }\n else {\n System.out.println("Detected a Change");\n System.out.println("Line Before the change:" + inputLine);\n System.out.println("Line After the change:" + inputLine1);\n changedtext = changedtext + inputLine + inputLine1;\n changedflag = "Y";\n }\n \n }\n\n if (in1.readLine() != null ) {\n System.out.println("Detected a Change");\n System.out.println("New Lines Added ");\n changedtext = changedtext + "New Lines added";\n changedflag = "Y";\n }\n\n in.print();\n in1.print();\n\n if (changedflag.equals("Y")) {\n String smtphost ="smtp.mail.rmit.edu." ; \n String from = "@rmit.edu."; \n String = "janaka1@optusnet.." ; \n }\n\n\n }\n catch(Exception e){ System.out.println("exception:" + e);}\n\t \n}\n\t\t\n public static void main (String[] args) throws Exception {\n\t\tWatchDog u = new WatchDog();\n }\n}\n',
'\n\n\n\nimport java.util.*;\nimport java.net.*;\nimport java.io.*;\nimport javax.swing.*;\n\npublic class PasswordCombination\n{\n private int pwdCounter = 0;\n private int startTime;\n private String str1,str2,str3;\n private String url = "http://sec-crack.cs.rmit.edu./SEC/2/";\n private String loginPwd;\n private String[] password;\n private HoldSharedData data;\n private char[] chars = {\'A\',\'B\',\'C\',\'D\',\'E\',\'F\',\'G\',\'H\',\'I\',\'J\',\'K\',\'L\',\'M\',\n \'N\',\'O\',\'P\',\'Q\',\'R\',\'S\',\'T\',\'U\',\'V\',\'W\',\'X\',\'Y\',\'Z\',\n \'a\',\'b\',\'c\',\'d\',\'e\',\'f\',\'g\',\'h\',\'i\',\'j\',\'k\',\'l\',\'m\',\n \'n\',\'o\',\'p\',\'q\',\'r\',\'s\',\'t\',\'u\',\'v\',\'w\',\'x\',\'y\',\'z\'};\n\n public PasswordCombination()\n {\n System.out.println("Programmed by for INTE1070 Assignment 2");\n\n String input = JOptionPane.showInputDialog( "Enter number of threads" );\n if( input == null )\n System.exit(0);\n\n int numOfConnections = Integer.parseInt( input );\n startTime = System.currentTimeMillis();\n int pwdCounter = 52*52*52 + 52*52 + 52;\n password = new String[pwdCounter];\n\n\n loadPasswords();\n System.out.println( "Total Number of Passwords: " + pwdCounter );\n createConnectionThread( numOfConnections );\n }\n\n private void doPwdCombination()\n {\n for( int i = 0; i < 52; i ++ )\n {\n str1 = "" + chars[i];\n password[pwdCounter++] = "" + chars[i];\n System.err.print( str1 + " | " );\n\n for( int j = 0; j < 52; j ++ )\n {\n str2 = str1 + chars[j];\n password[pwdCounter++] = str1 + chars[j];\n\n for( int k = 0; k < 52; k ++ )\n {\n str3 = str2 + chars[k];\n password[pwdCounter++] = str2 + chars[k];\n }\n }\n }\n }\n\n private void loadPasswords( )\n {\n FileReader fRead;\n BufferedReader buf;\n String line = null;\n String fileName = "words";\n\n try\n {\n fRead = new FileReader( fileName );\n buf = new BufferedReader(fRead);\n\n while((line = buf.readLine( )) != null)\n {\n password[pwdCounter++] = line;\n }\n }\n catch(FileNotFoundException e)\n {\n System.err.println("File not found: " + fileName);\n }\n catch(IOException ioe)\n {\n System.err.println("IO Error " + ioe);\n }\n }\n\n private void createConnectionThread( int input )\n {\n data = new HoldSharedData( startTime, password, pwdCounter );\n\n int numOfThreads = input;\n int batch = pwdCounter/numOfThreads + 1;\n numOfThreads = pwdCounter/batch + 1;\n System.out.println("Number of Connection Threads Used=" + numOfThreads);\n ConnectionThread[] connThread = new ConnectionThread[numOfThreads];\n\n for( int index = 0; index < numOfThreads; index ++ )\n {\n connThread[index] = new ConnectionThread( url, index, batch, data );\n connThread[index].conn();\n }\n }\n} ',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 512]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 33,411 training samples
- Columns:
sentence_0
,sentence_1
, andlabel
- Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 label type string string int details - min: 52 tokens
- mean: 444.58 tokens
- max: 512 tokens
- min: 52 tokens
- mean: 470.35 tokens
- max: 512 tokens
- 0: ~99.80%
- 1: ~0.20%
- Samples:
sentence_0 sentence_1 label
import java.util.;
import java.io.;
public class MyTimer
{
public static void main(String args[])
{
Watchdog watch = new Watchdog();
Timer time = new Timer();
time.schedule(watch,864000000,864000000);
}
}
import java.io.;
import java.;
import java.net.;
import java.util.;
public class Dictionary {
public static void main (String[] args) throws IOException {
BufferedReader stdin = new BufferedReader (new InputStreamReader(System.in));
d = new Date().getTime();
FileReader fr = new FileReader("/usr/share/lib/dict/words");
BufferedReader bufr = new BufferedReader(fr);
String word = bufr.readLine();
int total = 960;
String[] pws = new String[total];
int count = 0;
while (word!=null){
if (word.length()<=3) { pws[count] = word; count++;}
word = bufr.readLine();
}
int i=0;
int response = 0;
for (i=0;i String uname = "";
String userinfo = uname + ":" + pws[i];
try{
String encoding = new bf.misc.BASE64Encoder().encode (userinfo.getBytes());
URL url = new URL("http://sec-crack.cs.rmit.edu./SEC/2/");
HttpURLConn...0
import java.io.;
import java.util.;
class BruteForce{
public static void main(String args[]){
String pass,s;
char a,b,c;
int z=0;
int attempt=0;
Process p;
char password[]={'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q',
'R','S','T','U','V','W','X','Y','Z','a','b','c','d','e','f','g','h',
'i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'};
z = System.currentTimeMillis();
int at=0;
for(int i=0;i for(int j=0;j for(int k=0;k pass=String.valueOf(password[i])+String.valueOf(password[j])+String.valueOf(password[k]);
try {
System.out.println("Trying crack using: "+pass);
at++;
p = Runtime.getRuntime().exec("wget --http-user= --http-passwd="+pass+" http://sec-crack.cs.rmit.edu./SEC/2/index.php");
try{
p.waitFor();
}
catch(Exception q){}
z = p.exitValue();
...
import java.io.*;
import java.util.Vector;
import java.util.Date;
interface UnaryPredicate {
boolean execute(Object obj);
}
public class DiffPrint {
static String outFile="";
public static abstract class Base {
protected Base(Object[] a,Object[] b) {
try
{
outfile = new PrintWriter(new FileWriter(outFile));
}
catch (Exception e)
{
e.printStackTrace();
}
file0 = a;
file1 = b;
}
protected UnaryPredicate ignore = null;
protected Object[] file0, file1;
public void print_script(Diff.change script) {
Diff.change next = script;
while (next != null)
{
Diff.change t, end;
t = next;
end = hunkfun(next);
next = end;
end = null;
print_hunk(t);
end = next;
}
outfile.flush();
}
protected Diff.change hunkfun(Diff.change hunk) {
...0
package java.httputils;
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.OutputStream;
public class WatchDog
{
protected final int MILLIS_IN_HOUR = (60 * 60 * 1000);
protected int interval = 24;
protected String URL = "http://www.cs.rmit.edu./students/";
protected String fileName = "WatchDogContent.html";
protected String command = "./alert_mail.sh";
protected String savedContent;
protected String retrievedContent;
public WatchDog()
{
super();
}
public void run() throws Exception
{
HttpRequestClient client = null;
System.out.println(getClass().getName() +
"Retrieving baseline copy of: " + getURL());
client = new HttpRequestClie...
import java.;
import java.io.;
import java.util.*;
public class Dictionary
{
public String[] passwds;
public int passwdNum;
public static void main(String[] args) throws IOException
{
Dictionary dic=new Dictionary();
dic.doDictionary();
System.exit(1);
}
void doDictionary() throws IOException
{
Runtime rt=Runtime.getRuntime();
passwds=new String[32768];
passwdNum=0;
time1=new Date().getTime();
try
{
File f = new File ("words");
FileReader fin = new FileReader (f);
BufferedReader buf = new BufferedReader(fin);
passwds[0]="00";
System.out.println(" loading words....");
{
passwds[passwdNum]=buf.readLine();
passwdNum++;
}while(passwds[passwdNum-1]!=null);
System.out.println("Finish loading words.");
} catch (FileNotFoundException exc) {
System.out.println ("File Not Found");
} catch (IOException exc) {
System.out.println ("IOException 1");
} catch (NullPointerException exc) {
System.out.println ("NullPointerEx...0
- Loss:
BatchAllTripletLoss
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size
: 16per_device_eval_batch_size
: 16num_train_epochs
: 1multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: noprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Epoch | Step | Training Loss |
---|---|---|
0.2393 | 500 | 0.2122 |
0.4787 | 1000 | 0.1686 |
0.7180 | 1500 | 0.2193 |
0.9574 | 2000 | 0.2084 |
Framework Versions
- Python: 3.11.13
- Sentence Transformers: 4.1.0
- Transformers: 4.52.4
- PyTorch: 2.6.0+cu124
- Accelerate: 1.7.0
- Datasets: 3.6.0
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
BatchAllTripletLoss
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
- Downloads last month
- 9
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for buelfhood/SOCO-Java-CodeT5Small-ST
Base model
Salesforce/codet5-small