tjohn327's picture
Add new SentenceTransformer model
d5019b3 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:21376
  - loss:MultipleNegativesRankingLoss
base_model: sentence-transformers/all-MiniLM-L6-v2
widget:
  - source_sentence: Explain the purpose of source authentication in EPIC.
    sentences:
      - >-
        Golden file testing is integral to the SCION testing suite, ensuring
        consistency in test outputs. The -update flag is utilized across all
        packages containing golden file tests, allowing for systematic updates.
        To update all golden files, the command `go test ./... -update` is
        executed, while a specific package can be updated using `go test
        ./path/to/package -update`. The update mechanism is implemented via a
        package global variable: `var update = xtest.UpdateGoldenFiles()`.


        For tests involving non-deterministic elements, such as private keys and
        certificates, a separate flag, -update-non-deterministic, is employed.
        This allows for the updating of non-deterministic golden files with the
        command `go test ./... -update-non-deterministic` or for a specific
        package using `go test ./path/to/package -update-non-deterministic`. The
        corresponding global variable for this functionality is defined as `var
        updateNonDeterministic = xtest.UpdateNonDeterminsticGoldenFiles()`. This
        structured approach facilitates the management of both deterministic and
        non-deterministic golden files within the SCION architecture.
      - >-
        EPIC introduces a family of cryptographic data-plane protocols designed
        to enhance security in path-aware Internet architectures by addressing
        the security-efficiency dilemma. The protocols facilitate source
        authentication, path validation, and path authorization while
        maintaining low communication overhead. EPIC employs short per-hop
        authentication fields to minimize overhead, ensuring that even if an
        attacker forges an authenticator, they cannot exploit it to launch
        volumetric DoS attacks. The design binds authenticators to specific
        packets, preventing further malicious packet transmission. Additionally,
        EPIC utilizes a longer, unforgeable authentication field for the
        destination, allowing detection of any deceptive packets that may have
        bypassed intermediate routers. The proposed attacker model combines a
        localized Dolev–Yao adversary with a cryptographic oracle, demonstrating
        EPIC's resilience against powerful attackers. EPIC's communication
        overhead is 3–5 times smaller than existing solutions like OPT and ICING
        for realistic path lengths. Implementation using Intel’s Data Plane
        Development Kit (DPDK) shows that EPIC can saturate a 40 Gbps link on
        commodity hardware with only four processing cores. The focus is on
        securing inter-domain data-plane communication while assuming a secure
        control plane for key distribution and path construction.
      - >-
        The document chunk provides a comprehensive index of SCION Internet
        Architecture, detailing key components and concepts. It includes
        references to various types of certificates such as AS, CA, root, and
        voting certificates, essential for the SCION Control-plane PKI (CP-PKI).
        The control plane and data plane are delineated, with specific focus on
        control-plane extensions, including hidden paths and time
        synchronization, and data-plane extensions like end-to-end (E2E) and
        hop-by-hop (HBH) mechanisms. The COLIBRI framework is highlighted,
        encompassing control and data plane functionalities, end-to-end
        reservations (EER), and security analyses. Cryptographic algorithms are
        categorized, emphasizing agility, asymmetric, symmetric, and
        post-quantum methods, alongside cryptographic hash functions. Deployment
        scenarios are outlined, addressing customer site, end host, and ISP core
        network configurations. The document also references the Discovery
        service and DNSSEC, noting their relevance to SCION's operational
        integrity. Overall, the index encapsulates the architectural components,
        algorithms, and deployment strategies critical to SCION's design and
        implementation.
  - source_sentence: >-
      How does SCION's deployment model differ from traditional overlay
      networks?
    sentences:
      - >-
        The document discusses the deployment and scalability of the SCION
        Internet architecture, emphasizing its path-diversity-based path
        construction algorithm for core beaconing as the SCIONLab network
        expands. It contrasts SCION with existing testbeds like VINI, GENI, and
        FABRIC, which have not yet facilitated SCION's production network,
        highlighting SCION's unique design that avoids overlay complexities. The
        section on related work outlines various Internet architecture
        proposals, including Trotsky, which advocates for a backward-compatible
        framework for new inter-domain protocols, and RON, an application-level
        overlay network that lacks the guarantees of native architectures.
        SCION's architecture distinctly separates inter-domain communication
        (core-path segments) from intra-domain communication (down- and up-path
        segments), paralleling concepts in Plutarch and HLP, which employs a
        hybrid routing approach. SCION's beaconing mechanism enhances
        scalability compared to BGP, while maintaining a hierarchical network
        partitioning akin to HLP's model. The document also references XIA's
        principal-centric networking and the Framework for Internet Innovation's
        clean-slate redesign, noting SCION's alignment with these innovative
        paradigms while addressing deployment challenges.
      - >-
        The document discusses the limitations of the Border Gateway Protocol
        (BGP) within the context of internet infrastructure, emphasizing its
        lack of guarantees for data delivery and path specification. BGP
        facilitates the exchange of reachability information among Autonomous
        Systems (ASes) through IP prefix advertisements, enabling ASes to make
        routing decisions based on various factors, including cost and load
        balancing. However, BGP's design, originating in 1989 as a temporary
        solution, did not prioritize security, leading to vulnerabilities such
        as route hijacking. These deficiencies impact both Quality of Service
        (QoS) and Quality of Experience (QoE), as users remain unaware of the
        paths taken by their data. The document highlights the need for improved
        protocols that address these security and performance issues,
        particularly in the context of SCION architecture, which aims to provide
        more reliable and secure routing mechanisms.
      - >-
        The document chunk details the packet processing logic within the SCION
        Data Plane, specifically focusing on the handling of Hop Fields and
        Accumulator values during packet traversal. It outlines checks for link
        types to prevent valley usage and ensures that timestamps in the Info
        Field are valid. The processing steps vary based on the Construction
        Direction flag (C) and the Peering flag (P). 


        For packets traveling in construction direction (C = "1"), the process
        continues to the next step. If traveling against construction direction
        (C = "0"), three cases are considered: 


        1. **No Peering Hop Field (P = "0")**: The ingress border router
        computes the Accumulator (Acc) using the formula Acc = Acc_(i+1) XOR
        MAC_i, updates the Acc field, verifies the MAC, and increments path
        metadata if at the last Hop Field.


        2. **Peering Hop Field Present but Not Current (P = "1")**: Similar to
        Case 1, but without incrementing path metadata.


        3. **Current Hop Field is Peering (P = "1")**: The router computes
        MAC^Peer_i, verifies it against the current MAC, and increments path
        metadata if applicable. 


        These steps ensure integrity and proper routing within the SCION
        architecture.
  - source_sentence: >-
      What considerations are taken into account for Assigned SCION Protocol
      Numbers?
    sentences:
      - >-
        Redundant transmission in SCION enhances the reliability of media
        streams, specifically for video calls, by employing multiple relay
        connections. The application allows for configurable redundant and
        multi-redundant transmission of audio and video streams. Each media
        stream can utilize more than one relay connection, with outgoing packets
        sent identically across these connections. Received packets are
        forwarded to WebRTC, which utilizes RTP and RTCP protocols for
        connection management, including sequence numbering and deduplication. 


        Overlap path processors facilitate the selection of paths for redundant
        relay connections. The process begins with sorting the relay connections
        arbitrarily, followed by the first connection utilizing the standard
        root path processor. Subsequent connections employ an overlap path
        processor that references the path of the preceding connection, ensuring
        that all paths are considered as reference paths for redundancy. This
        design aims to maintain Quality of Service (QoS) by ensuring that at
        least one relay connection successfully delivers packets, mitigating
        packet loss.


        However, challenges arise from latency discrepancies between redundant
        connections, which can lead to a primary connection (lead) and a
        secondary connection (backup) scenario, potentially affecting the
        overall performance of the media stream.
      - >-
        The document chunk addresses the availability guarantees within the
        SCION Internet Architecture, detailing the adversary model and
        availability properties. It outlines defense systems, including basic
        fault and attack isolation, protection mechanisms for data-plane
        traffic, and control-plane services. Key components include secure path
        discovery and dissemination, data delivery mechanisms, packet
        authentication, and filtering strategies. The section on traffic
        prioritization introduces traffic classes, Setup-Less Neighbor-Based
        Communication (SNC), traffic marking, and priority processing through
        queuing disciplines. Additionally, it discusses protected DRKey
        bootstrapping, emphasizing assumptions, the protection of COLIBRI SegR
        setup, and the Telescoped Reservation Setup (TRS) with a corresponding
        security analysis. The protection of control-plane services is further
        elaborated, focusing on criticality criteria for interactions, filtering
        at the control service, and the attack resilience of the control
        service. The chapter concludes with a discussion on AS certification,
        reinforcing the importance of these mechanisms in ensuring the
        robustness and reliability of SCION's architecture against various
        threats.
      - >-
        The document chunk outlines the SCION Data Plane architecture, detailing
        the SCION Header Specification, which includes various header formats
        such as Address Header, SCION Path Types (SCION, EmptyPath, OneHopPath,
        EPIC-HP), and the Pseudo Header for Upper-Layer Checksum. It specifies
        the SCION Extension Header, encompassing Hop-by-Hop and End-to-End
        Options Headers, along with TLV-encoded Options. The SCMP Specification
        is introduced, covering its general format, error messages, and
        informational messages. The SCION Packet Authenticator Option is
        defined, detailing its format, absolute time, DRKey selection,
        authenticated data, and associated algorithms. Additionally, the
        document addresses BFD (Bidirectional Forwarding Detection) on top of
        SCION, including protocol specifics and its implementation in SCION
        routers. It also lists assigned SCION Protocol Numbers, highlighting
        considerations for their assignment, and concludes with an overview of
        the SCION Protocol Stack, which integrates these components into a
        cohesive architecture.
  - source_sentence: What happens when a local path service cannot find a path to a remote AS?
    sentences:
      - >-
        The document chunk details the verification of Go programs within the
        SCION architecture, specifically focusing on the `UnmarshalText`
        function for parsing Autonomous System (AS) identifiers. The
        `UnmarshalText` function, defined in the `addr` package, takes a byte
        slice `text`, converts it to a string, and invokes `ASFromString` to
        derive a valid AS identifier. The function employs preconditions and
        postconditions to ensure memory safety and validity of the AS
        identifier. The `validAS` function is a pure function that checks the
        validity of the AS identifier, returning true only for valid inputs. The
        use of the `old` keyword in postconditions guarantees that the state of
        the AS variable remains unchanged if the input string is invalid,
        ensuring that the function adheres to the principles of functional
        programming by avoiding side effects. The implementation details
        emphasize the importance of error handling, where `ASFromString` returns
        a non-nil error for invalid identifiers, thus maintaining the integrity
        of the AS state. This snippet is part of the SCION codebase,
        specifically from `github.com/scionproto/scion/go/lib/addr/isdas.go`.
      - >-
        The document chunk discusses extensions for the SCION data plane,
        focusing on the COLIBRI system's attack-resistant resource-reservation
        mechanisms. It identifies challenges such as per-flow state management,
        path stability, and authentication overhead, alongside corresponding
        enabling technologies. Key solutions include packet-carried state for
        fast path management, path choice strategies to mitigate on-path
        adversaries, and symmetric-key authentication to address signature
        overhead. The architecture leverages Isolation Domains (ISDs) and
        segment types to enhance scalability by decomposing reservations.
        Reservation protection is emphasized, requiring data packets to carry
        cryptographically protected information for validation and attribution.
        Efficient cryptographic mechanisms are critical, particularly for
        per-packet MACs to authenticate data packets at each on-path Autonomous
        System (AS). The DRKey framework underpins the efficient authentication
        of control-plane packets, mitigating risks such as signature flooding
        and denial-of-capability (DoC) attacks. Overall, the integration of
        these components within SCION's architecture enhances the robustness and
        efficiency of resource reservations in adversarial environments.
      - >-
        The document discusses the optimization of path registration and
        propagation in SCION Autonomous Systems (ASes) through beacon services,
        utilizing information encoded in Path Control Blocks (PCBs). Beacon
        services can optimize paths based on various quality metrics, such as
        latency and bandwidth, either jointly or in parallel, depending on local
        implementations. For path lookup, a host within a SCION AS must obtain
        the destination host's ISD number, AS number, and local address,
        resulting in tuples of the form ⟨ISD number, AS number, local address⟩.
        The host queries its path service for a registered or cached path to the
        destination. If unavailable, it escalates the request to a core AS's
        path service within its ISD. If the destination is within the same ISD
        or is a core AS in another ISD, the core path service provides a path
        segment. If not, the local core path service queries the remote core
        path service of the destination's ISD, which returns a down-segment. The
        local core path service then returns both the core-segment and
        down-segment to facilitate routing. The structure of a PCB is detailed,
        highlighting fields for AS hop metadata and extensions for additional
        information dissemination.
  - source_sentence: >-
      How does SCION's time synchronization support duplicate detection and
      traffic monitoring?
    sentences:
      - >-
        The document chunk details the COLIBRI system for bandwidth reservations
        within the SCION architecture, emphasizing its efficiency in managing
        end-to-end reservations (EERs) across on-path Autonomous Systems (ASes).
        COLIBRI employs symmetric cryptography for stateless verification of
        reservations, mitigating the need for per-source state. To counter
        replay attacks, a duplicate suppression mechanism is essential, ensuring
        that authenticated packets cannot be maliciously reused to exceed
        allocated bandwidth. Monitoring and policing systems are necessary to
        enforce bandwidth limits, allowing ASes to identify and manage
        misbehaving flows. While stateful flow monitoring, such as the
        token-bucket algorithm, provides precise measurements, it is limited to
        edge deployments due to state requirements. In contrast, probabilistic
        monitoring techniques like LOFT can operate effectively within the
        Internet core, accommodating a vast number of flows. Time
        synchronization is critical for coordinating bandwidth reservations
        across AS boundaries, facilitating accurate duplicate detection and
        traffic monitoring. SCION’s time synchronization mechanism ensures
        adequate synchronization levels for these operations. Overall, COLIBRI
        enables efficient short-term bandwidth reservations, crucial for
        maintaining the integrity of end-to-end communications in a highly
        dynamic Internet environment.
      - >-
        SCION is a next-generation Internet architecture designed to enhance
        security, availability, isolation, and scalability. It enables end hosts
        to utilize multiple authenticated inter-domain paths to any destination,
        with each packet carrying its own specified forwarding path determined
        by the sender. SCION architecture incorporates isolation domains (ISDs),
        which consist of multiple Autonomous Systems (ASes) that agree on a
        trust root configuration (TRC) defining the roots of trust for
        validating bindings between names and public keys or addresses. Core
        ASes govern each ISD, providing inter-ISD connectivity and managing
        trust roots. The architecture delineates three types of AS
        relationships: core, provider-customer, and peer-peer, with core
        relations existing solely among core ASes. SCION addressing is
        structured as a tuple ⟨ISD number, AS number, local address⟩, where the
        ISD number identifies the ISD of the end host, the AS number identifies
        the host's AS, and the local address serves as the host's identifier
        within its AS, not utilized for inter-domain routing or forwarding.
        Path-construction, path-registration, and path-lookup procedures are
        integral to SCION's operational framework, facilitating efficient packet
        forwarding based on the specified paths.
      - >-
        The document introduces a formal verification framework for path-aware
        data plane protocols, addressing the critical security property of path
        authorization in Internet architectures. It utilizes Isabelle/HOL to
        develop a parameterized model that first establishes path authorization
        without an attacker, then refines it by incorporating an attacker and
        cryptographic validation fields. The framework is parameterized by the
        protocol's authentication mechanism and relies on five verification
        conditions sufficient for proving the refinement of the abstract model.
        The authors validate the framework against several existing protocols,
        demonstrating compliance with the verification conditions and ensuring
        path authorization without requiring invariant proofs. This approach
        supports low-effort security proofs applicable to arbitrary network
        topologies and authorized paths, surpassing the capabilities of current
        automated security protocol verifiers. The document emphasizes the
        importance of formal verification in enhancing the security and
        reliability of future Internet architectures, particularly in light of
        the limitations of the existing Border Gateway Protocol (BGP) and the
        need for robust, scalable solutions. Path-aware architectures empower
        end hosts to select forwarding paths while ensuring adherence to
        autonomous systems' routing policies, thereby mitigating risks from
        malicious sources.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
model-index:
  - name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dev evaluation
          type: dev-evaluation
        metrics:
          - type: cosine_accuracy@1
            value: 0.2032690695725063
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.44090528080469404
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.5368818105616094
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.662615255658005
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.2032690695725063
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.146968426934898
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.10737636211232188
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.0662615255658005
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.2032690695725063
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.44090528080469404
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.5368818105616094
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.662615255658005
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.4209375313714088
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.3449352372969312
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.35689195077195485
            name: Cosine Map@100

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tjohn327/scion-all-MiniLM-L6-v2")
# Run inference
sentences = [
    "How does SCION's time synchronization support duplicate detection and traffic monitoring?",
    'The document chunk details the COLIBRI system for bandwidth reservations within the SCION architecture, emphasizing its efficiency in managing end-to-end reservations (EERs) across on-path Autonomous Systems (ASes). COLIBRI employs symmetric cryptography for stateless verification of reservations, mitigating the need for per-source state. To counter replay attacks, a duplicate suppression mechanism is essential, ensuring that authenticated packets cannot be maliciously reused to exceed allocated bandwidth. Monitoring and policing systems are necessary to enforce bandwidth limits, allowing ASes to identify and manage misbehaving flows. While stateful flow monitoring, such as the token-bucket algorithm, provides precise measurements, it is limited to edge deployments due to state requirements. In contrast, probabilistic monitoring techniques like LOFT can operate effectively within the Internet core, accommodating a vast number of flows. Time synchronization is critical for coordinating bandwidth reservations across AS boundaries, facilitating accurate duplicate detection and traffic monitoring. SCION’s time synchronization mechanism ensures adequate synchronization levels for these operations. Overall, COLIBRI enables efficient short-term bandwidth reservations, crucial for maintaining the integrity of end-to-end communications in a highly dynamic Internet environment.',
    "SCION is a next-generation Internet architecture designed to enhance security, availability, isolation, and scalability. It enables end hosts to utilize multiple authenticated inter-domain paths to any destination, with each packet carrying its own specified forwarding path determined by the sender. SCION architecture incorporates isolation domains (ISDs), which consist of multiple Autonomous Systems (ASes) that agree on a trust root configuration (TRC) defining the roots of trust for validating bindings between names and public keys or addresses. Core ASes govern each ISD, providing inter-ISD connectivity and managing trust roots. The architecture delineates three types of AS relationships: core, provider-customer, and peer-peer, with core relations existing solely among core ASes. SCION addressing is structured as a tuple ⟨ISD number, AS number, local address⟩, where the ISD number identifies the ISD of the end host, the AS number identifies the host's AS, and the local address serves as the host's identifier within its AS, not utilized for inter-domain routing or forwarding. Path-construction, path-registration, and path-lookup procedures are integral to SCION's operational framework, facilitating efficient packet forwarding based on the specified paths.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.2033
cosine_accuracy@3 0.4409
cosine_accuracy@5 0.5369
cosine_accuracy@10 0.6626
cosine_precision@1 0.2033
cosine_precision@3 0.147
cosine_precision@5 0.1074
cosine_precision@10 0.0663
cosine_recall@1 0.2033
cosine_recall@3 0.4409
cosine_recall@5 0.5369
cosine_recall@10 0.6626
cosine_ndcg@10 0.4209
cosine_mrr@10 0.3449
cosine_map@100 0.3569

Training Details

Training Dataset

Unnamed Dataset

  • Size: 21,376 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 7 tokens
    • mean: 17.45 tokens
    • max: 30 tokens
    • min: 4 tokens
    • mean: 231.74 tokens
    • max: 256 tokens
    • min: 1.0
    • mean: 1.0
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    What are the advantages of using flow-volume targets over cash transfers in SCION agreements? The document discusses optimization strategies for interconnection agreements in SCION, focusing on flow-volume targets and cash compensation mechanisms. The optimization problem involves determining the maximum customer demand for new path segments, denoted as ∆fmax P, and adjusting flow allowances f(a) and additional traffic ∆f(a) accordingly. Flow-volume targets provide predictability and enforceable limits, enhancing the likelihood of positive utility outcomes. Conversely, cash compensation agreements, defined by a payment π between Autonomous Systems (ASes), offer flexibility and can be concluded even when flow-volume targets yield zero solutions. The Nash Bargaining Solution is employed to determine the cash transfer that balances utilities uD(a) and uE(a) of the negotiating parties. Both methods face challenges due to private information regarding costs and pricing, which can lead to inefficiencies in negotiations. The document introduces a mechanism-assisted negotiation approac... 1.0
    What is the significance of the variable 'auth⇄a' in packet forwarding? The document chunk presents a formal model for secure packet forwarding protocols within the SCION architecture, detailing the dispatch and handling of packets across internal and external channels. The functions dispatch-inta, dispatch-intc, dispatch-exta, and dispatch-extc manage the addition of packets to internal and external send queues based on their authorization status and historical context. The model defines packet structures (PKTa) that encapsulate the future path (fut), past path (past), and historical path (hist), with authorization checks against the set of authorized paths (autha) and their fragment closures (auth⇄a). The environment parameters include the set of compromised nodes (Nattr), which influences the security model by introducing potential adversarial actions. The state representation employs asynchronous channels for packet transmission, with the initial state having all channels empty. The model emphasizes the importance of maintaining i... 1.0
    How can SNC and COLIBRI SegRs be combined to enhance DRKey bootstrapping security? The document chunk discusses the Protected DRKey Bootstrapping mechanism within the SCION architecture, emphasizing the integration of Service Network Controllers (SNC) and COLIBRI Segments (SegRs) to enhance the security of DRKey bootstrapping. This approach mitigates the denial-of-DRKey attack surface by ensuring that pre-shared DRKeys are securely established. The adversary model considers off-path adversaries capable of modifying, dropping, or injecting packets, while maintaining that the reservation path remains uncontested by adversaries. The document highlights the necessity of path stability in the underlying network architecture for effective bandwidth-reservation protocols. Various queuing disciplines, such as distributed weighted fair queuing (DWFQ) and rate-controlled priority queuing (PQ), are discussed for traffic isolation, with specific mechanisms to prevent starvation of lower-priority traffic. Control-plane traffic is rate-limited at infrastructure services and border... 1.0
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • num_train_epochs: 2
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step dev-evaluation_cosine_ndcg@10
1.0 84 0.4114
2.0 168 0.4209

Framework Versions

  • Python: 3.12.3
  • Sentence Transformers: 3.4.1
  • Transformers: 4.49.0
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.4.0
  • Datasets: 3.3.2
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}