Upload from huggingface_hub

Browse files

Files changed (13) hide show

.gitattributes +1 -0
1_Pooling/config.json +10 -0
README.md +609 -0
config.json +49 -0
config_sentence_transformers.json +14 -0
configuration.py +145 -0
model.safetensors +3 -0
modeling.py +1418 -0
modules.json +20 -0
sentence_bert_config.json +4 -0
special_tokens_map.json +51 -0
tokenizer.json +3 -0
tokenizer_config.json +55 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+    "word_embedding_dimension": 768,
+    "pooling_mode_cls_token": true,
+    "pooling_mode_mean_tokens": false,
+    "pooling_mode_max_tokens": false,
+    "pooling_mode_mean_sqrt_len_tokens": false,
+    "pooling_mode_weightedmean_tokens": false,
+    "pooling_mode_lasttoken": false,
+    "include_prompt": true
+}

README.md ADDED Viewed

@@ -0,0 +1,609 @@
  <br>bồi dưỡng được cấp cho học viên có đủ các điều kiện sau đây:1. Tham gia học tập đầy
  <br>đủ theo quy định của chương trình bồi dưỡng.<br><br>2. Có đủ các bài kiểm
  <br>tra, viết thu hoạch, tiểu luận, đề án theo quy định của chương trình bồi dưỡng;
  <br>các bài kiểm tra, viết thu hoạch, tiểu luận, đề án phải đạt từ 50% số điểm trở
  <br>lên theo thang điểm quy định của chương trình.<br><br>3. Chấp hành đúng, đầy
  <br>đủ các quy định của pháp luật về bồi dưỡng cán bộ, công chức, viên chức; quy
  <br>chế, nội quy học tập của Học viện Hành chính Quốc gia; cơ sở đào tạo, bồi
  <br>dưỡng; cơ sở đào tạo, nghiên cứu.</code>                                                                                                                                                                                                                                                                                                                                                                                               | <code>1.0</code> |

+---
+tags:
+- sentence-transformers
+- sentence-similarity
+- feature-extraction
+- dense
+- generated_from_trainer
+- dataset_size:2864
+- loss:MultipleNegativesRankingLoss
+base_model: Alibaba-NLP/gte-multilingual-base
+widget:
+- source_sentence: Điều kiện hưởng chính sách hỗ trợ chi phí học tập đối với sinh
+    viên là người dân tộc thiểu số là gì?
+  sentences:
+  - "1. Cơ sở giáo dục có vốn đầu tư nước ngoài được thành lập và hoạt động dưới hình\
+    \ thức trường hoặc trung tâm và được đặt tên theo quy định sau:\t\ta) Đối với\
+    \ trường, tên phải bao gồm các yếu tố cấu thành được sắp xếp theo trật tự: “Trường”,\
+    \ “Cấp học hoặc trình độ đào tạo” và tên riêng;\t\tb) Đối với cơ sở đào tạo, bồi\
+    \ dưỡng ngắn hạn, tên phải bao gồm các yếu tố cấu thành được sắp xếp theo trật\
+    \ tự: “Trung tâm giáo dục hoặc đào tạo”, “Ngành hoặc nhóm ngành đào tạo chính”\
+    \ và tên riêng;\t\tc) Đối với phân hiệu của cơ sở giáo dục đại học có vốn đầu\
+    \ tư nước ngoài tại Việt Nam thì tên phải bao gồm các yếu tố cấu thành được sắp\
+    \ xếp theo trật tự: “Phân hiệu”, “Tên cơ sở giáo dục đại học có vốn đầu tư nước\
+    \ ngoài” và “tại tỉnh, thành phố”.\n\n2. Tên riêng của cơ sở giáo dục có vốn đầu\
+    \ tư nước ngoài không được đặt trùng hoặc gây nhầm lẫn với tên của cơ sở giáo\
+    \ dục đã đăng ký, với tên của doanh nghiệp thực hiện dự án đầu tư; không sử dụng\
+    \ từ ngữ, ký hiệu vi phạm truyền thống lịch sử, văn hóa, đạo đức và thuần phong\
+    \ mỹ tục của dân tộc Việt Nam.\n\n3. Cơ sở giáo dục có vốn đầu tư nước ngoài có\
+    \ tên giao dịch bằng tiếng Việt và tên giao dịch quốc tế bằng tiếng Anh (hoặc\
+    \ bằng một ngôn ngữ nước ngoài thông dụng khác) với nội dung tương đương.\n\n\
+    4. Bộ trưởng Bộ Giáo dục và Đào tạo xem xét, quyết định việc đặt tên một số cơ\
+    \ sở giáo dục mang tính đặc thù."
+  - 'Để được hưởng chính sách hỗ trợ chi phí học tập, sinh viên phải đáp ứng đủ các
+    điều kiện sau:
+    1. Sinh viên là người dân tộc thiểu số thuộc hộ nghèo và hộ cận nghèo theo quy
+    định của Thủ tướng Chính phủ phê duyệt theo từng thời kỳ.
+    2. Thi đỗ vào học đại học, cao đẳng hệ chính quy tại các cơ sở giáo dục đại học,
+    bao gồm: đại học, học viện, trường đại học, trường cao đẳng tại năm tham dự tuyển
+    sinh theo đúng quy định của cơ quan Nhà nước có thẩm quyền về quy chế tuyển sinh
+    đại học, cao đẳng.'
+  - "1. Phá hoại các trang thiết bị điện, thiết bị đo đếm điện và công trình điện\
+    \ lực.\n\n2. Hoạt động điện lực không có giấy phép theo quy định của Luật này.\n\
+    \n3. Đóng, cắt điện trái quy định của pháp luật.\n\n4. Vi phạm các quy định về\
+    \ an toàn trong phát điện, truyền tải điện, phân phối điện và sử dụng điện.\n\n\
+    5. Cản trở việc kiểm tra hoạt động điện lực và sử dụng điện.\n\n6. Trộm cắp điện.\n\
+    \n7. Sử dụng điện để bẫy, bắt động vật hoặc làm phương tiện bảo vệ, trừ trường\
+    \ hợp được quy định tại Điều 59 của Luật này.\n\n8. Vi phạm các quy định về bảo\
+    \ vệ hành lang an toàn lưới điện, khoảng cách an toàn của đường dây và trạm điện.\
+    \ \n\n9. Cung cấp thông tin không trung thực làm tổn hại đến quyền và lợi ích\
+    \ hợp pháp của tổ chức, cá nhân hoạt động điện lực và sử dụng điện.\n\n10. Lợi\
+    \ dụng chức vụ, quyền hạn để gây sách nhiễu, phiền hà, thu lợi bất chính trong\
+    \ hoạt động điện lực và sử dụng điện. \n\n11. Các hành vi khác vi phạm quy định\
+    \ của pháp luật về điện lực."
+- source_sentence: 2. Vi phạm quy định về tổ chức, hoạt động của trung tâm tư vấn,
+    hỗ trợ hôn nhân và gia đình có yếu tố nước ngoài bị xử phạt như thế nào?
+  sentences:
+  - '1. Hàng năm, Bộ Nông nghiệp và Phát triển nông thôn tổ chức xây dựng, phê duyệt
+    kế hoạch kiểm tra định kỳ hoặc đột xuất để kiểm tra, đánh giá việc tổ chức thực
+    hiện, tiến độ, kết quả, hiệu quả dự án khuyến nông trung ương. Tần suất kiểm tra
+    được thực hiện ít nhất một lần cho mỗi dự án trong thời gian thực hiện.
+    2. Trước khi tiến hành kiểm tra (tối thiểu 05 ngày làm việc), đoàn kiểm tra thông
+    báo thời gian, địa điểm kiểm tra tới tổ chức chủ trì, chủ nhiệm dự án và các thành
+    viên đoàn kiểm tra. Tổ chức chủ trì, chủ nhiệm dự án làm báo cáo tiến độ thực
+    hiện dự án gửi đoàn kiểm tra để phục vụ công tác kiểm tra.
+    3. Trong quá trình thực hiện dự án, tổ chức chủ trì tự tổ chức kiểm tra dự án
+    và gửi báo cáo kết quả kiểm tra về Bộ Nông nghiệp và Phát triển nông thôn trước
+    ngày 15 tháng 6 và 15 tháng 12 hàng năm hoặc đột xuất khi có yêu cầu.'
+  - '1. Chủ sở hữu nhà ở có quyền thế chấp nhà ở đang cho thuê nhưng phải thông báo
+    bằng văn bản cho bên thuê nhà ở biết trước về việc thế chấp. Bên thuê nhà ở được
+    tiếp tục thuê đến hết hạn hợp đồng thuê nhà ở.
+    2. Trường hợp nhà ở đang thuê bị xử lý để thực hiện nghĩa vụ của bên thế chấp
+    nhà ở thì bên thuê nhà ở được tiếp tục thuê đến hết hạn hợp đồng, trừ trường hợp
+    bên thuê nhà ở vi phạm các quy định tại khoản 2 Điều 132 của Luật này hoặc các
+    bên có thỏa thuận khác.'
+  - "1. Phạt tiền từ 1.000.000 đồng đến 3.000.000 đồng đối với hành vi không thông\
+    \ báo cho cơ quan nhà nước có thẩm quyền khi có sự thay đổi về tên gọi, trụ sở,\
+    \ người đứng đầu.\n2. Phạt tiền từ 3.000.000 đồng đến 5.000.000 đồng đối với một\
+    \ trong các hành vi sau:\t\ta) Tẩy xoá, sửa chữa làm sai lệch nội dung giấy tờ,\
+    \ văn bản do cơ quan, tổ chức, người có thẩm quyền cấp trong hồ sơ đăng ký hoạt\
+    \ động hoặc thay đổi nội dung đăng ký hoạt động;\t\tb) Tẩy xoá, sửa chữa làm sai\
+    \ lệch nội dung giấy đăng ký hoạt động;\t\tc) Không thực hiện chế độ báo cáo;\
+    \ không lập, quản lý hoặc sử dụng không đúng sổ sách, biểu mẫu;\t\td) Không tư\
+    \ vấn, hỗ trợ cho người có yêu cầu tư vấn, hỗ trợ về hôn nhân và gia đình có yếu\
+    \ tố nước ngoài;\t\tđ) Không cấp giấy xác nhận của trung tâm theo yêu cầu của\
+    \ người được tư vấn, hỗ trợ;\t\te) Không công bố công khai mức thù lao theo quy\
+    \ định.\n3. Phạt tiền từ 5.000.000 đồng đến 7.000.000 đồng đối với một trong các\
+    \ hành vi sau:\t\ta) Khai không đúng sự thật trong tờ khai đăng ký hoạt động;\t\
+    \tb) Cung cấp giấy tờ chứng minh về trụ sở không đúng sự thật;\t\tc) Thay đổi\
+    \ tên gọi, trụ sở, người đứng đầu hoặc nội dung đăng ký hoạt động mà không có\
+    \ xác nhận của cơ quan có thẩm quyền.\n4. Phạt tiền từ 7.000.000 đồng đến 10.000.000\
+    \ đồng đối với một trong các hành vi sau:\t\ta) Hoạt động trong trường hợp không\
+    \ còn đủ điều kiện hoạt động;\t\tb) Hoạt động vượt quá phạm vi, nội dung ghi trong\
+    \ giấy đăng ký hoạt động;\t\tc) Đòi hỏi tiền hoặc lợi ích khác ngoài thù lao theo\
+    \ quy định khi thực hiện tư vấn, hỗ trợ về hôn nhân và gia đình có yếu tố nước\
+    \ ngoài;\t\td) Không giữ bí mật các thông tin, tư liệu về đời sống riêng tư, bí\
+    \ mật cá nhân, bí mật gia đình của các bên theo quy định của pháp luật.\n5. Phạt\
+    \ tiền từ 10.000.000 đồng đến 20.000.000 đồng đối với một trong các hành vi sau:\t\
+    \ta) Thực hiện các hoạt động tư vấn, hỗ trợ hôn nhân và gia đình có yếu tố nước\
+    \ ngoài khi chưa có giấy đăng ký hoạt động;\t\tb) Lợi dụng việc tư vấn, hỗ trợ\
+    \ hôn nhân và gia đình có yếu tố nước ngoài nhằm mua bán người, bóc lột sức lao\
+    \ động, xâm phạm tình dục hoặc vì mục đích trục lợi khác."
+- source_sentence: Tiền thưởng thêm từ hoàn thành tốt hợp đồng có cần phải xuất hóa
+    đơn không?
+  sentences:
+  - "1. Người nào cướp giật tài sản của người khác, thì bị phạt tù từ 01 năm đến 05\
+    \ năm.\n\n2. Phạm tội thuộc một trong các trường hợp sau đây, thì bị phạt tù từ\
+    \ 03 năm đến 10 năm:\t\ta) Có tổ chức;\t\tb) Có tính chất chuyên nghiệp;\t\tc)\
+    \ Chiếm đoạt tài sản trị giá từ 50.000.000 đồng đến dưới 200.000.000 đồng;\t\t\
+    d) Dùng thủ đoạn nguy hiểm;\t\tđ) Hành hung để tẩu thoát;\t\te) Gây thương tích\
+    \ hoặc gây tổn hại cho sức khỏe của người khác mà tỷ lệ tổn thương cơ thể từ 11%\
+    \ đến 30%;\t\tg) Phạm tội đối với người dưới 16 tuổi, phụ nữ mà biết là có thai,\
+    \ người già yếu hoặc người không có khả năng tự vệ;\t\th) Gây ảnh hưởng xấu đến\
+    \ an ninh, trật tự, an toàn xã hội;\t\ti) Tái phạm nguy hiểm.\n\n3. Phạm tội thuộc\
+    \ một trong các trường hợp sau đây, thì bị phạt tù từ 07 năm đến 15 năm:\t\ta)\
+    \ Chiếm đoạt tài sản trị giá từ 200.000.000 đồng đến dưới 500.000.000 đồng;\t\t\
+    b) Gây thương tích hoặc gây tổn hại cho sức khỏe của người khác mà tỷ lệ tổn thương\
+    \ cơ thể từ 31% đến 60%;\t\tc) Lợi dụng thiên tai, dịch bệnh.\n\n4. Phạm tội thuộc\
+    \ một trong các trường hợp sau đây, thì bị phạt tù từ 12 năm đến 20 năm hoặc tù\
+    \ chung thân:\t\ta) Chiếm đoạt tài sản trị giá 500.000.000 đồng trở lên;\t\tb)\
+    \ Gây thương tích hoặc gây tổn hại cho sức khỏe của 01 người mà tỷ lệ tổn thương\
+    \ cơ thể 61% trở lên hoặc gây thương tích hoặc gây tổn hại cho sức khỏe của 02\
+    \ người trở lên mà tỷ lệ tổn thương cơ thể của mỗi người 31% trở lên;\t\tc) Làm\
+    \ chết người;\t\td) Lợi dụng hoàn cảnh chiến tranh, tình trạng khẩn cấp.\n\n5.\
+    \ Người phạm tội còn có thể bị phạt tiền từ 10.000.000 đồng đến 100.000.000 đồng."
+  - 'Bắt buộc phải trưng cầu giám định khi cần xác định:
+    1. Tình trạng tâm thần của người bị buộc tội khi có sự nghi ngờ về năng lực trách
+    nhiệm hình sự của họ; tình trạng tâm thần của người làm chứng hoặc bị hại khi
+    có sự nghi ngờ về khả năng nhận thức, khả năng khai báo đúng đắn về những tình
+    tiết của vụ án;
+    2. Tuổi của bị can, bị cáo, bị hại nếu việc đó có ý nghĩa đối với việc giải quyết
+    vụ án và không có tài liệu để xác định chính xác tuổi của họ hoặc có nghi ngờ
+    về tính xác thực của những tài liệu đó;
+    3. Nguyên nhân chết người;
+    4. Tính chất thương tích, mức độ tổn hại sức khoẻ hoặc khả năng lao động;
+    5. Chất ma tuý, vũ khí quân dụng, vật liệu nổ, chất cháy, chất độc, chất phóng
+    xạ, tiền giả, vàng, bạc, kim khí quý, đá quý, đồ cổ;
+    6. Mức độ ô nhiễm môi trường.'
+  - '1. Tổ chức, cá nhân nhận các khoản thu về bồi thường bằng tiền (bao gồm cả tiền
+    bồi thường về đất và tài sản trên đất khi bị thu hồi đất theo quyết định của cơ
+    quan Nhà nước có thẩm quyền), tiền thưởng, tiền hỗ trợ, tiền chuyển nhượng quyền
+    phát thải và các khoản thu tài chính khác.Cơ sở kinh doanh khi nhận khoản tiền
+    thu về bồi thường, tiền thưởng, tiền hỗ trợ nhận được, tiền chuyển nhượng quyền
+    phát thải và các khoản thu tài chính khác thì lập chứng từ thu theo quy định.
+    Đối với cơ sở kinh doanh chi tiền, căn cứ mục đích chi để lập chứng từ chi tiền.Trường
+    hợp bồi thường bằng hàng hóa, dịch vụ, cơ sở bồi thường phải lập hóa đơn và kê
+    khai, tính, nộp thuế GTGT như đối với bán hàng hóa, dịch vụ; cơ sở nhận bồi thường
+    kê khai, khấu trừ theo quy định.Trường hợp cơ sở kinh doanh nhận tiền của tổ chức,
+    cá nhân để thực hiện dịch vụ cho tổ chức, cá nhân như sửa chữa, bảo hành, khuyến
+    mại, quảng cáo thì phải kê khai, nộp thuế theo quy định.Ví dụ 10: Công ty TNHH
+    P&C nhận được khoản tiền lãi từ việc mua trái phiếu và tiền cổ tức từ việc mua
+    cổ phiếu của các doanh nghiệp khác. Công ty TNHH P&C không phải kê khai, nộp thuế
+    GTGT đối với khoản tiền lãi từ việc mua trái phiếu và tiền cổ tức nhận được.Ví
+    dụ 11: Doanh nghiệp A nhận được khoản bồi thường thiệt hại do bị hủy hợp đồng
+    từ doanh nghiệp B là 50 triệu đồng thì doanh nghiệp A lập chứng từ thu và không
+    phải kê khai, nộp thuế GTGT đối với khoản tiền trên.Ví dụ 12: Doanh nghiệp X mua
+    hàng của doanh nghiệp Y, doanh nghiệp X có ứng trước cho doanh nghiệp Y một khoản
+    tiền và được doanh nghiệp Y trả lãi cho khoản tiền ứng trước đó thì doanh nghiệp
+    X không phải kê khai, nộp thuế GTGT đối với khoản lãi nhận được.Ví dụ 13: Doanh
+    nghiệp X bán hàng cho doanh nghiệp Z, tổng giá thanh toán là 440 triệu đồng. Theo
+    hợp đồng, doanh nghiệp Z thanh toán trả chậm trong vòng 3 tháng, lãi suất trả
+    chậm là 1%/tháng/tổng giá thanh toán của hợp đồng. Sau 3 tháng, doanh nghiệp X
+    nhận được từ doanh nghiệp Z tổng giá trị thanh toán của hợp đồng là 440 triệu
+    đồng và số tiền lãi chậm trả là 13,2 triệu đồng (440 triệu đồng x 1% x 3 tháng)
+    thì doanh nghiệp X không phải kê khai, nộp thuế GTGT đối với khoản tiền 13,2 triệu
+    đồng này.Ví dụ 14: Doanh nghiệp bảo hiểm A và Công ty B ký hợp đồng bảo hiểm với
+    hình thức bảo hiểm bằng tiền. Khi có rủi ro bảo hiểm xảy ra, doanh nghiệp bảo
+    hiểm A bồi thường cho Công ty B bằng tiền theo quy định của pháp luật về bảo hiểm.
+    Công ty B không phải kê khai, nộp thuế GTGT đối với số tiền bồi thường bảo hiểm
+    nhận được.Ví dụ 15: Công ty cổ phần Sữa ABC có chi tiền cho các nhà phân phối
+    (là tổ chức, cá nhân kinh doanh) để thực hiện chương trình khuyến mại (theo quy
+    định của pháp luật về hoạt động xúc tiến thương mại), tiếp thị, trưng bày sản
+    phẩm cho Công ty (nhà phân phối nhận tiền này để thực hiện dịch vụ cho Công ty)
+    thì khi nhận tiền, trường hợp nhà phân phối là người nộp thuế GTGT theo phương
+    pháp khấu trừ lập hóa đơn GTGT và tính thuế GTGT theo thuế suất 10%, trường hợp
+    nhà phân phối là người nộp thuế GTGT theo phương pháp trực tiếp thì sử dụng hóa
+    đơn bán hàng và xác định số thuế phải nộp theo tỷ lệ (%) trên doanh thu theo quy
+    định.'
+- source_sentence: Công trình trong đô thị cần đáp ứng điều kiện nào để được cấp giấy
+    phép xây dựng?
+  sentences:
+  - Mức phạt tiền của mỗi hành vi vi phạm hành chính quy định tại Nghị định này là
+    mức phạt áp dụng đối với cá nhân; trường hợp có cùng một hành vi vi phạm hành
+    chính thì mức phạt tiền đối với tổ chức bằng 02 lần mức phạt tiền đối với cá nhân.
+  - "1. Việc đánh giá viên chức được xem xét theo các nội dung sau:\t\ta) Kết quả\
+    \ thực hiện công việc hoặc nhiệm vụ theo hợp đồng làm việc đã ký kết;\t\tb) Việc\
+    \ thực hiện quy định về đạo đức nghề nghiệp;\t\tc) Tinh thần trách nhiệm, thái\
+    \ độ phục vụ nhân dân, tinh thần hợp tác với đồng nghiệp và việc thực hiện quy\
+    \ tắc ứng xử của viên chức;\t\td) Việc thực hiện các nghĩa vụ khác của viên chức.\n\
+    \n2. Việc đánh giá viên chức quản lý được xem xét theo các nội dung quy định tại\
+    \ khoản 1 Điều này và các nội dung sau:\t\ta) Năng lực lãnh đạo, quản lý, điều\
+    \ hành và tổ chức thực hiện nhiệm vụ;\t\tb) Kết quả hoạt động của đơn vị được\
+    \ giao quản lý, phụ trách.\n\n3. Việc đánh giá viên chức được thực hiện hàng năm;\
+    \ khi kết thúc thời gian tập sự; trước khi ký tiếp hợp đồng làm việc; thay đổi\
+    \ vị trí việc làm; xét khen thưởng, kỷ luật, bổ nhiệm, bổ nhiệm lại, quy hoạch,\
+    \ đào tạo, bồi đường."
+  - "1. Đơn đề nghị cấp giấy phép di dời công trình.\n\n2. Bản sao giấy tờ chứng minh\
+    \ quyền sử dụng đất nơi công trình sẽ di dời đến và giấy tờ hợp pháp về sở hữu\
+    \ công trình theo quy định của pháp luật.\n\n3. Bản vẽ hoàn công công trình (nếu\
+    \ có) hoặc bản vẽ thiết kế mô tả thực trạng công trình được di dời, gồm mặt bằng,\
+    \ mặt cắt móng và bản vẽ kết cấu chịu lực chính; bản vẽ tổng mặt bằng địa điểm\
+    \ công trình sẽ được di dời tới; bản vẽ mặt bằng, mặt cắt móng tại địa điểm công\
+    \ trình sẽ di dời đến.\n\n4. Báo cáo kết quả khảo sát đánh giá chất lượng hiện\
+    \ trạng của công trình do tổ chức, cá nhân có đủ điều kiện năng lực thực hiện.\n\
+    \n5. Phương án di dời do tổ chức, cá nhân có đủ điều kiện năng lực thực hiện gồm:\t\
+    \ta) Phần thuyết minh về hiện trạng công trình và khu vực công trình sẽ được di\
+    \ dời đến; giải pháp di dời, phương án bố trí sử dụng phương tiện, thiết bị, nhân\
+    \ lực; giải pháp bảo đảm an toàn cho công trình, người, máy móc, thiết bị và công\
+    \ trình lân cận; bảo đảm vệ sinh môi trường; tiến độ di dời; tổ chức, cá nhân\
+    \ thực hiện di dời công trình;\t\tb) Phần bản vẽ biện pháp thi công di dời công\
+    \ trình."
+- source_sentence: Trách nhiệm của Trưởng phòng Tiền tệ - Kho quỹ Ngân hàng Nhà nước
+    chi nhánh trong việc quản lý tiền mặt, tài sản quý, giấy tờ có giá là gì?
+  sentences:
+  - '1. Hướng dẫn, kiểm tra nghiệp vụ quản lý an toàn kho quỹ; tổ chức việc thu, chi
+    (xuất, nhập), bảo quản, vận chuyển tiền mặt, tài sản quý, giấy tờ có giá theo
+    quy định.
+    2. Hướng dẫn, kiểm tra việc mở và ghi chép sổ sách của thủ quỹ, thủ kho tiền.
+    3. Tham gia kiểm tra, kiểm kê, bàn giao tiền mặt, tài sản quý, giấy tờ có giá.'
+  - "1. Người có đủ các điều kiện sau đây không phân biệt dân tộc, nam nữ, thành phần\
+    \ xã hội, tín ngưỡng, tôn giáo được đăng ký dự tuyển viên chức: \t\ta) Có quốc\
+    \ tịch Việt Nam và cư trú tại Việt Nam;\t\tb) Từ đủ 18 tuổi trở lên. Đối với một\
+    \ số lĩnh vực hoạt động văn hóa, nghệ thuật, thể dục, thể thao, tuổi dự tuyển\
+    \ có thể thấp hơn theo quy định của pháp luật; đồng thời, phải có sự đồng ý bằng\
+    \ văn bản của người đại diện theo pháp luật;\t\tc) Có đơn đăng ký dự tuyển;\t\t\
+    d) Có lý lịch rõ ràng;\t\tđ) Có văn bằng, chứng chỉ đào tạo, chứng chỉ hành nghề\
+    \ hoặc có năng khiếu kỹ năng phù hợp với vị trí việc làm;\t\te) Đủ sức khoẻ để\
+    \ thực hiện công việc hoặc nhiệm vụ;\t\tg) Đáp ứng các điều kiện khác theo yêu\
+    \ cầu của vị trí việc làm do đơn vị sự nghiệp công lập xác định nhưng không được\
+    \ trái với quy định của pháp luật.\n\n2. Những người sau đây không được đăng ký\
+    \ dự tuyển viên chức:\t\ta) Mất năng lực hành vi dân sự hoặc bị hạn chế năng lực\
+    \ hành vi dân sự;\t\tb) Đang bị truy cứu trách nhiệm hình sự; đang chấp hành bản\
+    \ án, quyết định về hình sự của Tòa án; đang bị áp dụng biện pháp xử lý hành chính\
+    \ đưa vào cơ sở chữa bệnh, cơ sở giáo dục, trường giáo dưỡng."
+  - "1. Người nộp thuế phát hiện hồ sơ khai thuế đã nộp cho cơ quan thuế có sai, sót\
+    \ thì được khai bổ sung hồ sơ khai thuế trong thời hạn 10 năm kể từ ngày hết thời\
+    \ hạn nộp hồ sơ khai thuế của kỳ tính thuế có sai, sót nhưng trước khi cơ quan\
+    \ thuế, cơ quan có thẩm quyền công bố quyết định thanh tra, kiểm tra.\n\n2. Khi\
+    \ cơ quan thuế, cơ quan có thẩm quyền đã công bố quyết định thanh tra, kiểm tra\
+    \ thuế tại trụ sở của người nộp thuế thì người nộp thuế vẫn được khai bổ sung\
+    \ hồ sơ khai thuế; cơ quan thuế thực hiện xử phạt vi phạm hành chính về quản lý\
+    \ thuế đối với hành vi quy định tại Điều 142 và Điều 143 của Luật này.\n\n3. Sau\
+    \ khi cơ quan thuế, cơ quan có thẩm quyền đã ban hành kết luận, quyết định xử\
+    \ lý về thuế sau thanh tra, kiểm tra tại trụ sở của người nộp thuế thì việc khai\
+    \ bổ sung hồ sơ khai thuế được quy định như sau:\t\ta) Người nộp thuế được khai\
+    \ bổ sung hồ sơ khai thuế đối với trường hợp làm tăng số tiền thuế phải nộp, giảm\
+    \ số tiền thuế được khấu trừ hoặc giảm số tiền thuế được miễn, giảm, hoàn và bị\
+    \ xử phạt vi phạm hành chính về quản lý thuế đối với hành vi quy định tại Điều\
+    \ 142 và Điều 143 của Luật này;\t\tb) Trường hợp người nộp thuế phát hiện hồ sơ\
+    \ khai thuế có sai, sót nếu khai bổ sung làm giảm số tiền thuế phải nộp hoặc làm\
+    \ tăng số tiền thuế được khấu trừ, tăng số tiền thuế được miễn, giảm, hoàn thì\
+    \ thực hiện theo quy định về giải quyết khiếu nại về thuế.\n\n4. Hồ sơ khai bổ\
+    \ sung hồ sơ khai thuế bao gồm:\t\ta) Tờ khai bổ sung;\t\tb) Bản giải trình khai\
+    \ bổ sung và các tài liệu có liên quan.\n\n5. Đối với hàng hóa xuất khẩu, nhập\
+    \ khẩu, việc khai bổ sung hồ sơ khai thuế thực hiện theo quy định của pháp luật\
+    \ về hải quan."
+pipeline_tag: sentence-similarity
+library_name: sentence-transformers
+---
+# SentenceTransformer based on Alibaba-NLP/gte-multilingual-base
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Alibaba-NLP/gte-multilingual-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-base). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
+## Model Details
+### Model Description
+- **Model Type:** Sentence Transformer
+- **Base model:** [Alibaba-NLP/gte-multilingual-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-base) <!-- at revision 9bbca17d9273fd0d03d5725c7a4b0f6b45142062 -->
+- **Maximum Sequence Length:** 512 tokens
+- **Output Dimensionality:** 768 dimensions
+- **Similarity Function:** Cosine Similarity
+<!-- - **Training Dataset:** Unknown -->
+<!-- - **Language:** Unknown -->
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
+- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
+### Full Model Architecture
+```
+SentenceTransformer(
+  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'NewModel'})
+  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+  (2): Normalize()
+)
+```
+## Usage
+### Direct Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import SentenceTransformer
+# Download from the 🤗 Hub
+model = SentenceTransformer("sentence_transformers_model_id")
+# Run inference
+sentences = [
+    'Trách nhiệm của Trưởng phòng Tiền tệ - Kho quỹ Ngân hàng Nhà nước chi nhánh trong việc quản lý tiền mặt, tài sản quý, giấy tờ có giá là gì?',
+    '1. Hướng dẫn, kiểm tra nghiệp vụ quản lý an toàn kho quỹ; tổ chức việc thu, chi (xuất, nhập), bảo quản, vận chuyển tiền mặt, tài sản quý, giấy tờ có giá theo quy định.\n\n2. Hướng dẫn, kiểm tra việc mở và ghi chép sổ sách của thủ quỹ, thủ kho tiền.\n\n3. Tham gia kiểm tra, kiểm kê, bàn giao tiền mặt, tài sản quý, giấy tờ có giá.',
+    '1. Người có đủ các điều kiện sau đây không phân biệt dân tộc, nam nữ, thành phần xã hội, tín ngưỡng, tôn giáo được đăng ký dự tuyển viên chức: \t\ta) Có quốc tịch Việt Nam và cư trú tại Việt Nam;\t\tb) Từ đủ 18 tuổi trở lên. Đối với một số lĩnh vực hoạt động văn hóa, nghệ thuật, thể dục, thể thao, tuổi dự tuyển có thể thấp hơn theo quy định của pháp luật; đồng thời, phải có sự đồng ý bằng văn bản của người đại diện theo pháp luật;\t\tc) Có đơn đăng ký dự tuyển;\t\td) Có lý lịch rõ ràng;\t\tđ) Có văn bằng, chứng chỉ đào tạo, chứng chỉ hành nghề hoặc có năng khiếu kỹ năng phù hợp với vị trí việc làm;\t\te) Đủ sức khoẻ để thực hiện công việc hoặc nhiệm vụ;\t\tg) Đáp ứng các điều kiện khác theo yêu cầu của vị trí việc làm do đơn vị sự nghiệp công lập xác định nhưng không được trái với quy định của pháp luật.\n\n2. Những người sau đây không được đăng ký dự tuyển viên chức:\t\ta) Mất năng lực hành vi dân sự hoặc bị hạn chế năng lực hành vi dân sự;\t\tb) Đang bị truy cứu trách nhiệm hình sự; đang chấp hành bản án, quyết định về hình sự của Tòa án; đang bị áp dụng biện pháp xử lý hành chính đưa vào cơ sở chữa bệnh, cơ sở giáo dục, trường giáo dưỡng.',
+]
+embeddings = model.encode(sentences)
+print(embeddings.shape)
+# [3, 768]
+# Get the similarity scores for the embeddings
+similarities = model.similarity(embeddings, embeddings)
+print(similarities)
+# tensor([[ 1.0000,  0.8502, -0.0614],
+#         [ 0.8502,  1.0000, -0.0440],
+#         [-0.0614, -0.0440,  1.0000]])
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+<!--
+### Downstream Usage (Sentence Transformers)
+You can finetune this model on your own dataset.
+<details><summary>Click to expand</summary>
+</details>
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Dataset
+#### Unnamed Dataset
+* Size: 2,864 training samples
+* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
+* Approximate statistics based on the first 1000 samples:
+  |         | sentence_0                                                                        | sentence_1                                                                           | label                                                         |
+  |:--------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:--------------------------------------------------------------|
+  | type    | string                                                                            | string                                                                               | float                                                         |
+  | details | <ul><li>min: 14 tokens</li><li>mean: 23.6 tokens</li><li>max: 55 tokens</li></ul> | <ul><li>min: 15 tokens</li><li>mean: 299.02 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 1.0</li><li>mean: 1.0</li><li>max: 1.0</li></ul> |
+* Samples:
+  | sentence_0                                                                                                         | sentence_1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | label            |
+  |:-------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
+  | <code>Điều kiện để được cấp chứng chỉ bồi dưỡng của cán bộ, công chức, viên chức được quy định như thế nào?</code> | <code>Chứng chỉ chương trình
  <br>bồi dưỡng được cấp cho học viên có đủ các điều kiện sau đây:1. Tham gia học tập đầy
  <br>đủ theo quy định của chương trình bồi dưỡng.<br><br>2. Có đủ các bài kiểm
  <br>tra, viết thu hoạch, tiểu luận, đề án theo quy định của chương trình bồi dưỡng;
  <br>các bài kiểm tra, viết thu hoạch, tiểu luận, đề án phải đạt từ 50% số điểm trở
  <br>lên theo thang điểm quy định của chương trình.<br><br>3. Chấp hành đúng, đầy
  <br>đủ các quy định của pháp luật về bồi dưỡng cán bộ, công chức, viên chức; quy
  <br>chế, nội quy học tập của Học viện Hành chính Quốc gia; cơ sở đào tạo, bồi
  <br>dưỡng; cơ sở đào tạo, nghiên cứu.</code>                                                                                                                                                                                                                                                                                                                                                                                               | <code>1.0</code> |
+  | <code>Vi bằng phải đảm bảo nội dung và hình thức thế nào?</code>                                                   | <code>1. Vi bằng được lập bằng văn bản tiếng Việt, có nội dung chủ yếu sau đây:		a) Tên, địa chỉ Văn phòng Thừa phát lại; họ, tên Thừa phát lại lập vi bằng;		b) Địa điểm, thời gian lập vi bằng;		c) Họ, tên, địa chỉ người yêu cầu lập vi bằng;		d) Họ, tên người tham gia khác (nếu có);		đ) Nội dung yêu cầu lập vi bằng; nội dung cụ thể của sự kiện, hành vi được ghi nhận;		e) Lời cam đoan của Thừa phát lại về tính trung thực và khách quan trong việc lập vi bằng;		g) Chữ ký của Thừa phát lại, dấu Văn phòng Thừa phát lại, chữ ký hoặc dấu điểm chỉ của người yêu cầu, người tham gia khác (nếu có) và người có hành vi bị lập vi bằng (nếu họ có yêu cầu).Vi bằng có từ 02 trang trở lên thì từng trang phải được đánh số thứ tự; vi bằng có từ 02 tờ trở lên phải được đóng dấu giáp lai giữa các tờ; số lượng bản chính của mỗi vi bằng do các bên tự thỏa thuận.<br><br>2. Kèm theo vi bằng có thể có các tài liệu chứng minh; trường hợp tài liệu chứng minh do Thừa phát lại lập thì phải phù hợp với thẩm quyền, phạm vi theo q...</code>       | <code>1.0</code> |
+  | <code>Người cai nghiện tại cơ sở cai nghiện bắt buộc được thăm gặp người thân mấy lần mỗi tuần?</code>             | <code>1. Người cai nghiện được thăm gặp người thân tại phòng thăm gặp của cơ sở cai nghiện bắt buộc, một tuần một lần, mỗi lần không quá 02 giờ và tối đa không quá 03 thân nhân. Trường hợp gặp lâu hơn phải được Giám đốc cơ sở cai nghiện bắt buộc đồng ý và tối đa không quá 04 giờ.Người cai nghiện có vợ hoặc chồng, được Giám đốc cơ sở cai nghiện bắt buộc xem xét, cho phép thăm gặp tại phòng riêng của cơ sở cai nghiện bắt buộc một lần trong tháng và tối đa không quá 48 giờ cho một lần gặp. Căn cứ quy mô và điều kiện, cơ sở cai nghiện xây dựng, tổ chức phòng riêng để học viên thăm gặp vợ hoặc chồng.<br><br>2. Cơ sở cai nghiện xây dựng quy chế thăm gặp theo theo hướng dẫn của Bộ trưởng Bộ Lao động - Thương binh và Xã hội.<br><br>3. Người cai nghiện được nhận và gửi thư, nhận tiền, quà (trừ rượu, bia, thuốc lá, các chất kích thích, đồ vật và các loại văn hóa phẩm bị cấm). Cơ sở cai nghiện có trách nhiệm kiểm tra thư và các loại quà trước khi trao cho người cai nghiện. Riêng tiền hoặc giấy tờ có giá, người cai...</code> | <code>1.0</code> |
+* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
+  ```json
+  {
+      "scale": 20.0,
+      "similarity_fct": "cos_sim"
+  }
+  ```
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+- `per_device_train_batch_size`: 24
+- `per_device_eval_batch_size`: 24
+- `num_train_epochs`: 30
+- `fp16`: True
+- `multi_dataset_batch_sampler`: round_robin
+#### All Hyperparameters
+<details><summary>Click to expand</summary>
+- `overwrite_output_dir`: False
+- `do_predict`: False
+- `eval_strategy`: no
+- `prediction_loss_only`: True
+- `per_device_train_batch_size`: 24
+- `per_device_eval_batch_size`: 24
+- `per_gpu_train_batch_size`: None
+- `per_gpu_eval_batch_size`: None
+- `gradient_accumulation_steps`: 1
+- `eval_accumulation_steps`: None
+- `torch_empty_cache_steps`: None
+- `learning_rate`: 5e-05
+- `weight_decay`: 0.0
+- `adam_beta1`: 0.9
+- `adam_beta2`: 0.999
+- `adam_epsilon`: 1e-08
+- `max_grad_norm`: 1
+- `num_train_epochs`: 30
+- `max_steps`: -1
+- `lr_scheduler_type`: linear
+- `lr_scheduler_kwargs`: {}
+- `warmup_ratio`: 0.0
+- `warmup_steps`: 0
+- `log_level`: passive
+- `log_level_replica`: warning
+- `log_on_each_node`: True
+- `logging_nan_inf_filter`: True
+- `save_safetensors`: True
+- `save_on_each_node`: False
+- `save_only_model`: False
+- `restore_callback_states_from_checkpoint`: False
+- `no_cuda`: False
+- `use_cpu`: False
+- `use_mps_device`: False
+- `seed`: 42
+- `data_seed`: None
+- `jit_mode_eval`: False
+- `use_ipex`: False
+- `bf16`: False
+- `fp16`: True
+- `fp16_opt_level`: O1
+- `half_precision_backend`: auto
+- `bf16_full_eval`: False
+- `fp16_full_eval`: False
+- `tf32`: None
+- `local_rank`: 0
+- `ddp_backend`: None
+- `tpu_num_cores`: None
+- `tpu_metrics_debug`: False
+- `debug`: []
+- `dataloader_drop_last`: False
+- `dataloader_num_workers`: 0
+- `dataloader_prefetch_factor`: None
+- `past_index`: -1
+- `disable_tqdm`: False
+- `remove_unused_columns`: True
+- `label_names`: None
+- `load_best_model_at_end`: False
+- `ignore_data_skip`: False
+- `fsdp`: []
+- `fsdp_min_num_params`: 0
+- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
+- `fsdp_transformer_layer_cls_to_wrap`: None
+- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
+- `deepspeed`: None
+- `label_smoothing_factor`: 0.0
+- `optim`: adamw_torch
+- `optim_args`: None
+- `adafactor`: False
+- `group_by_length`: False
+- `length_column_name`: length
+- `ddp_find_unused_parameters`: None
+- `ddp_bucket_cap_mb`: None
+- `ddp_broadcast_buffers`: False
+- `dataloader_pin_memory`: True
+- `dataloader_persistent_workers`: False
+- `skip_memory_metrics`: True
+- `use_legacy_prediction_loop`: False
+- `push_to_hub`: False
+- `resume_from_checkpoint`: None
+- `hub_model_id`: None
+- `hub_strategy`: every_save
+- `hub_private_repo`: None
+- `hub_always_push`: False
+- `hub_revision`: None
+- `gradient_checkpointing`: False
+- `gradient_checkpointing_kwargs`: None
+- `include_inputs_for_metrics`: False
+- `include_for_metrics`: []
+- `eval_do_concat_batches`: True
+- `fp16_backend`: auto
+- `push_to_hub_model_id`: None
+- `push_to_hub_organization`: None
+- `mp_parameters`:
+- `auto_find_batch_size`: False
+- `full_determinism`: False
+- `torchdynamo`: None
+- `ray_scope`: last
+- `ddp_timeout`: 1800
+- `torch_compile`: False
+- `torch_compile_backend`: None
+- `torch_compile_mode`: None
+- `include_tokens_per_second`: False
+- `include_num_input_tokens_seen`: False
+- `neftune_noise_alpha`: None
+- `optim_target_modules`: None
+- `batch_eval_metrics`: False
+- `eval_on_start`: False
+- `use_liger_kernel`: False
+- `liger_kernel_config`: None
+- `eval_use_gather_object`: False
+- `average_tokens_across_devices`: False
+- `prompts`: None
+- `batch_sampler`: batch_sampler
+- `multi_dataset_batch_sampler`: round_robin
+- `router_mapping`: {}
+- `learning_rate_mapping`: {}
+</details>
+### Training Logs
+| Epoch   | Step | Training Loss |
+|:-------:|:----:|:-------------:|
+| 8.3333  | 500  | 0.0984        |
+| 16.6667 | 1000 | 0.0354        |
+| 25.0    | 1500 | 0.0288        |
+### Framework Versions
+- Python: 3.10.12
+- Sentence Transformers: 5.0.0
+- Transformers: 4.54.0
+- PyTorch: 2.7.1+cu126
+- Accelerate: 1.9.0
+- Datasets: 4.0.0
+- Tokenizers: 0.21.2
+## Citation
+### BibTeX
+#### Sentence Transformers
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://arxiv.org/abs/1908.10084",
+}
+```
+#### MultipleNegativesRankingLoss
+```bibtex
+@misc{henderson2017efficient,
+    title={Efficient Natural Language Response Suggestion for Smart Reply},
+    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
+    year={2017},
+    eprint={1705.00652},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config.json ADDED Viewed

	@@ -0,0 +1,49 @@

+{
+  "architectures": [
+    "NewModel"
+  ],
+  "attention_probs_dropout_prob": 0.0,
+  "auto_map": {
+    "AutoConfig": "configuration.NewConfig",
+    "AutoModel": "modeling.NewModel",
+    "AutoModelForMaskedLM": "Alibaba-NLP/new-impl--modeling.NewForMaskedLM",
+    "AutoModelForMultipleChoice": "Alibaba-NLP/new-impl--modeling.NewForMultipleChoice",
+    "AutoModelForQuestionAnswering": "Alibaba-NLP/new-impl--modeling.NewForQuestionAnswering",
+    "AutoModelForSequenceClassification": "Alibaba-NLP/new-impl--modeling.NewForSequenceClassification",
+    "AutoModelForTokenClassification": "Alibaba-NLP/new-impl--modeling.NewForTokenClassification"
+  },
+  "classifier_dropout": 0.0,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "id2label": {
+    "0": "LABEL_0"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "label2id": {
+    "LABEL_0": 0
+  },
+  "layer_norm_eps": 1e-12,
+  "layer_norm_type": "layer_norm",
+  "logn_attention_clip1": false,
+  "logn_attention_scale": false,
+  "max_position_embeddings": 8192,
+  "model_type": "new",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pack_qkv": true,
+  "pad_token_id": 1,
+  "position_embedding_type": "rope",
+  "rope_scaling": {
+    "factor": 8.0,
+    "type": "ntk"
+  },
+  "rope_theta": 20000,
+  "torch_dtype": "float32",
+  "transformers_version": "4.54.0",
+  "type_vocab_size": 1,
+  "unpad_inputs": false,
+  "use_memory_efficient_attention": false,
+  "vocab_size": 250048
+}

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "model_type": "SentenceTransformer",
+  "__version__": {
+    "sentence_transformers": "5.0.0",
+    "transformers": "4.54.0",
+    "pytorch": "2.7.1+cu126"
+  },
+  "prompts": {
+    "query": "",
+    "document": ""
+  },
+  "default_prompt_name": null,
+  "similarity_fn_name": "cosine"
+}

configuration.py ADDED Viewed

	@@ -0,0 +1,145 @@

+# coding=utf-8
+# Copyright 2024 The GTE Team Authors and Alibaba Group.
+# Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+""" NEW model configuration"""
+from transformers.configuration_utils import PretrainedConfig
+from transformers.utils import logging
+logger = logging.get_logger(__name__)
+class NewConfig(PretrainedConfig):
+    r"""
+    This is the configuration class to store the configuration of a [`NewModel`] or a [`TFNewModel`]. It is used to
+    instantiate a NEW model according to the specified arguments, defining the model architecture. Instantiating a
+    configuration with the defaults will yield a similar configuration to that of the NEW
+    [izhx/new-base-en](https://huggingface.co/izhx/new-base-en) architecture.
+    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
+    documentation from [`PretrainedConfig`] for more information.
+    Args:
+        vocab_size (`int`, *optional*, defaults to 30522):
+            Vocabulary size of the NEW model. Defines the number of different tokens that can be represented by the
+            `inputs_ids` passed when calling [`NewModel`] or [`TFNewModel`].
+        hidden_size (`int`, *optional*, defaults to 768):
+            Dimensionality of the encoder layers and the pooler layer.
+        num_hidden_layers (`int`, *optional*, defaults to 12):
+            Number of hidden layers in the Transformer encoder.
+        num_attention_heads (`int`, *optional*, defaults to 12):
+            Number of attention heads for each attention layer in the Transformer encoder.
+        intermediate_size (`int`, *optional*, defaults to 3072):
+            Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
+        hidden_act (`str` or `Callable`, *optional*, defaults to `"gelu"`):
+            The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
+            `"relu"`, `"silu"` and `"gelu_new"` are supported.
+        hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
+            The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
+        attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
+            The dropout ratio for the attention probabilities.
+        max_position_embeddings (`int`, *optional*, defaults to 512):
+            The maximum sequence length that this model might ever be used with. Typically set this to something large
+            just in case (e.g., 512 or 1024 or 2048).
+        type_vocab_size (`int`, *optional*, defaults to 2):
+            The vocabulary size of the `token_type_ids` passed when calling [`NewModel`] or [`TFNewModel`].
+        initializer_range (`float`, *optional*, defaults to 0.02):
+            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
+        layer_norm_eps (`float`, *optional*, defaults to 1e-12):
+            The epsilon used by the layer normalization layers.
+        position_embedding_type (`str`, *optional*, defaults to `"rope"`):
+            Type of position embedding. Choose one of `"absolute"`, `"rope"`.
+        rope_theta (`float`, *optional*, defaults to 10000.0):
+            The base period of the RoPE embeddings.
+        rope_scaling (`Dict`, *optional*):
+            Dictionary containing the scaling configuration for the RoPE embeddings. Currently supports two scaling
+            strategies: linear and dynamic. Their scaling factor must be a float greater than 1. The expected format is
+            `{"type": strategy name, "factor": scaling factor}`. When using this flag, don't update
+            `max_position_embeddings` to the expected new maximum. See the following thread for more information on how
+            these scaling strategies behave:
+            https://www.reddit.com/r/LocalLLaMA/comments/14mrgpr/dynamically_scaled_rope_further_increases/. This is an
+            experimental feature, subject to breaking API changes in future versions.
+        classifier_dropout (`float`, *optional*):
+            The dropout ratio for the classification head.
+    Examples:
+    ```python
+    >>> from transformers import NewConfig, NewModel
+    >>> # Initializing a NEW izhx/new-base-en style configuration
+    >>> configuration = NewConfig()
+    >>> # Initializing a model (with random weights) from the izhx/new-base-en style configuration
+    >>> model = NewModel(configuration)
+    >>> # Accessing the model configuration
+    >>> configuration = model.config
+    ```"""
+    model_type = "new"
+    def __init__(
+        self,
+        vocab_size=30528,
+        hidden_size=768,
+        num_hidden_layers=12,
+        num_attention_heads=12,
+        intermediate_size=3072,
+        hidden_act="gelu",
+        hidden_dropout_prob=0.1,
+        attention_probs_dropout_prob=0.0,
+        max_position_embeddings=2048,
+        type_vocab_size=1,
+        initializer_range=0.02,
+        layer_norm_type='layer_norm',
+        layer_norm_eps=1e-12,
+        # pad_token_id=0,
+        position_embedding_type="rope",
+        rope_theta=10000.0,
+        rope_scaling=None,
+        classifier_dropout=None,
+        pack_qkv=True,
+        unpad_inputs=False,
+        use_memory_efficient_attention=False,
+        logn_attention_scale=False,
+        logn_attention_clip1=False,
+        **kwargs,
+    ):
+        super().__init__(**kwargs)
+        self.vocab_size = vocab_size
+        self.hidden_size = hidden_size
+        self.num_hidden_layers = num_hidden_layers
+        self.num_attention_heads = num_attention_heads
+        self.hidden_act = hidden_act
+        self.intermediate_size = intermediate_size
+        self.hidden_dropout_prob = hidden_dropout_prob
+        self.attention_probs_dropout_prob = attention_probs_dropout_prob
+        self.max_position_embeddings = max_position_embeddings
+        self.type_vocab_size = type_vocab_size
+        self.initializer_range = initializer_range
+        self.layer_norm_type = layer_norm_type
+        self.layer_norm_eps = layer_norm_eps
+        self.position_embedding_type = position_embedding_type
+        self.rope_theta = rope_theta
+        self.rope_scaling = rope_scaling
+        self.classifier_dropout = classifier_dropout
+        self.pack_qkv = pack_qkv
+        self.unpad_inputs = unpad_inputs
+        self.use_memory_efficient_attention = use_memory_efficient_attention
+        self.logn_attention_scale = logn_attention_scale
+        self.logn_attention_clip1 = logn_attention_clip1

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4f0d0e530ea4cbccece8e1b744037b4f90de1e4e374707180b184052b04d641e
+size 1221487872

modeling.py ADDED Viewed

	@@ -0,0 +1,1418 @@

+# coding=utf-8
+# Copyright 2024 The GTE Team Authors and Alibaba Group.
+# Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""PyTorch NEW model."""
+import math
+from dataclasses import dataclass
+from typing import List, Optional, Tuple, Union
+import torch
+import torch.utils.checkpoint
+from torch import nn
+from transformers.activations import ACT2FN
+from transformers.modeling_outputs import (
+    BaseModelOutput,
+    BaseModelOutputWithPooling,
+    MaskedLMOutput,
+    MultipleChoiceModelOutput,
+    QuestionAnsweringModelOutput,
+    SequenceClassifierOutput,
+    ModelOutput,
+)
+from transformers.modeling_utils import PreTrainedModel
+from transformers.utils import logging
+try:
+    import xformers.ops as xops
+except ImportError as e:
+    xops = None
+from .configuration import NewConfig
+logger = logging.get_logger(__name__)
+# Adapted from https://github.com/HazyResearch/flash-attention/blob/main/flash_attn/bert_padding.py
+# Which was adapted from https://github.com/mlcommons/training_results_v1.1/blob/main/NVIDIA/benchmarks/bert/implementations/pytorch/padding.py
+class IndexFirstAxis(torch.autograd.Function):
+    @staticmethod
+    def forward(ctx, input, indices):
+        ctx.save_for_backward(indices)
+        assert input.ndim >= 2
+        ctx.first_axis_dim, other_shape = input.shape[0], input.shape[1:]
+        second_dim = other_shape.numel()
+        # TD [2022-03-04] For some reason torch.gather is a bit faster than indexing.
+        # return input[indices]
+        # return torch.gather(
+        #     rearrange(input, "b ... -> b (...)"), 0, repeat(indices, "z -> z d", d=second_dim)
+        # ).reshape(-1, *other_shape)
+        return torch.gather(
+            input.view(ctx.first_axis_dim, second_dim),
+            0,
+            indices.unsqueeze(-1).expand(indices.size(0), second_dim)
+        ).reshape(-1, *other_shape)
+    @staticmethod
+    def backward(ctx, grad_output):
+        (indices,) = ctx.saved_tensors
+        assert grad_output.ndim >= 2
+        other_shape = grad_output.shape[1:]
+        # grad_output = rearrange(grad_output, "b ... -> b (...)")
+        grad_output = grad_output.view(grad_output.size(0), other_shape.numel())
+        grad_input = torch.zeros(
+            [ctx.first_axis_dim, grad_output.shape[1]],
+            device=grad_output.device,
+            dtype=grad_output.dtype,
+        )
+        # TD [2022-03-04] For some reason torch.scatter is a bit faster than indexing.
+        # grad_input[indices] = grad_output
+        # grad_input.scatter_(0, repeat(indices, "z -> z d", d=grad_output.shape[1]), grad_output)
+        grad_input.scatter_(
+            0, indices.unsqueeze(-1).expand(indices.size(0), grad_output.size(1)), grad_output
+        )
+        return grad_input.reshape(ctx.first_axis_dim, *other_shape), None
+index_first_axis = IndexFirstAxis.apply
+def unpad_input(hidden_states, attention_mask=None, indices=None):
+    """
+    Arguments:
+        hidden_states: (batch, seqlen, ...)
+        attention_mask: (batch, seqlen), bool / int, 1 means valid and 0 means not valid.
+        indices: (total_nnz), the indices of non-masked tokens from the flattened input sequence.
+    Return:
+        hidden_states: (total_nnz, ...), where total_nnz = number of tokens in selected in attention_mask.
+    """
+    if indices is None:
+        assert attention_mask is not None
+        indices = torch.nonzero(attention_mask.flatten(), as_tuple=False).flatten()
+    # TD [2022-03-04] We don't want to index with a bool mask, because Pytorch will expand the
+    # bool mask, then call nonzero to get the indices, then index with those. The indices is @dim
+    # times larger than it needs to be, wasting memory. It's faster and more memory-efficient to
+    # index with integer indices. Moreover, torch's index is a bit slower than it needs to be,
+    # so we write custom forward and backward to make it a bit faster.
+    hidden_states = hidden_states.view(-1, *hidden_states.shape[2:])
+    return index_first_axis(hidden_states, indices)
+class IndexPutFirstAxis(torch.autograd.Function):
+    @staticmethod
+    def forward(
+        ctx,
+        values: torch.Tensor,
+        indices: torch.Tensor,
+        first_axis_dim
+    ) -> torch.Tensor:
+        ctx.save_for_backward(indices)
+        assert indices.ndim == 1
+        assert values.ndim >= 2
+        output = torch.zeros(
+            first_axis_dim, *values.shape[1:], device=values.device, dtype=values.dtype
+        )
+        output[indices] = values
+        return output
+    @staticmethod
+    def backward(ctx, grad_output: torch.Tensor) -> Tuple[torch.Tensor, None, None]:
+        indices, = ctx.saved_tensors
+        grad_values = grad_output[indices]
+        return grad_values, None, None
+index_put_first_axis = IndexPutFirstAxis.apply
+def pad_input(inputs: torch.Tensor, indices: torch.Tensor, batch: int, seqlen: int) -> torch.Tensor:
+    """Add padding to sequences.
+    Arguments:
+        inputs: (total_nnz, ...), where total_nnz = number of tokens in selected in attention_mask.
+        indices: (total_nnz), `indices = torch.nonzero(attention_mask.flatten(), as_tuple=False).flatten()`
+        batch: int batch_size
+        seqlen: int max sequence length
+    Returns:
+        inputs: (batch, seqlen, ...)
+    """
+    output = index_put_first_axis(inputs, indices, batch * seqlen)
+    return output.view(batch, seqlen, *inputs.shape[1:])
+def rotate_half(x):
+    """Rotates half the hidden dims of the input."""
+    x1 = x[..., : x.shape[-1] // 2]
+    x2 = x[..., x.shape[-1] // 2 :]
+    return torch.cat((-x2, x1), dim=-1)
+def apply_rotary_pos_emb(q, k, cos, sin):
+    """Applies Rotary Position Embedding to the query and key tensors.
+    Args:
+        q (`torch.Tensor`): The query tensor.
+        k (`torch.Tensor`): The key tensor.
+        cos (`torch.Tensor`): The cosine part of the rotary embedding.
+        sin (`torch.Tensor`): The sine part of the rotary embedding.
+    Returns:
+        `tuple(torch.Tensor)` comprising of the query and key tensors rotated using the Rotary Position Embedding.
+    """
+    cos, sin = cos.to(q.dtype), sin.to(q.dtype)
+    q_embed = (q * cos) + (rotate_half(q) * sin)
+    k_embed = (k * cos) + (rotate_half(k) * sin)
+    return q_embed, k_embed
+class RotaryEmbedding(torch.nn.Module):
+    def __init__(self, dim, max_position_embeddings=512, base=10000.0, device=None):
+        super().__init__()
+        self.dim = dim
+        self.max_position_embeddings = max_position_embeddings
+        self.base = base
+        inv_freq = 1.0 / (self.base ** (torch.arange(0, self.dim, 2).float().to(device) / self.dim))
+        self.register_buffer("inv_freq", inv_freq, persistent=False)
+        # Build here to make `torch.jit.trace` work.
+        self._set_cos_sin_cache(
+            seq_len=max_position_embeddings, device=self.inv_freq.device, dtype=torch.get_default_dtype()
+        )
+    def _set_cos_sin_cache(self, seq_len, device, dtype):
+        self.max_seq_len_cached = seq_len
+        t = torch.arange(self.max_seq_len_cached, device=device, dtype=torch.float32)
+        freqs = torch.einsum("i,j->ij", t, self.inv_freq)
+        # Different from paper, but it uses a different permutation in order to obtain the same calculation
+        emb = torch.cat((freqs, freqs), dim=-1)
+        self.register_buffer("cos_cached", emb.cos().to(dtype), persistent=False)
+        self.register_buffer("sin_cached", emb.sin().to(dtype), persistent=False)
+    def forward(self, x, seq_len=None):
+        # x: [bs, num_attention_heads, seq_len, head_size]
+        if seq_len > self.max_seq_len_cached:
+            self._set_cos_sin_cache(seq_len=seq_len, device=x.device, dtype=x.dtype)
+        return (
+            self.cos_cached[:seq_len, ...].to(dtype=x.dtype),
+            self.sin_cached[:seq_len, ...].to(dtype=x.dtype),
+        )
+class NTKScalingRotaryEmbedding(RotaryEmbedding):
+    """RotaryEmbedding extended with fixed and mixed NTK scaling. https://kexue.fm/archives/9706 """
+    def __init__(self, dim, max_position_embeddings=512, base=10000, device=None, scaling_factor=1.0, mixed_b=None):
+        self.scaling_factor = scaling_factor
+        self.mixed_b = mixed_b
+        super().__init__(dim, max_position_embeddings, base, device)
+        max_position_embeddings = max_position_embeddings * self.scaling_factor
+        self._set_cos_sin_cache(max_position_embeddings, self.inv_freq.device, torch.get_default_dtype())
+    def _set_cos_sin_cache(self, seq_len, device, dtype):
+        self.max_seq_len_cached = seq_len
+        if seq_len > self.max_position_embeddings:
+            base = self.base * (self.scaling_factor if self.mixed_b is None else 1)
+            inv_freq = 1.0 / (base ** (torch.arange(0, self.dim, 2).float().to(device) / self.dim))
+            if self.mixed_b is None:
+                inv_freq = inv_freq / self.scaling_factor ** (2 / self.dim)  # (6)
+            else:
+                a = torch.tensor(self.scaling_factor).log() / (self.dim / 2) ** self.mixed_b  # (13)
+                lambda_1_m = (a * torch.arange(1, self.dim // 2 + 1).float().to(device) ** self.mixed_b).exp()  # (12)
+                inv_freq = inv_freq / lambda_1_m  # (10)
+            self.register_buffer("inv_freq", inv_freq, persistent=False)
+        t = torch.arange(self.max_seq_len_cached, device=device, dtype=torch.float32)
+        freqs = torch.einsum("i,j->ij", t, self.inv_freq)
+        # Different from paper, but it uses a different permutation in order to obtain the same calculation
+        emb = torch.cat((freqs, freqs), dim=-1)
+        self.register_buffer("cos_cached", emb.cos().to(dtype), persistent=False)
+        self.register_buffer("sin_cached", emb.sin().to(dtype), persistent=False)
+class RMSNorm(nn.Module):
+    def __init__(self, hidden_size, eps=1e-6):
+        """
+        RMSNorm is equivalent to T5LayerNorm
+        """
+        super().__init__()
+        self.weight = nn.Parameter(torch.ones(hidden_size))
+        self.variance_epsilon = eps
+    def forward(self, hidden_states):
+        input_dtype = hidden_states.dtype
+        hidden_states = hidden_states.to(torch.float32)
+        variance = hidden_states.pow(2).mean(-1, keepdim=True)
+        hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
+        return self.weight * hidden_states.to(input_dtype)
+LAYER_NORM = {
+    'layer_norm': nn.LayerNorm,
+    'rms_norm': RMSNorm
+}
+class NewEmbeddings(nn.Module):
+    """
+    Embedding and Unpadding.
+    """
+    def __init__(self, config: NewConfig):
+        super().__init__()
+        self.padding_idx = config.pad_token_id
+        self.word_embeddings = nn.Embedding(
+            config.vocab_size, config.hidden_size, padding_idx=self.padding_idx
+        )
+        self.position_embedding_type = config.position_embedding_type
+        if self.position_embedding_type == 'absolute':
+            self.position_embeddings = nn.Embedding(
+                config.max_position_embeddings, config.hidden_size, padding_idx=self.padding_idx
+            )
+        elif self.position_embedding_type == 'rope':
+            self._init_rope(config)
+        else:
+            raise ValueError
+        self.type_vocab_size = config.type_vocab_size
+        if self.type_vocab_size > 0:
+            self.token_type_embeddings = nn.Embedding(config.type_vocab_size, config.hidden_size)
+        # self.LayerNorm is not snake-cased to stick with TensorFlow model variable name and be able to load
+        # any TensorFlow checkpoint file
+        self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
+        self.dropout = nn.Dropout(config.hidden_dropout_prob)
+        # position_ids is contiguous in memory and excluded when serialized
+        self.register_buffer(
+            "position_ids", torch.arange(config.max_position_embeddings), persistent=False
+        )
+    def _init_rope(self, config):
+        kwargs = dict(
+            dim=int(config.hidden_size / config.num_attention_heads),
+            max_position_embeddings=config.max_position_embeddings,
+            base=config.rope_theta
+        )
+        if config.rope_scaling is None:
+            self.rotary_emb = RotaryEmbedding(**kwargs)
+        else:
+            kwargs.update(scaling_factor=config.rope_scaling["factor"])
+            scaling_type = config.rope_scaling["type"]
+            if scaling_type == 'ntk':
+                kwargs.update(mixed_b=config.rope_scaling.get('mixed_b', None))
+                self.rotary_emb = NTKScalingRotaryEmbedding(**kwargs)
+            # elif scaling_type == "linear":
+            #     self.rotary_emb = LinearScalingRotaryEmbedding(**kwargs)
+            # elif scaling_type == "dynamic":
+            #     self.rotary_emb = DynamicNTKScalingRotaryEmbedding(**kwargs)
+            else:
+                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+    def forward(
+        self,
+        unpad_inputs: bool,
+        input_ids: Optional[torch.Tensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        length: Optional[List[int]] = None,
+        token_type_ids: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.Tensor] = None,
+        inputs_embeds: Optional[torch.Tensor] = None,
+    ) -> Tuple[torch.Tensor, torch.Tensor, Optional[Tuple], Optional[List[int]]]:
+        """
+        """
+        if inputs_embeds is None:
+            device, input_shape = input_ids.device, input_ids.shape
+        else:
+            device, input_shape = inputs_embeds.device, inputs_embeds.shape[:2]
+        batch_size, seq_length = input_shape
+        # Set attention_mask if it's None
+        if attention_mask is None:
+            attention_mask = torch.ones(input_shape, device=device)
+            if length is not None:
+                for i, l in enumerate(length):
+                    attention_mask[i, l:] = 0
+        # Set attention_mask_bool for unpadding
+        if unpad_inputs:
+            attention_mask_bool = attention_mask.bool()
+            if length is None:
+                length = attention_mask.sum(-1).tolist()
+        # Get word embeddings
+        if inputs_embeds is None:
+            if unpad_inputs:
+                input_ids = input_ids[attention_mask_bool].unsqueeze(0)
+            inputs_embeds = self.word_embeddings(input_ids)
+        else:
+            if unpad_inputs:
+                inputs_embeds = inputs_embeds[attention_mask_bool].unsqueeze(0)
+        embeddings = inputs_embeds
+        # Set and unpad position_ids
+        if position_ids is None:
+            if seq_length > self.position_ids.size(0):
+                self.register_buffer(
+                    "position_ids", torch.arange(seq_length, device=embeddings.device), persistent=False
+                )
+            if unpad_inputs:
+                # [1, cumsum_seq_len]
+                position_ids = torch.cat([self.position_ids[:l] for l in length]).unsqueeze(0)
+            else:
+                # [bs, seq_len]
+                position_ids = self.position_ids[:seq_length].expand(batch_size, -1)
+        elif unpad_inputs:
+            position_ids = position_ids[attention_mask_bool].unsqueeze(0)  # [1, cumsum_seq_len]
+        # Compute rotary embedding
+        if self.position_embedding_type == 'rope':
+            rope_cos, rope_sin = self.rotary_emb(inputs_embeds, seq_len=seq_length)
+            rope_cos = rope_cos[position_ids].unsqueeze(2)  # [bs, seq_len, 1, dim]
+            rope_sin = rope_sin[position_ids].unsqueeze(2)  # [bs, seq_len, 1, dim]
+            rope_embeds = rope_cos, rope_sin
+        else:
+            rope_embeds = None
+        if self.type_vocab_size > 0:
+            if token_type_ids is None:
+                token_type_ids = position_ids.mul(0)
+            else:
+                if self.type_vocab_size < 2:
+                    token_type_ids.mul_(0)
+                if unpad_inputs:
+                    token_type_ids = token_type_ids[attention_mask_bool].unsqueeze(0)
+            token_type_embeddings = self.token_type_embeddings(token_type_ids)
+            embeddings = embeddings + token_type_embeddings
+        # BERT position
+        if self.position_embedding_type == "absolute":
+            position_embeddings = self.position_embeddings(position_ids)
+            embeddings = embeddings + position_embeddings
+        embeddings = self.LayerNorm(embeddings)
+        embeddings = self.dropout(embeddings)
+        return embeddings, attention_mask, rope_embeds, length
+class NewAttention(nn.Module):
+    def __init__(self, config: NewConfig, pack_qkv=None, use_memory_efficient_attention=None):
+        super().__init__()
+        self.config = config
+        if config.hidden_size % config.num_attention_heads != 0 and not hasattr(config, "embedding_size"):
+            raise ValueError(
+                f"The hidden size ({config.hidden_size}) is not a multiple of the number of attention "
+                f"heads ({config.num_attention_heads})"
+            )
+        self.hidden_size = config.hidden_size
+        self.num_attention_heads = config.num_attention_heads
+        self.attention_head_size = int(config.hidden_size / config.num_attention_heads)
+        self.all_head_size = self.num_attention_heads * self.attention_head_size
+        if pack_qkv is None:
+            pack_qkv = config.pack_qkv
+        self.pack_qkv = pack_qkv
+        if self.pack_qkv:
+            self.qkv_proj = nn.Linear(config.hidden_size, self.all_head_size * 3, bias=True)
+        else:
+            self.q_proj = nn.Linear(config.hidden_size, self.all_head_size, bias=True)
+            self.k_proj = nn.Linear(config.hidden_size, self.all_head_size, bias=True)
+            self.v_proj = nn.Linear(config.hidden_size, self.all_head_size, bias=True)
+        self.dropout = nn.Dropout(config.attention_probs_dropout_prob)
+        self.o_proj = nn.Linear(config.hidden_size, config.hidden_size, bias=True)
+        if use_memory_efficient_attention is None:
+            use_memory_efficient_attention = self.config.use_memory_efficient_attention
+        self.use_memory_efficient_attention = use_memory_efficient_attention
+        self.memory_efficient_attention = None if xops is None else xops.memory_efficient_attention
+        if self.use_memory_efficient_attention:
+            assert self.memory_efficient_attention is not None, 'please install xformers'
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        attention_bias: torch.FloatTensor,
+        rope_embeds: Optional[Tuple[torch.FloatTensor, torch.FloatTensor]] = None,
+        padding_inputs: Optional[Tuple] = None,  # indices, batch, seqlen
+        attention_scale: Optional[torch.FloatTensor] = None,
+        head_mask: Optional[torch.FloatTensor] = None,
+        output_attentions: Optional[bool] = False,
+        qkv_inputs: Optional[Tuple] = None,  # For RetroMAE
+    ) -> Tuple[torch.Tensor, ...]:
+        shape_hd = (self.num_attention_heads, self.attention_head_size)
+        # qkv
+        if self.pack_qkv and qkv_inputs is None:
+            qkv_pack = self.qkv_proj(hidden_states).split(self.all_head_size, dim=-1)
+        else:
+            if qkv_inputs is None:
+                qkv_inputs = (hidden_states, hidden_states, hidden_states)
+            qkv_pack = [
+                getattr(self, n + '_proj')(s) for s, n in zip(qkv_inputs, 'qkv')
+            ]
+        query_states, key_states, value_states = [t.view(t.shape[:-1] + shape_hd) for t in qkv_pack]
+        if self.config.position_embedding_type == 'rope':
+            query_states, key_states = apply_rotary_pos_emb(query_states, key_states, *rope_embeds)
+        dtype = query_states.dtype
+        if self.config.logn_attention_scale and attention_scale is not None:
+            # https://kexue.fm/archives/8823
+            query_states = query_states * attention_scale.to(dtype)
+        if padding_inputs is not None:
+            query_states = pad_input(query_states.squeeze(), *padding_inputs)
+            key_states = pad_input(key_states.squeeze(), *padding_inputs)
+            value_states = pad_input(value_states.squeeze(), *padding_inputs)
+        if self.use_memory_efficient_attention:
+            assert self.memory_efficient_attention is not None, "xformers is not loaded"
+            assert output_attentions is False, "memory_efficient_attention do not output attentions"
+            assert head_mask is None, "Not support yet"
+            attention_probs = None
+            if torch.is_tensor(attention_bias):
+                attention_bias = attention_bias.to(dtype)
+            context_layer = self.memory_efficient_attention(
+                query_states,
+                key_states,
+                value_states,
+                attn_bias=attention_bias,
+                p=self.dropout.p
+            )
+        else:
+            if output_attentions and isinstance(self, NewSdpaAttention):
+                raise RuntimeError("SDPA do not output attentions")
+            context_layer, attention_probs = self._attention(
+                query_states, key_states, value_states, attention_bias, head_mask
+            )
+        if padding_inputs is not None:
+            context_layer = unpad_input(context_layer, indices=padding_inputs[0])
+        new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,)
+        context_layer = context_layer.view(new_context_layer_shape)
+        # output proj
+        attn_output = self.o_proj(context_layer)
+        # add attentions if we output them
+        outputs = (attn_output, attention_probs) if output_attentions else (attn_output,)
+        return outputs
+    def _attention(self, query_states, key_states, value_states, attention_bias, head_mask):
+        """
+        Args:
+            q/k/v: (B, L, n_head, head_dim),
+        Returns:
+            attn_output: (B L, n_head, head_dim)
+        """
+        query_states = query_states.transpose(1, 2)
+        key_states = key_states.transpose(1, 2)
+        value_states = value_states.transpose(1, 2)
+        # Take the dot product between "query" and "key" to get the raw attention scores.
+        attention_scores = torch.matmul(query_states, key_states.transpose(-1, -2))
+        attention_scores = attention_scores / math.sqrt(self.attention_head_size)
+        if attention_bias is not None:
+            # Apply the attention mask is (precomputed for all layers in BertModel forward() function)
+            attention_scores = attention_scores + attention_bias
+        # Normalize the attention scores to probabilities.
+        attention_probs = nn.functional.softmax(attention_scores, dim=-1)
+        # This is actually dropping out entire tokens to attend to, which might
+        # seem a bit unusual, but is taken from the original Transformer paper.
+        if self.dropout.p > 0:
+            attention_probs = self.dropout(attention_probs)
+        # Mask heads if we want to
+        if head_mask is not None:
+            attention_probs = attention_probs * head_mask
+        context_layer = torch.matmul(attention_probs, value_states)
+        context_layer = context_layer.permute(0, 2, 1, 3).contiguous()
+        return context_layer, attention_probs
+class NewSdpaAttention(NewAttention):
+    """
+    New attention module using torch.nn.functional.scaled_dot_product_attention. This module inherits from
+    `NewAttention` as the weights of the module stays untouched. The only changes are on the forward pass to adapt to
+    SDPA API.
+    """
+    def __init__(self, config: NewConfig, **kwargs):
+        super().__init__(config, **kwargs)
+        # torch.backends.cuda.enable_mem_efficient_sdp(False)
+        # logger.warning(
+        #     "Disable memory efficient attention kernel for `NewSdpaAttention`, you can set "
+        #     "`use_memory_efficient_attention=True` if it expected to use."
+        # )
+    def _attention(self, query_states, key_states, value_states, attention_bias, head_mask):
+        attn_output = torch.nn.functional.scaled_dot_product_attention(
+            query_states.transpose(1, 2),
+            key_states.transpose(1, 2),
+            value_states.transpose(1, 2),
+            attn_mask=attention_bias,
+            dropout_p=self.dropout.p if self.training else 0.0,
+        )
+        attn_output = attn_output.permute(0, 2, 1, 3).contiguous()
+        return attn_output, None
+NEW_ATTENTION_CLASSES = {
+    "eager": NewAttention,
+    # "flash_attention_2": ,  # TODO
+    "sdpa": NewSdpaAttention,
+}
+class NewGatedMLP(nn.Module):
+    """
+    GLU Variants Improve Transformer.
+    """
+    def __init__(self, config: NewConfig):
+        super().__init__()
+        self.intermediate_size = config.intermediate_size
+        self.up_gate_proj = nn.Linear(config.hidden_size, self.intermediate_size * 2, bias=False)
+        self.down_proj = nn.Linear(self.intermediate_size, config.hidden_size, bias=True)
+        self.act_fn = ACT2FN[config.hidden_act]
+        if config.hidden_dropout_prob > 0:
+            self.hidden_dropout = nn.Dropout(config.hidden_dropout_prob)
+        else:
+            self.hidden_dropout = None
+    def forward(self, hidden_states):
+        up_gate = self.up_gate_proj(hidden_states)
+        up_states, gate = torch.split(up_gate, self.intermediate_size, dim=-1)
+        gate = self.act_fn(gate)
+        gated_states = gate * up_states
+        if self.hidden_dropout is not None:
+            gated_states = self.hidden_dropout(gated_states)
+        down_states = self.down_proj(gated_states)
+        return down_states
+class NewLayer(nn.Module):
+    def __init__(
+        self,
+        config: NewConfig,
+        pack_qkv=None,
+        use_memory_efficient_attention=None,
+        attn_implementation=None
+    ):
+        super().__init__()
+        if attn_implementation is None:
+            attn_implementation = config._attn_implementation
+        if use_memory_efficient_attention is None:
+            use_memory_efficient_attention = config.use_memory_efficient_attention
+        if use_memory_efficient_attention:
+            if attn_implementation != 'eager':
+                logger.warning_once(f"Override {attn_implementation=} to 'eager' as {use_memory_efficient_attention=}")
+                attn_implementation = 'eager'  # Since it will be SDPA by default for torch>=2.1.1
+        self.attention = NEW_ATTENTION_CLASSES[attn_implementation](
+            config, pack_qkv=pack_qkv, use_memory_efficient_attention=use_memory_efficient_attention
+        )
+        self.mlp = NewGatedMLP(config)
+        ln_class = LAYER_NORM[config.layer_norm_type]
+        self.attn_ln = ln_class(config.hidden_size, eps=config.layer_norm_eps)
+        self.mlp_ln = ln_class(config.hidden_size, eps=config.layer_norm_eps)
+        if config.hidden_dropout_prob > 0:
+            self.hidden_dropout = nn.Dropout(config.hidden_dropout_prob)
+        else:
+            self.hidden_dropout = None
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        attention_bias: torch.FloatTensor,
+        rope_embeds: Optional[Tuple[torch.FloatTensor, torch.FloatTensor]] = None,
+        padding_inputs: Optional[Tuple] = None,  # indices, batch, seqlen
+        attention_scale: Optional[torch.FloatTensor] = None,
+        subset_indices: Optional[torch.LongTensor] = None,
+        head_mask: Optional[torch.FloatTensor] = None,
+        output_attentions: Optional[bool] = False,
+        qkv_inputs: Optional[Tuple] = None,  # For RetroMAE
+    ) -> Tuple[torch.Tensor, ...]:
+        # Multi head self attention
+        residual = hidden_states if qkv_inputs is None else qkv_inputs[0]
+        attention_outputs = self.attention(
+            hidden_states,
+            attention_bias,
+            rope_embeds,
+            padding_inputs,
+            attention_scale,
+            head_mask,
+            output_attentions=output_attentions,
+            qkv_inputs=qkv_inputs,
+        )
+        hidden_states = attention_outputs[0]
+        if self.hidden_dropout is not None:
+            hidden_states = self.hidden_dropout(hidden_states)
+        hidden_states = residual + hidden_states
+        # In pretraining, after the attention of last layer, we only need the masked tokens.
+        if subset_indices is not None:
+            hidden_states = hidden_states[subset_indices]
+        hidden_states = self.attn_ln(hidden_states)
+        # Fully Connected
+        residual = hidden_states
+        hidden_states = self.mlp(hidden_states)
+        if self.hidden_dropout is not None:
+            hidden_states = self.hidden_dropout(hidden_states)
+        hidden_states = residual + hidden_states
+        hidden_states = self.mlp_ln(hidden_states)
+        # add self attentions if we output attention weights
+        outputs = (hidden_states,) + attention_outputs[1:]
+        return outputs
+class NewEncoder(nn.Module):
+    def __init__(self, config):
+        super().__init__()
+        self.config = config
+        self.layer = nn.ModuleList([NewLayer(config) for _ in range(config.num_hidden_layers)])
+        self.gradient_checkpointing = False
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        attention_bias: Optional[torch.FloatTensor] = None,
+        rope_embeds: Optional[Tuple[torch.FloatTensor, torch.FloatTensor]] = None,
+        padding_inputs: Optional[Tuple] = None,  # indices, batch, seqlen
+        attention_scale: Optional[torch.FloatTensor] = None,
+        subset_indices: Optional[torch.LongTensor] = None,
+        head_mask: Optional[torch.FloatTensor] = None,
+        output_attentions: Optional[bool] = False,
+        output_hidden_states: Optional[bool] = False,
+        return_dict: Optional[bool] = True,
+    ) -> Union[Tuple[torch.Tensor], BaseModelOutput]:
+        all_hidden_states = () if output_hidden_states else None
+        all_self_attentions = () if output_attentions else None
+        for i, layer_module in enumerate(self.layer):
+            if output_hidden_states:
+                all_hidden_states = all_hidden_states + (hidden_states,)
+            if i >= len(self.layer) - 1:
+                layer_subset_indices = subset_indices
+            else:
+                layer_subset_indices = None
+            layer_head_mask = head_mask[i] if head_mask is not None else None
+            if self.gradient_checkpointing and self.training:
+                layer_outputs = self._gradient_checkpointing_func(
+                    layer_module.__call__,
+                    hidden_states,
+                    attention_bias,
+                    rope_embeds,
+                    padding_inputs,
+                    attention_scale,
+                    layer_subset_indices,
+                    layer_head_mask,
+                )
+            else:
+                layer_outputs = layer_module(
+                    hidden_states,
+                    attention_bias,
+                    rope_embeds,
+                    padding_inputs,
+                    attention_scale,
+                    layer_subset_indices,
+                    layer_head_mask,
+                    output_attentions,
+                )
+            hidden_states = layer_outputs[0]
+            if output_attentions:
+                all_self_attentions = all_self_attentions + (layer_outputs[1],)
+        if output_hidden_states:
+            all_hidden_states = all_hidden_states + (hidden_states,)
+        if not return_dict:
+            return tuple(
+                v
+                for v in [
+                    hidden_states,
+                    all_hidden_states,
+                    all_self_attentions,
+                ]
+                if v is not None
+            )
+        return BaseModelOutput(
+            last_hidden_state=hidden_states,
+            hidden_states=all_hidden_states,
+            attentions=all_self_attentions,
+        )
+# Copied from transformers.models.bert.modeling_bert.BertPooler with Bert->New
+class NewPooler(nn.Module):
+    def __init__(self, config):
+        super().__init__()
+        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
+        self.activation = nn.Tanh()
+    def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
+        # We "pool" the model by simply taking the hidden state corresponding
+        # to the first token.
+        first_token_tensor = hidden_states[:, 0]
+        pooled_output = self.dense(first_token_tensor)
+        pooled_output = self.activation(pooled_output)
+        return pooled_output
+class NewPreTrainedModel(PreTrainedModel):
+    """
+    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
+    models.
+    """
+    config_class = NewConfig
+    base_model_prefix = "new"
+    supports_gradient_checkpointing = True
+    _supports_sdpa = True
+    def _init_weights(self, module):
+        """Initialize the weights"""
+        if isinstance(module, nn.Linear):
+            # Slightly different from the TF version which uses truncated_normal for initialization
+            # cf https://github.com/pytorch/pytorch/pull/5617
+            module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
+            if module.bias is not None:
+                module.bias.data.zero_()
+        elif isinstance(module, nn.Embedding):
+            module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
+            if module.padding_idx is not None:
+                module.weight.data[module.padding_idx].zero_()
+        elif isinstance(module, nn.LayerNorm):
+            module.bias.data.zero_()
+            module.weight.data.fill_(1.0)
+class NewModel(NewPreTrainedModel):
+    """
+    The bare New Model transformer outputting raw hidden-states without any specific head on top.
+    """
+    def __init__(self, config: NewConfig, add_pooling_layer=False):
+        super().__init__(config)
+        self.config = config
+        self.embeddings = NewEmbeddings(config)
+        self.encoder = NewEncoder(config)
+        self.pooler = NewPooler(config) if add_pooling_layer else None
+        # Initialize weights and apply final processing
+        self.post_init()
+    def get_input_embeddings(self):
+        return self.embeddings.word_embeddings
+    def set_input_embeddings(self, value):
+        self.embeddings.word_embeddings = value
+    def forward(
+        self,
+        input_ids: Optional[torch.Tensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        length: Optional[List[int]] = None,
+        subset_indices: Optional[torch.LongTensor] = None,
+        token_type_ids: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.Tensor] = None,
+        head_mask: Optional[torch.Tensor] = None,
+        inputs_embeds: Optional[torch.Tensor] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+        unpad_inputs: Optional[bool] = None,
+    ) -> Union[Tuple[torch.Tensor], BaseModelOutputWithPooling]:
+        r"""
+        length  (`list` of length `batch_size`, *optional*):
+            If is `None`, return padded `last_hidden_state`.
+        subset_indices  ():
+            pass
+        unpad_inputs  (`bool`, *optional*):
+            pass
+        """
+        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+        output_hidden_states = (
+            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
+        )
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        unpad_inputs = unpad_inputs if unpad_inputs is not None else self.config.unpad_inputs
+        output_padded = length is None
+        if input_ids is not None and inputs_embeds is not None:
+            raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
+        elif input_ids is not None:
+            self.warn_if_padding_and_no_attention_mask(input_ids, attention_mask)
+            input_shape = input_ids.size()
+        elif inputs_embeds is not None:
+            input_shape = inputs_embeds.size()[:-1]
+        else:
+            raise ValueError("You have to specify either input_ids or inputs_embeds")
+        # TODO: not used
+        # # Prepare head mask if needed
+        # # 1.0 in head_mask indicate we keep the head
+        # # attention_probs has shape bsz x n_heads x N x N
+        # # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
+        # # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
+        # head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
+        # Get embeddings, may unpad them
+        (embedding_output, attention_mask, rope_embeds, length) = self.embeddings(
+            unpad_inputs,
+            input_ids=input_ids,
+            attention_mask=attention_mask,
+            length=length,
+            token_type_ids=token_type_ids,
+            position_ids=position_ids,
+            inputs_embeds=inputs_embeds
+        )
+        batch_size, seq_length = input_shape
+        if unpad_inputs and self.config.use_memory_efficient_attention:
+            attention_bias = xops.fmha.attn_bias.BlockDiagonalMask.from_seqlens(length)
+        else:
+            # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
+            # ourselves in which case we just need to make it broadcastable to all heads.
+            attention_bias = self.get_extended_attention_mask(attention_mask, input_shape)
+            if self.config.use_memory_efficient_attention:
+                # Invalid shape for attention bias: torch.Size([48, 1, 1, 512]) (expected (48, 12, 512, 512))
+                attention_bias = attention_bias.expand(-1, self.config.num_attention_heads, seq_length, -1)
+        padding_inputs = None
+        if unpad_inputs and (output_padded or not self.config.use_memory_efficient_attention):
+            indices = torch.nonzero(attention_mask.flatten(), as_tuple=False).flatten()
+            if not self.config.use_memory_efficient_attention:
+                padding_inputs = (indices, *input_shape)
+        attention_scale = None
+        if self.config.logn_attention_scale:
+            logger.warning_once("TODO: logn_attention_scale")
+        #     # attention scale log_512(input_len)
+        #     attention_scale = attention_mask.sum(1).log() / torch.tensor(self.config.max_position_embeddings).log()
+        #     # inference-time logn scale need clip 1
+        #     if self.config.logn_attention_clip1:
+        #         attention_scale.clip_(1)
+        #     attention_scale = attention_scale[:, None, None, None]
+        # else:
+        #     attention_scale = None
+        encoder_outputs = self.encoder(
+            embedding_output,
+            attention_bias=attention_bias,
+            rope_embeds=rope_embeds,
+            padding_inputs=padding_inputs,
+            attention_scale=attention_scale,
+            subset_indices=subset_indices,
+            head_mask=head_mask,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+        )
+        sequence_output = encoder_outputs[0]
+        if unpad_inputs and output_padded:
+            sequence_output = pad_input(
+                sequence_output.squeeze(), indices, batch_size, seq_length
+            )
+        pooled_output = self.pooler(sequence_output) if self.pooler is not None else None
+        if not return_dict:
+            return (sequence_output, pooled_output) + encoder_outputs[1:]
+        return BaseModelOutputWithPooling(
+            last_hidden_state=sequence_output,
+            pooler_output=pooled_output,
+            hidden_states=encoder_outputs.hidden_states,
+            attentions=encoder_outputs.attentions,
+        )
+class NewLMPredictionHead(nn.Module):
+    def __init__(self, config):
+        super().__init__()
+        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
+        self.transform_act_fn = ACT2FN[config.hidden_act]
+        self.norm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
+        # The output weights are the same as the input embeddings, but there is
+        # an output-only bias for each token.
+        self.decoder = nn.Linear(config.hidden_size, config.vocab_size)
+    def forward(self, hidden_states):
+        hidden_states = self.dense(hidden_states)
+        hidden_states = self.transform_act_fn(hidden_states)
+        hidden_states = self.norm(hidden_states)
+        hidden_states = self.decoder(hidden_states)
+        return hidden_states
+class NewForMaskedLM(NewPreTrainedModel):
+    _tied_weights_keys = ["lm_head.decoder.bias", "lm_head.decoder.weight"]
+    def __init__(self, config: NewConfig):
+        super().__init__(config)
+        self.new = NewModel(config, add_pooling_layer=False)
+        self.lm_head = NewLMPredictionHead(config)
+        self.loss_fct = nn.CrossEntropyLoss()
+        # Initialize weights and apply final processing
+        self.post_init()
+    def get_output_embeddings(self):
+        return self.lm_head.decoder
+    def set_output_embeddings(self, new_embeddings):
+        self.lm_head.decoder = new_embeddings
+    def forward(
+        self,
+        input_ids: Optional[torch.Tensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        token_type_ids: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.Tensor] = None,
+        head_mask: Optional[torch.Tensor] = None,
+        inputs_embeds: Optional[torch.Tensor] = None,
+        labels: Optional[torch.Tensor] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+        unpad_inputs: Optional[bool] = None,
+    ) -> Union[Tuple[torch.Tensor], MaskedLMOutput]:
+        r"""
+        labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
+            Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ...,
+            config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the
+            loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`
+        """
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        if labels is None or not self.new.config.unpad_inputs:
+            length = None
+            subset_indices = None
+        else:
+            length = attention_mask.sum(-1).tolist()
+            labels = labels[attention_mask.bool()].unsqueeze(0)
+            subset_indices = labels > -100
+        outputs = self.new(
+            input_ids,
+            attention_mask=attention_mask,
+            length=length,
+            subset_indices=subset_indices,
+            token_type_ids=token_type_ids,
+            position_ids=position_ids,
+            head_mask=head_mask,
+            inputs_embeds=inputs_embeds,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+            unpad_inputs=unpad_inputs,
+        )
+        sequence_output = outputs[0]
+        prediction_scores = self.lm_head(sequence_output)
+        masked_lm_loss = None
+        if labels is not None:
+            if subset_indices is None:
+                mask = attention_mask.bool()
+                prediction_scores = prediction_scores[mask]
+                labels = labels[mask]
+            else:
+                labels = labels[subset_indices]
+            masked_lm_loss = self.loss_fct(prediction_scores, labels)
+        if not return_dict:
+            output = (prediction_scores,) + outputs[2:]
+            return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output
+        return MaskedLMOutput(
+            loss=masked_lm_loss,
+            logits=prediction_scores,
+            hidden_states=outputs.hidden_states,
+            attentions=outputs.attentions,
+        )
+class NewForSequenceClassification(NewPreTrainedModel):
+    def __init__(self, config):
+        super().__init__(config)
+        self.num_labels = config.num_labels
+        self.config = config
+        self.new = NewModel(config, add_pooling_layer=True)
+        classifier_dropout = (
+            config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
+        )
+        self.dropout = nn.Dropout(classifier_dropout)
+        self.classifier = nn.Linear(config.hidden_size, config.num_labels)
+        # Initialize weights and apply final processing
+        self.post_init()
+    def forward(
+        self,
+        input_ids: Optional[torch.Tensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        token_type_ids: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.Tensor] = None,
+        head_mask: Optional[torch.Tensor] = None,
+        inputs_embeds: Optional[torch.Tensor] = None,
+        labels: Optional[torch.Tensor] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+        unpad_inputs: Optional[bool] = None,
+    ) -> Union[Tuple[torch.Tensor], SequenceClassifierOutput]:
+        r"""
+        labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
+            Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
+            config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
+            `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
+        """
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        outputs = self.new(
+            input_ids,
+            attention_mask=attention_mask,
+            token_type_ids=token_type_ids,
+            position_ids=position_ids,
+            head_mask=head_mask,
+            inputs_embeds=inputs_embeds,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+            unpad_inputs=unpad_inputs,
+        )
+        pooled_output = outputs[1]
+        pooled_output = self.dropout(pooled_output)
+        logits = self.classifier(pooled_output)
+        loss = None
+        if labels is not None:
+            if self.config.problem_type is None:
+                if self.num_labels == 1:
+                    self.config.problem_type = "regression"
+                elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
+                    self.config.problem_type = "single_label_classification"
+                else:
+                    self.config.problem_type = "multi_label_classification"
+            if self.config.problem_type == "regression":
+                loss_fct = nn.MSELoss()
+                if self.num_labels == 1:
+                    loss = loss_fct(logits.squeeze(), labels.squeeze())
+                else:
+                    loss = loss_fct(logits, labels)
+            elif self.config.problem_type == "single_label_classification":
+                loss_fct = nn.CrossEntropyLoss()
+                loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
+            elif self.config.problem_type == "multi_label_classification":
+                loss_fct = nn.BCEWithLogitsLoss()
+                loss = loss_fct(logits, labels)
+        if not return_dict:
+            output = (logits,) + outputs[2:]
+            return ((loss,) + output) if loss is not None else output
+        return SequenceClassifierOutput(
+            loss=loss,
+            logits=logits,
+            hidden_states=outputs.hidden_states,
+            attentions=outputs.attentions,
+        )
+class NewForMultipleChoice(NewPreTrainedModel):
+    def __init__(self, config):
+        super().__init__(config)
+        self.new = NewModel(config, add_pooling_layer=True)
+        classifier_dropout = (
+            config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
+        )
+        self.dropout = nn.Dropout(classifier_dropout)
+        self.classifier = nn.Linear(config.hidden_size, 1)
+        # Initialize weights and apply final processing
+        self.post_init()
+    def forward(
+        self,
+        input_ids: Optional[torch.Tensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        token_type_ids: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.Tensor] = None,
+        head_mask: Optional[torch.Tensor] = None,
+        inputs_embeds: Optional[torch.Tensor] = None,
+        labels: Optional[torch.Tensor] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+        unpad_inputs: Optional[bool] = None,
+    ) -> Union[Tuple[torch.Tensor], MultipleChoiceModelOutput]:
+        r"""
+        labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
+            Labels for computing the multiple choice classification loss. Indices should be in `[0, ...,
+            num_choices-1]` where `num_choices` is the size of the second dimension of the input tensors. (See
+            `input_ids` above)
+        """
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]
+        input_ids = input_ids.view(-1, input_ids.size(-1)) if input_ids is not None else None
+        attention_mask = attention_mask.view(-1, attention_mask.size(-1)) if attention_mask is not None else None
+        token_type_ids = token_type_ids.view(-1, token_type_ids.size(-1)) if token_type_ids is not None else None
+        position_ids = position_ids.view(-1, position_ids.size(-1)) if position_ids is not None else None
+        inputs_embeds = (
+            inputs_embeds.view(-1, inputs_embeds.size(-2), inputs_embeds.size(-1))
+            if inputs_embeds is not None
+            else None
+        )
+        outputs = self.new(
+            input_ids,
+            attention_mask=attention_mask,
+            token_type_ids=token_type_ids,
+            position_ids=position_ids,
+            head_mask=head_mask,
+            inputs_embeds=inputs_embeds,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+            unpad_inputs=unpad_inputs,
+        )
+        pooled_output = outputs[1]
+        pooled_output = self.dropout(pooled_output)
+        logits = self.classifier(pooled_output)
+        reshaped_logits = logits.view(-1, num_choices)
+        loss = None
+        if labels is not None:
+            loss_fct = nn.CrossEntropyLoss()
+            loss = loss_fct(reshaped_logits, labels)
+        if not return_dict:
+            output = (reshaped_logits,) + outputs[2:]
+            return ((loss,) + output) if loss is not None else output
+        return MultipleChoiceModelOutput(
+            loss=loss,
+            logits=reshaped_logits,
+            hidden_states=outputs.hidden_states,
+            attentions=outputs.attentions,
+        )
+@dataclass
+class NewTokenClassifierOutput(ModelOutput):
+    loss: Optional[torch.FloatTensor] = None
+    logits: torch.FloatTensor = None
+    last_hidden_state: torch.FloatTensor = None
+    hidden_states: Optional[Tuple[torch.FloatTensor, ...]] = None
+    attentions: Optional[Tuple[torch.FloatTensor, ...]] = None
+class NewForTokenClassification(NewPreTrainedModel):
+    def __init__(self, config):
+        super().__init__(config)
+        self.num_labels = config.num_labels
+        self.new = NewModel(config, add_pooling_layer=False)
+        classifier_dropout = (
+            config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
+        )
+        self.dropout = nn.Dropout(classifier_dropout)
+        self.classifier = nn.Linear(config.hidden_size, config.num_labels)
+        # Initialize weights and apply final processing
+        self.post_init()
+    def forward(
+        self,
+        input_ids: Optional[torch.Tensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        token_type_ids: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.Tensor] = None,
+        head_mask: Optional[torch.Tensor] = None,
+        inputs_embeds: Optional[torch.Tensor] = None,
+        labels: Optional[torch.Tensor] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+        unpad_inputs: Optional[bool] = None,
+    ) -> Union[Tuple[torch.Tensor], NewTokenClassifierOutput]:
+        r"""
+        labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
+            Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.
+        """
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        outputs = self.new(
+            input_ids,
+            attention_mask=attention_mask,
+            token_type_ids=token_type_ids,
+            position_ids=position_ids,
+            head_mask=head_mask,
+            inputs_embeds=inputs_embeds,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+            unpad_inputs=unpad_inputs,
+        )
+        sequence_output = outputs[0]
+        sequence_output = self.dropout(sequence_output)
+        logits = self.classifier(sequence_output)
+        loss = None
+        if labels is not None:
+            loss_fct = nn.CrossEntropyLoss()
+            loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
+        if not return_dict:
+            output = (logits,) + outputs[2:]
+            return ((loss,) + output) if loss is not None else output
+        return NewTokenClassifierOutput(
+            loss=loss,
+            logits=logits,
+            last_hidden_state=sequence_output,
+            hidden_states=outputs.hidden_states,
+            attentions=outputs.attentions,
+        )
+class NewForQuestionAnswering(NewPreTrainedModel):
+    def __init__(self, config):
+        super().__init__(config)
+        self.num_labels = config.num_labels
+        self.new = NewModel(config, add_pooling_layer=False)
+        self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)
+        # Initialize weights and apply final processing
+        self.post_init()
+    def forward(
+        self,
+        input_ids: Optional[torch.Tensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        token_type_ids: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.Tensor] = None,
+        head_mask: Optional[torch.Tensor] = None,
+        inputs_embeds: Optional[torch.Tensor] = None,
+        start_positions: Optional[torch.Tensor] = None,
+        end_positions: Optional[torch.Tensor] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+        unpad_inputs: Optional[bool] = None,
+    ) -> Union[Tuple[torch.Tensor], QuestionAnsweringModelOutput]:
+        r"""
+        start_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
+            Labels for position (index) of the start of the labelled span for computing the token classification loss.
+            Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
+            are not taken into account for computing the loss.
+        end_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
+            Labels for position (index) of the end of the labelled span for computing the token classification loss.
+            Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
+            are not taken into account for computing the loss.
+        """
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        outputs = self.new(
+            input_ids,
+            attention_mask=attention_mask,
+            token_type_ids=token_type_ids,
+            position_ids=position_ids,
+            head_mask=head_mask,
+            inputs_embeds=inputs_embeds,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+            unpad_inputs=unpad_inputs,
+        )
+        sequence_output = outputs[0]
+        logits = self.qa_outputs(sequence_output)
+        start_logits, end_logits = logits.split(1, dim=-1)
+        start_logits = start_logits.squeeze(-1).contiguous()
+        end_logits = end_logits.squeeze(-1).contiguous()
+        total_loss = None
+        if start_positions is not None and end_positions is not None:
+            # If we are on multi-GPU, split add a dimension
+            if len(start_positions.size()) > 1:
+                start_positions = start_positions.squeeze(-1)
+            if len(end_positions.size()) > 1:
+                end_positions = end_positions.squeeze(-1)
+            # sometimes the start/end positions are outside our model inputs, we ignore these terms
+            ignored_index = start_logits.size(1)
+            start_positions = start_positions.clamp(0, ignored_index)
+            end_positions = end_positions.clamp(0, ignored_index)
+            loss_fct = nn.CrossEntropyLoss(ignore_index=ignored_index)
+            start_loss = loss_fct(start_logits, start_positions)
+            end_loss = loss_fct(end_logits, end_positions)
+            total_loss = (start_loss + end_loss) / 2
+        if not return_dict:
+            output = (start_logits, end_logits) + outputs[2:]
+            return ((total_loss,) + output) if total_loss is not None else output
+        return QuestionAnsweringModelOutput(
+            loss=total_loss,
+            start_logits=start_logits,
+            end_logits=end_logits,
+            hidden_states=outputs.hidden_states,
+            attentions=outputs.attentions,
+        )

modules.json ADDED Viewed

	@@ -0,0 +1,20 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  },
+  {
+    "idx": 2,
+    "name": "2",
+    "path": "2_Normalize",
+    "type": "sentence_transformers.models.Normalize"
+  }
+]

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+    "max_seq_length": 512,
+    "do_lower_case": false
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,51 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "cls_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:883b037111086fd4dfebbbc9b7cee11e1517b5e0c0514879478661440f137085
+size 17082987

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,55 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "250001": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "mask_token": "<mask>",
+  "model_max_length": 8192,
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "tokenizer_class": "XLMRobertaTokenizerFast",
+  "unk_token": "<unk>"
+}