|
--- |
|
license: afl-3.0 |
|
language: |
|
- zh |
|
- en |
|
metrics: |
|
- accuracy |
|
- mae |
|
pipeline_tag: audio-classification |
|
tags: |
|
- acoustic |
|
- ocean |
|
- underwater |
|
- UWTR |
|
- recognition |
|
datasets: |
|
- peng7554/DS3500 |
|
base_model: |
|
- microsoft/resnet-50 |
|
--- |
|
- [中文](README-zh.md) |
|
- [English](README.md) |
|
|
|
# Underwater Target Recognition and Localization Model Library |
|
|
|
## Project Overview |
|
This repository contains a series of deep learning models for underwater target recognition and localization, including MCL/MEG series networks specifically designed for underwater acoustic scenarios, as well as general recognition models migrated from the computer vision field. These models implement underwater target classification and localization based on acoustic signature recognition technology, and can be applied in marine monitoring, underwater security, and other fields. |
|
|
|
## Model Description |
|
### 1. Specialized Network Series (Recognition + Localization) |
|
| Model Name | Description | Input Features | Function | |
|
|------------|-------------|----------------|----------| |
|
| MCL | Basic network without mixture-of-experts | GFCC/STFT | Recognition + Localization | |
|
| MEG | MCL with added mixture-of-experts model | GFCC/STFT | Recognition + Localization | |
|
| MEG_BLC | MEG variant with load balancing mechanism | GFCC/STFT | Recognition + Localization | |
|
| MEG_MIX | MEG variant with multi-feature fusion input | Multiple feature fusion | Recognition + Localization | |
|
|
|
### 2. General CV Networks (Recognition Only) |
|
Classic models migrated from the computer vision field, adapted for underwater acoustic signature recognition tasks: |
|
- DenseNet121 |
|
- MobileNetV2 |
|
- ResNet18 |
|
- ResNet50 |
|
- Swin-Transformer |
|
|
|
## Performance Metrics |
|
| Network | ACC(%) | MAE-R (km) | MAE-D (m) | |
|
|---------|--------|------------|-----------| |
|
| MEG (STFT) | 95.93 | **0.2011** | **20.61** | |
|
| MCL (STFT) | **96.07** | 0.2565 | 27.68 | |
|
| MEG(GFCC) | 95.75 | **0.1707** | **19.43** | |
|
| MCL(GFCC) | **96.10** | 0.3384 | 35.42 | |
|
| densenet121 | 86.61 | - | - | |
|
| resnet18 | 84.99 | - | - | |
|
| mobilenetv2 | 83.60 | - | - | |
|
| resnet50 | 76.34 | - | - | |
|
| swin-transformer | 63.08 | - | - | |
|
|
|
*Note: ACC is recognition accuracy, MAE-R is mean absolute error for range localization, MAE-D is mean absolute error for depth localization* |
|
|
|
## Usage Instructions |
|
|
|
### 1. Model Download |
|
Model weight files can be downloaded from Hugging Face Hub or ModelScope. Complete project code is available through the following links: |
|
- Gitee: <mcurl name="UWTRL-MEG" url="https://gitee.com/open-ocean/UWTRL-MEG"></mcurl> |
|
- GitHub: <mcurl name="UWTRL-MEG" url="https://github.com/Perry44001/UWTRL-MEG"></mcurl> |
|
|
|
### 2. Model Usage |
|
Use the --resume hyperparameter to specify the folder containing weight files, defaulting to loading model.pth |
|
```c |
|
python train_mtl.py --features stft --task_type mtl --resume './models/meg(stft)' |
|
``` |
|
|
|
### 3. Input and Output |
|
- Input: Acoustic features (GFCC/STFT, etc.) |
|
- Output: Target category, range estimation, depth estimation |
|
For detailed input/output formats and training/inference code, please refer to the project repository documentation. |
|
|
|
## Citation Information |
|
The related research paper is under review and is expected to be published in MDPI's *Remote Sensing* journal in September 2025. If using models from this project, please cite the following paper (to be updated after publication): |
|
``` |
|
@article{uwtrl2025, |
|
title={Multi-Task Mixture-of-Experts Model for Underwater Target Localization and Recognition}, |
|
author={Peng Qian, Jingyi Wang, Yining Liu, Yingxuan Chen, Pengjiu Wang, Yanfa Deng, Peng Xiao* and Zhenglin Li}, |
|
journal={Remote Sensing}, |
|
year={2025}, |
|
publisher={MDPI} |
|
} |
|
``` |
|
|
|
## Contact Information |
|
For questions or collaboration inquiries, please contact: [[email protected]] |
|
|
|
--- |
|
*This project is for academic research use only. For commercial use, please contact the authors for authorization.* |
|
|