Spaces:
Running
Running
Commit
·
8cec78f
1
Parent(s):
8c6a471
Migration to RIET-lab/moral-kg-workshop-listenr
Browse files- .gitignore +1 -2
- README.md +0 -1
- SETUP.md +0 -116
- SPEC.md +0 -132
- config.yaml +2 -4
- setup.py +0 -221
- setup.sh +0 -16
- utils/__init__.py +0 -145
- utils/dataset_utils.py +0 -351
- utils/log_utils.py +0 -292
- utils/phase1_utils.py +0 -240
- utils/setup_utils.py +0 -200
- utils/user_utils.py +0 -208
- utils/webhook_utils.py +0 -340
- utils/wipe_utils.py +0 -164
- utils/workspace_utils.py +0 -387
- wipe.py +0 -165
- wipe.sh +0 -16
.gitignore
CHANGED
@@ -1,5 +1,4 @@
|
|
1 |
# Env file nor User config file not to be uploaded to the HF Space!
|
2 |
.env
|
3 |
-
users/
|
4 |
archive/
|
5 |
-
*__pycache__*
|
|
|
1 |
# Env file nor User config file not to be uploaded to the HF Space!
|
2 |
.env
|
|
|
3 |
archive/
|
4 |
+
*__pycache__*
|
README.md
CHANGED
@@ -21,7 +21,6 @@ Part of RIET Lab's initiative to improve AI using moral reasoning.
|
|
21 |
**Organization**: [https://huggingface.co/RIET-lab](https://huggingface.co/RIET-lab)
|
22 |
**Workshop Website**: [https://sites.google.com/view/mereworkshop](https://sites.google.com/view/mereworkshop)
|
23 |
**Argdown Documentation**:[https://argdown.org/guide/](https://argdown.org/guide/)
|
24 |
-
**Listener HF Space**: [https://huggingface.co/spaces/RIET-lab/moral-kg-workshop-listener](https://huggingface.co/spaces/RIET-lab/moral-kg-workshop-listener)
|
25 |
|
26 |
- Creating your own Argilla Spaces, check the [quickstart guide](http://docs.argilla.io/latest/getting_started/quickstart/) and the [Hugging Face Spaces configuration](http://docs.argilla.io/latest/getting_started/how-to-configure-argilla-on-huggingface/) for more details.
|
27 |
- Discovering the Argilla UI, sign in with your Hugging Face account!
|
|
|
21 |
**Organization**: [https://huggingface.co/RIET-lab](https://huggingface.co/RIET-lab)
|
22 |
**Workshop Website**: [https://sites.google.com/view/mereworkshop](https://sites.google.com/view/mereworkshop)
|
23 |
**Argdown Documentation**:[https://argdown.org/guide/](https://argdown.org/guide/)
|
|
|
24 |
|
25 |
- Creating your own Argilla Spaces, check the [quickstart guide](http://docs.argilla.io/latest/getting_started/quickstart/) and the [Hugging Face Spaces configuration](http://docs.argilla.io/latest/getting_started/how-to-configure-argilla-on-huggingface/) for more details.
|
26 |
- Discovering the Argilla UI, sign in with your Hugging Face account!
|
SETUP.md
DELETED
@@ -1,116 +0,0 @@
|
|
1 |
-
# MERe Workshop Setup Guide
|
2 |
-
|
3 |
-
Setup and usage guide for the MERe Workshop dataset annotation process.
|
4 |
-
A very important note: much of this infrastructure is to avoid paying for a space - there is NO persistant storage in `moral-kg-workshop`.
|
5 |
-
|
6 |
-
## Environment
|
7 |
-
|
8 |
-
### Required Environment Variables
|
9 |
-
|
10 |
-
```bash
|
11 |
-
export ARGILLA_API_URL="your-argilla-url"
|
12 |
-
export ARGILLA_API_KEY="your-api-key"
|
13 |
-
export HF_TOKEN="your-huggingface-token"
|
14 |
-
```
|
15 |
-
|
16 |
-
### Optional Environment Variables
|
17 |
-
|
18 |
-
```bash
|
19 |
-
export SLACK_WEBHOOK_URL="your-slack-webhook-url"
|
20 |
-
# For error notifications to slack channel
|
21 |
-
# Requires a custom slack app setup with a webhook url.
|
22 |
-
# See https://api.slack.com/messaging/webhooks
|
23 |
-
```
|
24 |
-
|
25 |
-
### Dependencies
|
26 |
-
|
27 |
-
Install required Python packages:
|
28 |
-
```bash
|
29 |
-
pip install -r requirements.txt
|
30 |
-
```
|
31 |
-
|
32 |
-
## Configuration
|
33 |
-
|
34 |
-
See `config.yaml`
|
35 |
-
|
36 |
-
## Space Setup
|
37 |
-
|
38 |
-
### Complete Setup
|
39 |
-
|
40 |
-
Run all setup operations (users, datasets, webhooks):
|
41 |
-
```bash
|
42 |
-
./setup.sh
|
43 |
-
# or
|
44 |
-
python setup.py
|
45 |
-
```
|
46 |
-
|
47 |
-
### Partial Setup
|
48 |
-
|
49 |
-
Skip specific operations:
|
50 |
-
```bash
|
51 |
-
python setup.py --skip-users # Skip user creation
|
52 |
-
python setup.py --skip-workspaces # Skip workspace creation (breaks dataset allocation)
|
53 |
-
python setup.py --skip-datasets # Skip dataset creation
|
54 |
-
python setup.py --skip-webhooks # Skip webhook creation
|
55 |
-
```
|
56 |
-
|
57 |
-
### Status Check Only
|
58 |
-
|
59 |
-
View current space status without making changes:
|
60 |
-
```bash
|
61 |
-
python setup.py --status-only
|
62 |
-
```
|
63 |
-
|
64 |
-
## Wipe Operations
|
65 |
-
|
66 |
-
### Complete Wipe
|
67 |
-
|
68 |
-
Remove everything (users, workspaces, datasets, webhooks):
|
69 |
-
```bash
|
70 |
-
./wipe.sh
|
71 |
-
# or
|
72 |
-
python3 wipe.py
|
73 |
-
```
|
74 |
-
|
75 |
-
### Selective Wipe
|
76 |
-
|
77 |
-
Remove specific components:
|
78 |
-
```bash
|
79 |
-
python wipe.py --datasets-only # Only datasets
|
80 |
-
python wipe.py --users-only # Only users
|
81 |
-
python wipe.py --webhooks-only # Only webhooks
|
82 |
-
```
|
83 |
-
|
84 |
-
### Force Wipe
|
85 |
-
|
86 |
-
Skip confirmation prompts:
|
87 |
-
```bash
|
88 |
-
python wipe.py --force
|
89 |
-
```
|
90 |
-
|
91 |
-
### Status Check Only
|
92 |
-
|
93 |
-
View current space status without making changes:
|
94 |
-
```bash
|
95 |
-
python wipe.py --status-only
|
96 |
-
```
|
97 |
-
|
98 |
-
## Troubleshooting
|
99 |
-
|
100 |
-
### Debug Mode
|
101 |
-
|
102 |
-
For detailed debugging, set log level to DEBUG in `config.yaml`:
|
103 |
-
```yaml
|
104 |
-
logging:
|
105 |
-
level: "DEBUG"
|
106 |
-
```
|
107 |
-
|
108 |
-
### Status Commands
|
109 |
-
|
110 |
-
```bash
|
111 |
-
# Check status during setup
|
112 |
-
python3 setup.py --status-only
|
113 |
-
|
114 |
-
# Check status during wipe
|
115 |
-
python3 wipe.py --status-only
|
116 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
SPEC.md
DELETED
@@ -1,132 +0,0 @@
|
|
1 |
-
# Moral-kg annotation process setup
|
2 |
-
Notes on the annotation pipeline / data ETL process
|
3 |
-
|
4 |
-
## Annotation Pipeline Architecture
|
5 |
-
Annotation occurs in two phases. During **Phase 1** annotators determine which set
|
6 |
-
of claims best represents the argument of each paper. During **Phase 2** annotators
|
7 |
-
map those claims into an Argument Map.
|
8 |
-
|
9 |
-
### Setup
|
10 |
-
1. Create users and user-specific workspaces based off of `users.csv` list
|
11 |
-
2. Create Phase 1 dataset with records for each user
|
12 |
-
- NOTE: depending on space constraints, we could do one Phase 1 dataset for
|
13 |
-
all users as only Phase 2 is user-response-dependent.
|
14 |
-
3. Create webhooks.
|
15 |
-
|
16 |
-
### Phase 1 Argilla Dataset
|
17 |
-
Creation:
|
18 |
-
- At startup
|
19 |
-
Records Input:
|
20 |
-
- Manual batch input via HF dataset `moral-kg-sample`
|
21 |
-
Response Output:
|
22 |
-
- Real-time webhook to HF dataset `moral-kg-sample-labels`
|
23 |
-
- Real-time webhook to Argilla Phase 2 dataset records
|
24 |
-
Updates:
|
25 |
-
- Only if moral-kg-sample is updated (this is handled manually)
|
26 |
-
Fields:
|
27 |
-
- Title (Author, Year) "title_info"
|
28 |
-
- Text "text"
|
29 |
-
Metadata:
|
30 |
-
- Identifier (visible to annotators) "id"
|
31 |
-
Questions:
|
32 |
-
- TextQuestion "claims"
|
33 |
-
- Users list the claims which best represent the argument in the paper
|
34 |
-
- AI/ML-generated claims are proposed in a list as a suggestion
|
35 |
-
Webhooks:
|
36 |
-
- Listen if the dataset is ever published (it shouldn't be) and notify admin if
|
37 |
-
it is.
|
38 |
-
- Response created/updated/deleted -> update `moral-kg-sample-labels`
|
39 |
-
-> update Argilla Phase 2 records
|
40 |
-
|
41 |
-
### Phase 2 Argilla Dataset
|
42 |
-
Creation:
|
43 |
-
- When the first Phase 1 response is created
|
44 |
-
Records Input:
|
45 |
-
- Real-time webhook `response.created`/`.updated`/`.deleted` from Phase 1
|
46 |
-
Response Output:
|
47 |
-
- Real-time webhook to HF dataset `moral-kg-sample-maps`
|
48 |
-
Updates:
|
49 |
-
- When a Phase 1 response is created/updated/deleted
|
50 |
-
Fields:
|
51 |
-
- Title (Author, Year) "title_info"
|
52 |
-
- Argdown Page "argdown"
|
53 |
-
- Text "text"
|
54 |
-
Metadata:
|
55 |
-
- Identifier (visible to annotators) "id"
|
56 |
-
Questions:
|
57 |
-
- TextQuestion "argmap"
|
58 |
-
- Users are asked to copy and paste their final Argdown input into this box
|
59 |
-
as the solution.
|
60 |
-
Webhooks:
|
61 |
-
- Listen if the dataset is ever published (it shouldn't be) and notify admin if
|
62 |
-
it is.
|
63 |
-
- Response created/updated/deleted -> update `moral-kg-sample-maps`
|
64 |
-
|
65 |
-
## HuggingFace Datasets
|
66 |
-
There are three huggingface datasets that will be involved in the annotation
|
67 |
-
process: `moral-kg-sample`, `moral-kg-sample-labels`, and `moral-kg-sample-maps`.
|
68 |
-
|
69 |
-
### `moral-kg-sample` (private)
|
70 |
-
Will store the data associated with each paper in the sample:
|
71 |
-
- identifier | str | The Phil-Papers ID associated with each paper
|
72 |
-
- title | str | The title of the paper
|
73 |
-
- authors | list:str | The authors attributed to the paper
|
74 |
-
- year | str | The publication year of the paper
|
75 |
-
- text | str | The paper content (in plain text or markdown)
|
76 |
-
- map | dict | The claim:method map that contains each claim
|
77 |
-
extracted from the text and its associated
|
78 |
-
extraction method.
|
79 |
-
|
80 |
-
### `moral-kg-sample-labels` (private)
|
81 |
-
Will store data associated with the claims annotators select for each paper in
|
82 |
-
the sample:
|
83 |
-
- identifier | str | The Phil-Papers ID associated with each paper
|
84 |
-
- annotator | str | The annotator's unique Argilla UUID
|
85 |
-
- map | dict | The claim:method map that contains each claim the
|
86 |
-
annotator selects as representative of the paper.
|
87 |
-
Claims not found in the original map are labeled
|
88 |
-
"annotator"
|
89 |
-
|
90 |
-
### `moral-kg-sample-maps` (private)
|
91 |
-
Will store data associated with the argument maps annotators create for each
|
92 |
-
paper in the sample:
|
93 |
-
- identifier | str | The Phil-Papers ID associated with each paper
|
94 |
-
- annotator | str | The annotator's unique Argilla UUID
|
95 |
-
- argmap | dict | The argument map (in Argdown format) that
|
96 |
-
represents the paper argument structure.
|
97 |
-
|
98 |
-
## Webhooks
|
99 |
-
|
100 |
-
### dataset.published
|
101 |
-
- Stretch goal: implement slack notification. For now just log that a dataset
|
102 |
-
was published.
|
103 |
-
|
104 |
-
### response.created
|
105 |
-
IF data.data.values contains "claims":
|
106 |
-
- This means it is phase 1 response
|
107 |
-
ELSE IF data.data.values contains "argmap":
|
108 |
-
- This means it is phase 2 response
|
109 |
-
|
110 |
-
### response.updated
|
111 |
-
IF data.record.questions.name contains "claims":
|
112 |
-
-
|
113 |
-
ELSE IF data.record.questions.name contains "argmap":
|
114 |
-
-
|
115 |
-
|
116 |
-
### response.deleted
|
117 |
-
IF data.record.questions.name contains "claims":
|
118 |
-
-
|
119 |
-
ELSE IF data.record.questions.name contains "argmap":
|
120 |
-
-
|
121 |
-
|
122 |
-
## Notes, Comments, and Questions
|
123 |
-
- I assume that our ultimate moral-kg dataset, that which makes up the entirety
|
124 |
-
of the KG and will be public, will be in a separate HF dataset.
|
125 |
-
- There are no user event webhooks so we must either:
|
126 |
-
1. batch create users or
|
127 |
-
2. poll every second during the workshop or
|
128 |
-
3. track OAuth sign-ins
|
129 |
-
- Should we put a link to the website pdf alongside its processed text?
|
130 |
-
- For Phase 2 argmap building: ideally we are able to extract the user text
|
131 |
-
inputted into the iFrame but I'm not confident we will be able to so this
|
132 |
-
solution suffices for now.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
config.yaml
CHANGED
@@ -1,4 +1,6 @@
|
|
1 |
# moral-kg-workshop config
|
|
|
|
|
2 |
|
3 |
# File Paths Configuration
|
4 |
paths:
|
@@ -83,10 +85,6 @@ phase1:
|
|
83 |
logging:
|
84 |
level: "INFO"
|
85 |
format: "[%(asctime)s] [%(name)s] [%(levelname)s] - %(message)s"
|
86 |
-
# External library log levels (set to WARNING/ERROR to reduce verbosity)
|
87 |
-
external_libraries:
|
88 |
-
httpx: "WARNING"
|
89 |
-
argilla.sdk: "WARNING"
|
90 |
|
91 |
# Error Handling Configuration
|
92 |
error_handling:
|
|
|
1 |
# moral-kg-workshop config
|
2 |
+
#
|
3 |
+
# NOTE: See moral-kg-workshop-listener config for updates!
|
4 |
|
5 |
# File Paths Configuration
|
6 |
paths:
|
|
|
85 |
logging:
|
86 |
level: "INFO"
|
87 |
format: "[%(asctime)s] [%(name)s] [%(levelname)s] - %(message)s"
|
|
|
|
|
|
|
|
|
88 |
|
89 |
# Error Handling Configuration
|
90 |
error_handling:
|
setup.py
DELETED
@@ -1,221 +0,0 @@
|
|
1 |
-
#!/usr/bin/env python3
|
2 |
-
|
3 |
-
"""
|
4 |
-
setup.py
|
5 |
-
|
6 |
-
Setup script for the MERe Workshop Argilla Hugging Face space. This is the
|
7 |
-
primary annotation pipeline. Creates users, workspaces, datasets, and webhooks.
|
8 |
-
"""
|
9 |
-
|
10 |
-
import argparse
|
11 |
-
import json
|
12 |
-
import os
|
13 |
-
|
14 |
-
from huggingface_hub import HfApi
|
15 |
-
|
16 |
-
from utils import (
|
17 |
-
validate_env,
|
18 |
-
log_operation_success,
|
19 |
-
log_operation_failure,
|
20 |
-
get_status,
|
21 |
-
log_info,
|
22 |
-
log_warning,
|
23 |
-
create_users,
|
24 |
-
create_user_workspaces,
|
25 |
-
create_webhooks,
|
26 |
-
create_phase1_datasets,
|
27 |
-
list_users,
|
28 |
-
get_config,
|
29 |
-
)
|
30 |
-
|
31 |
-
|
32 |
-
def parse_args():
|
33 |
-
"""Parse command line arguments."""
|
34 |
-
parser = argparse.ArgumentParser(
|
35 |
-
description="Setup MERe Workshop Argilla Hugging Face space",
|
36 |
-
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
|
37 |
-
)
|
38 |
-
|
39 |
-
parser.add_argument(
|
40 |
-
"-u", "--skip-users",
|
41 |
-
action="store_true",
|
42 |
-
help="Skip user creation step"
|
43 |
-
)
|
44 |
-
|
45 |
-
parser.add_argument(
|
46 |
-
"-w", "--skip-workspaces",
|
47 |
-
action="store_true",
|
48 |
-
help="Skip workspace creation and user assignment step"
|
49 |
-
)
|
50 |
-
|
51 |
-
parser.add_argument(
|
52 |
-
"-d", "--skip-datasets",
|
53 |
-
action="store_true",
|
54 |
-
help="Skip dataset creation step"
|
55 |
-
)
|
56 |
-
|
57 |
-
parser.add_argument(
|
58 |
-
"-l", "--skip-listener",
|
59 |
-
action="store_true",
|
60 |
-
help="Skip restarting the listener space (skips webhook creation step)."
|
61 |
-
)
|
62 |
-
|
63 |
-
parser.add_argument(
|
64 |
-
"-s",
|
65 |
-
"--status-only",
|
66 |
-
action="store_true",
|
67 |
-
help="Only show current space status, do not perform setup",
|
68 |
-
)
|
69 |
-
|
70 |
-
return parser.parse_args()
|
71 |
-
|
72 |
-
|
73 |
-
def restart_listener():
|
74 |
-
"""Start the RIET-lab/moral-kg-workshop-listener space."""
|
75 |
-
try:
|
76 |
-
api = HfApi(token=os.getenv("HF_TOKEN"))
|
77 |
-
api.restart_space(repo_id="RIET-lab/moral-kg-workshop-listener")
|
78 |
-
log_operation_success("restart listener space", "Space restart initiated successfully")
|
79 |
-
return True
|
80 |
-
except Exception as e:
|
81 |
-
log_operation_failure("restart listener space", e)
|
82 |
-
return False
|
83 |
-
|
84 |
-
|
85 |
-
def show_space_status():
|
86 |
-
"""Display current space status."""
|
87 |
-
status = get_status()
|
88 |
-
|
89 |
-
if "error" in status:
|
90 |
-
log_operation_failure("check space status", status["error"])
|
91 |
-
return False
|
92 |
-
|
93 |
-
print()
|
94 |
-
log_info("=== Current Argilla Space Status ===")
|
95 |
-
log_info(f"Workspaces: {status['workspaces']}")
|
96 |
-
log_info(f"Users: {status['users']}")
|
97 |
-
log_info(f"Datasets: {status['datasets']}")
|
98 |
-
log_info(f"Records: {status['records']}")
|
99 |
-
log_info(f"Webhooks: {status['webhooks']}")
|
100 |
-
print()
|
101 |
-
|
102 |
-
return True
|
103 |
-
|
104 |
-
|
105 |
-
def track_user_info(
|
106 |
-
filepath=None
|
107 |
-
):
|
108 |
-
"""Store Argilla user info to a file or log them if no file is provided."""
|
109 |
-
users = list_users()
|
110 |
-
|
111 |
-
if filepath:
|
112 |
-
try:
|
113 |
-
with open(filepath, 'w', encoding='utf-8') as f:
|
114 |
-
json.dump(users, f, indent=2)
|
115 |
-
log_info(f"User ID map written to {filepath}")
|
116 |
-
except Exception as e:
|
117 |
-
log_operation_failure("map user ids", e)
|
118 |
-
else:
|
119 |
-
log_info(f"User ID map: {users}")
|
120 |
-
|
121 |
-
|
122 |
-
def main():
|
123 |
-
"""Main setup function."""
|
124 |
-
args = parse_args()
|
125 |
-
config = get_config()
|
126 |
-
|
127 |
-
# Validate environment
|
128 |
-
try:
|
129 |
-
validate_env()
|
130 |
-
log_operation_success("setup validation", "Environment validated")
|
131 |
-
except Exception as e:
|
132 |
-
log_operation_failure("setup validation", e)
|
133 |
-
return 1
|
134 |
-
|
135 |
-
# Show current status
|
136 |
-
if not show_space_status():
|
137 |
-
return 1
|
138 |
-
|
139 |
-
# If status-only mode, exit here
|
140 |
-
if args.status_only:
|
141 |
-
return 0
|
142 |
-
|
143 |
-
# Track overall success
|
144 |
-
operations_success = []
|
145 |
-
|
146 |
-
# Step 1: Create users
|
147 |
-
if not args.skip_users:
|
148 |
-
print()
|
149 |
-
log_info("Creating users...")
|
150 |
-
success = create_users()
|
151 |
-
operations_success.append(success)
|
152 |
-
|
153 |
-
if success:
|
154 |
-
log_info("Success: Users created successfully")
|
155 |
-
# Track user profiles after creation so we can map users to their UUIDs
|
156 |
-
track_user_info(config.get('paths', {}).get('users_info', None))
|
157 |
-
else:
|
158 |
-
log_info("Failed: Could not create users")
|
159 |
-
else:
|
160 |
-
log_info("Skipping user creation")
|
161 |
-
|
162 |
-
# Step 2: Create workspaces
|
163 |
-
if not args.skip_workspaces:
|
164 |
-
print()
|
165 |
-
log_info("Creating workspaces and assigning users...")
|
166 |
-
success = create_user_workspaces()
|
167 |
-
operations_success.append(success)
|
168 |
-
|
169 |
-
if success:
|
170 |
-
log_info("Success: Workspaces created and users assigned successfully")
|
171 |
-
else:
|
172 |
-
log_info("Failed: Could not create workspaces and assign users")
|
173 |
-
else:
|
174 |
-
log_info("Skipping workspace creation and user assignment")
|
175 |
-
|
176 |
-
# Step 3: Create datasets
|
177 |
-
if not args.skip_datasets:
|
178 |
-
print()
|
179 |
-
log_info("Creating datasets...")
|
180 |
-
success = create_phase1_datasets()
|
181 |
-
operations_success.append(success)
|
182 |
-
|
183 |
-
if success:
|
184 |
-
log_info("Success: Datasets created successfully")
|
185 |
-
else:
|
186 |
-
log_info("Failed: Could not create datasets")
|
187 |
-
else:
|
188 |
-
log_info("Skipping dataset creation")
|
189 |
-
|
190 |
-
# # Step 4: Restart listener to create webhooks
|
191 |
-
if not args.skip_listener:
|
192 |
-
print()
|
193 |
-
log_info("Restarting RIET-lab/moral-kg-workshop-listener space...")
|
194 |
-
success = restart_listener()
|
195 |
-
if success:
|
196 |
-
log_info("Success: Listener space restart initiated")
|
197 |
-
else:
|
198 |
-
log_info("Failed: Could not restart listener space")
|
199 |
-
return 0 if success else 1
|
200 |
-
|
201 |
-
# Show final status
|
202 |
-
show_space_status()
|
203 |
-
|
204 |
-
# Overall result
|
205 |
-
if operations_success:
|
206 |
-
successful_count = sum(operations_success)
|
207 |
-
total_count = len(operations_success)
|
208 |
-
|
209 |
-
if successful_count == total_count:
|
210 |
-
log_operation_success("complete setup", "All operations completed successfully", send_to_slack=True)
|
211 |
-
return 0
|
212 |
-
else:
|
213 |
-
log_operation_failure("complete setup", Exception("Some or all operations failed"), send_to_slack=True)
|
214 |
-
return 1
|
215 |
-
else:
|
216 |
-
log_operation_success("complete setup", "No operations were required", send_to_slack=True)
|
217 |
-
return 0
|
218 |
-
|
219 |
-
|
220 |
-
if __name__ == "__main__":
|
221 |
-
exit(main())
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
setup.sh
DELETED
@@ -1,16 +0,0 @@
|
|
1 |
-
#!/bin/bash
|
2 |
-
|
3 |
-
# setup.sh
|
4 |
-
#
|
5 |
-
# Shell wrapper for the MERe Workshop setup process.
|
6 |
-
|
7 |
-
set -euo pipefail
|
8 |
-
|
9 |
-
# Get the directory where this script is located
|
10 |
-
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" &> /dev/null && pwd)"
|
11 |
-
|
12 |
-
# Change to the script directory
|
13 |
-
cd "$SCRIPT_DIR"
|
14 |
-
|
15 |
-
# Run the setup script
|
16 |
-
python setup.py "$@"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
utils/__init__.py
DELETED
@@ -1,145 +0,0 @@
|
|
1 |
-
"""
|
2 |
-
utils package for MERe Workshop annotation pipeline
|
3 |
-
|
4 |
-
This package provides utilities for:
|
5 |
-
- Configuration management (setup_utils)
|
6 |
-
- Logging and notifications (log_utils)
|
7 |
-
- Argilla Phase 1 dataset and Hugging Face dataset management (dataset_utils)
|
8 |
-
- User management (users_utils.py)
|
9 |
-
- Argilla webhook management (webhook_utils.py)
|
10 |
-
- Argilla space wiping/status (wipe_utils)
|
11 |
-
"""
|
12 |
-
|
13 |
-
from .setup_utils import (
|
14 |
-
get_root,
|
15 |
-
get_config,
|
16 |
-
get_client,
|
17 |
-
get_hf_api,
|
18 |
-
validate_env,
|
19 |
-
load_users,
|
20 |
-
)
|
21 |
-
from .log_utils import (
|
22 |
-
log_error,
|
23 |
-
log_warning,
|
24 |
-
log_info,
|
25 |
-
log_operation_success,
|
26 |
-
log_operation_failure,
|
27 |
-
log_dataset_operation,
|
28 |
-
log_user_operation,
|
29 |
-
log_webhook_operation,
|
30 |
-
)
|
31 |
-
from .dataset_utils import (
|
32 |
-
create_dataset,
|
33 |
-
delete_datasets,
|
34 |
-
delete_dataset,
|
35 |
-
list_datasets,
|
36 |
-
update_datasets,
|
37 |
-
update_dataset,
|
38 |
-
load_moral_kg_sample,
|
39 |
-
)
|
40 |
-
from .phase1_utils import (
|
41 |
-
create_phase1_datasets,
|
42 |
-
delete_phase1_datasets,
|
43 |
-
update_phase1_datasets,
|
44 |
-
)
|
45 |
-
from .user_utils import (
|
46 |
-
create_users,
|
47 |
-
create_user,
|
48 |
-
delete_users,
|
49 |
-
delete_user,
|
50 |
-
list_users,
|
51 |
-
)
|
52 |
-
from .workspace_utils import (
|
53 |
-
create_workspaces,
|
54 |
-
create_workspace,
|
55 |
-
create_user_workspaces,
|
56 |
-
create_user_workspace,
|
57 |
-
delete_workspaces,
|
58 |
-
delete_workspace,
|
59 |
-
delete_user_workspaces,
|
60 |
-
delete_user_workspace,
|
61 |
-
list_workspaces,
|
62 |
-
list_user_workspaces,
|
63 |
-
)
|
64 |
-
from .webhook_utils import (
|
65 |
-
create_webhooks,
|
66 |
-
create_webhook,
|
67 |
-
delete_webhooks,
|
68 |
-
delete_webhook,
|
69 |
-
list_webhooks,
|
70 |
-
list_webhook_events,
|
71 |
-
update_webhooks,
|
72 |
-
update_webhook,
|
73 |
-
validate_webhooks,
|
74 |
-
webhook_exists,
|
75 |
-
)
|
76 |
-
from .wipe_utils import (
|
77 |
-
get_status,
|
78 |
-
wipe_space,
|
79 |
-
wipe_datasets_only,
|
80 |
-
wipe_users_only,
|
81 |
-
wipe_webhooks_only,
|
82 |
-
)
|
83 |
-
|
84 |
-
__all__ = [
|
85 |
-
"get_root",
|
86 |
-
"get_config",
|
87 |
-
"get_client",
|
88 |
-
"get_hf_api",
|
89 |
-
"validate_env",
|
90 |
-
"load_users",
|
91 |
-
|
92 |
-
"log_error",
|
93 |
-
"log_warning",
|
94 |
-
"log_info",
|
95 |
-
"log_operation_success",
|
96 |
-
"log_operation_failure",
|
97 |
-
"log_dataset_operation",
|
98 |
-
"log_user_operation",
|
99 |
-
"log_webhook_operation",
|
100 |
-
|
101 |
-
"create_phase1_datasets",
|
102 |
-
"create_dataset",
|
103 |
-
"delete_phase1_datasets",
|
104 |
-
"delete_datasets",
|
105 |
-
"delete_dataset",
|
106 |
-
"list_datasets",
|
107 |
-
"update_phase1_datasets",
|
108 |
-
"update_datasets",
|
109 |
-
"update_dataset",
|
110 |
-
"load_moral_kg_sample",
|
111 |
-
|
112 |
-
"create_users",
|
113 |
-
"create_user",
|
114 |
-
"delete_users",
|
115 |
-
"delete_user",
|
116 |
-
"list_users",
|
117 |
-
|
118 |
-
"create_workspaces",
|
119 |
-
"create_workspace",
|
120 |
-
"create_user_workspaces",
|
121 |
-
"create_user_workspace",
|
122 |
-
"delete_workspaces",
|
123 |
-
"delete_workspace",
|
124 |
-
"delete_user_workspaces",
|
125 |
-
"delete_user_workspace",
|
126 |
-
"list_workspaces",
|
127 |
-
"list_user_workspaces",
|
128 |
-
|
129 |
-
"create_webhooks",
|
130 |
-
"create_webhook",
|
131 |
-
"delete_webhooks",
|
132 |
-
"delete_webhook",
|
133 |
-
"list_webhooks",
|
134 |
-
"list_webhook_events",
|
135 |
-
"update_webhooks",
|
136 |
-
"update_webhook",
|
137 |
-
"validate_webhooks",
|
138 |
-
"webhook_exists",
|
139 |
-
|
140 |
-
"get_status",
|
141 |
-
"wipe_space",
|
142 |
-
"wipe_datasets_only",
|
143 |
-
"wipe_users_only",
|
144 |
-
"wipe_webhooks_only",
|
145 |
-
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
utils/dataset_utils.py
DELETED
@@ -1,351 +0,0 @@
|
|
1 |
-
"""
|
2 |
-
dataset_utils.py
|
3 |
-
|
4 |
-
Helper functions for dataset creation and management in the MERe Workshop annotation pipeline.
|
5 |
-
Transformed from create-datasets.py script to follow proper helper function paradigm.
|
6 |
-
"""
|
7 |
-
|
8 |
-
import os
|
9 |
-
import warnings
|
10 |
-
from typing import Dict, List, Optional
|
11 |
-
|
12 |
-
import argilla as rg
|
13 |
-
from datasets import load_dataset
|
14 |
-
|
15 |
-
from .setup_utils import (
|
16 |
-
get_config,
|
17 |
-
get_client,
|
18 |
-
get_hf_api
|
19 |
-
)
|
20 |
-
from .log_utils import (
|
21 |
-
log_info,
|
22 |
-
log_operation_success,
|
23 |
-
log_operation_failure,
|
24 |
-
log_dataset_operation
|
25 |
-
)
|
26 |
-
|
27 |
-
|
28 |
-
# Get config
|
29 |
-
_config = get_config()
|
30 |
-
|
31 |
-
# Get client
|
32 |
-
_client = get_client()
|
33 |
-
|
34 |
-
|
35 |
-
def load_moral_kg_sample(
|
36 |
-
) -> Optional[List[Dict]]:
|
37 |
-
"""Load the moral-kg-sample dataset from HuggingFace."""
|
38 |
-
global config
|
39 |
-
|
40 |
-
dataset_name = _config.get('datasets.sample')
|
41 |
-
if not dataset_name:
|
42 |
-
log_operation_failure("load sample dataset", Exception("Dataset name not configured"))
|
43 |
-
return None
|
44 |
-
|
45 |
-
try:
|
46 |
-
# Setup HF client to ensure authentication
|
47 |
-
get_hf_api()
|
48 |
-
|
49 |
-
dataset = load_dataset(dataset_name, split="train", token=os.getenv("HF_TOKEN"))
|
50 |
-
|
51 |
-
# Convert to list of dictionaries for easier processing
|
52 |
-
records = []
|
53 |
-
for item in dataset:
|
54 |
-
item = dict(item)
|
55 |
-
records.append({
|
56 |
-
'identifier': item.get('identifier'),
|
57 |
-
'title': item.get('title'),
|
58 |
-
'authors': item.get('authors'),
|
59 |
-
'year': item.get('year'),
|
60 |
-
'categories': item.get('categories'),
|
61 |
-
'text': item.get('text'),
|
62 |
-
'map': item.get('map')
|
63 |
-
})
|
64 |
-
|
65 |
-
log_operation_success("load moral-kg-sample dataset", f"Loaded {len(records)} records")
|
66 |
-
return records
|
67 |
-
|
68 |
-
except Exception as e:
|
69 |
-
log_operation_failure("load moral-kg-sample dataset", e)
|
70 |
-
return None
|
71 |
-
|
72 |
-
|
73 |
-
def _get_workspace_names(
|
74 |
-
) -> List[str]:
|
75 |
-
"""Get list of available workspaces."""
|
76 |
-
|
77 |
-
try:
|
78 |
-
global _client
|
79 |
-
workspaces = _client.workspaces
|
80 |
-
workspace_names = [ws.name or "" for ws in workspaces]
|
81 |
-
return workspace_names
|
82 |
-
except Exception as e:
|
83 |
-
log_operation_failure("fetch workspaces", e)
|
84 |
-
return []
|
85 |
-
|
86 |
-
|
87 |
-
def _format_title_info(
|
88 |
-
authors: List[str],
|
89 |
-
year: str,
|
90 |
-
title: str
|
91 |
-
) -> str:
|
92 |
-
"""Format title info as 'Title (Author, Year)'."""
|
93 |
-
# Take first author and add et al. if multiple authors
|
94 |
-
authors_display = authors[0] if authors else "Unknown"
|
95 |
-
if len(authors) > 1:
|
96 |
-
authors_display += " et al."
|
97 |
-
|
98 |
-
return f"{title} ({authors_display}, {year})"
|
99 |
-
|
100 |
-
|
101 |
-
def _check_dataset_exists(
|
102 |
-
workspace_name: str,
|
103 |
-
dataset_name: str
|
104 |
-
) -> bool:
|
105 |
-
"""Check if dataset already exists in workspace."""
|
106 |
-
try:
|
107 |
-
with warnings.catch_warnings():
|
108 |
-
warnings.simplefilter("ignore")
|
109 |
-
workspace = _client.workspaces(workspace_name)
|
110 |
-
|
111 |
-
if workspace:
|
112 |
-
for existing_dataset in workspace.datasets:
|
113 |
-
if existing_dataset.name == dataset_name:
|
114 |
-
return True
|
115 |
-
except Exception:
|
116 |
-
pass
|
117 |
-
return False
|
118 |
-
|
119 |
-
|
120 |
-
def create_dataset(
|
121 |
-
dataset_name: str,
|
122 |
-
workspace_name: Optional[str],
|
123 |
-
settings: rg.Settings,
|
124 |
-
records: Optional[List[Dict]] = None,
|
125 |
-
) -> bool:
|
126 |
-
"""Create a dataset with given settings in specified workspace."""
|
127 |
-
global _client
|
128 |
-
|
129 |
-
try:
|
130 |
-
dataset = rg.Dataset(
|
131 |
-
name=dataset_name,
|
132 |
-
workspace=workspace_name,
|
133 |
-
settings=settings,
|
134 |
-
client=_client,
|
135 |
-
)
|
136 |
-
dataset.create()
|
137 |
-
log_dataset_operation("created", dataset_name, f"in workspace {workspace_name}")
|
138 |
-
|
139 |
-
# Add records if provided
|
140 |
-
if records:
|
141 |
-
dataset.records.log(records)
|
142 |
-
log_operation_success("load records into dataset", f"Added {len(records)} records")
|
143 |
-
|
144 |
-
return True
|
145 |
-
|
146 |
-
except Exception as e:
|
147 |
-
log_operation_failure("create dataset", e)
|
148 |
-
return False
|
149 |
-
|
150 |
-
|
151 |
-
def delete_datasets(
|
152 |
-
dataset_names: Optional[List[str]] = None,
|
153 |
-
workspace_name: Optional[str] = None
|
154 |
-
) -> bool:
|
155 |
-
"""Delete multiple datasets or all datasets if none specified."""
|
156 |
-
global _client
|
157 |
-
|
158 |
-
if dataset_names is None:
|
159 |
-
# Delete all datasets from all workspaces or specific workspace
|
160 |
-
if workspace_name:
|
161 |
-
with warnings.catch_warnings():
|
162 |
-
warnings.simplefilter("ignore")
|
163 |
-
workspace = _client.workspaces(workspace_name)
|
164 |
-
if not workspace:
|
165 |
-
log_operation_failure("delete datasets", Exception(f"Workspace {workspace_name} not found"))
|
166 |
-
return False
|
167 |
-
|
168 |
-
datasets = workspace.datasets
|
169 |
-
dataset_names = [ds.name for ds in datasets if ds.name]
|
170 |
-
|
171 |
-
success_count = 0
|
172 |
-
for ds_name in dataset_names:
|
173 |
-
if delete_dataset(workspace_name, ds_name):
|
174 |
-
success_count += 1
|
175 |
-
|
176 |
-
log_operation_success("delete datasets from workspace",
|
177 |
-
f"Deleted {success_count}/{len(dataset_names)} datasets from {workspace_name}")
|
178 |
-
|
179 |
-
return success_count == len(dataset_names)
|
180 |
-
else:
|
181 |
-
# Get all datasets from all workspaces
|
182 |
-
all_datasets = []
|
183 |
-
for ws in _client.workspaces:
|
184 |
-
ws_name = ws.name
|
185 |
-
if ws_name:
|
186 |
-
datasets = ws.datasets
|
187 |
-
for ds in datasets:
|
188 |
-
if ds.name:
|
189 |
-
all_datasets.append((ws_name, ds.name))
|
190 |
-
|
191 |
-
success_count = 0
|
192 |
-
for ws_name, ds_name in all_datasets:
|
193 |
-
if delete_dataset(ws_name, ds_name):
|
194 |
-
success_count += 1
|
195 |
-
|
196 |
-
log_operation_success("delete all datasets",
|
197 |
-
f"Deleted {success_count}/{len(all_datasets)} datasets")
|
198 |
-
|
199 |
-
return success_count == len(all_datasets)
|
200 |
-
else:
|
201 |
-
# Delete specific datasets
|
202 |
-
if not workspace_name:
|
203 |
-
log_operation_failure("delete datasets", Exception("Workspace name required when specifying dataset names"))
|
204 |
-
return False
|
205 |
-
|
206 |
-
success_count = 0
|
207 |
-
for dataset_name in dataset_names:
|
208 |
-
if delete_dataset(workspace_name, dataset_name):
|
209 |
-
success_count += 1
|
210 |
-
|
211 |
-
log_operation_success("delete datasets",
|
212 |
-
f"Deleted {success_count}/{len(dataset_names)} datasets")
|
213 |
-
|
214 |
-
return success_count == len(dataset_names)
|
215 |
-
|
216 |
-
|
217 |
-
def delete_dataset(
|
218 |
-
workspace_name: str,
|
219 |
-
dataset_name: str
|
220 |
-
) -> bool:
|
221 |
-
"""Delete a specific dataset from a workspace."""
|
222 |
-
try:
|
223 |
-
global _client
|
224 |
-
workspace = _client.workspaces(workspace_name)
|
225 |
-
|
226 |
-
if not workspace:
|
227 |
-
log_operation_failure("delete dataset", Exception(f"Workspace {workspace_name} not found"))
|
228 |
-
return False
|
229 |
-
|
230 |
-
# Find the dataset in workspace
|
231 |
-
dataset = None
|
232 |
-
for ds in workspace.datasets:
|
233 |
-
if ds.name == dataset_name:
|
234 |
-
dataset = ds
|
235 |
-
break
|
236 |
-
|
237 |
-
if not dataset:
|
238 |
-
log_operation_failure("delete dataset", Exception(f"Dataset {dataset_name} not found in workspace {workspace_name}"))
|
239 |
-
return False
|
240 |
-
|
241 |
-
# Delete all records first
|
242 |
-
try:
|
243 |
-
records = list(dataset.records)
|
244 |
-
# Filter out None records to avoid AttributeError
|
245 |
-
records = [r for r in records if r is not None]
|
246 |
-
|
247 |
-
if records:
|
248 |
-
dataset.records.delete(records=records)
|
249 |
-
log_dataset_operation("deleted records", dataset_name, f"{len(records)} records")
|
250 |
-
|
251 |
-
else:
|
252 |
-
log_info(f"No records found in dataset {dataset_name}")
|
253 |
-
except Exception as e:
|
254 |
-
if e is AttributeError:
|
255 |
-
pass
|
256 |
-
|
257 |
-
else:
|
258 |
-
log_operation_failure("delete dataset records", e)
|
259 |
-
|
260 |
-
# Delete the dataset
|
261 |
-
dataset.delete()
|
262 |
-
log_dataset_operation("deleted", dataset_name, f"from workspace {workspace_name}")
|
263 |
-
|
264 |
-
return True
|
265 |
-
|
266 |
-
except Exception as e:
|
267 |
-
log_operation_failure("delete dataset", e)
|
268 |
-
return False
|
269 |
-
|
270 |
-
|
271 |
-
def list_datasets(
|
272 |
-
) -> Dict[str, List[str]]:
|
273 |
-
"""List all datasets grouped by workspace."""
|
274 |
-
global _client
|
275 |
-
|
276 |
-
try:
|
277 |
-
workspace_datasets = {}
|
278 |
-
|
279 |
-
for workspace in _client.workspaces:
|
280 |
-
workspace_name = workspace.name or "Unknown"
|
281 |
-
datasets = [dataset.name for dataset in workspace.datasets if dataset.name]
|
282 |
-
workspace_datasets[workspace_name] = datasets
|
283 |
-
|
284 |
-
log_dataset_operation("listed", f"workspace {workspace_name}",
|
285 |
-
f"Found {len(datasets)} datasets")
|
286 |
-
|
287 |
-
return workspace_datasets
|
288 |
-
|
289 |
-
except Exception as e:
|
290 |
-
log_operation_failure("list datasets", e)
|
291 |
-
return {}
|
292 |
-
|
293 |
-
|
294 |
-
def update_datasets(
|
295 |
-
dataset_updates: List[Dict[str, str]],
|
296 |
-
new_settings: Optional[rg.Settings] = None
|
297 |
-
) -> bool:
|
298 |
-
"""Update multiple datasets."""
|
299 |
-
success_count = 0
|
300 |
-
|
301 |
-
for update_info in dataset_updates:
|
302 |
-
workspace_name = update_info.get('workspace', '')
|
303 |
-
dataset_name = update_info.get('dataset', '')
|
304 |
-
new_workspace = update_info.get('new_workspace')
|
305 |
-
|
306 |
-
if update_dataset(workspace_name, dataset_name, new_settings, new_workspace):
|
307 |
-
success_count += 1
|
308 |
-
|
309 |
-
log_operation_success("update datasets",
|
310 |
-
f"Updated {success_count}/{len(dataset_updates)} datasets")
|
311 |
-
|
312 |
-
return success_count == len(dataset_updates)
|
313 |
-
|
314 |
-
|
315 |
-
def update_dataset(
|
316 |
-
workspace_name: str,
|
317 |
-
dataset_name: str,
|
318 |
-
new_settings: Optional[rg.Settings] = None,
|
319 |
-
new_workspace: Optional[str] = None
|
320 |
-
) -> bool:
|
321 |
-
"""Update a specific dataset's settings or move to new workspace."""
|
322 |
-
global _client
|
323 |
-
|
324 |
-
try:
|
325 |
-
with warnings.catch_warnings():
|
326 |
-
warnings.simplefilter("ignore")
|
327 |
-
workspace = _client.workspaces(workspace_name)
|
328 |
-
dataset = workspace.datasets(dataset_name) #type: ignore
|
329 |
-
|
330 |
-
if not dataset:
|
331 |
-
log_operation_failure("update dataset",
|
332 |
-
Exception(f"Dataset {dataset_name} not found in workspace {workspace_name}"))
|
333 |
-
return False
|
334 |
-
|
335 |
-
# Update settings if provided
|
336 |
-
if new_settings:
|
337 |
-
# Note: Argilla may not support direct settings updates, this might need to be recreate
|
338 |
-
log_operation_success("update dataset settings",
|
339 |
-
f"Attempted to update {dataset_name}")
|
340 |
-
|
341 |
-
# Move to new workspace if provided
|
342 |
-
if new_workspace:
|
343 |
-
# Note: This typically requires recreating the dataset in the new workspace
|
344 |
-
log_operation_success("move dataset workspace",
|
345 |
-
f"Attempted to move {dataset_name} to {new_workspace}")
|
346 |
-
|
347 |
-
return True
|
348 |
-
|
349 |
-
except Exception as e:
|
350 |
-
log_operation_failure("update dataset", e)
|
351 |
-
return False
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
utils/log_utils.py
DELETED
@@ -1,292 +0,0 @@
|
|
1 |
-
"""
|
2 |
-
log_utils.py
|
3 |
-
|
4 |
-
Logging and notification utilities for the MERe Workshop annotation pipeline.
|
5 |
-
Handles error logging, Slack notifications, and webhook data logging.
|
6 |
-
"""
|
7 |
-
|
8 |
-
import logging
|
9 |
-
import os
|
10 |
-
import textwrap
|
11 |
-
from typing import Optional
|
12 |
-
|
13 |
-
import requests
|
14 |
-
|
15 |
-
from .setup_utils import get_config
|
16 |
-
|
17 |
-
|
18 |
-
# Get config
|
19 |
-
config = get_config()
|
20 |
-
|
21 |
-
|
22 |
-
def _setup_logging(
|
23 |
-
) -> logging.Logger:
|
24 |
-
"""Set up logging configuration."""
|
25 |
-
global config
|
26 |
-
log_config = config.get("logging", {})
|
27 |
-
|
28 |
-
# Configure logging
|
29 |
-
logging.basicConfig(
|
30 |
-
level=getattr(logging, log_config.get("level", "INFO")),
|
31 |
-
format=log_config.get(
|
32 |
-
"format", "[%(asctime)s] [%(name)s] [%(levelname)s] - %(message)s"
|
33 |
-
),
|
34 |
-
)
|
35 |
-
|
36 |
-
# Configure external library log levels
|
37 |
-
external_libs = log_config.get("external_libraries", {})
|
38 |
-
for lib_name, log_level in external_libs.items():
|
39 |
-
lib_logger = logging.getLogger(lib_name)
|
40 |
-
lib_logger.setLevel(getattr(logging, log_level.upper()))
|
41 |
-
|
42 |
-
return logging.getLogger("mere_workshop")
|
43 |
-
|
44 |
-
# Get logger
|
45 |
-
logger = _setup_logging()
|
46 |
-
|
47 |
-
|
48 |
-
def _can_print(
|
49 |
-
) -> bool:
|
50 |
-
"""
|
51 |
-
Use to test if you can use the print() function.
|
52 |
-
|
53 |
-
returns True: if you can _print_ in the space or locally
|
54 |
-
return False: if you cannot _print_ in the space or locally
|
55 |
-
"""
|
56 |
-
global config
|
57 |
-
|
58 |
-
return config.get("error_handling.log_data_not_slack", False)
|
59 |
-
|
60 |
-
|
61 |
-
def _can_log(
|
62 |
-
) -> bool:
|
63 |
-
"""
|
64 |
-
Use to test if you can use any log_*() function.
|
65 |
-
|
66 |
-
returns True: if you can _log_ in the space or locally
|
67 |
-
return False: if you cannot _log_ in the space or locally
|
68 |
-
"""
|
69 |
-
global config
|
70 |
-
|
71 |
-
return (config.get("error_handling.log_data_not_slack", False) and
|
72 |
-
config.get("error_handling.force_slack_notifications", False))
|
73 |
-
|
74 |
-
|
75 |
-
def _send_slack_notification(
|
76 |
-
message: str,
|
77 |
-
) -> bool:
|
78 |
-
"""Sends Slack notification if configured."""
|
79 |
-
global config
|
80 |
-
global logger
|
81 |
-
|
82 |
-
slack_webhook_url = os.getenv("SLACK_WEBHOOK_URL")
|
83 |
-
if not slack_webhook_url:
|
84 |
-
if _can_log():
|
85 |
-
logger.warning("SLACK_WEBHOOK_URL not configured, skipping notification")
|
86 |
-
elif _can_print():
|
87 |
-
print(f"SLACK_WEBHOOK_URL not configured, skipping notification")
|
88 |
-
|
89 |
-
return False
|
90 |
-
|
91 |
-
try:
|
92 |
-
payload = {"text": message}
|
93 |
-
response = requests.post(
|
94 |
-
slack_webhook_url,
|
95 |
-
json=payload,
|
96 |
-
headers={"Content-Type": "application/json"},
|
97 |
-
timeout=10,
|
98 |
-
)
|
99 |
-
|
100 |
-
if response.status_code == 200:
|
101 |
-
if _can_log():
|
102 |
-
logger.info(f"Slack notification sent: {message}")
|
103 |
-
elif _can_print():
|
104 |
-
print(f"Slack notification sent: {message}")
|
105 |
-
|
106 |
-
return True
|
107 |
-
else:
|
108 |
-
if _can_log():
|
109 |
-
logger.error(f"Failed to send Slack notification. Status code: {response.status_code}")
|
110 |
-
elif _can_print():
|
111 |
-
print(f"Failed to send Slack notification. Status code: {response.status_code}")
|
112 |
-
|
113 |
-
return False
|
114 |
-
|
115 |
-
except Exception as e:
|
116 |
-
if _can_log():
|
117 |
-
logger.error("Error sending Slack notification", e)
|
118 |
-
elif _can_print():
|
119 |
-
print(f"Error sending Slack notification: {e}")
|
120 |
-
|
121 |
-
return False
|
122 |
-
|
123 |
-
|
124 |
-
def _send_to_slack(
|
125 |
-
send_to_slack: bool,
|
126 |
-
message: str
|
127 |
-
) -> bool:
|
128 |
-
"""Determine if a log should be sent as a slack notification"""
|
129 |
-
global config
|
130 |
-
global logger
|
131 |
-
|
132 |
-
try:
|
133 |
-
if (config.get("error_handling.slack_notifications", False) and
|
134 |
-
(send_to_slack or
|
135 |
-
config.get("error_handling.force_slack_notifications", False))):
|
136 |
-
return _send_slack_notification(message)
|
137 |
-
else:
|
138 |
-
return True
|
139 |
-
|
140 |
-
except Exception as e:
|
141 |
-
logger.error("Error sending Slack notification", e)
|
142 |
-
return False
|
143 |
-
|
144 |
-
|
145 |
-
def log_error(
|
146 |
-
error_msg: str,
|
147 |
-
exception: Optional[Exception] = None,
|
148 |
-
send_to_slack: bool = False,
|
149 |
-
) -> None:
|
150 |
-
"""Log errors."""
|
151 |
-
global config
|
152 |
-
global logger
|
153 |
-
|
154 |
-
if exception:
|
155 |
-
# Format error with indented description using textwrap
|
156 |
-
error_detail = textwrap.indent(str(exception), " ")
|
157 |
-
full_msg = (
|
158 |
-
f"{error_msg}\n Exception: {type(exception).__name__}\n{error_detail}"
|
159 |
-
)
|
160 |
-
else:
|
161 |
-
full_msg = error_msg
|
162 |
-
|
163 |
-
if _can_log():
|
164 |
-
logger.error(full_msg)
|
165 |
-
_send_to_slack(send_to_slack, full_msg)
|
166 |
-
elif _can_print():
|
167 |
-
print(f"[ERROR] {full_msg}")
|
168 |
-
_send_to_slack(send_to_slack, full_msg)
|
169 |
-
|
170 |
-
|
171 |
-
def log_warning(
|
172 |
-
warning_msg: str,
|
173 |
-
send_to_slack: bool = False
|
174 |
-
) -> None:
|
175 |
-
"""Log warnings."""
|
176 |
-
global config
|
177 |
-
global logger
|
178 |
-
|
179 |
-
if _can_log():
|
180 |
-
logger.warning(warning_msg)
|
181 |
-
_send_to_slack(send_to_slack, warning_msg)
|
182 |
-
elif _can_print():
|
183 |
-
print(f"[WARNING] {warning_msg}")
|
184 |
-
_send_to_slack(send_to_slack, warning_msg)
|
185 |
-
|
186 |
-
|
187 |
-
def log_info(
|
188 |
-
info_msg: str,
|
189 |
-
send_to_slack: bool = False
|
190 |
-
) -> None:
|
191 |
-
"""Log information."""
|
192 |
-
global config
|
193 |
-
global logger
|
194 |
-
|
195 |
-
if _can_log():
|
196 |
-
logger.info(info_msg)
|
197 |
-
_send_to_slack(send_to_slack, info_msg)
|
198 |
-
elif _can_print():
|
199 |
-
print(f"[INFO] {info_msg}")
|
200 |
-
_send_to_slack(send_to_slack, info_msg)
|
201 |
-
|
202 |
-
|
203 |
-
def log_operation_success(
|
204 |
-
operation: str,
|
205 |
-
details: Optional[str] = None,
|
206 |
-
send_to_slack: bool = False
|
207 |
-
) -> None:
|
208 |
-
"""Log successful operation."""
|
209 |
-
global config
|
210 |
-
|
211 |
-
msg = f"Successfully completed {operation}"
|
212 |
-
if details:
|
213 |
-
msg += f": {details}"
|
214 |
-
|
215 |
-
log_info(msg)
|
216 |
-
_send_to_slack(send_to_slack, msg)
|
217 |
-
|
218 |
-
|
219 |
-
def log_operation_failure(
|
220 |
-
operation: str,
|
221 |
-
error: Optional[Exception] = None,
|
222 |
-
send_to_slack: bool = False,
|
223 |
-
) -> None:
|
224 |
-
"""Logs failed operation."""
|
225 |
-
global config
|
226 |
-
|
227 |
-
msg = f"Failed to {operation}"
|
228 |
-
|
229 |
-
log_error(msg, error)
|
230 |
-
|
231 |
-
if error:
|
232 |
-
error_detail = textwrap.indent(str(error), " ")
|
233 |
-
full_msg = (
|
234 |
-
f"{msg}\n Exception: {type(error).__name__}\n{error_detail}"
|
235 |
-
)
|
236 |
-
_send_to_slack(send_to_slack, full_msg)
|
237 |
-
else:
|
238 |
-
_send_to_slack(send_to_slack, msg)
|
239 |
-
|
240 |
-
|
241 |
-
def log_dataset_operation(
|
242 |
-
operation: str,
|
243 |
-
dataset_name: str,
|
244 |
-
details: Optional[str] = None,
|
245 |
-
send_to_slack: bool = False,
|
246 |
-
) -> None:
|
247 |
-
"""Log dataset-related operations."""
|
248 |
-
global config
|
249 |
-
global logger
|
250 |
-
|
251 |
-
msg = f"Dataset {operation} ({dataset_name})"
|
252 |
-
if details:
|
253 |
-
msg += f": {details}"
|
254 |
-
|
255 |
-
logger.info(msg)
|
256 |
-
_send_to_slack(send_to_slack, msg)
|
257 |
-
|
258 |
-
|
259 |
-
def log_user_operation(
|
260 |
-
operation: str,
|
261 |
-
username: str,
|
262 |
-
details: Optional[str] = None,
|
263 |
-
send_to_slack: bool = False,
|
264 |
-
) -> None:
|
265 |
-
"""Log user-related operations."""
|
266 |
-
global config
|
267 |
-
global logger
|
268 |
-
|
269 |
-
msg = f"User {operation} ({username})"
|
270 |
-
if details:
|
271 |
-
msg += f": {details}"
|
272 |
-
|
273 |
-
logger.info(msg)
|
274 |
-
_send_to_slack(send_to_slack, msg)
|
275 |
-
|
276 |
-
|
277 |
-
def log_webhook_operation(
|
278 |
-
operation: str,
|
279 |
-
event: str,
|
280 |
-
details: Optional[str] = None,
|
281 |
-
send_to_slack: bool = False,
|
282 |
-
) -> None:
|
283 |
-
"""Log webhook-related operations."""
|
284 |
-
global config
|
285 |
-
global logger
|
286 |
-
|
287 |
-
msg = f"Webhook {operation} ({event})"
|
288 |
-
if details:
|
289 |
-
msg += f": {details}"
|
290 |
-
|
291 |
-
logger.info(msg)
|
292 |
-
_send_to_slack(send_to_slack, msg)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
utils/phase1_utils.py
DELETED
@@ -1,240 +0,0 @@
|
|
1 |
-
"""
|
2 |
-
phase1_utils.py
|
3 |
-
|
4 |
-
Helper functions for Phase 1 dataset creation and management in the MERe Workshop annotation pipeline.
|
5 |
-
"""
|
6 |
-
|
7 |
-
import json
|
8 |
-
import warnings
|
9 |
-
from typing import Dict, List, Optional
|
10 |
-
|
11 |
-
import argilla as rg
|
12 |
-
|
13 |
-
from .setup_utils import (
|
14 |
-
get_config,
|
15 |
-
get_client,
|
16 |
-
)
|
17 |
-
from .log_utils import (
|
18 |
-
log_info,
|
19 |
-
log_operation_success,
|
20 |
-
log_operation_failure,
|
21 |
-
log_dataset_operation
|
22 |
-
)
|
23 |
-
from .dataset_utils import (
|
24 |
-
load_moral_kg_sample,
|
25 |
-
_get_workspace_names,
|
26 |
-
_format_title_info,
|
27 |
-
_check_dataset_exists,
|
28 |
-
create_dataset,
|
29 |
-
delete_dataset,
|
30 |
-
update_dataset
|
31 |
-
)
|
32 |
-
|
33 |
-
|
34 |
-
# Get config and client
|
35 |
-
_config = get_config()
|
36 |
-
_client = get_client()
|
37 |
-
|
38 |
-
|
39 |
-
def _create_phase1_settings(
|
40 |
-
) -> rg.Settings:
|
41 |
-
"""Create the Phase 1 dataset settings from configuration."""
|
42 |
-
global _config
|
43 |
-
phase1_config = _config.phase1
|
44 |
-
|
45 |
-
# Build fields from config
|
46 |
-
fields = []
|
47 |
-
for field_name, field_config in phase1_config.get('fields', {}).items():
|
48 |
-
fields.append(rg.TextField(
|
49 |
-
name=field_config['name'],
|
50 |
-
title=field_config['title'],
|
51 |
-
use_markdown=field_config.get('use_markdown', False)
|
52 |
-
))
|
53 |
-
|
54 |
-
# Build metadata from config
|
55 |
-
metadata = []
|
56 |
-
for meta_name, meta_config in phase1_config.get('metadata', {}).items():
|
57 |
-
metadata.append(rg.TermsMetadataProperty(
|
58 |
-
name=meta_config['name'],
|
59 |
-
title=meta_config['title'],
|
60 |
-
visible_for_annotators=meta_config.get('visible_for_annotators', True)
|
61 |
-
))
|
62 |
-
|
63 |
-
# Build questions from config
|
64 |
-
questions = []
|
65 |
-
for question_name, question_config in phase1_config.get('questions', {}).items():
|
66 |
-
if question_config.get('type') == 'TextQuestion':
|
67 |
-
questions.append(rg.TextQuestion(
|
68 |
-
name=question_config['name'],
|
69 |
-
title=question_config['title'],
|
70 |
-
description=question_config.get('description', ''),
|
71 |
-
required=question_config.get('required', False)
|
72 |
-
))
|
73 |
-
else:
|
74 |
-
log_operation_failure("add question to Phase 1 dataset",
|
75 |
-
Exception("Haven't implemented non TextQuestions into the process."))
|
76 |
-
|
77 |
-
return rg.Settings(
|
78 |
-
guidelines=phase1_config.get('guidelines', ''),
|
79 |
-
fields=fields,
|
80 |
-
metadata=metadata,
|
81 |
-
questions=questions
|
82 |
-
)
|
83 |
-
|
84 |
-
|
85 |
-
def _create_phase1_dataset(
|
86 |
-
workspace_name: str,
|
87 |
-
records: List[Dict]
|
88 |
-
) -> bool:
|
89 |
-
"""Create Phase 1 dataset for a specific workspace."""
|
90 |
-
global _client
|
91 |
-
|
92 |
-
dataset_name = _config.get('phase1.dataset_name', 'Phase 1')
|
93 |
-
|
94 |
-
# Check if dataset already exists
|
95 |
-
if _check_dataset_exists(workspace_name, dataset_name):
|
96 |
-
log_dataset_operation("created", dataset_name, f"in workspace {workspace_name} (already exists)")
|
97 |
-
# Get existing dataset for record loading
|
98 |
-
try:
|
99 |
-
with warnings.catch_warnings():
|
100 |
-
warnings.simplefilter("ignore")
|
101 |
-
workspace = _client.workspaces(workspace_name)
|
102 |
-
if workspace:
|
103 |
-
for existing_dataset in workspace.datasets:
|
104 |
-
if existing_dataset.name == dataset_name:
|
105 |
-
dataset = existing_dataset
|
106 |
-
break
|
107 |
-
except Exception as e:
|
108 |
-
log_operation_failure("get existing dataset", e)
|
109 |
-
return False
|
110 |
-
else:
|
111 |
-
# Create new dataset
|
112 |
-
try:
|
113 |
-
dataset = rg.Dataset(
|
114 |
-
name=dataset_name,
|
115 |
-
workspace=workspace_name,
|
116 |
-
settings=_create_phase1_settings(),
|
117 |
-
client=_client,
|
118 |
-
)
|
119 |
-
dataset.create()
|
120 |
-
log_dataset_operation("created", dataset_name, f"in workspace {workspace_name}")
|
121 |
-
except Exception as e:
|
122 |
-
log_operation_failure("create Phase 1 dataset", e)
|
123 |
-
return False
|
124 |
-
|
125 |
-
# Convert records to Argilla format and load them
|
126 |
-
try:
|
127 |
-
argilla_records = []
|
128 |
-
for record in records:
|
129 |
-
title_info = _format_title_info(
|
130 |
-
record['authors'],
|
131 |
-
record['year'],
|
132 |
-
record['title']
|
133 |
-
).strip()
|
134 |
-
# Parse map from JSON string back to dictionary
|
135 |
-
map_data = json.loads(record['map']) if record['map'] else {}
|
136 |
-
suggestions = list(map_data.keys())
|
137 |
-
|
138 |
-
argilla_record = rg.Record(
|
139 |
-
fields={
|
140 |
-
"title_info": title_info,
|
141 |
-
"text": record['text']
|
142 |
-
},
|
143 |
-
metadata={
|
144 |
-
"id": record['identifier'],
|
145 |
-
"fields": record['categories']
|
146 |
-
},
|
147 |
-
suggestions=[
|
148 |
-
rg.Suggestion(
|
149 |
-
question_name="claims",
|
150 |
-
value="\n\n".join(suggestions)
|
151 |
-
)
|
152 |
-
]
|
153 |
-
)
|
154 |
-
argilla_records.append(argilla_record)
|
155 |
-
|
156 |
-
# Add records to dataset
|
157 |
-
dataset.records.log(argilla_records)
|
158 |
-
log_operation_success("load records into dataset", f"Added {len(argilla_records)} records")
|
159 |
-
|
160 |
-
return True
|
161 |
-
|
162 |
-
except Exception as e:
|
163 |
-
log_operation_failure("load records into dataset", e)
|
164 |
-
return False
|
165 |
-
|
166 |
-
|
167 |
-
def create_phase1_datasets(
|
168 |
-
) -> bool:
|
169 |
-
"""Create Phase 1 datasets for all available workspaces."""
|
170 |
-
try:
|
171 |
-
# Load client and get workspaces
|
172 |
-
workspace_names = _get_workspace_names()
|
173 |
-
|
174 |
-
if not workspace_names:
|
175 |
-
log_operation_failure("create datasets", Exception("No workspaces found"))
|
176 |
-
return False
|
177 |
-
|
178 |
-
# Load records from HuggingFace
|
179 |
-
records = load_moral_kg_sample()
|
180 |
-
if not records:
|
181 |
-
log_operation_failure("create datasets", Exception("Failed to load sample records"))
|
182 |
-
return False
|
183 |
-
|
184 |
-
# Create datasets for each workspace
|
185 |
-
success_count = 0
|
186 |
-
failed_count = 0
|
187 |
-
for workspace_name in workspace_names:
|
188 |
-
if _create_phase1_dataset(workspace_name, records):
|
189 |
-
success_count += 1
|
190 |
-
else:
|
191 |
-
failed_count += 1
|
192 |
-
|
193 |
-
# Use transaction-like logging
|
194 |
-
log_info(f"Create Phase 1 datasets: {success_count} / {len(workspace_names)} succeeded, {failed_count} failed.")
|
195 |
-
|
196 |
-
return success_count == len(workspace_names)
|
197 |
-
|
198 |
-
except Exception as e:
|
199 |
-
log_operation_failure("create datasets for all workspaces", e)
|
200 |
-
return False
|
201 |
-
|
202 |
-
|
203 |
-
def delete_phase1_datasets(
|
204 |
-
) -> bool:
|
205 |
-
"""Delete all Phase 1 datasets from all workspaces."""
|
206 |
-
global config
|
207 |
-
|
208 |
-
dataset_name = _config.get('phase1.dataset_name', 'Phase 1')
|
209 |
-
workspace_names = _get_workspace_names()
|
210 |
-
|
211 |
-
success_count = 0
|
212 |
-
for workspace_name in workspace_names:
|
213 |
-
if delete_dataset(workspace_name, dataset_name):
|
214 |
-
success_count += 1
|
215 |
-
|
216 |
-
log_operation_success("delete Phase 1 datasets",
|
217 |
-
f"Deleted {success_count}/{len(workspace_names)} datasets")
|
218 |
-
|
219 |
-
return success_count == len(workspace_names)
|
220 |
-
|
221 |
-
|
222 |
-
def update_phase1_datasets(
|
223 |
-
new_settings: Optional[rg.Settings] = None,
|
224 |
-
new_workspace: Optional[str] = None
|
225 |
-
) -> bool:
|
226 |
-
"""Update all Phase 1 datasets with new settings or move to new workspace."""
|
227 |
-
global _config
|
228 |
-
|
229 |
-
dataset_name = _config.get('phase1.dataset_name', 'Phase 1')
|
230 |
-
workspace_names = _get_workspace_names()
|
231 |
-
|
232 |
-
success_count = 0
|
233 |
-
for workspace_name in workspace_names:
|
234 |
-
if update_dataset(workspace_name, dataset_name, new_settings, new_workspace):
|
235 |
-
success_count += 1
|
236 |
-
|
237 |
-
log_operation_success("update Phase 1 datasets",
|
238 |
-
f"Updated {success_count}/{len(workspace_names)} datasets")
|
239 |
-
|
240 |
-
return success_count == len(workspace_names)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
utils/setup_utils.py
DELETED
@@ -1,200 +0,0 @@
|
|
1 |
-
"""
|
2 |
-
setup_utils.py
|
3 |
-
|
4 |
-
Initialization utilities for the MERe Workshop annotation pipeline.
|
5 |
-
Handles setup of clients, configuration loading, and environment validation.
|
6 |
-
"""
|
7 |
-
|
8 |
-
import os
|
9 |
-
from pathlib import Path
|
10 |
-
from typing import Any, Dict
|
11 |
-
|
12 |
-
import argilla as rg
|
13 |
-
from huggingface_hub import HfApi
|
14 |
-
import rootutils
|
15 |
-
import yaml
|
16 |
-
|
17 |
-
|
18 |
-
# Setup project root
|
19 |
-
_root = rootutils.setup_root(__file__, indicator=".git", pythonpath=True)
|
20 |
-
|
21 |
-
|
22 |
-
def validate_env(
|
23 |
-
) -> bool:
|
24 |
-
"""Validate that all required environment variables are set."""
|
25 |
-
required_vars = [
|
26 |
-
"ARGILLA_API_URL",
|
27 |
-
"ARGILLA_API_KEY",
|
28 |
-
"HF_TOKEN"]
|
29 |
-
|
30 |
-
missing_vars = [var for var in required_vars if not os.getenv(var)]
|
31 |
-
|
32 |
-
if missing_vars:
|
33 |
-
raise EnvironmentError(
|
34 |
-
f"Missing required environment variables: {', '.join(missing_vars)}"
|
35 |
-
)
|
36 |
-
|
37 |
-
return True
|
38 |
-
|
39 |
-
|
40 |
-
class Config:
|
41 |
-
"""Configuration manager for the MERe Workshop application."""
|
42 |
-
|
43 |
-
def __init__(
|
44 |
-
self,
|
45 |
-
config_path: str = "config.yaml"
|
46 |
-
):
|
47 |
-
self._config_path = config_path
|
48 |
-
self._config = self._load_config()
|
49 |
-
|
50 |
-
def _load_config(
|
51 |
-
self
|
52 |
-
) -> Dict[str, Any] | None:
|
53 |
-
"""Load configuration from YAML file."""
|
54 |
-
if validate_env():
|
55 |
-
config_file = _root / self._config_path
|
56 |
-
|
57 |
-
if not config_file.exists():
|
58 |
-
raise FileNotFoundError(f"Configuration file not found: {config_file}")
|
59 |
-
|
60 |
-
with open(config_file, "r", encoding="utf-8") as f:
|
61 |
-
return yaml.safe_load(f)
|
62 |
-
|
63 |
-
def get(
|
64 |
-
self,
|
65 |
-
key_path: str,
|
66 |
-
default: Any = None
|
67 |
-
) -> Any:
|
68 |
-
"""Get configuration value using dot notation (e.g., 'datasets.sample')."""
|
69 |
-
keys = key_path.split(".")
|
70 |
-
value = self._config
|
71 |
-
|
72 |
-
for key in keys:
|
73 |
-
if isinstance(value, dict) and key in value:
|
74 |
-
value = value[key]
|
75 |
-
else:
|
76 |
-
return default
|
77 |
-
|
78 |
-
return value
|
79 |
-
|
80 |
-
|
81 |
-
@property
|
82 |
-
def datasets(
|
83 |
-
self
|
84 |
-
) -> Dict[str, str]:
|
85 |
-
"""Get dataset configuration."""
|
86 |
-
return self.get("datasets", {})
|
87 |
-
|
88 |
-
|
89 |
-
@property
|
90 |
-
def webhook_events(
|
91 |
-
self
|
92 |
-
) -> Dict[str, Any]:
|
93 |
-
"""Get webhook configuration."""
|
94 |
-
return self.get("webhooks.events", {})
|
95 |
-
|
96 |
-
|
97 |
-
@property
|
98 |
-
def phase1(
|
99 |
-
self
|
100 |
-
) -> Dict[str, Any]:
|
101 |
-
"""Get Phase 1 configuration."""
|
102 |
-
return self.get("phase1", {})
|
103 |
-
|
104 |
-
|
105 |
-
@property
|
106 |
-
def users_config(
|
107 |
-
self
|
108 |
-
) -> Dict[str, Any]:
|
109 |
-
"""Get users configuration."""
|
110 |
-
return self.get("users", {})
|
111 |
-
|
112 |
-
|
113 |
-
@property
|
114 |
-
def paths(
|
115 |
-
self
|
116 |
-
) -> Dict[str, str]:
|
117 |
-
"""Get file paths configuration."""
|
118 |
-
return self.get("paths", {})
|
119 |
-
|
120 |
-
|
121 |
-
# Global config instance
|
122 |
-
_config = Config()
|
123 |
-
|
124 |
-
# Global Argilla client instance
|
125 |
-
_client = None
|
126 |
-
|
127 |
-
# Global Hugging Face API instance
|
128 |
-
_hf_api = None
|
129 |
-
|
130 |
-
|
131 |
-
def get_root(
|
132 |
-
) -> Path:
|
133 |
-
"""Get the project root directory."""
|
134 |
-
return _root
|
135 |
-
|
136 |
-
|
137 |
-
def get_config(
|
138 |
-
) -> Config:
|
139 |
-
"""Get the configuration manager."""
|
140 |
-
return _config
|
141 |
-
|
142 |
-
|
143 |
-
def get_client(
|
144 |
-
) -> rg.Argilla: # type: ignore
|
145 |
-
"""Get the Argilla client."""
|
146 |
-
global _client
|
147 |
-
|
148 |
-
if _client is not None:
|
149 |
-
return _client
|
150 |
-
|
151 |
-
if validate_env():
|
152 |
-
try:
|
153 |
-
_client = rg.Argilla(
|
154 |
-
api_url=os.getenv("ARGILLA_API_URL"),
|
155 |
-
api_key=os.getenv("ARGILLA_API_KEY"),
|
156 |
-
)
|
157 |
-
return _client
|
158 |
-
|
159 |
-
except Exception as e:
|
160 |
-
if "ArgillaCredentialsError" in str(e):
|
161 |
-
print(
|
162 |
-
"\n HINT: Did you wipe/restart the space? If you did, ",
|
163 |
-
"you need to update your Argilla API key!\n"
|
164 |
-
)
|
165 |
-
raise
|
166 |
-
|
167 |
-
def get_hf_api(
|
168 |
-
) -> HfApi: # type: ignore
|
169 |
-
"""Get the HuggingFace API client."""
|
170 |
-
global _hf_api
|
171 |
-
|
172 |
-
if _hf_api is not None:
|
173 |
-
return _hf_api
|
174 |
-
|
175 |
-
if validate_env():
|
176 |
-
_hf_api = HfApi(token=os.getenv("HF_TOKEN"))
|
177 |
-
|
178 |
-
return _hf_api
|
179 |
-
|
180 |
-
|
181 |
-
def load_users(
|
182 |
-
) -> list[Dict[str, str]] | None:
|
183 |
-
"""Load users from CSV file specified in config."""
|
184 |
-
config = get_config()
|
185 |
-
csv_path = config.get("paths.users_csv", "users.csv")
|
186 |
-
|
187 |
-
full_path = _root / csv_path
|
188 |
-
if not full_path.exists():
|
189 |
-
raise FileNotFoundError(f"Users CSV file not found: {full_path}")
|
190 |
-
|
191 |
-
import csv
|
192 |
-
|
193 |
-
users = []
|
194 |
-
with open(full_path, "r", newline="", encoding="utf-8") as csvfile:
|
195 |
-
reader = csv.DictReader(csvfile)
|
196 |
-
for row in reader:
|
197 |
-
user_data = {key.rstrip(): value.rstrip() for key, value in row.items()}
|
198 |
-
users.append(user_data)
|
199 |
-
|
200 |
-
return users
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
utils/user_utils.py
DELETED
@@ -1,208 +0,0 @@
|
|
1 |
-
"""
|
2 |
-
user_utils.py
|
3 |
-
|
4 |
-
Helper functions for user management in the MERe Workshop annotation pipeline.
|
5 |
-
Transformed from create-users.py script to follow proper helper function paradigm.
|
6 |
-
"""
|
7 |
-
|
8 |
-
from typing import Dict, List, Optional
|
9 |
-
|
10 |
-
import argilla as rg
|
11 |
-
|
12 |
-
from .setup_utils import (
|
13 |
-
get_config,
|
14 |
-
get_client,
|
15 |
-
load_users
|
16 |
-
)
|
17 |
-
from .log_utils import (
|
18 |
-
log_info,
|
19 |
-
log_operation_success,
|
20 |
-
log_operation_failure,
|
21 |
-
log_user_operation
|
22 |
-
)
|
23 |
-
|
24 |
-
# Get config
|
25 |
-
_config = get_config()
|
26 |
-
|
27 |
-
# Get client
|
28 |
-
_client = get_client()
|
29 |
-
|
30 |
-
def create_user(
|
31 |
-
user_data: Dict[str, str],
|
32 |
-
) -> bool:
|
33 |
-
"""Create a single user."""
|
34 |
-
global _config
|
35 |
-
global _client
|
36 |
-
|
37 |
-
username = user_data['username']
|
38 |
-
|
39 |
-
# Check if user already exists
|
40 |
-
try:
|
41 |
-
for existing_user in _client.users:
|
42 |
-
if existing_user.username == username:
|
43 |
-
log_user_operation("created", username, f"role: {existing_user.role} (already exists)")
|
44 |
-
log_operation_success("create user", f"{username} (already exists)")
|
45 |
-
return True
|
46 |
-
except Exception:
|
47 |
-
# Continue with creation if check fails
|
48 |
-
pass
|
49 |
-
|
50 |
-
try:
|
51 |
-
# Create user
|
52 |
-
user = rg.User(
|
53 |
-
username=username,
|
54 |
-
first_name=user_data.get('first_name', ''),
|
55 |
-
last_name=user_data.get('last_name', ''),
|
56 |
-
role=user_data.get('role', _config.get('users.default_role', 'annotator')),
|
57 |
-
password=user_data['password']
|
58 |
-
)
|
59 |
-
|
60 |
-
created_user = user.create()
|
61 |
-
log_user_operation("created", username, f"role: {user.role}")
|
62 |
-
|
63 |
-
log_operation_success("create user", username)
|
64 |
-
return True
|
65 |
-
|
66 |
-
except Exception as e:
|
67 |
-
# Check if user already exists
|
68 |
-
error_str = str(e).lower()
|
69 |
-
if "conflict" in error_str or "already exists" in error_str or "not unique" in error_str:
|
70 |
-
log_user_operation("created", username, "role: annotator (already exists)")
|
71 |
-
return True
|
72 |
-
else:
|
73 |
-
log_operation_failure("create user", e)
|
74 |
-
return False
|
75 |
-
|
76 |
-
|
77 |
-
def create_users(
|
78 |
-
users_data: Optional[List[Dict[str, str]]] = None
|
79 |
-
) -> bool:
|
80 |
-
"""Create all users from the CSV file or provided list."""
|
81 |
-
try:
|
82 |
-
if users_data is None:
|
83 |
-
users_data = load_users()
|
84 |
-
|
85 |
-
if not users_data:
|
86 |
-
log_operation_failure("create users", Exception("No users found"))
|
87 |
-
return False
|
88 |
-
|
89 |
-
# Create each user
|
90 |
-
success_count = 0
|
91 |
-
for user_data in users_data:
|
92 |
-
if create_user(user_data):
|
93 |
-
success_count += 1
|
94 |
-
|
95 |
-
log_operation_success("create users",
|
96 |
-
f"Created {success_count}/{len(users_data)} users successfully")
|
97 |
-
|
98 |
-
return success_count == len(users_data)
|
99 |
-
|
100 |
-
except Exception as e:
|
101 |
-
log_operation_failure("create users", e)
|
102 |
-
return False
|
103 |
-
|
104 |
-
|
105 |
-
def delete_user(
|
106 |
-
username: str,
|
107 |
-
skip_admin: bool = True
|
108 |
-
) -> bool:
|
109 |
-
"""Delete a single user."""
|
110 |
-
global _client
|
111 |
-
|
112 |
-
try:
|
113 |
-
# Find and delete user
|
114 |
-
users = _client.users
|
115 |
-
user_to_delete = None
|
116 |
-
user_found = False
|
117 |
-
|
118 |
-
for user in users:
|
119 |
-
if user.username == username:
|
120 |
-
user_found = True
|
121 |
-
if skip_admin:
|
122 |
-
if user.role not in ["owner", "admin"]:
|
123 |
-
user_to_delete = user
|
124 |
-
break
|
125 |
-
else:
|
126 |
-
log_info(f"SKIPPED OWNER or ADMIN ({user.username})")
|
127 |
-
# Skipping admin/owner is considered success
|
128 |
-
return True
|
129 |
-
else:
|
130 |
-
user_to_delete = user
|
131 |
-
break
|
132 |
-
|
133 |
-
|
134 |
-
if not user_found:
|
135 |
-
log_operation_failure("delete user", Exception(f"User {username} not found"))
|
136 |
-
return False
|
137 |
-
|
138 |
-
if not user_to_delete:
|
139 |
-
log_operation_failure("delete user", Exception(f"User {username} could not be deleted"))
|
140 |
-
return False
|
141 |
-
|
142 |
-
# Delete user
|
143 |
-
user_to_delete.delete()
|
144 |
-
log_user_operation("deleted", username)
|
145 |
-
|
146 |
-
return True
|
147 |
-
|
148 |
-
except Exception as e:
|
149 |
-
log_operation_failure("delete user", e)
|
150 |
-
return False
|
151 |
-
|
152 |
-
|
153 |
-
def delete_users(
|
154 |
-
usernames: Optional[List[str]] = None
|
155 |
-
) -> bool:
|
156 |
-
"""Delete all users or specified users."""
|
157 |
-
try:
|
158 |
-
global _client
|
159 |
-
|
160 |
-
if usernames is None:
|
161 |
-
# Delete all users
|
162 |
-
users = _client.users
|
163 |
-
usernames = [user.username for user in users if user.username]
|
164 |
-
|
165 |
-
if not usernames:
|
166 |
-
log_operation_success("delete users", "No users to delete")
|
167 |
-
return True
|
168 |
-
|
169 |
-
# Delete each user
|
170 |
-
success_count = 0
|
171 |
-
for username in usernames:
|
172 |
-
if delete_user(username):
|
173 |
-
success_count += 1
|
174 |
-
|
175 |
-
log_operation_success("delete users",
|
176 |
-
f"Deleted {success_count}/{len(usernames)} users")
|
177 |
-
|
178 |
-
return success_count == len(usernames)
|
179 |
-
|
180 |
-
except Exception as e:
|
181 |
-
log_operation_failure("delete users", e)
|
182 |
-
return False
|
183 |
-
|
184 |
-
|
185 |
-
def list_users(
|
186 |
-
) -> List[Dict[str, str]]:
|
187 |
-
"""List all users with their details."""
|
188 |
-
try:
|
189 |
-
global _client
|
190 |
-
users = _client.users
|
191 |
-
user_list = []
|
192 |
-
|
193 |
-
for user in users:
|
194 |
-
user_info = {
|
195 |
-
'username': user.username or '',
|
196 |
-
'first_name': user.first_name or '',
|
197 |
-
'last_name': user.last_name or '',
|
198 |
-
'role': user.role or '',
|
199 |
-
'id': str(user.id) if user.id else ''
|
200 |
-
}
|
201 |
-
user_list.append(user_info)
|
202 |
-
|
203 |
-
log_user_operation("listed all users", f"Found {len(user_list)} users")
|
204 |
-
return user_list
|
205 |
-
|
206 |
-
except Exception as e:
|
207 |
-
log_operation_failure("list users", e)
|
208 |
-
return []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
utils/webhook_utils.py
DELETED
@@ -1,340 +0,0 @@
|
|
1 |
-
"""
|
2 |
-
webhook_utils.py
|
3 |
-
|
4 |
-
Helper functions for webhook management in the MERe Workshop annotation pipeline.
|
5 |
-
Transformed from create-webhooks.py and related scripts to follow proper helper function paradigm.
|
6 |
-
"""
|
7 |
-
|
8 |
-
import os
|
9 |
-
from typing import List, Optional, Dict
|
10 |
-
|
11 |
-
import argilla as rg
|
12 |
-
|
13 |
-
from .setup_utils import (
|
14 |
-
get_config,
|
15 |
-
get_client
|
16 |
-
)
|
17 |
-
from .log_utils import (
|
18 |
-
log_operation_success,
|
19 |
-
log_operation_failure,
|
20 |
-
log_webhook_operation
|
21 |
-
)
|
22 |
-
|
23 |
-
|
24 |
-
# Setup config
|
25 |
-
_config = get_config()
|
26 |
-
|
27 |
-
# Setup client
|
28 |
-
_client = get_client()
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
def create_webhook(
|
33 |
-
event: str,
|
34 |
-
description: str,
|
35 |
-
) -> Optional[rg.Webhook]:
|
36 |
-
"""Create a webhook for a specific event."""
|
37 |
-
|
38 |
-
global _client
|
39 |
-
|
40 |
-
webhook_url = os.getenv("ARGILLA_WEBHOOK_URL")
|
41 |
-
if not webhook_url:
|
42 |
-
log_operation_failure("create webhook",
|
43 |
-
Exception(f"ARGILLA_WEBHOOK_URL environment variable not set for {event}"))
|
44 |
-
return None
|
45 |
-
|
46 |
-
try:
|
47 |
-
webhook = rg.Webhook(
|
48 |
-
url=webhook_url,
|
49 |
-
events=[event], # type: ignore
|
50 |
-
description=description
|
51 |
-
)
|
52 |
-
|
53 |
-
created_webhook = webhook.create()
|
54 |
-
log_webhook_operation("created", event, description)
|
55 |
-
return created_webhook #type: ignore
|
56 |
-
|
57 |
-
except Exception as e:
|
58 |
-
log_operation_failure("create webhook", e)
|
59 |
-
return None
|
60 |
-
|
61 |
-
|
62 |
-
def list_webhook_events(
|
63 |
-
) -> List[str]:
|
64 |
-
"""Return list of webhook events from configuration."""
|
65 |
-
global _config
|
66 |
-
return _config.get('webhooks.events', [])
|
67 |
-
|
68 |
-
|
69 |
-
def create_webhooks(
|
70 |
-
) -> bool:
|
71 |
-
"""Create webhooks for all configured events."""
|
72 |
-
try:
|
73 |
-
global _client
|
74 |
-
events = list_webhook_events()
|
75 |
-
|
76 |
-
if not events:
|
77 |
-
log_operation_failure("create webhooks",
|
78 |
-
Exception("No webhook events configured"))
|
79 |
-
return False
|
80 |
-
|
81 |
-
webhook_url = os.getenv("ARGILLA_WEBHOOK_URL")
|
82 |
-
if not webhook_url:
|
83 |
-
log_operation_failure("create webhooks",
|
84 |
-
Exception("ARGILLA_WEBHOOK_URL environment variable not set"))
|
85 |
-
return False
|
86 |
-
|
87 |
-
# Create webhooks for each event, recreating if they already exist
|
88 |
-
success_count = 0
|
89 |
-
for event in events:
|
90 |
-
# Check if webhook already exists
|
91 |
-
if webhook_exists(event):
|
92 |
-
log_webhook_operation("already exists", event, "recreating")
|
93 |
-
# Delete existing webhook first
|
94 |
-
for webhook in _client.webhooks:
|
95 |
-
if webhook.events and event in webhook.events:
|
96 |
-
webhook.delete()
|
97 |
-
log_webhook_operation("deleted existing", event)
|
98 |
-
break
|
99 |
-
|
100 |
-
description = f"Webhook for {event} events to {webhook_url}"
|
101 |
-
if create_webhook(event, description):
|
102 |
-
success_count += 1
|
103 |
-
|
104 |
-
log_operation_success("create webhooks",
|
105 |
-
f"Created {success_count}/{len(events)} webhooks successfully")
|
106 |
-
|
107 |
-
return success_count == len(events)
|
108 |
-
|
109 |
-
except Exception as e:
|
110 |
-
log_operation_failure("create webhooks", e)
|
111 |
-
return False
|
112 |
-
|
113 |
-
|
114 |
-
def list_webhooks(
|
115 |
-
) -> List[Dict[str, str]]:
|
116 |
-
"""List all existing webhooks."""
|
117 |
-
try:
|
118 |
-
global _client
|
119 |
-
webhooks = _client.webhooks
|
120 |
-
webhook_list = []
|
121 |
-
|
122 |
-
for webhook in webhooks:
|
123 |
-
webhook_info = {
|
124 |
-
'url': webhook.url or '',
|
125 |
-
'events': ', '.join(webhook.events) if webhook.events else '',
|
126 |
-
'description': webhook.description or ''
|
127 |
-
}
|
128 |
-
webhook_list.append(webhook_info)
|
129 |
-
|
130 |
-
log_webhook_operation("listed all webhooks", f"Found {len(webhook_list)} webhooks")
|
131 |
-
return webhook_list
|
132 |
-
|
133 |
-
except Exception as e:
|
134 |
-
log_operation_failure("list webhooks", e)
|
135 |
-
return []
|
136 |
-
|
137 |
-
|
138 |
-
def delete_webhook(
|
139 |
-
webhook_url: str,
|
140 |
-
webhook_events: List[str],
|
141 |
-
) -> bool:
|
142 |
-
"""Delete a specific webhook by URL and events."""
|
143 |
-
try:
|
144 |
-
global _client
|
145 |
-
# Find webhook by URL and events
|
146 |
-
webhook_to_delete = None
|
147 |
-
for webhook in _client.webhooks:
|
148 |
-
if (webhook.url == webhook_url and
|
149 |
-
webhook.events and
|
150 |
-
set(webhook.events) == set(webhook_events)):
|
151 |
-
webhook_to_delete = webhook
|
152 |
-
break
|
153 |
-
|
154 |
-
if not webhook_to_delete:
|
155 |
-
log_operation_failure("delete webhook",
|
156 |
-
Exception(f"Webhook with URL {webhook_url} and events {webhook_events} not found"))
|
157 |
-
return False
|
158 |
-
|
159 |
-
# Delete webhook
|
160 |
-
webhook_to_delete.delete()
|
161 |
-
log_webhook_operation("deleted", f"{webhook_url} ({', '.join(webhook_events)})")
|
162 |
-
|
163 |
-
return True
|
164 |
-
|
165 |
-
except Exception as e:
|
166 |
-
log_operation_failure("delete webhook", e)
|
167 |
-
return False
|
168 |
-
|
169 |
-
|
170 |
-
def delete_webhooks(
|
171 |
-
webhook_specs: Optional[List[Dict[str, str]]] = None
|
172 |
-
) -> bool:
|
173 |
-
"""Delete all webhooks or specified webhooks."""
|
174 |
-
try:
|
175 |
-
global _client
|
176 |
-
|
177 |
-
if webhook_specs is None:
|
178 |
-
# Delete all webhooks
|
179 |
-
webhooks = _client.webhooks
|
180 |
-
webhook_specs = []
|
181 |
-
for webhook in webhooks:
|
182 |
-
if webhook.url and webhook.events:
|
183 |
-
webhook_specs.append({
|
184 |
-
'url': webhook.url,
|
185 |
-
'events': ','.join(webhook.events)
|
186 |
-
})
|
187 |
-
|
188 |
-
if not webhook_specs:
|
189 |
-
log_operation_success("delete webhooks", "No webhooks to delete")
|
190 |
-
return True
|
191 |
-
|
192 |
-
# Delete each webhook
|
193 |
-
success_count = 0
|
194 |
-
for webhook_spec in webhook_specs:
|
195 |
-
webhook_url = webhook_spec.get('url', '')
|
196 |
-
webhook_events = webhook_spec.get('events', '').split(',') if webhook_spec.get('events') else []
|
197 |
-
|
198 |
-
if delete_webhook(webhook_url, webhook_events):
|
199 |
-
success_count += 1
|
200 |
-
|
201 |
-
log_operation_success("delete webhooks",
|
202 |
-
f"Deleted {success_count}/{len(webhook_specs)} webhooks")
|
203 |
-
|
204 |
-
return success_count == len(webhook_specs)
|
205 |
-
|
206 |
-
except Exception as e:
|
207 |
-
log_operation_failure("delete webhooks", e,)
|
208 |
-
return False
|
209 |
-
|
210 |
-
|
211 |
-
def webhook_exists(
|
212 |
-
event: str
|
213 |
-
) -> bool:
|
214 |
-
"""Check if a webhook already exists for a specific event."""
|
215 |
-
try:
|
216 |
-
global _client
|
217 |
-
webhooks = _client.webhooks
|
218 |
-
|
219 |
-
for webhook in webhooks:
|
220 |
-
if webhook.events and event in webhook.events:
|
221 |
-
log_webhook_operation("found existing", event, f"webhook URL: {webhook.url}")
|
222 |
-
return True
|
223 |
-
|
224 |
-
return False
|
225 |
-
|
226 |
-
except Exception as e:
|
227 |
-
log_operation_failure("check webhook exists", e)
|
228 |
-
return False
|
229 |
-
|
230 |
-
|
231 |
-
def validate_webhooks(
|
232 |
-
) -> bool:
|
233 |
-
"""Validate that webhook configuration is correct."""
|
234 |
-
try:
|
235 |
-
# Check if webhook URL is set
|
236 |
-
webhook_url = os.getenv("ARGILLA_WEBHOOK_URL")
|
237 |
-
if not webhook_url:
|
238 |
-
log_operation_failure("validate webhook config", Exception("ARGILLA_WEBHOOK_URL environment variable not set"))
|
239 |
-
return False
|
240 |
-
|
241 |
-
# Check if events are configured
|
242 |
-
events = list_webhook_events()
|
243 |
-
if not events:
|
244 |
-
log_operation_failure("validate webhook config", Exception("No webhook events configured"))
|
245 |
-
return False
|
246 |
-
|
247 |
-
# Check if Argilla client can be created
|
248 |
-
try:
|
249 |
-
get_client()
|
250 |
-
except Exception as e:
|
251 |
-
log_operation_failure("validate webhook config", Exception(f"Cannot create Argilla client: {str(e)}"))
|
252 |
-
return False
|
253 |
-
|
254 |
-
log_operation_success("validate webhook config", f"Configuration valid for {len(events)} events")
|
255 |
-
return True
|
256 |
-
|
257 |
-
except Exception as e:
|
258 |
-
log_operation_failure("validate webhook config", e)
|
259 |
-
return False
|
260 |
-
|
261 |
-
|
262 |
-
def update_webhook(
|
263 |
-
webhook_url: str,
|
264 |
-
webhook_events: List[str],
|
265 |
-
new_url: Optional[str] = None,
|
266 |
-
new_events: Optional[List[str]] = None,
|
267 |
-
new_description: Optional[str] = None,
|
268 |
-
) -> bool:
|
269 |
-
"""Update a webhook's properties by recreating it (since Argilla doesn't support direct updates)."""
|
270 |
-
try:
|
271 |
-
global _client
|
272 |
-
# Find webhook
|
273 |
-
webhook = None
|
274 |
-
for w in _client.webhooks:
|
275 |
-
if (w.url == webhook_url and
|
276 |
-
w.events and
|
277 |
-
set(w.events) == set(webhook_events)):
|
278 |
-
webhook = w
|
279 |
-
break
|
280 |
-
|
281 |
-
if not webhook:
|
282 |
-
log_operation_failure("update webhook",
|
283 |
-
Exception(f"Webhook with URL {webhook_url} and events {webhook_events} not found"))
|
284 |
-
return False
|
285 |
-
|
286 |
-
# Since Argilla doesn't support direct webhook updates, we need to recreate
|
287 |
-
# First delete the existing webhook
|
288 |
-
webhook.delete()
|
289 |
-
log_webhook_operation("deleted for update", f"{webhook_url} ({', '.join(webhook_events)})")
|
290 |
-
|
291 |
-
# Create new webhook with updated properties
|
292 |
-
final_url = new_url if new_url else webhook_url
|
293 |
-
final_events = new_events if new_events else webhook_events
|
294 |
-
final_description = new_description if new_description else webhook.description
|
295 |
-
|
296 |
-
for event in final_events:
|
297 |
-
description = final_description or f"Webhook for {event} events to {final_url}"
|
298 |
-
new_webhook = rg.Webhook(
|
299 |
-
url=final_url,
|
300 |
-
events=[event], # type: ignore
|
301 |
-
description=description
|
302 |
-
)
|
303 |
-
new_webhook.create()
|
304 |
-
|
305 |
-
updates = []
|
306 |
-
if new_url:
|
307 |
-
updates.append(f"url: {new_url}")
|
308 |
-
if new_events:
|
309 |
-
updates.append(f"events: {', '.join(new_events)}")
|
310 |
-
if new_description:
|
311 |
-
updates.append(f"description: {new_description}")
|
312 |
-
|
313 |
-
log_operation_success("update webhook", f"{webhook_url} - {', '.join(updates)}")
|
314 |
-
|
315 |
-
return True
|
316 |
-
|
317 |
-
except Exception as e:
|
318 |
-
log_operation_failure("update webhook", e)
|
319 |
-
return False
|
320 |
-
|
321 |
-
|
322 |
-
def update_webhooks(
|
323 |
-
webhook_updates: List[Dict[str, str]]
|
324 |
-
) -> bool:
|
325 |
-
"""Update multiple webhooks."""
|
326 |
-
success_count = 0
|
327 |
-
for update_info in webhook_updates:
|
328 |
-
webhook_url = update_info.get('url', '')
|
329 |
-
webhook_events = update_info.get('events', '').split(',') if update_info.get('events') else []
|
330 |
-
new_url = update_info.get('new_url')
|
331 |
-
new_events = update_info.get('new_events', '').split(',') if update_info.get('new_events') else None
|
332 |
-
new_description = update_info.get('new_description')
|
333 |
-
|
334 |
-
if update_webhook(webhook_url, webhook_events, new_url, new_events, new_description):
|
335 |
-
success_count += 1
|
336 |
-
|
337 |
-
log_operation_success("update webhooks",
|
338 |
-
f"Updated {success_count}/{len(webhook_updates)} webhooks")
|
339 |
-
|
340 |
-
return success_count == len(webhook_updates)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
utils/wipe_utils.py
DELETED
@@ -1,164 +0,0 @@
|
|
1 |
-
"""
|
2 |
-
wipe_utils.py
|
3 |
-
|
4 |
-
Helper functions for wiping/cleaning Argilla space in the MERe Workshop annotation pipeline.
|
5 |
-
Transformed from wipe-space.py script to follow proper helper function paradigm.
|
6 |
-
"""
|
7 |
-
|
8 |
-
from .setup_utils import get_client
|
9 |
-
from .dataset_utils import delete_datasets
|
10 |
-
from .user_utils import delete_users
|
11 |
-
from .webhook_utils import delete_webhooks
|
12 |
-
from .workspace_utils import delete_workspaces
|
13 |
-
from .log_utils import (
|
14 |
-
log_operation_success,
|
15 |
-
log_operation_failure,
|
16 |
-
)
|
17 |
-
|
18 |
-
|
19 |
-
# Setup client
|
20 |
-
_client = get_client()
|
21 |
-
|
22 |
-
|
23 |
-
def wipe_space(
|
24 |
-
) -> bool:
|
25 |
-
"""Completely wipe the Argilla space - datasets, users, workspaces, and webhooks."""
|
26 |
-
try:
|
27 |
-
# Track success of each operation
|
28 |
-
operations = [
|
29 |
-
("datasets", delete_datasets),
|
30 |
-
("webhooks", delete_webhooks),
|
31 |
-
("users", delete_users),
|
32 |
-
("workspaces", delete_workspaces)
|
33 |
-
]
|
34 |
-
|
35 |
-
operations_results = {}
|
36 |
-
|
37 |
-
# Execute each operation and continue even if one fails
|
38 |
-
for operation_name, operation_func in operations:
|
39 |
-
try:
|
40 |
-
success = operation_func()
|
41 |
-
operations_results[operation_name] = success
|
42 |
-
if success:
|
43 |
-
log_operation_success(f"wipe {operation_name}", "Operation completed successfully")
|
44 |
-
else:
|
45 |
-
log_operation_failure(f"wipe {operation_name}", Exception("Operation completed with some failures"))
|
46 |
-
except Exception as e:
|
47 |
-
operations_results[operation_name] = False
|
48 |
-
log_operation_failure(f"wipe {operation_name}", e)
|
49 |
-
|
50 |
-
# Calculate summary
|
51 |
-
successful_ops = sum(1 for success in operations_results.values() if success)
|
52 |
-
total_ops = len(operations_results)
|
53 |
-
|
54 |
-
if successful_ops == total_ops:
|
55 |
-
log_operation_success("wipe entire Argilla space", "All components deleted successfully")
|
56 |
-
return True
|
57 |
-
else:
|
58 |
-
failed_ops = [name for name, success in operations_results.items() if not success]
|
59 |
-
log_operation_failure("wipe entire Argilla space",
|
60 |
-
Exception(f"{total_ops - successful_ops}/{total_ops} operations failed: {', '.join(failed_ops)}"))
|
61 |
-
# Return True if at least some operations succeeded
|
62 |
-
return successful_ops > 0
|
63 |
-
|
64 |
-
except Exception as e:
|
65 |
-
log_operation_failure("wipe entire Argilla space", e)
|
66 |
-
return False
|
67 |
-
|
68 |
-
|
69 |
-
def wipe_datasets_only(
|
70 |
-
) -> bool:
|
71 |
-
"""Wipe only datasets, keeping users and workspaces."""
|
72 |
-
try:
|
73 |
-
success = delete_datasets()
|
74 |
-
|
75 |
-
if success:
|
76 |
-
log_operation_success("wipe datasets only", "All datasets deleted successfully")
|
77 |
-
else:
|
78 |
-
log_operation_failure("wipe datasets only", Exception("Some datasets could not be deleted"))
|
79 |
-
|
80 |
-
return success
|
81 |
-
|
82 |
-
except Exception as e:
|
83 |
-
log_operation_failure("wipe datasets only", e)
|
84 |
-
return False
|
85 |
-
|
86 |
-
|
87 |
-
def wipe_users_only(
|
88 |
-
) -> bool:
|
89 |
-
"""Wipe only users, keeping datasets and workspaces."""
|
90 |
-
try:
|
91 |
-
success = delete_users()
|
92 |
-
|
93 |
-
if success:
|
94 |
-
log_operation_success("wipe users only", "All users deleted successfully")
|
95 |
-
else:
|
96 |
-
log_operation_failure("wipe users only", Exception("Some users could not be deleted"))
|
97 |
-
|
98 |
-
return success
|
99 |
-
|
100 |
-
except Exception as e:
|
101 |
-
log_operation_failure("wipe users only", e)
|
102 |
-
return False
|
103 |
-
|
104 |
-
|
105 |
-
def wipe_webhooks_only(
|
106 |
-
) -> bool:
|
107 |
-
"""Wipe only webhooks, keeping everything else."""
|
108 |
-
try:
|
109 |
-
success = delete_webhooks()
|
110 |
-
|
111 |
-
if success:
|
112 |
-
log_operation_success("wipe webhooks only", "All webhooks deleted successfully")
|
113 |
-
else:
|
114 |
-
log_operation_failure("wipe webhooks only", Exception("Some webhooks could not be deleted"))
|
115 |
-
|
116 |
-
return success
|
117 |
-
|
118 |
-
except Exception as e:
|
119 |
-
log_operation_failure("wipe webhooks only", e)
|
120 |
-
return False
|
121 |
-
|
122 |
-
|
123 |
-
def get_status(
|
124 |
-
) -> dict:
|
125 |
-
"""Get current status of the Argilla space (counts of datasets, users, etc.)."""
|
126 |
-
try:
|
127 |
-
global _client
|
128 |
-
|
129 |
-
# Count datasets across all workspaces
|
130 |
-
total_datasets = 0
|
131 |
-
total_records = 0
|
132 |
-
for workspace in _client.workspaces:
|
133 |
-
workspace_datasets = workspace.datasets
|
134 |
-
total_datasets += len(workspace_datasets)
|
135 |
-
|
136 |
-
for dataset in workspace_datasets:
|
137 |
-
try:
|
138 |
-
records = list(dataset.records)
|
139 |
-
total_records += len(records)
|
140 |
-
except Exception:
|
141 |
-
# Skip if can't access records
|
142 |
-
pass
|
143 |
-
|
144 |
-
status = {
|
145 |
-
'workspaces': len(_client.workspaces),
|
146 |
-
'users': len(_client.users),
|
147 |
-
'datasets': total_datasets,
|
148 |
-
'records': total_records,
|
149 |
-
'webhooks': len(_client.webhooks)
|
150 |
-
}
|
151 |
-
|
152 |
-
log_operation_success("get space status", f"Status retrieved: {status}")
|
153 |
-
return status
|
154 |
-
|
155 |
-
except Exception as e:
|
156 |
-
log_operation_failure("get space status", e)
|
157 |
-
return {
|
158 |
-
'workspaces': 0,
|
159 |
-
'users': 0,
|
160 |
-
'datasets': 0,
|
161 |
-
'records': 0,
|
162 |
-
'webhooks': 0,
|
163 |
-
'error': str(e)
|
164 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
utils/workspace_utils.py
DELETED
@@ -1,387 +0,0 @@
|
|
1 |
-
"""
|
2 |
-
workspace_utils.py
|
3 |
-
|
4 |
-
Helper functions for workspace management in the MERe Workshop annotation pipeline.
|
5 |
-
Handles workspace creation, deletion, user assignment, and management operations.
|
6 |
-
"""
|
7 |
-
|
8 |
-
from typing import Dict, List, Optional
|
9 |
-
import warnings
|
10 |
-
|
11 |
-
import argilla as rg
|
12 |
-
|
13 |
-
from .setup_utils import (
|
14 |
-
get_client,
|
15 |
-
load_users
|
16 |
-
)
|
17 |
-
from .log_utils import (
|
18 |
-
log_operation_success,
|
19 |
-
log_operation_failure,
|
20 |
-
log_user_operation
|
21 |
-
)
|
22 |
-
|
23 |
-
|
24 |
-
# Setup client
|
25 |
-
_client = get_client()
|
26 |
-
|
27 |
-
|
28 |
-
def create_workspace(
|
29 |
-
workspace_name: str,
|
30 |
-
) -> bool:
|
31 |
-
"""Create a single workspace."""
|
32 |
-
global _client
|
33 |
-
|
34 |
-
# Check if workspace already exists
|
35 |
-
try:
|
36 |
-
with warnings.catch_warnings():
|
37 |
-
warnings.simplefilter("ignore")
|
38 |
-
existing_workspace = _client.workspaces(workspace_name)
|
39 |
-
if existing_workspace:
|
40 |
-
log_operation_success("create workspace", f"{workspace_name} (already exists)")
|
41 |
-
return True
|
42 |
-
except Exception:
|
43 |
-
# Workspace doesn't exist, continue with creation
|
44 |
-
pass
|
45 |
-
|
46 |
-
try:
|
47 |
-
workspace = rg.Workspace(name=workspace_name)
|
48 |
-
workspace.create()
|
49 |
-
log_operation_success("create workspace", workspace_name)
|
50 |
-
return True
|
51 |
-
|
52 |
-
except Exception as e:
|
53 |
-
# Check if workspace already exists
|
54 |
-
error_str = str(e).lower()
|
55 |
-
if "conflict" in error_str or "already exists" in error_str or "not unique" in error_str:
|
56 |
-
log_operation_success("create workspace", f"{workspace_name} (already exists)")
|
57 |
-
return True
|
58 |
-
else:
|
59 |
-
log_operation_failure("create workspace", e)
|
60 |
-
return False
|
61 |
-
|
62 |
-
|
63 |
-
def create_workspaces(
|
64 |
-
workspace_names: List[str]
|
65 |
-
) -> bool:
|
66 |
-
"""Create multiple workspaces from a list of workspace names."""
|
67 |
-
global _client
|
68 |
-
|
69 |
-
success_count = 0
|
70 |
-
for workspace_name in workspace_names:
|
71 |
-
if create_workspace(workspace_name):
|
72 |
-
success_count += 1
|
73 |
-
|
74 |
-
log_operation_success("create workspaces",
|
75 |
-
f"Created {success_count}/{len(workspace_names)} workspaces")
|
76 |
-
|
77 |
-
return success_count == len(workspace_names)
|
78 |
-
|
79 |
-
|
80 |
-
def create_user_workspace(
|
81 |
-
username: str,
|
82 |
-
workspace_name: str
|
83 |
-
) -> bool:
|
84 |
-
"""Add a user to a specific workspace."""
|
85 |
-
global _client
|
86 |
-
|
87 |
-
|
88 |
-
try:
|
89 |
-
# Find user
|
90 |
-
user = None
|
91 |
-
for u in _client.users:
|
92 |
-
if u.username == username:
|
93 |
-
user = u
|
94 |
-
break
|
95 |
-
|
96 |
-
if not user:
|
97 |
-
log_operation_failure("add user to workspace", Exception(f"User {username} not found"))
|
98 |
-
return False
|
99 |
-
|
100 |
-
# Find workspace
|
101 |
-
with warnings.catch_warnings():
|
102 |
-
warnings.simplefilter("ignore")
|
103 |
-
workspace = _client.workspaces(workspace_name)
|
104 |
-
if not workspace:
|
105 |
-
log_operation_failure("add user to workspace", Exception(f"Workspace {workspace_name} not found"))
|
106 |
-
return False
|
107 |
-
|
108 |
-
# Check if user is already in workspace
|
109 |
-
try:
|
110 |
-
workspace_users = list(workspace.users)
|
111 |
-
for existing_user in workspace_users:
|
112 |
-
if existing_user.username == username:
|
113 |
-
log_user_operation("added to workspace", username, f"{workspace_name} (already assigned)")
|
114 |
-
return True
|
115 |
-
except Exception:
|
116 |
-
# Continue if check fails
|
117 |
-
pass
|
118 |
-
|
119 |
-
# Add user to workspace
|
120 |
-
workspace.add_user(user) #type: ignore
|
121 |
-
log_user_operation("added to workspace", username, workspace_name)
|
122 |
-
|
123 |
-
return True
|
124 |
-
|
125 |
-
except Exception as e:
|
126 |
-
# Check if user already in workspace
|
127 |
-
error_str = str(e).lower()
|
128 |
-
if "conflict" in error_str or "already" in error_str:
|
129 |
-
log_user_operation("added to workspace", username, f"{workspace_name} (already assigned)")
|
130 |
-
return True
|
131 |
-
else:
|
132 |
-
log_operation_failure("add user to workspace", e)
|
133 |
-
return False
|
134 |
-
|
135 |
-
|
136 |
-
def create_user_workspaces(
|
137 |
-
user_workspace_map: Optional[Dict[str, List[str]]] = None
|
138 |
-
) -> bool:
|
139 |
-
"""Create workspaces for users based on mapping or CSV data."""
|
140 |
-
|
141 |
-
if user_workspace_map is None:
|
142 |
-
# Load from CSV and create user workspaces based on usernames
|
143 |
-
users = load_users()
|
144 |
-
if not users:
|
145 |
-
log_operation_failure("create user workspaces", Exception("No users found in CSV"))
|
146 |
-
return False
|
147 |
-
|
148 |
-
success_count = 0
|
149 |
-
total_count = 0
|
150 |
-
|
151 |
-
for user_data in users:
|
152 |
-
username = user_data['username']
|
153 |
-
# Create workspace with username as workspace name
|
154 |
-
total_count += 1
|
155 |
-
if create_workspace(username):
|
156 |
-
# Add user to their workspace
|
157 |
-
if create_user_workspace(username, username):
|
158 |
-
success_count += 1
|
159 |
-
|
160 |
-
log_operation_success("create user workspaces from CSV",
|
161 |
-
f"Created {success_count}/{total_count} user workspaces")
|
162 |
-
|
163 |
-
return success_count == total_count
|
164 |
-
else:
|
165 |
-
# Use provided mapping
|
166 |
-
success_count = 0
|
167 |
-
total_count = 0
|
168 |
-
|
169 |
-
for username, workspace_names in user_workspace_map.items():
|
170 |
-
for workspace_name in workspace_names:
|
171 |
-
total_count += 1
|
172 |
-
if create_user_workspace(username, workspace_name):
|
173 |
-
success_count += 1
|
174 |
-
|
175 |
-
log_operation_success("create user workspaces from mapping",
|
176 |
-
f"Added users to {success_count}/{total_count} workspaces")
|
177 |
-
|
178 |
-
return success_count == total_count
|
179 |
-
|
180 |
-
|
181 |
-
def delete_workspace(
|
182 |
-
workspace_name: str, client: Optional[rg.Argilla] = None
|
183 |
-
) -> bool:
|
184 |
-
"""Delete a single workspace."""
|
185 |
-
global _client
|
186 |
-
|
187 |
-
try:
|
188 |
-
with warnings.catch_warnings():
|
189 |
-
warnings.simplefilter("ignore")
|
190 |
-
workspace = _client.workspaces(workspace_name)
|
191 |
-
if not workspace:
|
192 |
-
log_operation_failure("delete workspace", Exception(f"Workspace {workspace_name} not found"))
|
193 |
-
return False
|
194 |
-
|
195 |
-
# Check for remaining datasets first
|
196 |
-
try:
|
197 |
-
datasets = list(workspace.datasets)
|
198 |
-
if datasets:
|
199 |
-
dataset_names = [ds.name for ds in datasets if ds.name]
|
200 |
-
log_operation_failure("delete workspace",
|
201 |
-
Exception(f"Workspace {workspace_name} still has datasets: {', '.join(dataset_names)}. Delete datasets first."))
|
202 |
-
return False
|
203 |
-
except Exception as e:
|
204 |
-
# If we can't check datasets, try to continue
|
205 |
-
log_operation_failure("check workspace datasets", e)
|
206 |
-
|
207 |
-
# Remove all users from workspace first
|
208 |
-
try:
|
209 |
-
workspace_users = list(workspace.users)
|
210 |
-
for user in workspace_users:
|
211 |
-
try:
|
212 |
-
workspace.remove_user(user)
|
213 |
-
log_user_operation("removed from workspace", user.username or f"User-{user.id}", workspace_name)
|
214 |
-
except Exception as e:
|
215 |
-
log_operation_failure("remove user from workspace", e)
|
216 |
-
except Exception as e:
|
217 |
-
# Continue if user removal fails
|
218 |
-
log_operation_failure("remove users from workspace", e)
|
219 |
-
|
220 |
-
# Delete the workspace
|
221 |
-
workspace.delete()
|
222 |
-
log_operation_success("delete workspace", workspace_name)
|
223 |
-
|
224 |
-
return True
|
225 |
-
|
226 |
-
except Exception as e:
|
227 |
-
# Check if it's a dependency error
|
228 |
-
error_str = str(e).lower()
|
229 |
-
if "has some datasets linked" in error_str or "dependency" in error_str:
|
230 |
-
log_operation_failure("delete workspace",
|
231 |
-
Exception(f"Workspace {workspace_name} cannot be deleted due to remaining dependencies"))
|
232 |
-
else:
|
233 |
-
log_operation_failure("delete workspace", e)
|
234 |
-
return False
|
235 |
-
|
236 |
-
|
237 |
-
def delete_workspaces(
|
238 |
-
workspace_names: Optional[List[str]] = None
|
239 |
-
) -> bool:
|
240 |
-
"""Delete multiple workspaces or all workspaces if none specified."""
|
241 |
-
global _client
|
242 |
-
if workspace_names is None:
|
243 |
-
# Delete all workspaces
|
244 |
-
workspaces = _client.workspaces
|
245 |
-
workspace_names = [ws.name for ws in workspaces if ws.name]
|
246 |
-
|
247 |
-
success_count = 0
|
248 |
-
for workspace_name in workspace_names:
|
249 |
-
if delete_workspace(workspace_name):
|
250 |
-
success_count += 1
|
251 |
-
|
252 |
-
log_operation_success("delete workspaces",
|
253 |
-
f"Deleted {success_count}/{len(workspace_names)} workspaces")
|
254 |
-
|
255 |
-
return success_count == len(workspace_names)
|
256 |
-
|
257 |
-
|
258 |
-
def delete_user_workspace(
|
259 |
-
username: str,
|
260 |
-
workspace_name: str,
|
261 |
-
delete_if_empty: bool = True
|
262 |
-
) -> bool:
|
263 |
-
"""Remove a user from a workspace and optionally delete workspace if empty."""
|
264 |
-
global _client
|
265 |
-
|
266 |
-
try:
|
267 |
-
# Find user
|
268 |
-
user = None
|
269 |
-
for u in _client.users:
|
270 |
-
if u.username == username:
|
271 |
-
user = u
|
272 |
-
break
|
273 |
-
|
274 |
-
if not user:
|
275 |
-
log_operation_failure("remove user from workspace", Exception(f"User {username} not found"))
|
276 |
-
return False
|
277 |
-
|
278 |
-
# Find workspace
|
279 |
-
with warnings.catch_warnings():
|
280 |
-
warnings.simplefilter("ignore")
|
281 |
-
workspace = _client.workspaces(workspace_name)
|
282 |
-
if not workspace:
|
283 |
-
log_operation_failure("remove user from workspace", Exception(f"Workspace {workspace_name} not found"))
|
284 |
-
return False
|
285 |
-
|
286 |
-
# Remove user from workspace
|
287 |
-
workspace.remove_user(user)
|
288 |
-
log_user_operation("removed from workspace", username, workspace_name)
|
289 |
-
|
290 |
-
# Check if workspace is empty and delete if requested
|
291 |
-
if delete_if_empty:
|
292 |
-
remaining_users = workspace.users
|
293 |
-
if not remaining_users:
|
294 |
-
workspace.delete()
|
295 |
-
log_operation_success("delete empty workspace", workspace_name)
|
296 |
-
else:
|
297 |
-
log_operation_success("workspace not empty", f"{workspace_name} still has {len(remaining_users)} users")
|
298 |
-
|
299 |
-
return True
|
300 |
-
|
301 |
-
except Exception as e:
|
302 |
-
log_operation_failure("remove user from workspace", e)
|
303 |
-
return False
|
304 |
-
|
305 |
-
|
306 |
-
def delete_user_workspaces(usernames: List[str]) -> bool:
|
307 |
-
"""Remove users from all their workspaces and delete empty workspaces."""
|
308 |
-
|
309 |
-
success_count = 0
|
310 |
-
for username in usernames:
|
311 |
-
user_workspaces = list_user_workspaces(username)
|
312 |
-
user_success = True
|
313 |
-
|
314 |
-
for workspace_name in user_workspaces:
|
315 |
-
if not delete_user_workspace(username, workspace_name, delete_if_empty=True):
|
316 |
-
user_success = False
|
317 |
-
|
318 |
-
if user_success:
|
319 |
-
success_count += 1
|
320 |
-
|
321 |
-
log_operation_success("delete user workspaces",
|
322 |
-
f"Processed {success_count}/{len(usernames)} users")
|
323 |
-
|
324 |
-
return success_count == len(usernames)
|
325 |
-
|
326 |
-
|
327 |
-
def list_workspaces(
|
328 |
-
) -> List[Dict[str, str]]:
|
329 |
-
"""List all workspaces with their details."""
|
330 |
-
global _client
|
331 |
-
|
332 |
-
try:
|
333 |
-
workspaces = _client.workspaces
|
334 |
-
workspace_list = []
|
335 |
-
|
336 |
-
for workspace in workspaces:
|
337 |
-
workspace_info = {
|
338 |
-
'name': workspace.name or '',
|
339 |
-
'id': str(workspace.id) if workspace.id else '',
|
340 |
-
'user_count': str(len(workspace.users))
|
341 |
-
}
|
342 |
-
workspace_list.append(workspace_info)
|
343 |
-
|
344 |
-
log_operation_success("list workspaces", f"Found {len(workspace_list)} workspaces")
|
345 |
-
return workspace_list
|
346 |
-
|
347 |
-
except Exception as e:
|
348 |
-
log_operation_failure("list workspaces", e)
|
349 |
-
return []
|
350 |
-
|
351 |
-
|
352 |
-
def list_user_workspaces(
|
353 |
-
username: str,
|
354 |
-
) -> List[str]:
|
355 |
-
"""Get list of workspaces a user has access to."""
|
356 |
-
global _client
|
357 |
-
|
358 |
-
try:
|
359 |
-
# Find user
|
360 |
-
user = None
|
361 |
-
for u in _client.users:
|
362 |
-
if u.username == username:
|
363 |
-
user = u
|
364 |
-
break
|
365 |
-
|
366 |
-
if not user:
|
367 |
-
log_operation_failure("get user workspaces", Exception(f"User {username} not found"))
|
368 |
-
return []
|
369 |
-
|
370 |
-
# Get workspaces the user has access to
|
371 |
-
workspaces = []
|
372 |
-
for workspace in _client.workspaces:
|
373 |
-
try:
|
374 |
-
# Check if user has access to workspace
|
375 |
-
workspace_users = workspace.users
|
376 |
-
if any(wu.id == user.id for wu in workspace_users):
|
377 |
-
workspaces.append(workspace.name or '')
|
378 |
-
except Exception:
|
379 |
-
# Skip workspaces we can't access
|
380 |
-
continue
|
381 |
-
|
382 |
-
log_user_operation("listed workspaces", username, f"Found {len(workspaces)} workspaces")
|
383 |
-
return workspaces
|
384 |
-
|
385 |
-
except Exception as e:
|
386 |
-
log_operation_failure("get user workspaces", e)
|
387 |
-
return []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
wipe.py
DELETED
@@ -1,165 +0,0 @@
|
|
1 |
-
#!/usr/bin/env python3
|
2 |
-
|
3 |
-
"""
|
4 |
-
wipe.py
|
5 |
-
|
6 |
-
Clean wipe script for the MERe Workshop annotation pipeline.
|
7 |
-
Removes users, workspaces, datasets, and webhooks using modular helper functions.
|
8 |
-
"""
|
9 |
-
|
10 |
-
import sys
|
11 |
-
import argparse
|
12 |
-
from pathlib import Path
|
13 |
-
|
14 |
-
from utils import (
|
15 |
-
validate_env,
|
16 |
-
log_operation_success,
|
17 |
-
log_operation_failure,
|
18 |
-
wipe_space,
|
19 |
-
wipe_datasets_only,
|
20 |
-
wipe_users_only,
|
21 |
-
wipe_webhooks_only,
|
22 |
-
get_status,
|
23 |
-
log_info,
|
24 |
-
log_warning
|
25 |
-
)
|
26 |
-
|
27 |
-
|
28 |
-
def parse_args():
|
29 |
-
"""Parse command line arguments."""
|
30 |
-
parser = argparse.ArgumentParser(
|
31 |
-
description="Wipe MERe Workshop Argilla space",
|
32 |
-
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
|
33 |
-
)
|
34 |
-
|
35 |
-
parser.add_argument(
|
36 |
-
"-d", "--datasets-only",
|
37 |
-
action="store_true",
|
38 |
-
help="Only wipe datasets, keep users and workspaces",
|
39 |
-
)
|
40 |
-
|
41 |
-
parser.add_argument(
|
42 |
-
"-u", "--users-only",
|
43 |
-
action="store_true",
|
44 |
-
help="Only wipe users, keep datasets and workspaces",
|
45 |
-
)
|
46 |
-
|
47 |
-
parser.add_argument(
|
48 |
-
"-w", "--webhooks-only",
|
49 |
-
action="store_true",
|
50 |
-
help="Only wipe webhooks, keep everything else",
|
51 |
-
)
|
52 |
-
|
53 |
-
parser.add_argument(
|
54 |
-
"-s", "--status-only",
|
55 |
-
action="store_true",
|
56 |
-
help="Only show current space status, do not perform wipe",
|
57 |
-
)
|
58 |
-
|
59 |
-
parser.add_argument("--force", action="store_true", help="Skip confirmation prompt")
|
60 |
-
|
61 |
-
return parser.parse_args()
|
62 |
-
|
63 |
-
|
64 |
-
def show_space_status():
|
65 |
-
"""Display current space status."""
|
66 |
-
status = get_status()
|
67 |
-
|
68 |
-
if "error" in status:
|
69 |
-
log_operation_failure("check space status", status["error"])
|
70 |
-
return False
|
71 |
-
|
72 |
-
print()
|
73 |
-
log_info("=== Current Argilla Space Status ===")
|
74 |
-
log_info(f"Workspaces: {status['workspaces']}")
|
75 |
-
log_info(f"Users: {status['users']}")
|
76 |
-
log_info(f"Datasets: {status['datasets']}")
|
77 |
-
log_info(f"Records: {status['records']}")
|
78 |
-
log_info(f"Webhooks: {status['webhooks']}")
|
79 |
-
print()
|
80 |
-
|
81 |
-
return True
|
82 |
-
|
83 |
-
|
84 |
-
def confirm_wipe(
|
85 |
-
operation_description: str,
|
86 |
-
force: bool = False
|
87 |
-
) -> bool:
|
88 |
-
"""Confirm wipe operation with user."""
|
89 |
-
if force:
|
90 |
-
return True
|
91 |
-
|
92 |
-
log_warning(f"WARNING: This will {operation_description}")
|
93 |
-
log_warning("This action cannot be undone!")
|
94 |
-
|
95 |
-
log_warning("Are you sure you want to proceed? [y/N]:")
|
96 |
-
response = input().strip().lower()
|
97 |
-
return response in ["y", "yes"]
|
98 |
-
|
99 |
-
|
100 |
-
def main():
|
101 |
-
"""Main wipe function."""
|
102 |
-
args = parse_args()
|
103 |
-
|
104 |
-
# Validate environment
|
105 |
-
try:
|
106 |
-
validate_env()
|
107 |
-
log_operation_success("wipe validation", "Environment validated")
|
108 |
-
except Exception as e:
|
109 |
-
log_operation_failure("wipe validation", e)
|
110 |
-
return 1
|
111 |
-
|
112 |
-
# Show current status
|
113 |
-
if not show_space_status():
|
114 |
-
return 1
|
115 |
-
|
116 |
-
# If status-only mode, exit here
|
117 |
-
if args.status_only:
|
118 |
-
return 0
|
119 |
-
|
120 |
-
# Determine operation and confirmation message
|
121 |
-
if args.datasets_only:
|
122 |
-
operation = "datasets only"
|
123 |
-
confirmation_msg = "delete ALL DATASETS (keeping users and workspaces)"
|
124 |
-
wipe_function = wipe_datasets_only
|
125 |
-
elif args.users_only:
|
126 |
-
operation = "users only"
|
127 |
-
confirmation_msg = (
|
128 |
-
"delete ALL USERS (keeping datasets, workspaces, and webhooks)"
|
129 |
-
)
|
130 |
-
wipe_function = wipe_users_only
|
131 |
-
elif args.webhooks_only:
|
132 |
-
operation = "webhooks only"
|
133 |
-
confirmation_msg = "delete ALL WEBHOOKS (keeping users and datasets)"
|
134 |
-
wipe_function = wipe_webhooks_only
|
135 |
-
else:
|
136 |
-
operation = "entire space"
|
137 |
-
confirmation_msg = "DELETE EVERYTHING (users, workspaces, datasets, webhooks)"
|
138 |
-
wipe_function = wipe_space
|
139 |
-
|
140 |
-
# Confirm operation
|
141 |
-
if not confirm_wipe(confirmation_msg, args.force):
|
142 |
-
log_info("Wipe operation cancelled")
|
143 |
-
return 0
|
144 |
-
|
145 |
-
# Perform wipe operation
|
146 |
-
print()
|
147 |
-
log_info(f"Wiping {operation}...")
|
148 |
-
success = wipe_function()
|
149 |
-
|
150 |
-
if success:
|
151 |
-
log_operation_success(f"wipe {operation}", "Operation completed successfully")
|
152 |
-
else:
|
153 |
-
log_operation_failure(f"wipe {operation}", Exception("Operation failed"))
|
154 |
-
return 1
|
155 |
-
|
156 |
-
# Show final status
|
157 |
-
if not show_space_status():
|
158 |
-
return 1
|
159 |
-
|
160 |
-
log_operation_success("Wipe operation completed", send_to_slack=True)
|
161 |
-
return 0
|
162 |
-
|
163 |
-
|
164 |
-
if __name__ == "__main__":
|
165 |
-
exit(main())
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
wipe.sh
DELETED
@@ -1,16 +0,0 @@
|
|
1 |
-
#!/bin/bash
|
2 |
-
|
3 |
-
# wipe.sh
|
4 |
-
#
|
5 |
-
# Shell wrapper for the MERe Workshop wipe process.
|
6 |
-
|
7 |
-
set -euo pipefail
|
8 |
-
|
9 |
-
# Get the directory where this script is located
|
10 |
-
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" &> /dev/null && pwd)"
|
11 |
-
|
12 |
-
# Change to the script directory
|
13 |
-
cd "$SCRIPT_DIR"
|
14 |
-
|
15 |
-
# Run the wipe script
|
16 |
-
python wipe.py "$@"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|