Spaces:
Running
Running
Add metadata file for sample image retrieval and description of it
Browse files- README.md +14 -1
- app.py +1 -7
- components/metadata.parquet +3 -0
README.md
CHANGED
@@ -16,6 +16,19 @@ datasets:
|
|
16 |
|
17 |
This app is modified from the original [BioCLIP Demo](https://huggingface.co/spaces/imageomics/bioclip-demo) to run inference with [BioCLIP 2](https://huggingface.co/imageomics/bioclip-2) and uses [pybioclip](https://github.com/Imageomics/pybioclip).
|
18 |
|
19 |
-
Due to space persistent storage limitations, embeddings are fetched from the [TreeOfLife-200M repo](https://huggingface.co/datasets/imageomics/TreeOfLife-200M)
|
20 |
|
21 |
Note that if this space is duplicated, the sample image portion **will not work**.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
This app is modified from the original [BioCLIP Demo](https://huggingface.co/spaces/imageomics/bioclip-demo) to run inference with [BioCLIP 2](https://huggingface.co/imageomics/bioclip-2) and uses [pybioclip](https://github.com/Imageomics/pybioclip).
|
18 |
|
19 |
+
Due to space persistent storage limitations, embeddings are fetched from the [TreeOfLife-200M repo](https://huggingface.co/datasets/imageomics/TreeOfLife-200M). The images will be retrieved from an S3 bucket, as with the origin, described below.
|
20 |
|
21 |
Note that if this space is duplicated, the sample image portion **will not work**.
|
22 |
+
|
23 |
+
**bioclip-2/metadata.parquet:** metadata file for fetching [TreeOfLife-200M](https://huggingface.co/datasets/imageomics/TreeOfLife-200M) sample images (up to 3 available per taxa) from an S3 bucket.
|
24 |
+
- `uuid`: unique identifier for the image within the TreeOfLife-200M dataset.
|
25 |
+
- `eol_page_id`: identifier of EOL page for the most specific taxa of the image (where available). Note that an image's association to a particular page ID may change with updates to the EOL (or image provider's) hierarchy. However, EOL taxon page IDs are stable. "https://eol.org/pages/" + `eol_page_id` links to the page.
|
26 |
+
- `gbif_id`: identifier used by GBIF for the most specific taxa of the image (where available). "https://gbif.org/species/" + `taxonID` links to the page.
|
27 |
+
- `kingdom`: kingdom to which the subject of the image belongs (all `Animalia`).
|
28 |
+
- `phylum`: phylum to which the subject of the image belongs.
|
29 |
+
- `class`: class to which the subject of the image belongs.
|
30 |
+
- `order`: order to which the subject of the image belongs.
|
31 |
+
- `family`: family to which the subject of the image belongs.
|
32 |
+
- `genus`: genus to which the subject of the image belongs.
|
33 |
+
- `species`: species to which the subject of the image belongs.
|
34 |
+
- `file_path`: image filepath to fetch image from S3 bucket (`<folder>/<uuid>.jpg`, folders are first two characters of the `uuid`).
|
app.py
CHANGED
@@ -14,7 +14,6 @@ from torchvision import transforms
|
|
14 |
|
15 |
from components.query import get_sample
|
16 |
from bioclip import CustomLabelsClassifier
|
17 |
-
from huggingface_hub import hf_hub_download
|
18 |
|
19 |
log_format = "[%(asctime)s] [%(levelname)s] [%(name)s] %(message)s"
|
20 |
logging.basicConfig(level=logging.INFO, format=log_format)
|
@@ -23,12 +22,7 @@ logger = logging.getLogger()
|
|
23 |
hf_token = os.getenv("HF_TOKEN")
|
24 |
|
25 |
# For sample images
|
26 |
-
|
27 |
-
filename="bioclip-2/metadata.parquet",
|
28 |
-
repo_type="dataset",
|
29 |
-
local_dir = "components",
|
30 |
-
token = hf_token)
|
31 |
-
METADATA_PATH = "components/bioclip-2/metadata.parquet"
|
32 |
# Read page IDs as int
|
33 |
metadata_df = pl.read_parquet(METADATA_PATH, low_memory = False)
|
34 |
metadata_df = metadata_df.with_columns(pl.col(["eol_page_id", "gbif_id"]).cast(pl.Int64))
|
|
|
14 |
|
15 |
from components.query import get_sample
|
16 |
from bioclip import CustomLabelsClassifier
|
|
|
17 |
|
18 |
log_format = "[%(asctime)s] [%(levelname)s] [%(name)s] %(message)s"
|
19 |
logging.basicConfig(level=logging.INFO, format=log_format)
|
|
|
22 |
hf_token = os.getenv("HF_TOKEN")
|
23 |
|
24 |
# For sample images
|
25 |
+
METADATA_PATH = "components/metadata.parquet"
|
|
|
|
|
|
|
|
|
|
|
26 |
# Read page IDs as int
|
27 |
metadata_df = pl.read_parquet(METADATA_PATH, low_memory = False)
|
28 |
metadata_df = metadata_df.with_columns(pl.col(["eol_page_id", "gbif_id"]).cast(pl.Int64))
|
components/metadata.parquet
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6af05f1f8f08b0d447b9a4c18680c7de39551a05318f026d30c224a9bbe5283e
|
3 |
+
size 121162891
|