Add demo image classification models and datasets #41

Inokinoki · 2024-07-18T16:05:41Z

Enable Skin Cancer detection with single label.

It takes around 4 minutes to predict 1285 images in test split of marmal88/skin_cancer datasets:

…e-classification-hf

rabah-khalek · 2024-07-19T12:33:53Z

giskard_vision/core/dataloaders/hf.py

+    """
+
+    def __init__(
+        self, hf_id: str, hf_config: Optional[str] = None, hf_split: str = "train", name: Optional[str] = None


Suggested change

self, hf_id: str, hf_config: Optional[str] = None, hf_split: str = "train", name: Optional[str] = None

self, hf_id: str, hf_config: Optional[str] = None, hf_split: str = "test", name: Optional[str] = None

wouldn't this make more sense? On the other hand, sometimes there're only "train" split and not "test", but still.

giskard_vision/core/dataloaders/hf.py

rabah-khalek · 2024-07-19T12:35:26Z

giskard_vision/core/dataloaders/hf.py

+
+        return MetaData(
+            data=flat_meta,
+            categories=flat_meta.keys(),


I would do the opposite, no category features by default

giskard_vision/core/dataloaders/hf.py

rabah-khalek · 2024-07-19T12:49:03Z

giskard_vision/core/dataloaders/hf.py

+    def __len__(self):
+        return len(self.ds)
+
+    def get_meta(self, idx: int) -> Optional[TypesBase.meta]:


I don't think we should put the get_meta on this level, it's not very useful, as we'd need to specify each issuegroup per meta, I would also not have meta_exclude_keys as attribute, it's complicated to understand what it means,

instead I would just implement get_meta for each daughter class, with a custom list of exclude, and custom list of categories

rabah-khalek · 2024-07-19T13:32:05Z

giskard_vision/image_classification/models/base.py

+    def predict_image(self, image: np.ndarray) -> np.ndarray:
+        """method that takes one image as input and outputs the prediction of probabilities for each class
+
+        Args:
+            image (np.ndarray): input image
+        """
+        _raw_prediction = self.pipeline(
+            image,
+            top_k=len(self.classification_labels),  # Get probabilities for all labels
+        )
+        _prediction = {p["label"]: p["score"] for p in _raw_prediction}
+
+        return np.array([_prediction[label] for label in self.classification_labels])


I think since the daughter classes are predicting labels, it's better to call this method predict_probas, and leave the predict_image abstract for this class

rabah-khalek · 2024-07-19T13:32:24Z

giskard_vision/image_classification/models/wrappers.py

+        )
+
+    def predict_image(self, image) -> np.ndarray:
+        probas = super().predict_image(image)


I would instead use self.predict_probas here

Done in the single label base class

rabah-khalek · 2024-07-19T13:33:13Z

giskard_vision/image_classification/models/wrappers.py

+        return np.array([np.argmax(probas)])
+
+
+class MicrosoftResNetImageNet50HuggingFaceModel(ImageClassificationHuggingFaceModel):


why no predict_image here?

Simplified to have a single label base class

rabah-khalek · 2024-07-19T13:33:24Z

giskard_vision/image_classification/models/wrappers.py

+        )
+
+
+class Jsli96ResNetImageNetHuggingFaceModel(ImageClassificationHuggingFaceModel):


why no predict_image here?

Simplified to have a single label base class

rabah-khalek · 2024-07-19T13:39:03Z

giskard_vision/image_classification/types.py

+    TypesBase,
+)
+
+CLASSIFICATION_LABEL_TYPE = np.ndarray  # Probabilities for each class


The classification type is not probabilities.

We need to implement a threshold for binary classification models to get the predicted classes
We already have in place the argmax for the multi-classifcation case

In both cases though, I would choose between:

int: label_id

string: label

I'm more in favour of label as it'll be more readable when we use model.predict

I assumed an N-dimensional array to have both multiple labels for one prediction and multiple predictions.

…on instead

…FModel`

Image classification metrics

…rd-vision into image-classification-hf

Inokinoki · 2024-07-23T10:33:01Z

I aligned all datasets and models to use string of class labels now.

There are 2 new notebooks to try all models and datasets on.

rabah-khalek

LGTM, good job

rabah-khalek · 2024-07-23T11:00:33Z

I would just add tests to the core stuff (no need to test the demo dataloaders or models) rather the general wrappers

Inokinoki added 8 commits July 18, 2024 18:01

Add base model for Hugging Face pipeline

48c17ac

Add skin cancer, imagenet resnet models in image classification

866e202

Add type definitions for image classification

2be1899

Add comments for image classification models based on HF pipeline

46445c6

Merge branch 'main' of github.com:Giskard-AI/giskard-vision into imag…

f9c94ae

…e-classification-hf

Add general Hugging Face dataset loader through HF datasets

0f62a9c

Add Skin Cancer dataset from Hugging Face datasets

af8c27f

Add an example for skin cancer image classification

cf766e1

Inokinoki marked this pull request as ready for review July 18, 2024 22:11

Inokinoki added 2 commits July 19, 2024 00:41

Overwrite predict method to get results

dca5a7c

Test predict method to get results in notebook

24d4715

Inokinoki requested a review from rabah-khalek July 18, 2024 22:47

rabah-khalek suggested changes Jul 19, 2024

View reviewed changes

rabah-khalek mentioned this pull request Jul 19, 2024

Scan from metadata #37

Merged

rabah-khalek and others added 16 commits July 19, 2024 16:14

Merge branch 'main' into image-classification-hf

751dcce

Merge branch 'main' into image-classification-hf

9bdf762

added accuracy metric

0a334b4

Merge branch 'main' into image-class-metrics

5d8de4a

Remove flatten_dict_exclude_wrapper and use flatten_dict with exclusi…

b169166

…on instead

pdm format

e780608

Catch failed HF dataset loading and throw Giskard error

2008e11

Move meta data to specific dataloaders

e1b6eea

Simplify dataloader classes with metadata

678ea2b

Rename ImageClassificationHuggingFaceModel to `ImageClassificationH…

2ee4287

…FModel`

Merge branch 'main' into image-class-metrics

36b4878

Merge branch 'image-classification-hf' into image-class-metrics

e469e70

Merge pull request #42 from Giskard-AI/image-class-metrics

5305e23

Image classification metrics

Improve image classification prediction and types

cfd55b2

Polish HFPipelineModelBase

b56a706

Merge branch 'image-classification-hf' of github.com:Giskard-AI/giska…

b827fdc

…rd-vision into image-classification-hf

Inokinoki added 6 commits July 22, 2024 16:32

Remove class attributes in image classification dataloaders

8759b9d

Use test split by default in HF dataset

3930acf

Fix GeirhosConflictStimuli loader to have label class as labels element

b338a2d

Exclude temporarilly landmark specific failed rate calculation

eda96a9

Add two sample notebooks for image classification

3bffa18

Format code

1657de5

Inokinoki requested a review from rabah-khalek July 23, 2024 10:33

Inokinoki and others added 3 commits July 23, 2024 12:36

Add get_meta to the300w_lp after after moving it from parent

1326807

Format code

9e3c35c

Merge branch 'main' into image-classification-hf

f3764dc

rabah-khalek approved these changes Jul 23, 2024

View reviewed changes

Inokinoki added 2 commits July 23, 2024 16:40

Add test for HF dataloader base class

8ef00d1

Add datasets as a dev dependency

be749e0

Inokinoki added the Lockfile label Jul 23, 2024

Regenerating pdm.lock

6b34e7e

github-actions bot removed the Lockfile label Jul 23, 2024

Inokinoki added 3 commits July 23, 2024 16:50

Add test for Tensorflow Datasets dataloader base class

7c44965

Format test code

f2d7770

Catch error during Tensorflow Datasets loading

ea8d98e

Inokinoki merged commit 03dee7b into main Jul 23, 2024

Inokinoki deleted the image-classification-hf branch July 23, 2024 19:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add demo image classification models and datasets #41

Add demo image classification models and datasets #41

Uh oh!

Inokinoki commented Jul 18, 2024 •

edited

Loading

rabah-khalek Jul 19, 2024

Uh oh!

rabah-khalek Jul 19, 2024

Uh oh!

rabah-khalek Jul 19, 2024

rabah-khalek Jul 19, 2024

Inokinoki Jul 22, 2024

rabah-khalek Jul 19, 2024

Inokinoki Jul 22, 2024

rabah-khalek Jul 19, 2024

Inokinoki Jul 22, 2024

rabah-khalek Jul 19, 2024

Inokinoki Jul 22, 2024

rabah-khalek Jul 19, 2024

Inokinoki Jul 22, 2024

Inokinoki commented Jul 23, 2024

rabah-khalek left a comment

rabah-khalek commented Jul 23, 2024

Labels

3 participants

	self, hf_id: str, hf_config: Optional[str] = None, hf_split: str = "train", name: Optional[str] = None
	self, hf_id: str, hf_config: Optional[str] = None, hf_split: str = "test", name: Optional[str] = None

		return np.array([np.argmax(probas)])


		class MicrosoftResNetImageNet50HuggingFaceModel(ImageClassificationHuggingFaceModel):

		)


		class Jsli96ResNetImageNetHuggingFaceModel(ImageClassificationHuggingFaceModel):

Add demo image classification models and datasets #41

Add demo image classification models and datasets #41

Uh oh!

Conversation

Inokinoki commented Jul 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Inokinoki commented Jul 23, 2024

rabah-khalek left a comment

Choose a reason for hiding this comment

rabah-khalek commented Jul 23, 2024

Labels

3 participants

Inokinoki commented Jul 18, 2024 •

edited

Loading