Skip to content

Conversation

@Inokinoki
Copy link
Member

@Inokinoki Inokinoki commented Jul 18, 2024

Enable Skin Cancer detection with single label.

It takes around 4 minutes to predict 1285 images in test split of marmal88/skin_cancer datasets:

image
@Inokinoki Inokinoki marked this pull request as ready for review July 18, 2024 22:11
@Inokinoki Inokinoki requested a review from rabah-khalek July 18, 2024 22:47
"""

def __init__(
self, hf_id: str, hf_config: Optional[str] = None, hf_split: str = "train", name: Optional[str] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self, hf_id: str, hf_config: Optional[str] = None, hf_split: str = "train", name: Optional[str] = None
self, hf_id: str, hf_config: Optional[str] = None, hf_split: str = "test", name: Optional[str] = None

wouldn't this make more sense? On the other hand, sometimes there're only "train" split and not "test", but still.


return MetaData(
data=flat_meta,
categories=flat_meta.keys(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do the opposite, no category features by default

def __len__(self):
return len(self.ds)

def get_meta(self, idx: int) -> Optional[TypesBase.meta]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should put the get_meta on this level, it's not very useful, as we'd need to specify each issuegroup per meta, I would also not have meta_exclude_keys as attribute, it's complicated to understand what it means,

instead I would just implement get_meta for each daughter class, with a custom list of exclude, and custom list of categories

Comment on lines 47 to 59
def predict_image(self, image: np.ndarray) -> np.ndarray:
"""method that takes one image as input and outputs the prediction of probabilities for each class

Args:
image (np.ndarray): input image
"""
_raw_prediction = self.pipeline(
image,
top_k=len(self.classification_labels), # Get probabilities for all labels
)
_prediction = {p["label"]: p["score"] for p in _raw_prediction}

return np.array([_prediction[label] for label in self.classification_labels])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think since the daughter classes are predicting labels, it's better to call this method predict_probas, and leave the predict_image abstract for this class

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

)

def predict_image(self, image) -> np.ndarray:
probas = super().predict_image(image)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would instead use self.predict_probas here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the single label base class

return np.array([np.argmax(probas)])


class MicrosoftResNetImageNet50HuggingFaceModel(ImageClassificationHuggingFaceModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why no predict_image here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified to have a single label base class

)


class Jsli96ResNetImageNetHuggingFaceModel(ImageClassificationHuggingFaceModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why no predict_image here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified to have a single label base class

TypesBase,
)

CLASSIFICATION_LABEL_TYPE = np.ndarray # Probabilities for each class
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The classification type is not probabilities.

We need to implement a threshold for binary classification models to get the predicted classes
We already have in place the argmax for the multi-classifcation case

In both cases though, I would choose between:

  • int: label_id
  • string: label

I'm more in favour of label as it'll be more readable when we use model.predict

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed an N-dimensional array to have both multiple labels for one prediction and multiple predictions.

@rabah-khalek rabah-khalek mentioned this pull request Jul 19, 2024
@Inokinoki
Copy link
Member Author

I aligned all datasets and models to use string of class labels now.

There are 2 new notebooks to try all models and datasets on.

@Inokinoki Inokinoki requested a review from rabah-khalek July 23, 2024 10:33
Copy link
Contributor

@rabah-khalek rabah-khalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good job

@rabah-khalek
Copy link
Contributor

I would just add tests to the core stuff (no need to test the demo dataloaders or models) rather the general wrappers

@github-actions github-actions bot removed the Lockfile label Jul 23, 2024
@Inokinoki Inokinoki merged commit 03dee7b into main Jul 23, 2024
@Inokinoki Inokinoki deleted the image-classification-hf branch July 23, 2024 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants