Skip to content

cleanlab/examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cleanlab Examples

This repo contains code examples that demonstrate how to use cleanlab with real-world models/datasets, how its underlying algorithms work, how to get better results from cleanlab via more advanced functionality than is demonstrated in the quickstart tutorials, and how to train certain models used in some tutorials.

To quickly learn the basics of running cleanlab on your own data, we recommend first starting here before diving into the examples below.

Table of Contents

Example Description
1 find_label_errors_iris Find label errors introduced into the Iris classification dataset.
2 classifier_comparison Use CleanLearning to train 10 different classifiers on 4 dataset distributions with label errors.
3 hyperparameter_optimization Hyperparameter optimization to find the best settings of CleanLearning's optional parameters.
4 simplifying_confident_learning Straightforward implementation of Confident Learning algorithm with raw numpy code.
5 visualizing_confident_learning See how cleanlab estimates parameters of the label error distribution (noise matrix).
6 cnn_mnist Finding label errors in MNIST image data with a Convolutional Neural Network
7 huggingface_keras_imdb CleanLearning for text classification with Keras Model + pretrained BERT backbone and Tensorflow Dataset.
8 fasttext_amazon_reviews Finding label errors in Amazon Reviews text dataset using a cleanlab-compatible FastText model
9 multiannotator_cifar10 Iteratively improve consensus labels and trained classifier from data labeled by mulitple annotators.
10 outlier_detection_cifar10 Train AutoML for image classification and use it to detect out-of-distribution images.
11 entity_recognition Train Transformer model for Named Entity Recognition and produce out-of-sample pred_probs for cleanlab.token_classification.
12 cnn_coteaching_cifar10 Train a Convolutional Neural Network on noisily labeled Cifar10 image data using cleanlab with coteaching.

Instructions

To run the latest example notebooks, execute the commands below which will install the required libraries in a virtual environment.

$ python -m pip install virtualenv
$ python -m venv cleanlab-examples  # creates a new venv named cleanlab-examples
$ source cleanlab-examples/bin/activate
$ python -m pip install -r requirements.txt

It is recommended to run the examples with the latest stable cleanlab release (pip install cleanlab). However be aware that notebooks in the master branch of this repository are assumed to correspond to master branch version of cleanlab, hence some very-recently added examples may require you to instead install the master branch of cleanlab (pip install git+https://github.com/cleanlab/cleanlab.git).

You may run the notebooks individually or run the bash script below which will execute and save each notebook (for examples: 1-7). Note that before executing the script to run all notebooks for the first time you will need to create a jupyter kernel named cleanlab-examples. Be sure that you have already created and activated the virtual environment (steps provided above) before running the following command to create the jupyter kernel.

$ python -m ipykernel install --user --name=cleanlab-examples

Bash script to run all notebooks:

$ bash ./run_all_notebooks.sh

Instead of installing the requirements for all examples simultaneously via pip install -r requirements.txt, you can alternatively install only the requirements for one particular example by executing this same command inside of the corresponding folder. This will require that you have installed cleanlab (pip install cleanlab), and some examples may require you to have the latest developer version of cleanlab from github (pip install git+https://github.com/cleanlab/cleanlab.git).

Older Examples

For running older versions of cleanlab, you can look at the Tagged Releases of this repository to see the corresponding older versions of the example notebooks.

See the contrib folder for examples from v1 of cleanlab which may be helpful for reproducing results from the Confident Learning paper.

License

Copyright (c) 2017-2022 Cleanlab Inc.

All files listed above and contained in this folder (https://github.com/cleanlab/examples) are part of cleanlab.

cleanlab is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

cleanlab is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License in LICENSE.

About

Notebooks demonstrating example applications of the cleanlab library

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages