Provided and maintained by π¦ UnicoLab
Transform your raw data into ML-ready features with just a few lines of code!
KDP provides a state-of-the-art preprocessing system built on TensorFlow Keras. It handles everything from feature normalization to advanced embedding techniques, making your ML pipelines faster, more robust, and easier to maintain. Built with β€οΈ by π¦ UnicoLab, it provides a clean, efficient, and extensible foundation for building sophisticated machine learning models for enterprise AI applications.
- π Efficient Single-Pass Processing: Process all features in one go, dramatically faster than alternatives
- π§ Distribution-Aware Encoding: Automatically detects and optimally handles different data distributions
- ποΈ Tabular Attention: Captures complex feature interactions for better model performance
- π Feature Selection: Automatically identifies and focuses on the most important features
- π Feature-wise Mixture of Experts: Specialized processing for different feature types
- π¦ Production-Ready: Deploy your preprocessing along with your model as a single unit
# Using pip
pip install kdp
# Using Poetry
poetry add kdpfrom kdp import PreprocessingModel, FeatureType
# Define your features
features_specs = {
"age": FeatureType.FLOAT_NORMALIZED,
"income": FeatureType.FLOAT_RESCALED,
"occupation": FeatureType.STRING_CATEGORICAL,
"description": FeatureType.TEXT
}
# Create and build the preprocessor
preprocessor = PreprocessingModel(
path_data="data/my_data.csv",
features_specs=features_specs,
# Enable advanced features
use_distribution_aware=True,
tabular_attention=True
)
result = preprocessor.build_preprocessor()
model = result["model"]
# Use the preprocessor with your data
processed_features = model(input_data)We've built an extensive documentation system to help you get the most from KDP:
- π Quick Start Guide - Get up and running in minutes
- π Feature Processing - Learn about all supported feature types
- π§ββοΈ Auto-Configuration - Let KDP configure itself for your data
- π Distribution-Aware Encoding - Smart handling of different distributions
- ποΈ Tabular Attention - Capture complex feature interactions
- π’ Advanced Numerical Embeddings - Rich representations for numbers
- π€ Transformer Blocks - Apply transformer architecture to tabular data
- π― Feature Selection - Focus on what matters in your data
- π§ Feature-wise Mixture of Experts - Specialized processing per feature
- π Integration Guide - Use KDP with existing ML pipelines
- π Tabular Optimization - Supercharge your preprocessing
- π Performance Tips - Handling large datasets efficiently
- π‘ Motivation - Why we built KDP
- π€ Contributing - Help improve KDP
Your preprocessing pipeline is built as a Keras model that can be used independently or as the first layer of any model:
KDP outperforms alternative preprocessing approaches, especially as data size increases:
We welcome contributions! Please check out our Contributing Guide for guidelines on how to proceed.
Have questions or want to connect with other KDP users? Join us on Discord:
KDP includes tools to help developers:
- Documentation Generation: Automatically generate API docs from docstrings
- Model Diagram Generation: Visualize model architectures with
make generate_doc_contentor run:This creates diagram images inpython scripts/generate_model_diagrams.py
docs/features/imgs/models/for all feature types and configurations.
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with TensorFlow and Keras
- Inspired by modern deep learning research
- Community-driven development
- All contributors who help make KDP better
Built with β€οΈ for the ML community by π¦ UnicoLab.ai


