Skip to content

UnicoLab/keras-data-processor

Repository files navigation

🌟 Keras Data Processor (KDP) - Powerful Data Preprocessing for TensorFlow 🌟

Keras Data Processor Logo

Provided and maintained by πŸ¦„ UnicoLab

Python 3.9+ TensorFlow 2.18+ License: MIT πŸ¦„ UnicoLab Documentation

Transform your raw data into ML-ready features with just a few lines of code!

KDP provides a state-of-the-art preprocessing system built on TensorFlow Keras. It handles everything from feature normalization to advanced embedding techniques, making your ML pipelines faster, more robust, and easier to maintain. Built with ❀️ by πŸ¦„ UnicoLab, it provides a clean, efficient, and extensible foundation for building sophisticated machine learning models for enterprise AI applications.

✨ Key Features

  • πŸš€ Efficient Single-Pass Processing: Process all features in one go, dramatically faster than alternatives
  • 🧠 Distribution-Aware Encoding: Automatically detects and optimally handles different data distributions
  • πŸ‘οΈ Tabular Attention: Captures complex feature interactions for better model performance
  • πŸ” Feature Selection: Automatically identifies and focuses on the most important features
  • πŸ”„ Feature-wise Mixture of Experts: Specialized processing for different feature types
  • πŸ“¦ Production-Ready: Deploy your preprocessing along with your model as a single unit

πŸš€ Quick Installation

# Using pip
pip install kdp

# Using Poetry
poetry add kdp

πŸ“‹ Simple Example

from kdp import PreprocessingModel, FeatureType

# Define your features
features_specs = {
    "age": FeatureType.FLOAT_NORMALIZED,
    "income": FeatureType.FLOAT_RESCALED,
    "occupation": FeatureType.STRING_CATEGORICAL,
    "description": FeatureType.TEXT
}

# Create and build the preprocessor
preprocessor = PreprocessingModel(
    path_data="data/my_data.csv",
    features_specs=features_specs,
    # Enable advanced features
    use_distribution_aware=True,
    tabular_attention=True
)
result = preprocessor.build_preprocessor()
model = result["model"]

# Use the preprocessor with your data
processed_features = model(input_data)

πŸ“š Comprehensive Documentation

We've built an extensive documentation system to help you get the most from KDP:

Core Guides

Advanced Topics

Integration & Performance

Background & Resources

πŸ–ΌοΈ Model Architecture

Your preprocessing pipeline is built as a Keras model that can be used independently or as the first layer of any model:

πŸ“Š Performance

KDP outperforms alternative preprocessing approaches, especially as data size increases:

🀝 Contributing

We welcome contributions! Please check out our Contributing Guide for guidelines on how to proceed.

πŸ’¬ Join Our Community

Have questions or want to connect with other KDP users? Join us on Discord:

Discord

πŸ› οΈ Development Tools

KDP includes tools to help developers:

  • Documentation Generation: Automatically generate API docs from docstrings
  • Model Diagram Generation: Visualize model architectures with make generate_doc_content or run:
    python scripts/generate_model_diagrams.py
    This creates diagram images in docs/features/imgs/models/ for all feature types and configurations.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Built with TensorFlow and Keras
  • Inspired by modern deep learning research
  • Community-driven development
  • All contributors who help make KDP better

Built with ❀️ for the ML community by πŸ¦„ UnicoLab.ai

About

Data Preprocessing model based on Keras preprocessing layers that can be used as a standalone model or incorporated to Keras model as first layers.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 6

Languages