Skip to content

amurru/filetools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

File Tools

Go Version License: MIT SLSA Go releaser

A collection of command-line tools for efficient file management and analysis, built with Go.

Features

Current Tools

  • dupfind: Find duplicate files in a directory tree by comparing file hashes. Efficiently identifies identical files regardless of filename or location.
  • dirstat: Analyze directory and subdirectories for comprehensive file statistics including sizes, types, and utilization percentages.
  • rename: Rename files in a directory using pattern matching and sed-like replacements.

Key Features

  • Multiple Output Formats: Support for text, JSON, XML, and HTML output formats
  • File Output: Redirect output to files instead of stdout
  • Flexible Hashing: Choose from MD5, SHA1, or SHA256 hash algorithms
  • File/Directory Exclusions: Exclude files and directories from processing with pattern matching and file type filtering
  • Structured Data: JSON/XML output provides machine-readable duplicate file information with metadata
  • Rich HTML Reports: Generate professional HTML reports with styling, statistics, and interactive features
  • Comprehensive Metadata: All structured outputs include execution context and branding information

Planned Tools

  • File organizer
  • File size analyzer

Installation

Prerequisites

  • Go 1.24.5 or later

Build from Source

git clone https://github.com/amurru/filetools.git
cd filetools
make build
make test  # Run tests

The binary will be created as bin/filetools.

Install with Go

go install github.com/amurru/filetools@latest

Development

make test      # Run all tests
make clean     # Clean build artifacts
make run       # Build and run the application

Usage

dupfind

Find duplicate files in a directory tree with flexible output options.

Basic Usage

Find duplicate files in a directory:

filetools dupfind /path/to/directory

If no directory is specified, it uses the current directory:

filetools dupfind

Output Formats

Choose from multiple output formats:

# Text output (default)
filetools dupfind /path/to/directory

# JSON output
filetools dupfind -o json /path/to/directory
filetools dupfind -j /path/to/directory

# XML output
filetools dupfind -o xml /path/to/directory
filetools dupfind -x /path/to/directory

# HTML output (generates a styled web page)
filetools dupfind -o html /path/to/directory
filetools dupfind -w /path/to/directory

File Output

Redirect output to a file instead of stdout:

# Save results to a file
filetools dupfind -f results.txt /path/to/directory
filetools dupfind -o json -f duplicates.json /path/to/directory
filetools dupfind -w -f report.html /path/to/directory

Hash Algorithms

Choose the hash algorithm for file comparison:

# Use different hash algorithms (default: md5)
filetools dupfind -H sha256 /path/to/directory
filetools dupfind -H sha1 /path/to/directory
filetools dupfind -H md5 /path/to/directory

Combined Usage

Combine multiple options:

# Generate JSON report with SHA256 hashes, save to file
filetools dupfind -H sha256 -o json -f report.json /path/to/directory

# Create HTML report with MD5 hashes
filetools dupfind -H md5 -w -f analysis.html /path/to/directory

Example Outputs

Text Output (default):

Generated by filetools dupfind v1.0.0 on 2025-10-27T14:30:45Z (hash: md5, output: text)

Duplicate files found:
- file1.txt (size: 1024 bytes, hash: a1b2c3d4...)
  - /path/to/dir1/file1.txt
  - /path/to/dir2/file1.txt
- file2.txt (size: 2048 bytes, hash: e5f6g7h8...)
  - /path/to/dir3/file2.txt
  - /path/to/dir4/file2.txt

JSON Output:

{
  "metadata": {
    "tool_name": "filetools",
    "sub_command": "dupfind",
    "flags": [
      {
        "name": "hash",
        "value": "md5"
      },
      {
        "name": "output",
        "value": "json"
      }
    ],
    "version": "1.0.0",
    "generated_at": "2025-10-27T14:30:45Z"
  },
  "groups": [
    {
      "hash": "a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6",
      "hash_type": "md5",
      "size": 1024,
      "files": ["/path/to/dir1/file1.txt", "/path/to/dir2/file1.txt"]
    }
  ],
  "found": true
}

XML Output:

<?xml version="1.0" encoding="UTF-8"?>
<DuplicateResult>
  <metadata>
    <toolName>filetools</toolName>
    <subCommand>dupfind</subCommand>
    <flags>
      <flag>
        <name>hash</name>md5
      </flag>
      <flag>
        <name>output</name>xml
      </flag>
    </flags>
    <version>1.0.0</version>
    <generatedAt>2025-10-27T14:30:45Z</generatedAt>
  </metadata>
  <groups>
    <hash>a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6</hash>
    <hashType>md5</hashType>
    <size>1024</size>
    <files>/path/to/dir1/file1.txt</files>
    <files>/path/to/dir2/file1.txt</files>
  </groups>
  <found>true</found>
</DuplicateResult>

HTML Output: Generates a complete HTML page with:

  • Professional styling and layout
  • Summary statistics
  • Color-coded file badges (original/duplicate)
  • Responsive design
  • Interactive features: clickable hashes (copy to clipboard), collapsible duplicate groups
  • Program branding footer

Exclusion Flags

Exclude files and directories from processing while still reporting them in the output.

File Exclusions

Exclude files matching patterns or file types:

# Exclude specific file patterns (globs)
filetools dupfind --exclude-file "*.log,*.tmp,cache/*" /path/to/directory

# Exclude by file type (matches extensions)
filetools dupfind --exclude-file "*.jpg,*.png,*.gif" /path/to/directory

# Combine with other options
filetools dirstat --exclude-file "*.log,*.tmp" -o json /path/to/directory

Directory Exclusions

Exclude entire directories matching patterns:

# Exclude common directories
filetools dupfind --exclude-dir "node_modules,.git,build" /path/to/directory

# Exclude by pattern
filetools dirstat --exclude-dir "temp*,cache*" /path/to/directory

Combined Exclusions

Use both file and directory exclusions together:

filetools dupfind --exclude-file "*.log,*.tmp" --exclude-dir "node_modules,.git" /path/to/directory

Exclusion Output

Excluded items are listed in the "Exclusions" section of all output formats:

Text Output:

Excluded files and directories:
- node_modules (dir_pattern)
- cache/file.log (file_pattern)
- temp/image.jpg (file_type)

JSON/XML Output: Excluded items are included in the exclusions array with path and reason fields.

HTML Output: Exclusions are displayed in a sortable table with professional styling.

dirstat

Analyze directory and subdirectories for comprehensive file statistics.

rename

Rename files in a directory using pattern matching and sed-like replacements.

Basic Usage

Analyze a directory for file statistics:

filetools dirstat /path/to/directory

If no directory is specified, it uses the current directory:

filetools dirstat

Output Formats

Choose from multiple output formats:

# Text output (default)
filetools dirstat /path/to/directory

# JSON output
filetools dirstat -o json /path/to/directory
filetools dirstat -j /path/to/directory

# XML output
filetools dirstat -o xml /path/to/directory
filetools dirstat -x /path/to/directory

# HTML output (generates a styled web page)
filetools dirstat -o html /path/to/directory
filetools dirstat -w /path/to/directory

File Output

Redirect output to a file instead of stdout:

# Save results to a file
filetools dirstat -f stats.txt /path/to/directory
filetools dirstat -o json -f stats.json /path/to/directory
filetools dirstat -w -f report.html /path/to/directory

Combined Usage

Combine multiple options:

# Generate JSON statistics, save to file
filetools dirstat -o json -f stats.json /path/to/directory

# Create HTML report
filetools dirstat -w -f analysis.html /path/to/directory

# Analyze with exclusions
filetools dirstat --exclude-file "*.log,*.tmp" --exclude-dir "node_modules,.git" -w -f clean-report.html /path/to/directory

Example Outputs

Text Output (default):

Generated by filetools dirstat v1.0.0 on 2025-10-27T14:30:45Z (output: text)

Directory Statistics
===================

Total Files: 150
Total Size: 25.3 MB
Largest File: large_video.mp4 (15.2 MB)

File Types
----------
Extension    Count    Size      Percentage
------------ -------- --------- ----------
.mp4         5        18.5 MB   73.12%
.jpg         45       4.2 MB    16.60%
.txt         30       1.8 MB    7.11%
.pdf         12       780 KB    3.01%
(no ext)     58       45 KB     0.17%

Subdirectories
--------------
Path                    Files    Size      Percentage
----------------------- -------- --------- ----------
videos                  5        18.5 MB   73.12%
images                  45       4.2 MB    16.60%
documents               42       2.6 MB    10.28%
...

JSON Output:

{
  "metadata": {
    "tool_name": "filetools",
    "sub_command": "dirstat",
    "flags": [
      {
        "name": "output",
        "value": "json"
      }
    ],
    "version": "1.0.0",
    "generated_at": "2025-10-27T14:30:45Z"
  },
  "total_files": 150,
  "total_size": 26528934,
  "largest_file": {
    "name": "large_video.mp4",
    "size": 15920000,
    "path": "videos/large_video.mp4"
  },
  "file_types": [
    {
      "extension": ".mp4",
      "count": 5,
      "total_size": 19398656,
      "percentage": 73.12
    }
  ],
  "directories": [
    {
      "path": "videos",
      "file_count": 5,
      "total_size": 19398656,
      "percentage": 73.12
    }
  ]
}

HTML Output: Generates a complete HTML page with:

  • Professional styling and layout
  • Summary statistics dashboard
  • Sortable tables for file types and directories
  • Visual percentage bars
  • Responsive design
  • Program branding footer

Version

Check the version and build information:

filetools version

Output:

version: 1.0.0
date: 2025-10-27T14:30:45Z

Project Structure

filetools/
├── cmd/                    # CLI commands
│   ├── dirstat.go         # Directory statistics command
│   ├── dupfind.go         # Duplicate file finder command
│   ├── dupfind_test.go    # Tests for dupfind
│   ├── root.go            # Root command and global flags
│   └── version.go         # Version command
├── internal/
│   └── output/            # Output formatting module
│       ├── formatter.go   # Core interfaces and data structures
│       ├── json.go        # JSON formatter
│       ├── xml.go         # XML formatter
│       ├── html.go        # HTML formatter
│       ├── text.go        # Text formatter
│       └── formatter_test.go # Output tests
├── main.go                # Application entry point
├── go.mod                 # Go module definition
├── Makefile               # Build automation
└── README.md              # This file

Architecture

The tool is built with a modular architecture:

  • CLI Layer: Uses Cobra for command-line interface with persistent flags
  • Core Logic: File hashing and duplicate detection algorithms
  • Output Layer: Pluggable formatters for different output types
  • Data Flow: Structured data flows from detection → formatting → output (stdout/file)

Command Reference

Global Flags

These flags work with all commands:

  • -o, --output string: Output format (text, json, xml, html) (default "text")
  • -f, --file string: Output file (default: stdout)
  • -j, --json: Shortcut for -o json
  • -x, --xml: Shortcut for -o xml
  • -w, --html: Shortcut for -o html

dupfind Flags

  • -H, --hash string: Hash algorithm (md5, sha1, sha256) (default "md5")

dirstat Flags

The dirstat command uses only the global flags (no command-specific flags).

rename

Rename files in a directory using pattern matching and sed-like replacements.

Basic Usage

Rename files in a directory:

filetools rename --match "*.jpg" --sed "s/^/vacation_/" /photos

If no directory is specified, it uses the current directory:

filetools rename --match "*.txt" --sed "s/draft/final/g"

Output Formats

Choose from multiple output formats:

# Text output (default)
filetools rename --match "*.jpg" --sed "s/old/new/" /path

# JSON output
filetools rename -o json --match "*.jpg" --sed "s/old/new/" /path

# XML output
filetools rename -o xml --match "*.jpg" --sed "s/old/new/" /path

# HTML output
filetools rename -o html --match "*.jpg" --sed "s/old/new/" /path

Examples

Add prefix to all JPG files:

filetools rename --match "*.jpg" --sed "s/^/vacation_/" /photos

Remove suffix from files:

filetools rename --match "*_old.jpg" --sed "s/_old//" /photos

General replacement:

filetools rename --match "*.txt" --sed "s/draft/final/g" /docs

Dry Run (Default)

By default, the command runs in dry-run mode for safety:

filetools rename --match "*.jpg" --sed "s/old/new/" /photos
# Shows what would be renamed without making changes

To actually perform the renames:

filetools rename --force --match "*.jpg" --sed "s/old/new/" /photos

rename Flags

  • --match string: File pattern to match (glob, required)
  • --sed string: Sed-style replacement expression (e.g., s/old/new/g, required)
  • --dry-run: Preview changes without executing (default: true)
  • --force: Perform actual renames and overwrite existing files

Examples

# View help
filetools --help
filetools dupfind --help
filetools dirstat --help

# Different output combinations
filetools dupfind -j -f results.json /path
filetools dupfind -o xml -f report.xml /path
filetools dupfind -w /path > report.html

# Directory statistics
filetools dirstat -j -f stats.json /path
filetools dirstat -w -f analysis.html /path

Development

Testing

Run the test suite:

make test

Run specific tests:

go test -run TestCalculateHash ./cmd/
go test ./internal/output/

Code Quality

The project follows Go best practices:

  • Uses gofmt for consistent formatting
  • Includes comprehensive unit tests
  • Follows standard Go naming conventions
  • Uses Cobra for CLI framework
  • Modular architecture for maintainability

Adding New Output Formats

To add a new output format:

  1. Create a new formatter in internal/output/
  2. Implement the OutputFormatter interface
  3. Add the format to NewFormatter() function
  4. Add corresponding flag if needed
  5. Update tests and documentation

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Workflow

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Write tests for your changes
  4. Ensure all tests pass (make test)
  5. Update documentation if needed
  6. Commit your changes (git commit -m 'Add some AmazingFeature')
  7. Push to the branch (git push origin feature/AmazingFeature)
  8. Open a Pull Request

Guidelines

  • Follow the existing code style and architecture
  • Add tests for new functionality
  • Update README.md for new features
  • Ensure backward compatibility
  • Use meaningful commit messages

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A collection of command-line tools for efficient file management and analysis, built with Go.

Resources

License

Stars

Watchers

Forks

Packages

No packages published