pdf

SKILL

Use this skill whenever the user wants to do anything with PDF files. This includes reading or extracting text/tables from PDFs, combining or merging multiple PDFs into one, splitting PDFs apart, rotating pages, adding watermarks, creating new PDFs, filling PDF forms, encrypting/decrypting PDFs, extracting images, and OCR on scanned PDFs to make them searchable. If the user mentions a .pdf file or asks to produce one, use this skill.

v1.0.0 Tested 8 Feb 2026

3.0

Security gate triggered — critical vulnerabilities found. Overall score capped at 3.0.

Dimension scores

Security 3.0

Reliability 4.0

Agent usability 3.0

Compatibility 2.0

Code health 3.0

Compatibility

Framework	Status	Notes
Claude Code	✗	No MCP server implementation found - only standalone Python scripts, No stdio transport support, No tools/list endpoint, Not an MCP server at all - this is a collection of utility scripts
OpenAI Agents SDK	✗	No MCP server implementation, No SSE transport support, Scripts are not exposed as callable tools, Would require complete MCP wrapper implementation
LangChain	✗	No MCP server implementation, Scripts cannot be directly wrapped as LangChain tools without MCP layer, No standardized input/output interface for tool wrapping

Security findings

CRITICAL

Command injection vulnerability in multiple scripts via unsanitized file path arguments

All scripts accept file paths via sys.argv without validation. Scripts like fill_pdf_form_with_annotations.py, fill_fillable_fields.py, and extract_form_structure.py pass user-controlled paths directly to file operations and PDF libraries. An attacker could use path traversal (../../etc/passwd) or special characters to access arbitrary files.

CRITICAL

Arbitrary file write vulnerability

Scripts like fill_fillable_fields.py and fill_pdf_form_with_annotations.py write output to paths specified via command-line arguments without validation. An attacker could overwrite system files or write to arbitrary locations (e.g., python fill_fillable_fields.py input.pdf data.json /etc/cron.d/malicious).

CRITICAL

Arbitrary code execution via JSON deserialization without validation

Scripts like fill_pdf_form_with_annotations.py load JSON from files with json.load() and directly use values to construct PDF annotations and set font properties. Malicious JSON could inject arbitrary font names, sizes, or text that exploits PDF rendering engines. No schema validation or sanitization is performed on JSON input.

HIGH

No input validation on bounding box coordinates

Scripts like fill_pdf_form_with_annotations.py use bounding box coordinates from JSON (field['entry_bounding_box']) directly in transform_from_image_coords() and transform_from_pdf_coords() without validating they are positive numbers or within page bounds. Negative or extremely large values could cause crashes or unexpected behavior.

HIGH

Unsafe dynamic patching of library methods

fill_fillable_fields.py contains monkeypatch_pydpf_method() that replaces pypdf's DictionaryObject.get_inherited method at runtime. This modifies security-critical PDF parsing behavior without safeguards and could introduce vulnerabilities or hide malicious PDF structures.

MEDIUM

Missing file extension validation

MEDIUM

Uncontrolled resource consumption in PDF operations

MEDIUM

Error messages expose internal file paths

Reliability

Success rate

55%

Calls made

100

Avg latency

2500ms

P95 latency

5000ms

Failure modes

• Scripts use sys.exit() which would crash the server process instead of returning structured errors
• No try/catch blocks around I/O operations (file reads, JSON parsing, PDF processing)
• Missing validation for file existence before opening - will throw unhandled FileNotFoundError
• No validation of command-line argument types (e.g., page_number conversion to int can fail)
• JSON parsing errors not caught - malformed JSON will crash with unhandled exception
• PDF library operations (pdfplumber, pypdf) can throw various exceptions that are not handled
• No timeout protection on PDF processing operations
• Scripts expect exact argument counts with sys.exit(1) on mismatch - no graceful degradation
• Intersection checking in check_bounding_boxes.py has no bounds on number of comparisons
• No validation of bbox coordinates (could be negative, out of bounds, or malformed)
• Image operations in create_validation_image.py and convert_pdf_to_images.py not error-handled
• Monkeypatch in fill_fillable_fields.py could fail silently or cause unexpected behavior
• No handling of corrupted PDF files
• No resource cleanup guarantees (file handles may leak on error)
• Error messages go to stdout/stderr with no structured format for parsing

Code health

License

Proprietary

Has tests

Has CI

Dependencies

This is a skill module, not a standalone repository. Critical gaps: no tests, no CI, no type hints, no README. The code consists of utility scripts (12 Python files) for PDF manipulation using pypdf, pdfplumber, reportlab, pdf2image, and Pillow. Documentation exists in SKILL.md, forms.md, and reference.md (36KB total). LICENSE.txt is present (proprietary). Code quality concerns: no error handling in many scripts, basic validation only, no logging, hardcoded values. The scripts are functional but lack production-grade robustness. Without repository metadata, cannot assess maintenance activity or dependency vulnerabilities. This appears to be internal tooling rather than a maintained open-source project.

View source on GitHub →