Назад към всички

file-to-markdown

// Convert files into **clean, structured, AI-ready Markdown** using the `markdown.new` API powered by **Cloudflare Workers AI toMarkdown()**.

$ git log --oneline --stat
stars:1,933
forks:367
updated:March 4, 2026
SKILL.mdreadonly

File to Markdown — Skill

Overview

Convert files into clean, structured, AI-ready Markdown using the markdown.new API powered by Cloudflare Workers AI toMarkdown().

Supports 20+ formats including documents, spreadsheets, images, and structured data.

No authentication required (500 requests/day per IP).


When to Use This Skill

Use this skill whenever you need to:

  • Extract text from files for LLM processing
  • Convert PDFs or Office files into Markdown
  • Normalize data into structured text
  • Process uploaded user files
  • Scrape webpage content into Markdown
  • Convert images into AI-generated descriptions + content

Common AI workflows:

  • RAG ingestion pipelines
  • Knowledge base creation
  • Document summarization
  • Dataset extraction
  • Spreadsheet analysis
  • OCR-like extraction from images

Supported Formats

Documents

  • .pdf
  • .docx
  • .odt

Spreadsheets

  • .xlsx
  • .xls
  • .xlsm
  • .xlsb
  • .et
  • .ods
  • .numbers

Images

  • .jpg
  • .jpeg
  • .png
  • .webp
  • .svg

Text & Structured Data

  • .txt
  • .md
  • .csv
  • .json
  • .xml
  • .html
  • .htm

Notes:

  • Image conversion uses AI object detection + summarization.
  • HTML URL conversion uses a web page pipeline.
  • Uploaded HTML uses Workers AI conversion.

API Base URL

https://markdown.new

Endpoints

1️⃣ Convert Remote File (Simple GET)

Returns plain Markdown text.

GET /:file-url

Example:

curl -s "https://markdown.new/https://example.com/report.pdf"

2️⃣ Convert Remote File (JSON Response)

Returns metadata + Markdown.

GET /:file-url?format=json

Example:

curl -s "https://markdown.new/https://example.com/report.pdf?format=json"

3️⃣ Convert Remote File via POST

Use when you want structured JSON response.

POST /
Content-Type: application/json

Body:

{
  "url": "https://example.com/report.pdf"
}

Example:

curl -s https://markdown.new/ \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/report.pdf"}'

4️⃣ Upload Local File

Use when file is not publicly accessible.

POST /convert
multipart/form-data

Example:

curl -s https://markdown.new/convert \
  -F "file=@document.pdf"

Response Formats

URL Conversion Response

{
  "success": true,
  "url": "https://example.com/report.pdf",
  "title": "Quarterly Report",
  "content": "# Quarterly Report\n\n...",
  "method": "Workers AI (file)",
  "duration_ms": 1200,
  "tokens": 850
}

Upload Conversion Response

{
  "success": true,
  "data": {
    "title": "Q4 Report",
    "content": "# Q4 Report\n\n...",
    "filename": "report.xlsx",
    "file_type": ".xlsx",
    "tokens": 1250,
    "processing_time_ms": 320
  }
}

Best Practices for AI Agents

Prefer GET for Simple Workflows

Use:

GET /:url

When:

  • You only need Markdown text
  • Speed is important
  • No metadata required

Prefer POST for Structured Pipelines

Use POST when:

  • Metadata is needed
  • Token counts are required
  • Monitoring or logging is implemented
  • Building automation workflows

File Upload Strategy

Use /convert only if:

  • File is local
  • File is private
  • File requires authentication to access

Otherwise always prefer URL conversion.


Error Handling Strategy

Agents should:

  1. Check "success": true
  2. Retry once if network failure
  3. Validate content length > 0
  4. Fallback to alternate extraction if needed

Rate Limits

  • 500 requests/day per IP without API key
  • No signup required

Agents should:

  • Cache results when possible
  • Avoid duplicate conversions

Integration Examples

JavaScript (Node.js)

const res = await fetch("https://markdown.new/", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    url: "https://example.com/file.pdf"
  })
});

const data = await res.json();
console.log(data.content);

Python

import requests

res = requests.post(
    "https://markdown.new/",
    json={"url": "https://example.com/file.pdf"}
)

data = res.json()
print(data["content"])

Agent Decision Tree

If user provides:

Input TypeAction
Public file URLUse GET or POST
Local fileUse POST /convert
ImageConvert then summarize
SpreadsheetConvert then analyze
WebpageConvert URL HTML

Output Expectations

The Markdown should be:

  • Clean
  • Structured
  • AI-friendly
  • Minimal noise
  • Ready for LLM ingestion

Limitations

  • Complex PDF layouts may lose formatting
  • Large spreadsheets may be truncated
  • Images rely on AI interpretation accuracy
  • Token limits may apply

Summary

This skill provides a universal file-to-Markdown conversion layer for AI systems with:

  • No authentication
  • Simple HTTP interface
  • Multi-format support
  • Structured output
  • Fast processing

Ideal for document ingestion, RAG pipelines, and automation agents.