Automatic Contract Summarization

An open-source experimental tool by Development Gateway that extracts and summarizes key details from public contracts into a consistent structured format.

What it does

Given a PDF/Word public contract, the tool parses text and tables, runs an LLM‑driven extraction, and outputs a standardized human‑readable text summary.

  • PDF/Word processing
  • Document ingestion and text extraction
  • LLM‑prompted summarization to a key+values schema
  • Post‑processing for dates, currency, emails
  • Portable outputs (JSON/Markdown/CSV)
  • Automatic hallucination control

How it works

First we train the LLM with a subset of documents so it learns what is the information we want to extract, how it can appear in the documents and what is the output format we expect. Once this training phase is completed we can summarize new documents with this LLM in seconds.

The tool provides code to preprocess complex PDF/Word formats. Also, it removes hallucinated data not present in the original document.

Get started

Head to the Install page or read the Guide (live from the README).