What it does
Given a PDF/Word public contract, the tool parses text and tables, runs an LLM‑driven extraction, and outputs a standardized human‑readable text summary.
- PDF/Word processing
- Document ingestion and text extraction
- LLM‑prompted summarization to a key+values schema
- Post‑processing for dates, currency, emails
- Portable outputs (JSON/Markdown/CSV)
- Automatic hallucination control
How it works
First we train the LLM with a subset of documents so it learns what is the information we want to extract, how it can appear in the documents and what is the output format we expect. Once this training phase is completed we can summarize new documents with this LLM in seconds.
The tool provides code to preprocess complex PDF/Word formats. Also, it removes hallucinated data not present in the original document.