Show HN: Form16x – Parse Tax PDFs into Structured JSON/CSV with a Python CLI

Wait 5 sec.

Hi HN,Every year in India, salaried employees receive a tax document called Form 16 (similar to a W-2 in the US or a T4 in Canada). It’s a semi-structured PDF, but every employer generates it in slightly different layouts. Preparing annual tax returns often means manually copying numbers into filing software.I built *Form16x*, a Python CLI + library that extracts structured data from these PDFs into JSON/CSV. Beyond extraction, it can: - Consolidate multiple Form 16s (if you switched jobs in a year) - Calculate taxes under both regimes and recommend the better option - Show detailed salary breakdowns in the terminal (tree view, percentages, colored output) - Suggest tax optimizations (80C, 80D, NPS, etc.) with potential savings - Expose a Python API (`TaxCalculationAPI`) for programmatic use, with multi-year tax rules (AY 2020–2025)*Why I built it* I was frustrated with manually typing this data into filing portals, so I wrote my own parser + calculator. I open-sourced it in case others dealing with semi-structured PDFs (whether tax forms or other domains) find the approach useful.*Technical notes* - Built on top of camelot-py/pdfplumber with custom routing logic and confidence scoring - CLI uses `rich` for colored output and breakdown trees - Designed for offline-only use (no uploads, no external APIs) - Outputs JSON/CSV aligned to the official form structureWould love feedback from the HN community — both on the technical design and whether there’s interest in extending this to other countries’ forms (W-2, T4, etc.).Thanks!Comments URL: https://news.ycombinator.com/item?id=45213416Points: 1# Comments: 0