Nova Gramática do Português Contemporâneo (Cunha & Cintra, 7ª ed. 2017, Lexikon) — plain-text conversion for MASHA
Find a file
Marcos d2630e3c14 init: import 1 full + 16 chunked markdown files from Cunha & Cintra PDF
Conversion via PyMuPDF with TEXT_INHIBIT_SPACES + TEXT_DEHYPHENATE flags —
795 pages, ~1.2 MB clean text.
2026-05-10 23:42:42 -03:00
text init: import 1 full + 16 chunked markdown files from Cunha & Cintra PDF 2026-05-10 23:42:42 -03:00
README.md init: import 1 full + 16 chunked markdown files from Cunha & Cintra PDF 2026-05-10 23:42:42 -03:00

Nova Gramática do Português Contemporâneo

Plain-text (Markdown) conversion of:

Celso Cunha & Lindley CintraNova Gramática do Português Contemporâneo, 7ª edição, 2017 — Lexikon Editora Digital — ISBN 9788583000310.

Used by the MASHA project as the source of grammatical rules and weights.

Layout

text/
├── nova_gramatica_full.md         — whole book in one file (~1.2 MB)
└── pages_NNNN-MMMM.md             — 50-page chunks for easier diffing

Provenance

  • Source PDF: 47.8 MB, 795 pages
  • Extracted with PyMuPDF (fitz) using TEXT_DEHYPHENATE | TEXT_INHIBIT_SPACES flags to defeat Lexikon's letter-spaced typesetting (e.g. "NOVA G R A M Á T IC A" → "NOVA GRAMÁTICA")
  • Soft-hyphens (U+00AD) stripped after dehyphenation
  • See MASHA scripts/extract_grammar.py for the exact pipeline

License

Conversion script: CHARRUA v1.2. Source text remains property of the original publisher (Lexikon Editora Digital). This repository is for research use by MASHA contributors.