Public Release of HistText: A Major Milestone for Digital Historical Research

Share:

The ENP-China  project is pleased to announce the public release of HistText, an innovative web-based application designed to transform how scholars explore and analyze large-scale historical text corpora. Developed through an ERC Proof of Concept grant, HistText addresses one of the core challenges in the digital humanities: making complex, multilingual, and heterogeneous textual data—newspapers, directories, periodicals, diaries—accessible and analytically usable.

HistText is more than a tool; it is a methodological advance. Featuring a user-friendly interface and powered by a robust SolR-based backend, HistText offers:

  • flexible full-text search and filtering across corpora,
  • concordance tools for contextual exploration,
  • query expansion via word embedding models,
  • named entity recognition (NER), including for non-Latin scripts and transitional Chinese,
  • interactive visualizations for pattern detection and hypothesis formation,
  • export options for use with Gephi, Cytoscape, or GIS environments.

Unlike a desktop application,  HistText is a server-based platform intended for deployment by institutional IT teams, libraries, or archives. It is distributed free of charge for non-commercial use within the European Union. Full source code, documentation, and installation instructions are available here: https://github.com/BaptisteBlouin/HistText

HistText owes its technical achievement to Baptiste Blouin, the project’s computer scientist, who designed and developed both backend and frontend systems, adapted NLP models for historical Chinese texts, and implemented the application in RUST for enhanced performance and security.

Presentation videos and tutorials to be released soon.

As of today,  HistText provides immediate and open access to the corpora assembled in the Modern China Text Base—a major new resource for scholars of modern East Asia and beyond. It is a concrete outcome of interdisciplinary collaboration between historians and computer scientists.

We warmly invite researchers, librarians, and digital humanists to explore and share this new platform with their networks.