Project 04 · Open Source · NLP
Arabic NLP Toolkit
Production-ready Arabic NLP for real-world text — eight dialects, sentiment, NER, morphology, Franco-Arabic transliteration, keywords, profiling, and a polished browser demo.
Most Arabic NLP libraries target Modern Standard Arabic only. arabic-nlp-toolkit is built for how Arabic is actually written — Egyptian, Gulf, Levantine, Maghrebi, Iraqi, Yemeni, and Sudanese dialects on social media, in reviews, and in Franco-Arabic chat.
The core ships with a single required dependency (Pydantic v2), rule-based models that work offline, typed JSON-serializable results, a full CLI, and a FastAPI web demo with RTL dark UI for live playground testing.
End-to-end pipeline
Dialect detection
Confidence-ranked scores across eight Arabic varieties with Arabic display names.
Sentiment
Negation, intensifiers, and dialect-aware lexicon scoring for social text.
Named entities
Gazetteer + pattern NER for persons, locations, and organizations.
Transliteration
Franco-Arabic ↔ Arabic and Buckwalter for chat-alphabet workflows.
Morphology & POS
Roots, patterns, stemming, and Universal + Arabic POS tagging.
Keywords & profiling
TF keyword extraction, register detection, quality score, and recommendations.
Normalization
Diacritics, alef variants, mentions, hashtags, and emoji cleanup.
Document export
analyze_document() → JSON-ready pipelines for APIs and ETL.
Web demo
Playground + project profile tabs at python webapp/app.py — port 8765.
pip install arabic-nlp-toolkit
from arabic_nlp import ArabicNLP
nlp = ArabicNLP()
nlp.detect_dialect("ازيك عامل ايه؟") # egyptian
nlp.sentiment("المنتج رائع جداً!") # positive
doc = nlp.analyze_document("نص كامل") # JSON export
Author & maintainer — library architecture, dialect lexicons, test suite (301+ tests), web demo, PyPI packaging, and documentation. MIT licensed, built from Egypt.
Star on GitHub →