Japanese Linguistics Resources
From RevTK
Contents |
Frequency lists
- Novel word frequency list - From Michiel Kamermans (2008).
- 5000+ novels word frequency list - From cb4960 (2010).
- Goo blogs word frequency list - From Hiroshi Utsumi (2008).
- Kanji Frequency in Wikipedia - From shang.
- Kanji frequency analysis in Wikipedia - From FooSoft.
Corpora
- Balanced Corpus of Contemporary Written Japanese (BCCWJ) - Demo page for the Balanced Corpus of Contemporary Written Japanese. It's a fraction (10 million words, without certain annotations) of the full corpus.
- Tanaka Corpus - A corpus of Japanese-English sentences used as examples in WWWJDIC. Considered problematic by some RevTK forum users due to errors. Incorporated into the Tatoeba Project.
Concordancing
- J-KWIC Online - Online KWIC concordancer for Japanese text.
- JReK - “JReK lets you search for Japanese words and see how they are used in context within sentences on the web.”
Databases
- Japanese Wordnet - Lexical database. Aims to be “... a large scale, freely available, semantic dictionary of Japanese.”
- Japanese FrameNet - “Japanese FrameNet aims at building a lexicon that records the valence descriptions of Japanese words, based on Frame Semantics and corpus data.”
- Asian WordNet Project
Text Analysis Tools
- AntConc - Freeware, features concordancing and text mining functions.
- KH Coder - “KH Coder is a free software for Quantitative Content Analysis or Text Mining of Japanese language data.”
- Language Grid Playground - Various language services, including semantic dictionaries, translators, morphological analysis, and dependency parsing.
- Transcriber - “A tool for segmenting, labeling and transcribing speech.” Superseded by TranscriberAG.
- Translation Aggregator - Suite that compiles web tools for translating, morphological parsing and annotating.
External Links
- Catalogue of Language Resources and Tools in Japan - Includes information on various spoken and written corpora.
- The Monash Nihongo FTP Archive - “The archive is maintained by Jim Breen (jwb@csse.monash.edu.au) as a repository of files and software related to Japan, its people and particularly the Japanese language.”
